new Quantifying Holistic Review: A Multi-Modal Approach to College Admissions Prediction

Authors: Jun-Wei Zeng, Jerry Shen

Abstract: This paper introduces the Comprehensive Applicant Profile Score (CAPS), a novel multi-modal framework designed to quantitatively model and interpret holistic college admissions evaluations. CAPS decomposes applicant profiles into three interpretable components: academic performance (Standardized Academic Score, SAS), essay quality (Essay Quality Index, EQI), and extracurricular engagement (Extracurricular Impact Score, EIS). Leveraging transformer-based semantic embeddings, LLM scoring, and XGBoost regression, CAPS provides transparent and explainable evaluations aligned with human judgment. Experiments on a synthetic but realistic dataset demonstrate strong performance, achieving an EQI prediction R^2 of 0.80, classification accuracy over 75%, a macro F1 score of 0.69, and a weighted F1 score of 0.74. CAPS addresses key limitations in traditional holistic review -- particularly the opacity, inconsistency, and anxiety faced by applicants -- thus paving the way for more equitable and data-informed admissions practices.

new RDMA: Cost Effective Agent-Driven Rare Disease Discovery within Electronic Health Record Systems

Authors: John Wu, Adam Cross, Jimeng Sun

Abstract: Rare diseases affect 1 in 10 Americans, yet standard ICD coding systems fail to capture these conditions in electronic health records (EHR), leaving crucial information buried in clinical notes. Current approaches struggle with medical abbreviations, miss implicit disease mentions, raise privacy concerns with cloud processing, and lack clinical reasoning abilities. We present Rare Disease Mining Agents (RDMA), a framework that mirrors how medical experts identify rare disease patterns in EHR. RDMA connects scattered clinical observations that together suggest specific rare conditions. By handling clinical abbreviations, recognizing implicit disease patterns, and applying contextual reasoning locally on standard hardware, RDMA reduces privacy risks while improving F1 performance by upwards of 30\% and decreasing inferences costs 10-fold. This approach helps clinicians avoid the privacy risk of using cloud services while accessing key rare disease information from EHR systems, supporting earlier diagnosis for rare disease patients. Available at https://github.com/jhnwu3/RDMA.

URLs: https://github.com/jhnwu3/RDMA.

new An open dataset of neural networks for hypernetwork research

Authors: David Kurtenbach, Lior Shamir

Abstract: Despite the transformative potential of AI, the concept of neural networks that can produce other neural networks by generating model weights (hypernetworks) has been largely understudied. One of the possible reasons is the lack of available research resources that can be used for the purpose of hypernetwork research. Here we describe a dataset of neural networks, designed for the purpose of hypernetworks research. The dataset includes $10^4$ LeNet-5 neural networks trained for binary image classification separated into 10 classes, such that each class contains 1,000 different neural networks that can identify a certain ImageNette V2 class from all other classes. A computing cluster of over $10^4$ cores was used to generate the dataset. Basic classification results show that the neural networks can be classified with accuracy of 72.0%, indicating that the differences between the neural networks can be identified by supervised machine learning algorithms. The ultimate purpose of the dataset is to enable hypernetworks research. The dataset and the code that generates it are open and accessible to the public.

new Prompt Smart, Pay Less: Cost-Aware APO for Real-World Applications

Authors: Jayesh Choudhari, Piyush Kumar Singh, Douglas McIlwraith, Snehal Nair

Abstract: Prompt design is a critical factor in the effectiveness of Large Language Models (LLMs), yet remains largely heuristic, manual, and difficult to scale. This paper presents the first comprehensive evaluation of Automatic Prompt Optimization (APO) methods for real-world, high-stakes multiclass classification in a commercial setting, addressing a critical gap in the existing literature where most of the APO frameworks have been validated only on benchmark classification tasks of limited complexity. We introduce APE-OPRO, a novel hybrid framework that combines the complementary strengths of APE and OPRO, achieving notably better cost-efficiency, around $18\%$ improvement over OPRO, without sacrificing performance. We benchmark APE-OPRO alongside both gradient-free (APE, OPRO) and gradient-based (ProTeGi) methods on a dataset of ~2,500 labeled products. Our results highlight key trade-offs: ProTeGi offers the strongest absolute performance at lower API cost but higher computational time as noted in~\cite{protegi}, while APE-OPRO strikes a compelling balance between performance, API efficiency, and scalability. We further conduct ablation studies on depth and breadth hyperparameters, and reveal notable sensitivity to label formatting, indicating implicit sensitivity in LLM behavior. These findings provide actionable insights for implementing APO in commercial applications and establish a foundation for future research in multi-label, vision, and multimodal prompt optimization scenarios.

new ReDi: Rectified Discrete Flow

Authors: Jaehoon Yoo, Wonjung Kim, Seunghoon Hong

Abstract: Discrete Flow-based Models (DFMs) are powerful generative models for high-quality discrete data but typically suffer from slow sampling speeds due to their reliance on iterative decoding processes. This reliance on a multi-step process originates from the factorization approximation of DFMs, which is necessary for handling high-dimensional data. In this paper, we rigorously characterize the approximation error from factorization using Conditional Total Correlation (TC), which depends on the coupling. To reduce the Conditional TC and enable efficient few-step generation, we propose Rectified Discrete Flow (ReDi), a novel iterative method that reduces factorization error by rectifying the coupling between source and target distributions. We theoretically prove that each ReDi step guarantees a monotonic decreasing Conditional TC, ensuring its convergence. Empirically, ReDi significantly reduces Conditional TC and enables few-step generation. Moreover, we demonstrate that the rectified couplings are well-suited for training efficient one-step models on image generation. ReDi offers a simple and theoretically grounded approach for tackling the few-step challenge, providing a new perspective on efficient discrete data synthesis. Code is available at https://github.com/Ugness/ReDi_discrete

URLs: https://github.com/Ugness/ReDi_discrete

new Improving the Generation of VAEs with High Dimensional Latent Spaces by the use of Hyperspherical Coordinates

Authors: Alejandro Ascarate, Leo Lebrat, Rodrigo Santa Cruz, Clinton Fookes, Olivier Salvado

Abstract: Variational autoencoders (VAE) encode data into lower-dimensional latent vectors before decoding those vectors back to data. Once trained, decoding a random latent vector from the prior usually does not produce meaningful data, at least when the latent space has more than a dozen dimensions. In this paper, we investigate this issue by drawing insight from high dimensional statistics: in these regimes, the latent vectors of a standard VAE are by construction distributed uniformly on a hypersphere. We propose to formulate the latent variables of a VAE using hyperspherical coordinates, which allows compressing the latent vectors towards an island on the hypersphere, thereby reducing the latent sparsity and we show that this improves the generation ability of the VAE. We propose a new parameterization of the latent space with limited computational overhead.

new Towards Mitigation of Hallucination for LLM-empowered Agents: Progressive Generalization Bound Exploration and Watchdog Monitor

Authors: Siyuan Liu, Wenjing Liu, Zhiwei Xu, Xin Wang, Bo Chen, Tao Li

Abstract: Empowered by large language models (LLMs), intelligent agents have become a popular paradigm for interacting with open environments to facilitate AI deployment. However, hallucinations generated by LLMs-where outputs are inconsistent with facts-pose a significant challenge, undermining the credibility of intelligent agents. Only if hallucinations can be mitigated, the intelligent agents can be used in real-world without any catastrophic risk. Therefore, effective detection and mitigation of hallucinations are crucial to ensure the dependability of agents. Unfortunately, the related approaches either depend on white-box access to LLMs or fail to accurately identify hallucinations. To address the challenge posed by hallucinations of intelligent agents, we present HalMit, a novel black-box watchdog framework that models the generalization bound of LLM-empowered agents and thus detect hallucinations without requiring internal knowledge of the LLM's architecture. Specifically, a probabilistic fractal sampling technique is proposed to generate a sufficient number of queries to trigger the incredible responses in parallel, efficiently identifying the generalization bound of the target agent. Experimental evaluations demonstrate that HalMit significantly outperforms existing approaches in hallucination monitoring. Its black-box nature and superior performance make HalMit a promising solution for enhancing the dependability of LLM-powered systems.

new Fast-VAT: Accelerating Cluster Tendency Visualization using Cython and Numba

Authors: MSR Avinash (Presidency University, Bangalore), Ismael Lachheb (EPITA School of Engineering,Computer Science, Paris, France)

Abstract: Visual Assessment of Cluster Tendency (VAT) is a widely used unsupervised technique to assess the presence of cluster structure in unlabeled datasets. However, its standard implementation suffers from significant performance limitations due to its O(n^2) time complexity and inefficient memory usage. In this work, we present Fast-VAT, a high-performance reimplementation of the VAT algorithm in Python, augmented with Numba's Just-In-Time (JIT) compilation and Cython's static typing and low-level memory optimizations. Our approach achieves up to 50x speedup over the baseline implementation, while preserving the output fidelity of the original method. We validate Fast-VAT on a suite of real and synthetic datasets -- including Iris, Mall Customers, and Spotify subsets -- and verify cluster tendency using Hopkins statistics, PCA, and t-SNE. Additionally, we compare VAT's structural insights with clustering results from DBSCAN and K-Means to confirm its reliability.

new Foundation Models and Transformers for Anomaly Detection: A Survey

Authors: Mou\"in Ben Ammar, Arturo Mendoza, Nacim Belkhir, Antoine Manzanera, Gianni Franchi

Abstract: In line with the development of deep learning, this survey examines the transformative role of Transformers and foundation models in advancing visual anomaly detection (VAD). We explore how these architectures, with their global receptive fields and adaptability, address challenges such as long-range dependency modeling, contextual modeling and data scarcity. The survey categorizes VAD methods into reconstruction-based, feature-based and zero/few-shot approaches, highlighting the paradigm shift brought about by foundation models. By integrating attention mechanisms and leveraging large-scale pre-training, Transformers and foundation models enable more robust, interpretable, and scalable anomaly detection solutions. This work provides a comprehensive review of state-of-the-art techniques, their strengths, limitations, and emerging trends in leveraging these architectures for VAD.

new Towards Reliable, Uncertainty-Aware Alignment

Authors: Debangshu Banerjee, Kintan Saha, Aditya Gopalan

Abstract: Alignment of large language models (LLMs) typically involves training a reward model on preference data, followed by policy optimization with respect to the reward model. However, optimizing policies with respect to a single reward model estimate can render it vulnerable to inaccuracies in the reward model. We empirically study the variability of reward model training on open-source benchmarks. We observe that independently trained reward models on the same preference dataset can exhibit substantial disagreement, highlighting the instability of current alignment strategies. Employing a theoretical model, we demonstrate that variability in reward model estimation can cause overfitting, leading to the risk of performance degradation. To mitigate this risk, we propose a variance-aware policy optimization framework for preference-based alignment. The key ingredient of the framework is a new policy regularizer that incorporates reward model variance estimates. We show that variance-aware policy optimization provably reduces the risk of outputting a worse policy than the default. Experiments across diverse LLM and reward model configurations confirm that our approach yields more stable and robust alignment than the standard (variance-unaware) pipeline.

new Dual Turing Test: A Framework for Detecting and Mitigating Undetectable AI

Authors: Alberto Messina

Abstract: In this short note, we propose a unified framework that bridges three areas: (1) a flipped perspective on the Turing Test, the "dual Turing test", in which a human judge's goal is to identify an AI rather than reward a machine for deception; (2) a formal adversarial classification game with explicit quality constraints and worst-case guarantees; and (3) a reinforcement learning (RL) alignment pipeline that uses an undetectability detector and a set of quality related components in its reward model. We review historical precedents, from inverted and meta-Turing variants to modern supervised reverse-Turing classifiers, and highlight the novelty of combining quality thresholds, phased difficulty levels, and minimax bounds. We then formalize the dual test: define the judge's task over N independent rounds with fresh prompts drawn from a prompt space Q, introduce a quality function Q and parameters tau and delta, and cast the interaction as a two-player zero-sum game over the adversary's feasible strategy set M. Next, we map this minimax game onto an RL-HF style alignment loop, in which an undetectability detector D provides negative reward for stealthy outputs, balanced by a quality proxy that preserves fluency. Throughout, we include detailed explanations of each component notation, the meaning of inner minimization over sequences, phased tests, and iterative adversarial training and conclude with a suggestion for a couple of immediate actions.

new HyDRA: A Hybrid-Driven Reasoning Architecture for Verifiable Knowledge Graphs

Authors: Adrian Kaiser, Claudiu Leoveanu-Condrei, Ryan Gold, Marius-Constantin Dinu, Markus Hofmarcher

Abstract: The synergy between symbolic knowledge, often represented by Knowledge Graphs (KGs), and the generative capabilities of neural networks is central to advancing neurosymbolic AI. A primary bottleneck in realizing this potential is the difficulty of automating KG construction, which faces challenges related to output reliability, consistency, and verifiability. These issues can manifest as structural inconsistencies within the generated graphs, such as the formation of disconnected $\textit{isolated islands}$ of data or the inaccurate conflation of abstract classes with specific instances. To address these challenges, we propose HyDRA, a $\textbf{Hy}$brid-$\textbf{D}$riven $\textbf{R}$easoning $\textbf{A}$rchitecture designed for verifiable KG automation. Given a domain or an initial set of documents, HyDRA first constructs an ontology via a panel of collaborative neurosymbolic agents. These agents collaboratively agree on a set of competency questions (CQs) that define the scope and requirements the ontology must be able to answer. Given these CQs, we build an ontology graph that subsequently guides the automated extraction of triplets for KG generation from arbitrary documents. Inspired by design-by-contracts (DbC) principles, our method leverages verifiable contracts as the primary control mechanism to steer the generative process of Large Language Models (LLMs). To verify the output of our approach, we extend beyond standard benchmarks and propose an evaluation framework that assesses the functional correctness of the resulting KG by leveraging symbolic verifications as described by the neurosymbolic AI framework, $\textit{SymbolicAI}$. This work contributes a hybrid-driven architecture for improving the reliability of automated KG construction and the exploration of evaluation methods for measuring the functional integrity of its output. The code is publicly available.

new On the transferability of Sparse Autoencoders for interpreting compressed models

Authors: Suchit Gupte, Vishnu Kabir Chhabra, Mohammad Mahdi Khalili

Abstract: Modern LLMs face inference efficiency challenges due to their scale. To address this, many compression methods have been proposed, such as pruning and quantization. However, the effect of compression on a model's interpretability remains elusive. While several model interpretation approaches exist, such as circuit discovery, Sparse Autoencoders (SAEs) have proven particularly effective in decomposing a model's activation space into its feature basis. In this work, we explore the differences in SAEs for the original and compressed models. We find that SAEs trained on the original model can interpret the compressed model albeit with slight performance degradation compared to the trained SAE on the compressed model. Furthermore, simply pruning the original SAE itself achieves performance comparable to training a new SAE on the pruned model. This finding enables us to mitigate the extensive training costs of SAEs.

new Semantic-Aware Gaussian Process Calibration with Structured Layerwise Kernels for Deep Neural Networks

Authors: Kyung-hwan Lee, Kyung-tae Kim

Abstract: Calibrating the confidence of neural network classifiers is essential for quantifying the reliability of their predictions during inference. However, conventional Gaussian Process (GP) calibration methods often fail to capture the internal hierarchical structure of deep neural networks, limiting both interpretability and effectiveness for assessing predictive reliability. We propose a Semantic-Aware Layer-wise Gaussian Process (SAL-GP) framework that mirrors the layered architecture of the target neural network. Instead of applying a single global GP correction, SAL-GP employs a multi-layer GP model, where each layer's feature representation is mapped to a local calibration correction. These layerwise GPs are coupled through a structured multi-layer kernel, enabling joint marginalization across all layers. This design allows SAL-GP to capture both local semantic dependencies and global calibration coherence, while consistently propagating predictive uncertainty through the network. The resulting framework enhances interpretability aligned with the network architecture and enables principled evaluation of confidence consistency and uncertainty quantification in deep models.

new Enhancing Stability of Physics-Informed Neural Network Training Through Saddle-Point Reformulation

Authors: Dmitry Bylinkin, Mikhail Aleksandrov, Savelii Chezhegov, Aleksandr Beznosikov

Abstract: Physics-informed neural networks (PINNs) have gained prominence in recent years and are now effectively used in a number of applications. However, their performance remains unstable due to the complex landscape of the loss function. To address this issue, we reformulate PINN training as a nonconvex-strongly concave saddle-point problem. After establishing the theoretical foundation for this approach, we conduct an extensive experimental study, evaluating its effectiveness across various tasks and architectures. Our results demonstrate that the proposed method outperforms the current state-of-the-art techniques.

new Neural Probabilistic Shaping: Joint Distribution Learning for Optical Fiber Communications

Authors: Mohammad Taha Askari, Lutz Lampe, Amirhossein Ghazisaeidi

Abstract: We present an autoregressive end-to-end learning approach for probabilistic shaping on nonlinear fiber channels. Our proposed scheme learns the joint symbol distribution and provides a 0.3-bits/2D achievable information rate gain over an optimized marginal distribution for dual-polarized 64-QAM transmission over a single-span 205 km link.

new Reactivation: Empirical NTK Dynamics Under Task Shifts

Authors: Yuzhi Liu, Zixuan Chen, Zirui Zhang, Yufei Liu, Giulia Lanzillotta

Abstract: The Neural Tangent Kernel (NTK) offers a powerful tool to study the functional dynamics of neural networks. In the so-called lazy, or kernel regime, the NTK remains static during training and the network function is linear in the static neural tangents feature space. The evolution of the NTK during training is necessary for feature learning, a key driver of deep learning success. The study of the NTK dynamics has led to several critical discoveries in recent years, in generalization and scaling behaviours. However, this body of work has been limited to the single task setting, where the data distribution is assumed constant over time. In this work, we present a comprehensive empirical analysis of NTK dynamics in continual learning, where the data distribution shifts over time. Our findings highlight continual learning as a rich and underutilized testbed for probing the dynamics of neural training. At the same time, they challenge the validity of static-kernel approximations in theoretical treatments of continual learning, even at large scale.

new A Lower Bound for the Number of Linear Regions of Ternary ReLU Regression Neural Networks

Authors: Yuta Nakahara, Manabu Kobayashi, Toshiyasu Matsushima

Abstract: With the advancement of deep learning, reducing computational complexity and memory consumption has become a critical challenge, and ternary neural networks (NNs) that restrict parameters to $\{-1, 0, +1\}$ have attracted attention as a promising approach. While ternary NNs demonstrate excellent performance in practical applications such as image recognition and natural language processing, their theoretical understanding remains insufficient. In this paper, we theoretically analyze the expressivity of ternary NNs from the perspective of the number of linear regions. Specifically, we evaluate the number of linear regions of ternary regression NNs with Rectified Linear Unit (ReLU) for activation functions and prove that the number of linear regions increases polynomially with respect to network width and exponentially with respect to depth, similar to standard NNs. Moreover, we show that it suffices to either square the width or double the depth of ternary NNs to achieve a lower bound on the maximum number of linear regions comparable to that of general ReLU regression NNs. This provides a theoretical explanation, in some sense, for the practical success of ternary NNs.

new TorchAO: PyTorch-Native Training-to-Serving Model Optimization

Authors: Andrew Or, Apurva Jain, Daniel Vega-Myhre, Jesse Cai, Charles David Hernandez, Zhenrui Zheng, Driss Guessous, Vasiliy Kuznetsov, Christian Puhrsch, Mark Saroufim, Supriya Rao, Thien Tran, Aleksandar Samard\v{z}i\'c

Abstract: We present TorchAO, a PyTorch-native model optimization framework leveraging quantization and sparsity to provide an end-to-end, training-to-serving workflow for AI models. TorchAO supports a variety of popular model optimization techniques, including FP8 quantized training, quantization-aware training (QAT), post-training quantization (PTQ), and 2:4 sparsity, and leverages a novel tensor subclass abstraction to represent a variety of widely-used, backend agnostic low precision data types, including INT4, INT8, FP8, MXFP4, MXFP6, and MXFP8. TorchAO integrates closely with the broader ecosystem at each step of the model optimization pipeline, from pre-training (TorchTitan) to fine-tuning (TorchTune, Axolotl) to serving (HuggingFace, vLLM, SGLang, ExecuTorch), connecting an otherwise fragmented space in a single, unified workflow. TorchAO has enabled recent launches of the quantized Llama 3.2 1B/3B and LlamaGuard3-8B models and is open-source at https://github.com/pytorch/ao/.

URLs: https://github.com/pytorch/ao/.

new Learning Patient-Specific Spatial Biomarker Dynamics via Operator Learning for Alzheimer's Disease Progression

Authors: Jindong Wang, Yutong Mao, Xiao Liu, Wenrui Hao

Abstract: Alzheimer's disease (AD) is a complex, multifactorial neurodegenerative disorder with substantial heterogeneity in progression and treatment response. Despite recent therapeutic advances, predictive models capable of accurately forecasting individualized disease trajectories remain limited. Here, we present a machine learning-based operator learning framework for personalized modeling of AD progression, integrating longitudinal multimodal imaging, biomarker, and clinical data. Unlike conventional models with prespecified dynamics, our approach directly learns patient-specific disease operators governing the spatiotemporal evolution of amyloid, tau, and neurodegeneration biomarkers. Using Laplacian eigenfunction bases, we construct geometry-aware neural operators capable of capturing complex brain dynamics. Embedded within a digital twin paradigm, the framework enables individualized predictions, simulation of therapeutic interventions, and in silico clinical trials. Applied to AD clinical data, our method achieves high prediction accuracy exceeding 90% across multiple biomarkers, substantially outperforming existing approaches. This work offers a scalable, interpretable platform for precision modeling and personalized therapeutic optimization in neurodegenerative diseases.

new LLM Data Selection and Utilization via Dynamic Bi-level Optimization

Authors: Yang Yu, Kai Han, Hang Zhou, Yehui Tang, Kaiqi Huang, Yunhe Wang, Dacheng Tao

Abstract: While large-scale training data is fundamental for developing capable large language models (LLMs), strategically selecting high-quality data has emerged as a critical approach to enhance training efficiency and reduce computational costs. Current data selection methodologies predominantly rely on static, training-agnostic criteria, failing to account for the dynamic model training and data interactions. In this paper, we propose a new Data Weighting Model (DWM) to adjust the weight of selected data within each batch to achieve a dynamic data utilization during LLM training. Specially, to better capture the dynamic data preference of the trained model, a bi-level optimization framework is implemented to update the weighting model. Our experiments demonstrate that DWM enhances the performance of models trained with randomly-selected data, and the learned weighting model can be transferred to enhance other data selection methods and models of different sizes. Moreover, we further analyze how a model's data preferences evolve throughout training, providing new insights into the data preference of the model during training.

new EBaReT: Expert-guided Bag Reward Transformer for Auto Bidding

Authors: Kaiyuan Li, Pengyu Wang, Yunshan Peng, Pengjia Yuan, Yanxiang Zeng, Rui Xiang, Yanhua Cheng, Xialong Liu, Peng Jiang

Abstract: Reinforcement learning has been widely applied in automated bidding. Traditional approaches model bidding as a Markov Decision Process (MDP). Recently, some studies have explored using generative reinforcement learning methods to address long-term dependency issues in bidding environments. Although effective, these methods typically rely on supervised learning approaches, which are vulnerable to low data quality due to the amount of sub-optimal bids and low probability rewards resulting from the low click and conversion rates. Unfortunately, few studies have addressed these challenges. In this paper, we formalize the automated bidding as a sequence decision-making problem and propose a novel Expert-guided Bag Reward Transformer (EBaReT) to address concerns related to data quality and uncertainty rewards. Specifically, to tackle data quality issues, we generate a set of expert trajectories to serve as supplementary data in the training process and employ a Positive-Unlabeled (PU) learning-based discriminator to identify expert transitions. To ensure the decision also meets the expert level, we further design a novel expert-guided inference strategy. Moreover, to mitigate the uncertainty of rewards, we consider the transitions within a certain period as a "bag" and carefully design a reward function that leads to a smoother acquisition of rewards. Extensive experiments demonstrate that our model achieves superior performance compared to state-of-the-art bidding methods.

new RealBench: Benchmarking Verilog Generation Models with Real-World IP Designs

Authors: Pengwei Jin, Di Huang, Chongxiao Li, Shuyao Cheng, Yang Zhao, Xinyao Zheng, Jiaguo Zhu, Shuyi Xing, Bohan Dou, Rui Zhang, Zidong Du, Qi Guo, Xing Hu

Abstract: The automatic generation of Verilog code using Large Language Models (LLMs) has garnered significant interest in hardware design automation. However, existing benchmarks for evaluating LLMs in Verilog generation fall short in replicating real-world design workflows due to their designs' simplicity, inadequate design specifications, and less rigorous verification environments. To address these limitations, we present RealBench, the first benchmark aiming at real-world IP-level Verilog generation tasks. RealBench features complex, structured, real-world open-source IP designs, multi-modal and formatted design specifications, and rigorous verification environments, including 100% line coverage testbenches and a formal checker. It supports both module-level and system-level tasks, enabling comprehensive assessments of LLM capabilities. Evaluations on various LLMs and agents reveal that even one of the best-performing LLMs, o1-preview, achieves only a 13.3% pass@1 on module-level tasks and 0% on system-level tasks, highlighting the need for stronger Verilog generation models in the future. The benchmark is open-sourced at https://github.com/IPRC-DIP/RealBench.

URLs: https://github.com/IPRC-DIP/RealBench.

new METER: Multi-modal Evidence-based Thinking and Explainable Reasoning -- Algorithm and Benchmark

Authors: Xu Yang, Qi Zhang, Shuming Jiang, Yaowen Xu, Zhaofan Zou, Hao Sun, Xuelong Li

Abstract: With the rapid advancement of generative AI, synthetic content across images, videos, and audio has become increasingly realistic, amplifying the risk of misinformation. Existing detection approaches predominantly focus on binary classification while lacking detailed and interpretable explanations of forgeries, which limits their applicability in safety-critical scenarios. Moreover, current methods often treat each modality separately, without a unified benchmark for cross-modal forgery detection and interpretation. To address these challenges, we introduce METER, a unified, multi-modal benchmark for interpretable forgery detection spanning images, videos, audio, and audio-visual content. Our dataset comprises four tracks, each requiring not only real-vs-fake classification but also evidence-chain-based explanations, including spatio-temporal localization, textual rationales, and forgery type tracing. Compared to prior benchmarks, METER offers broader modality coverage and richer interpretability metrics such as spatial/temporal IoU, multi-class tracing, and evidence consistency. We further propose a human-aligned, three-stage Chain-of-Thought (CoT) training strategy combining SFT, DPO, and a novel GRPO stage that integrates a human-aligned evaluator with CoT reasoning. We hope METER will serve as a standardized foundation for advancing generalizable and interpretable forgery detection in the era of generative media.

new Aligned Manifold Property and Topology Point Clouds for Learning Molecular Properties

Authors: Alexander Mihalcea

Abstract: Machine learning models for molecular property prediction generally rely on representations -- such as SMILES strings and molecular graphs -- that overlook the surface-local phenomena driving intermolecular behavior. 3D-based approaches often reduce surface detail or require computationally expensive SE(3)-equivariant architectures to manage spatial variance. To overcome these limitations, this work introduces AMPTCR (Aligned Manifold Property and Topology Cloud Representation), a molecular surface representation that combines local quantum-derived scalar fields and custom topological descriptors within an aligned point cloud format. Each surface point includes a chemically meaningful scalar, geodesically derived topology vectors, and coordinates transformed into a canonical reference frame, enabling efficient learning with conventional SE(3)-sensitive architectures. AMPTCR is evaluated using a DGCNN framework on two tasks: molecular weight and bacterial growth inhibition. For molecular weight, results confirm that AMPTCR encodes physically meaningful data, with a validation R^2 of 0.87. In the bacterial inhibition task, AMPTCR enables both classification and direct regression of E. coli inhibition values using Dual Fukui functions as the electronic descriptor and Morgan Fingerprints as auxiliary data, achieving an ROC AUC of 0.912 on the classification task, and an R^2 of 0.54 on the regression task. These results help demonstrate that AMPTCR offers a compact, expressive, and architecture-agnostic representation for modeling surface-mediated molecular properties.

new Multi-Agent Reinforcement Learning for Sample-Efficient Deep Neural Network Mapping

Authors: Srivatsan Krishnan, Jason Jabbour, Dan Zhang, Natasha Jaques, Aleksandra Faust, Shayegan Omidshafiei, Vijay Janapa Reddi

Abstract: Mapping deep neural networks (DNNs) to hardware is critical for optimizing latency, energy consumption, and resource utilization, making it a cornerstone of high-performance accelerator design. Due to the vast and complex mapping space, reinforcement learning (RL) has emerged as a promising approach-but its effectiveness is often limited by sample inefficiency. We present a decentralized multi-agent reinforcement learning (MARL) framework designed to overcome this challenge. By distributing the search across multiple agents, our framework accelerates exploration. To avoid inefficiencies from training multiple agents in parallel, we introduce an agent clustering algorithm that assigns similar mapping parameters to the same agents based on correlation analysis. This enables a decentralized, parallelized learning process that significantly improves sample efficiency. Experimental results show our MARL approach improves sample efficiency by 30-300x over standard single-agent RL, achieving up to 32.61x latency reduction and 16.45x energy-delay product (EDP) reduction under iso-sample conditions.

new Reducing GPU Memory Fragmentation via Spatio-Temporal Planning for Efficient Large-Scale Model Training

Authors: Zixiao Huang, Junhao Hu, Hao Lin, Chunyang Zhu, Yueran Tang, Quanlu Zhang, Zhen Guo, Zhenhua Li, Shengen Yan, Zhenhua Zhu, Guohao Dai, Yu Wang

Abstract: The rapid scaling of large language models (LLMs) has significantly increased GPU memory pressure, which is further aggravated by training optimization techniques such as virtual pipeline and recomputation that disrupt tensor lifespans and introduce considerable memory fragmentation. Default GPU memory allocators of popular deep learning frameworks like PyTorch use online strategies without knowledge of tensor lifespans, which can waste up to 43\% of memory and cause out-of-memory errors, rendering optimization techniques ineffective or even unusable. To address this, we introduce STWeaver, a GPU memory allocator for deep learning frameworks that reduces fragmentation by exploiting the spatial and temporal regularity in memory allocation behaviors of training workloads. STWeaver introduces a novel paradigm that combines offline planning with online allocation. The offline planning leverages spatio-temporal regularities to generate a near-optimal allocation plan, while the online allocation handles complex and dynamic models such as Mixture-of-Experts (MoE). Built as a pluggable PyTorch allocator, STWeaver reduces fragmentation ratio on average by 79.2\% (up to 100\%) across both dense and sparse models, with negligible overhead. This enables more efficient, high-throughput training configurations and improves performance by up to 32.5\%.

new Understanding Generalization, Robustness, and Interpretability in Low-Capacity Neural Networks

Authors: Yash Kumar

Abstract: Although modern deep learning often relies on massive over-parameterized models, the fundamental interplay between capacity, sparsity, and robustness in low-capacity networks remains a vital area of study. We introduce a controlled framework to investigate these properties by creating a suite of binary classification tasks from the MNIST dataset with increasing visual difficulty (e.g., 0 and 1 vs. 4 and 9). Our experiments reveal three core findings. First, the minimum model capacity required for successful generalization scales directly with task complexity. Second, these trained networks are robust to extreme magnitude pruning (up to 95% sparsity), revealing the existence of sparse, high-performing subnetworks. Third, we show that over-parameterization provides a significant advantage in robustness against input corruption. Interpretability analysis via saliency maps further confirms that these identified sparse subnetworks preserve the core reasoning process of the original dense models. This work provides a clear, empirical demonstration of the foundational trade-offs governing simple neural networks.

new Towards Resilient Safety-driven Unlearning for Diffusion Models against Downstream Fine-tuning

Authors: Boheng Li, Renjie Gu, Junjie Wang, Leyi Qi, Yiming Li, Run Wang, Zhan Qin, Tianwei Zhang

Abstract: Text-to-image (T2I) diffusion models have achieved impressive image generation quality and are increasingly fine-tuned for personalized applications. However, these models often inherit unsafe behaviors from toxic pretraining data, raising growing safety concerns. While recent safety-driven unlearning methods have made promising progress in suppressing model toxicity, they are identified to be fragile to downstream fine-tuning, where we reveal that state-of-the-art methods largely fail to retain their effectiveness even when fine-tuned on entirely benign datasets. To mitigate this problem, in this paper, we propose ResAlign, a safety-driven unlearning framework with enhanced resilience against downstream fine-tuning. By modeling downstream fine-tuning as an implicit optimization problem with a Moreau Envelope-based reformulation, ResAlign enables efficient gradient estimation to minimize the recovery of harmful behaviors. Additionally, a meta-learning strategy is proposed to simulate a diverse distribution of fine-tuning scenarios to improve generalization. Extensive experiments across a wide range of datasets, fine-tuning methods, and configurations demonstrate that ResAlign consistently outperforms prior unlearning approaches in retaining safety after downstream fine-tuning while preserving benign generation capability well.

new Perovskite-R1: A Domain-Specialized LLM for Intelligent Discovery of Precursor Additives and Experimental Design

Authors: Xin-De Wang, Zhi-Rui Chen, Peng-Jie Guo, Ze-Feng Gao, Cheng Mu, Zhong-Yi Lu

Abstract: Perovskite solar cells (PSCs) have rapidly emerged as a leading contender in next-generation photovoltaic technologies, owing to their exceptional power conversion efficiencies and advantageous material properties. Despite these advances, challenges such as long-term stability, environmental sustainability, and scalable manufacturing continue to hinder their commercialization. Precursor additive engineering has shown promise in addressing these issues by enhancing both the performance and durability of PSCs. However, the explosive growth of scientific literature and the complex interplay of materials, processes, and device architectures make it increasingly difficult for researchers to efficiently access, organize, and utilize domain knowledge in this rapidly evolving field. To address this gap, we introduce Perovskite-R1, a specialized large language model (LLM) with advanced reasoning capabilities tailored for the discovery and design of PSC precursor additives. By systematically mining and curating 1,232 high-quality scientific publications and integrating a comprehensive library of 33,269 candidate materials, we constructed a domain-specific instruction-tuning dataset using automated question-answer generation and chain-of-thought reasoning. Fine-tuning the QwQ-32B model on this dataset resulted in Perovskite-R1, which can intelligently synthesize literature insights and generate innovative and practical solutions for defect passivation and the selection of precursor additives. Experimental validation of several model-proposed strategies confirms their effectiveness in improving material stability and performance. Our work demonstrates the potential of domain-adapted LLMs in accelerating materials discovery and provides a closed-loop framework for intelligent, data-driven advancements in perovskite photovoltaic research.

new The Cost of Compression: Tight Quadratic Black-Box Attacks on Sketches for $\ell_2$ Norm Estimation

Authors: Sara Ahmadian, Edith Cohen, Uri Stemmer

Abstract: Dimensionality reduction via linear sketching is a powerful and widely used technique, but it is known to be vulnerable to adversarial inputs. We study the black-box adversarial setting, where a fixed, hidden sketching matrix A in $R^{k X n}$ maps high-dimensional vectors v $\in R^n$ to lower-dimensional sketches A v in $R^k$, and an adversary can query the system to obtain approximate ell2-norm estimates that are computed from the sketch. We present a universal, nonadaptive attack that, using tilde(O)($k^2$) queries, either causes a failure in norm estimation or constructs an adversarial input on which the optimal estimator for the query distribution (used by the attack) fails. The attack is completely agnostic to the sketching matrix and to the estimator: It applies to any linear sketch and any query responder, including those that are randomized, adaptive, or tailored to the query distribution. Our lower bound construction tightly matches the known upper bounds of tilde(Omega)($k^2$), achieved by specialized estimators for Johnson Lindenstrauss transforms and AMS sketches. Beyond sketching, our results uncover structural parallels to adversarial attacks in image classification, highlighting fundamental vulnerabilities of compressed representations.

new Leveraging Personalized PageRank and Higher-Order Topological Structures for Heterophily Mitigation in Graph Neural Networks

Authors: Yumeng Wang, Zengyi Wo, Wenjun Wang, Xingcheng Fu, Minglai Shao

Abstract: Graph Neural Networks (GNNs) excel in node classification tasks but often assume homophily, where connected nodes share similar labels. This assumption does not hold in many real-world heterophilic graphs. Existing models for heterophilic graphs primarily rely on pairwise relationships, overlooking multi-scale information from higher-order structures. This leads to suboptimal performance, particularly under noise from conflicting class information across nodes. To address these challenges, we propose HPGNN, a novel model integrating Higher-order Personalized PageRank with Graph Neural Networks. HPGNN introduces an efficient high-order approximation of Personalized PageRank (PPR) to capture long-range and multi-scale node interactions. This approach reduces computational complexity and mitigates noise from surrounding information. By embedding higher-order structural information into convolutional networks, HPGNN effectively models key interactions across diverse graph dimensions. Extensive experiments on benchmark datasets demonstrate HPGNN's effectiveness. The model achieves better performance than five out of seven state-of-the-art methods on heterophilic graphs in downstream tasks while maintaining competitive performance on homophilic graphs. HPGNN's ability to balance multi-scale information and robustness to noise makes it a versatile solution for real-world graph learning challenges. Codes are available at https://github.com/streetcorner/HPGNN.

URLs: https://github.com/streetcorner/HPGNN.

new Bipartite Patient-Modality Graph Learning with Event-Conditional Modelling of Censoring for Cancer Survival Prediction

Authors: Hailin Yue, Hulin Kuang, Jin Liu, Junjian Li, Lanlan Wang, Mengshen He, Jianxin Wang

Abstract: Accurately predicting the survival of cancer patients is crucial for personalized treatment. However, existing studies focus solely on the relationships between samples with known survival risks, without fully leveraging the value of censored samples. Furthermore, these studies may suffer performance degradation in modality-missing scenarios and even struggle during the inference process. In this study, we propose a bipartite patient-modality graph learning with event-conditional modelling of censoring for cancer survival prediction (CenSurv). Specifically, we first use graph structure to model multimodal data and obtain representation. Then, to alleviate performance degradation in modality-missing scenarios, we design a bipartite graph to simulate the patient-modality relationship in various modality-missing scenarios and leverage a complete-incomplete alignment strategy to explore modality-agnostic features. Finally, we design a plug-and-play event-conditional modeling of censoring (ECMC) that selects reliable censored data using dynamic momentum accumulation confidences, assigns more accurate survival times to these censored data, and incorporates them as uncensored data into training. Comprehensive evaluations on 5 publicly cancer datasets showcase the superiority of CenSurv over the best state-of-the-art by 3.1% in terms of the mean C-index, while also exhibiting excellent robustness under various modality-missing scenarios. In addition, using the plug-and-play ECMC module, the mean C-index of 8 baselines increased by 1.3% across 5 datasets. Code of CenSurv is available at https://github.com/yuehailin/CenSurv.

URLs: https://github.com/yuehailin/CenSurv.

new Optimization and generalization analysis for two-layer physics-informed neural networks without over-parametrization

Authors: Zhihan Zeng, Yiqi Gu

Abstract: This work focuses on the behavior of stochastic gradient descent (SGD) in solving least-squares regression with physics-informed neural networks (PINNs). Past work on this topic has been based on the over-parameterization regime, whose convergence may require the network width to increase vastly with the number of training samples. So, the theory derived from over-parameterization may incur prohibitive computational costs and is far from practical experiments. We perform new optimization and generalization analysis for SGD in training two-layer PINNs, making certain assumptions about the target function to avoid over-parameterization. Given $\epsilon>0$, we show that if the network width exceeds a threshold that depends only on $\epsilon$ and the problem, then the training loss and expected loss will decrease below $O(\epsilon)$.

new Improving Predictions on Highly Unbalanced Data Using Open Source Synthetic Data Upsampling

Authors: Ivona Krchova, Michael Platzer, Paul Tiwald

Abstract: Unbalanced tabular data sets present significant challenges for predictive modeling and data analysis across a wide range of applications. In many real-world scenarios, such as fraud detection, medical diagnosis, and rare event prediction, minority classes are vastly underrepresented, making it difficult for traditional machine learning algorithms to achieve high accuracy. These algorithms tend to favor the majority class, leading to biased models that struggle to accurately represent minority classes. Synthetic data holds promise for addressing the under-representation of minority classes by providing new, diverse, and highly realistic samples. This paper presents a benchmark study on the use of AI-generated synthetic data for upsampling highly unbalanced tabular data sets. We evaluate the effectiveness of an open-source solution, the Synthetic Data SDK by MOSTLY AI, which provides a flexible and user-friendly approach to synthetic upsampling for mixed-type data. We compare predictive models trained on data sets upsampled with synthetic records to those using standard methods, such as naive oversampling and SMOTE-NC. Our results demonstrate that synthetic data can improve predictive accuracy for minority groups by generating diverse data points that fill gaps in sparse regions of the feature space. We show that upsampled synthetic training data consistently results in top-performing predictive models, particularly for mixed-type data sets containing very few minority samples.

new RIS-aided Latent Space Alignment for Semantic Channel Equalization

Authors: Tom\'as H\"uttebr\"aucker, Mario Edoardo Pandolfo, Simone Fiorellino, Emilio Calvanese Strinati, Paolo Di Lorenzo

Abstract: Semantic communication systems introduce a new paradigm in wireless communications, focusing on transmitting the intended meaning rather than ensuring strict bit-level accuracy. These systems often rely on Deep Neural Networks (DNNs) to learn and encode meaning directly from data, enabling more efficient communication. However, in multi-user settings where interacting agents are trained independently-without shared context or joint optimization-divergent latent representations across AI-native devices can lead to semantic mismatches, impeding mutual understanding even in the absence of traditional transmission errors. In this work, we address semantic mismatch in Multiple-Input Multiple-Output (MIMO) channels by proposing a joint physical and semantic channel equalization framework that leverages the presence of Reconfigurable Intelligent Surfaces (RIS). The semantic equalization is implemented as a sequence of transformations: (i) a pre-equalization stage at the transmitter; (ii) propagation through the RIS-aided channel; and (iii) a post-equalization stage at the receiver. We formulate the problem as a constrained Minimum Mean Squared Error (MMSE) optimization and propose two solutions: (i) a linear semantic equalization chain, and (ii) a non-linear DNN-based semantic equalizer. Both methods are designed to operate under semantic compression in the latent space and adhere to transmit power constraints. Through extensive evaluations, we show that the proposed joint equalization strategies consistently outperform conventional, disjoint approaches to physical and semantic channel equalization across a broad range of scenarios and wireless channel conditions.

new Canonical Correlation Patterns for Validating Clustering of Multivariate Time Series

Authors: Isabella Degen, Zahraa S Abdallah, Kate Robson Brown, Henry W J Reeve

Abstract: Clustering of multivariate time series using correlation-based methods reveals regime changes in relationships between variables across health, finance, and industrial applications. However, validating whether discovered clusters represent distinct relationships rather than arbitrary groupings remains a fundamental challenge. Existing clustering validity indices were developed for Euclidean data, and their effectiveness for correlation patterns has not been systematically evaluated. Unlike Euclidean clustering, where geometric shapes provide discrete reference targets, correlations exist in continuous space without equivalent reference patterns. We address this validation gap by introducing canonical correlation patterns as mathematically defined validation targets that discretise the infinite correlation space into finite, interpretable reference patterns. Using synthetic datasets with perfect ground truth across controlled conditions, we demonstrate that canonical patterns provide reliable validation targets, with L1 norm for mapping and L5 norm for silhouette width criterion and Davies-Bouldin index showing superior performance. These methods are robust to distribution shifts and appropriately detect correlation structure degradation, enabling practical implementation guidelines. This work establishes a methodological foundation for rigorous correlation-based clustering validation in high-stakes domains.

new Analogy making as amortised model construction

Authors: David G. Nagy, Tingke Shen, Hanqi Zhou, Charley M. Wu, Peter Dayan

Abstract: Humans flexibly construct internal models to navigate novel situations. To be useful, these internal models must be sufficiently faithful to the environment that resource-limited planning leads to adequate outcomes; equally, they must be tractable to construct in the first place. We argue that analogy plays a central role in these processes, enabling agents to reuse solution-relevant structure from past experiences and amortise the computational costs of both model construction (construal) and planning. Formalising analogies as partial homomorphisms between Markov decision processes, we sketch a framework in which abstract modules, derived from previous construals, serve as composable building blocks for new ones. This modular reuse allows for flexible adaptation of policies and representations across domains with shared structural essence.

new confopt: A Library for Implementation and Evaluation of Gradient-based One-Shot NAS Methods

Authors: Abhash Kumar Jha, Shakiba Moradian, Arjun Krishnakumar, Martin Rapp, Frank Hutter

Abstract: Gradient-based one-shot neural architecture search (NAS) has significantly reduced the cost of exploring architectural spaces with discrete design choices, such as selecting operations within a model. However, the field faces two major challenges. First, evaluations of gradient-based NAS methods heavily rely on the DARTS benchmark, despite the existence of other available benchmarks. This overreliance has led to saturation, with reported improvements often falling within the margin of noise. Second, implementations of gradient-based one-shot NAS methods are fragmented across disparate repositories, complicating fair and reproducible comparisons and further development. In this paper, we introduce Configurable Optimizer (confopt), an extensible library designed to streamline the development and evaluation of gradient-based one-shot NAS methods. Confopt provides a minimal API that makes it easy for users to integrate new search spaces, while also supporting the decomposition of NAS optimizers into their core components. We use this framework to create a suite of new DARTS-based benchmarks, and combine them with a novel evaluation protocol to reveal a critical flaw in how gradient-based one-shot NAS methods are currently assessed. The code can be found at https://github.com/automl/ConfigurableOptimizer.

URLs: https://github.com/automl/ConfigurableOptimizer.

new Symbolic Graph Intelligence: Hypervector Message Passing for Learning Graph-Level Patterns with Tsetlin Machines

Authors: Christian D. Blakely

Abstract: We propose a multilayered symbolic framework for general graph classification that leverages sparse binary hypervectors and Tsetlin Machines. Each graph is encoded through structured message passing, where node, edge, and attribute information are bound and bundled into a symbolic hypervector. This process preserves the hierarchical semantics of the graph through layered binding from node attributes to edge relations to structural roles resulting in a compact, discrete representation. We also formulate a local interpretability framework which lends itself to a key advantage of our approach being locally interpretable. We validate our method on TUDataset benchmarks, demonstrating competitive accuracy with strong symbolic transparency compared to neural graph models.

new A Comprehensive Data-centric Overview of Federated Graph Learning

Authors: Zhengyu Wu, Xunkai Li, Yinlin Zhu, Zekai Chen, Guochen Yan, Yanyu Yan, Hao Zhang, Yuming Ai, Xinmo Jin, Rong-Hua Li, Guoren Wang

Abstract: In the era of big data applications, Federated Graph Learning (FGL) has emerged as a prominent solution that reconcile the tradeoff between optimizing the collective intelligence between decentralized datasets holders and preserving sensitive information to maximum. Existing FGL surveys have contributed meaningfully but largely focus on integrating Federated Learning (FL) and Graph Machine Learning (GML), resulting in early stage taxonomies that emphasis on methodology and simulated scenarios. Notably, a data centric perspective, which systematically examines FGL methods through the lens of data properties and usage, remains unadapted to reorganize FGL research, yet it is critical to assess how FGL studies manage to tackle data centric constraints to enhance model performances. This survey propose a two-level data centric taxonomy: Data Characteristics, which categorizes studies based on the structural and distributional properties of datasets used in FGL, and Data Utilization, which analyzes the training procedures and techniques employed to overcome key data centric challenges. Each taxonomy level is defined by three orthogonal criteria, each representing a distinct data centric configuration. Beyond taxonomy, this survey examines FGL integration with Pretrained Large Models, showcases realistic applications, and highlights future direction aligned with emerging trends in GML.

new Families of Optimal Transport Kernels for Cell Complexes

Authors: Rahul Khorana

Abstract: Recent advances have discussed cell complexes as ideal learning representations. However, there is a lack of available machine learning methods suitable for learning on CW complexes. In this paper, we derive an explicit expression for the Wasserstein distance between cell complex signal distributions in terms of a Hodge-Laplacian matrix. This leads to a structurally meaningful measure to compare CW complexes and define the optimal transportation map. In order to simultaneously include both feature and structure information, we extend the Fused Gromov-Wasserstein distance to CW complexes. Finally, we introduce novel kernels over the space of probability measures on CW complexes based on the dual formulation of optimal transport.

new Scaling Linear Attention with Sparse State Expansion

Authors: Yuqi Pan, Yongqi An, Zheng Li, Yuhong Chou, Ruijie Zhu, Xiaohui Wang, Mingxuan Wang, Jinqiao Wang, Guoqi Li

Abstract: The Transformer architecture, despite its widespread success, struggles with long-context scenarios due to quadratic computation and linear memory growth. While various linear attention variants mitigate these efficiency constraints by compressing context into fixed-size states, they often degrade performance in tasks such as in-context retrieval and reasoning. To address this limitation and achieve more effective context compression, we propose two key innovations. First, we introduce a row-sparse update formulation for linear attention by conceptualizing state updating as information classification. This enables sparse state updates via softmax-based top-$k$ hard classification, thereby extending receptive fields and reducing inter-class interference. Second, we present Sparse State Expansion (SSE) within the sparse framework, which expands the contextual state into multiple partitions, effectively decoupling parameter size from state capacity while maintaining the sparse classification paradigm. Our design, supported by efficient parallelized implementations, yields effective classification and discriminative state representations. We extensively validate SSE in both pure linear and hybrid (SSE-H) architectures across language modeling, in-context retrieval, and mathematical reasoning benchmarks. SSE demonstrates strong retrieval performance and scales favorably with state size. Moreover, after reinforcement learning (RL) training, our 2B SSE-H model achieves state-of-the-art mathematical reasoning performance among small reasoning models, scoring 64.7 on AIME24 and 51.3 on AIME25, significantly outperforming similarly sized open-source Transformers. These results highlight SSE as a promising and efficient architecture for long-context modeling.

new Meta-Learning for Cold-Start Personalization in Prompt-Tuned LLMs

Authors: Yushang Zhao, Huijie Shen, Dannier Li, Lu Chang, Chengrui Zhou, Yinuo Yang

Abstract: Generative, explainable, and flexible recommender systems, derived using Large Language Models (LLM) are promising and poorly adapted to the cold-start user situation, where there is little to no history of interaction. The current solutions i.e. supervised fine-tuning and collaborative filtering are dense-user-item focused and would be expensive to maintain and update. This paper introduces a meta-learning framework, that can be used to perform parameter-efficient prompt-tuning, to effectively personalize LLM-based recommender systems quickly at cold-start. The model learns soft prompt embeddings with first-order (Reptile) and second-order (MAML) optimization by treating each of the users as the tasks. As augmentations to the input tokens, these learnable vectors are the differentiable control variables that represent user behavioral priors. The prompts are meta-optimized through episodic sampling, inner-loop adaptation, and outer-loop generalization. On MovieLens-1M, Amazon Reviews, and Recbole, we can see that our adaptive model outperforms strong baselines in NDCG@10, HR@10, and MRR, and it runs in real-time (i.e., below 300 ms) on consumer GPUs. Zero-history personalization is also supported by this scalable solution, and its 275 ms rate of adaptation allows successful real-time risk profiling of financial systems by shortening detection latency and improving payment network stability. Crucially, the 275 ms adaptation capability can enable real-time risk profiling for financial institutions, reducing systemic vulnerability detection latency significantly versus traditional compliance checks. By preventing contagion in payment networks (e.g., Fedwire), the framework strengthens national financial infrastructure resilience.

new GASPnet: Global Agreement to Synchronize Phases

Authors: Andrea Alamiaa, Sabine Muzellec, Thomas Serre, Rufin VanRullen

Abstract: In recent years, Transformer architectures have revolutionized most fields of artificial intelligence, relying on an attentional mechanism based on the agreement between keys and queries to select and route information in the network. In previous work, we introduced a novel, brain-inspired architecture that leverages a similar implementation to achieve a global 'routing by agreement' mechanism. Such a system modulates the network's activity by matching each neuron's key with a single global query, pooled across the entire network. Acting as a global attentional system, this mechanism improves noise robustness over baseline levels but is insufficient for multi-classification tasks. Here, we improve on this work by proposing a novel mechanism that combines aspects of the Transformer attentional operations with a compelling neuroscience theory, namely, binding by synchrony. This theory proposes that the brain binds together features by synchronizing the temporal activity of neurons encoding those features. This allows the binding of features from the same object while efficiently disentangling those from distinct objects. We drew inspiration from this theory and incorporated angular phases into all layers of a convolutional network. After achieving phase alignment via Kuramoto dynamics, we use this approach to enhance operations between neurons with similar phases and suppresses those with opposite phases. We test the benefits of this mechanism on two datasets: one composed of pairs of digits and one composed of a combination of an MNIST item superimposed on a CIFAR-10 image. Our results reveal better accuracy than CNN networks, proving more robust to noise and with better generalization abilities. Overall, we propose a novel mechanism that addresses the visual binding problem in neural networks by leveraging the synergy between neuroscience and machine learning.

new Custom Algorithm-based Fault Tolerance for Attention Layers in Transformers

Authors: Vasileios Titopoulos, Kosmas Alexandridis, Giorgos Dimitrakopoulos

Abstract: Transformers and large language models (LLMs), powered by the attention mechanism, have transformed numerous AI applications, driving the need for specialized hardware accelerators. A major challenge in these accelerators is efficiently detecting errors caused by random hardware faults. Traditional algorithm-based fault tolerance (ABFT) techniques verify individual matrix multiplications but fall short in handling the full attention mechanism, particularly due to intermediate softmax normalization. This work proposes Flash-ABFT, a novel method that computes an online checksum across the entire three-matrix product of query, key and value matrices, of an attention layer, including the softmax operation, with a single check. This approach significantly reduces overhead by eliminating redundant checks while maintaining high fault-detection accuracy. Experimental results demonstrate that Flash-ABFT incurs only 5.3% hardware area overhead and less than 1.9% energy overhead, making it a cost-effective and robust solution for error detection in attention accelerators.

new Latent Space Alignment for AI-Native MIMO Semantic Communications

Authors: Mario Edoardo Pandolfo, Simone Fiorellino, Emilio Calvanese Strinati, Paolo Di Lorenzo

Abstract: Semantic communications focus on prioritizing the understanding of the meaning behind transmitted data and ensuring the successful completion of tasks that motivate the exchange of information. However, when devices rely on different languages, logic, or internal representations, semantic mismatches may occur, potentially hindering mutual understanding. This paper introduces a novel approach to addressing latent space misalignment in semantic communications, exploiting multiple-input multiple-output (MIMO) communications. Specifically, our method learns a MIMO precoder/decoder pair that jointly performs latent space compression and semantic channel equalization, mitigating both semantic mismatches and physical channel impairments. We explore two solutions: (i) a linear model, optimized by solving a biconvex optimization problem via the alternating direction method of multipliers (ADMM); (ii) a neural network-based model, which learns semantic MIMO precoder/decoder under transmission power budget and complexity constraints. Numerical results demonstrate the effectiveness of the proposed approach in a goal-oriented semantic communication scenario, illustrating the main trade-offs between accuracy, communication burden, and complexity of the solutions.

new FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation

Authors: Pingyi Fan, Anbai Jiang, Shuwei Zhang, Zhiqiang Lv, Bing Han, Xinhu Zheng, Wenrui Liang, Junjie Li, Wei-Qiang Zhang, Yanmin Qian, Xie Chen, Cheng Lu, Jia Liu

Abstract: With the rapid deployment of SCADA systems, how to effectively analyze industrial signals and detect abnormal states is an urgent need for the industry. Due to the significant heterogeneity of these signals, which we summarize as the M5 problem, previous works only focus on small sub-problems and employ specialized models, failing to utilize the synergies between modalities and the powerful scaling law. However, we argue that the M5 signals can be modeled in a unified manner due to the intrinsic similarity. As a result, we propose FISHER, a Foundation model for multi-modal Industrial Signal compreHEnsive Representation. To support arbitrary sampling rates, FISHER considers the increment of sampling rate as the concatenation of sub-band information. Specifically, FISHER takes the STFT sub-band as the modeling unit and adopts a teacher student SSL framework for pre-training. We also develop the RMIS benchmark, which evaluates the representations of M5 industrial signals on multiple health management tasks. Compared with top SSL models, FISHER showcases versatile and outstanding capabilities with a general performance gain up to 5.03%, along with much more efficient scaling curves. We also investigate the scaling law on downstream tasks and derive potential avenues for future works. FISHER is now open-sourced on https://github.com/jianganbai/FISHER

URLs: https://github.com/jianganbai/FISHER

new Screen2AX: Vision-Based Approach for Automatic macOS Accessibility Generation

Authors: Viktor Muryn, Marta Sumyk, Mariya Hirna, Sofiya Garkot, Maksym Shamrai

Abstract: Desktop accessibility metadata enables AI agents to interpret screens and supports users who depend on tools like screen readers. Yet, many applications remain largely inaccessible due to incomplete or missing metadata provided by developers - our investigation shows that only 33% of applications on macOS offer full accessibility support. While recent work on structured screen representation has primarily addressed specific challenges, such as UI element detection or captioning, none has attempted to capture the full complexity of desktop interfaces by replicating their entire hierarchical structure. To bridge this gap, we introduce Screen2AX, the first framework to automatically create real-time, tree-structured accessibility metadata from a single screenshot. Our method uses vision-language and object detection models to detect, describe, and organize UI elements hierarchically, mirroring macOS's system-level accessibility structure. To tackle the limited availability of data for macOS desktop applications, we compiled and publicly released three datasets encompassing 112 macOS applications, each annotated for UI element detection, grouping, and hierarchical accessibility metadata alongside corresponding screenshots. Screen2AX accurately infers hierarchy trees, achieving a 77% F1 score in reconstructing a complete accessibility tree. Crucially, these hierarchy trees improve the ability of autonomous agents to interpret and interact with complex desktop interfaces. We introduce Screen2AX-Task, a benchmark specifically designed for evaluating autonomous agent task execution in macOS desktop environments. Using this benchmark, we demonstrate that Screen2AX delivers a 2.2x performance improvement over native accessibility representations and surpasses the state-of-the-art OmniParser V2 system on the ScreenSpot benchmark.

new Improving Model Classification by Optimizing the Training Dataset

Authors: Morad Tukan, Loay Mualem, Eitan Netzer, Liran Sigalat

Abstract: In the era of data-centric AI, the ability to curate high-quality training data is as crucial as model design. Coresets offer a principled approach to data reduction, enabling efficient learning on large datasets through importance sampling. However, conventional sensitivity-based coreset construction often falls short in optimizing for classification performance metrics, e.g., $F1$ score, focusing instead on loss approximation. In this work, we present a systematic framework for tuning the coreset generation process to enhance downstream classification quality. Our method introduces new tunable parameters--including deterministic sampling, class-wise allocation, and refinement via active sampling, beyond traditional sensitivity scores. Through extensive experiments on diverse datasets and classifiers, we demonstrate that tuned coresets can significantly outperform both vanilla coresets and full dataset training on key classification metrics, offering an effective path towards better and more efficient model training.

new A Partitioned Sparse Variational Gaussian Process for Fast, Distributed Spatial Modeling

Authors: Michael Grosskopf, Kellin Rumsey, Ayan Biswas, Earl Lawrence

Abstract: The next generation of Department of Energy supercomputers will be capable of exascale computation. For these machines, far more computation will be possible than that which can be saved to disk. As a result, users will be unable to rely on post-hoc access to data for uncertainty quantification and other statistical analyses and there will be an urgent need for sophisticated machine learning algorithms which can be trained in situ. Algorithms deployed in this setting must be highly scalable, memory efficient and capable of handling data which is distributed across nodes as spatially contiguous partitions. One suitable approach involves fitting a sparse variational Gaussian process (SVGP) model independently and in parallel to each spatial partition. The resulting model is scalable, efficient and generally accurate, but produces the undesirable effect of constructing discontinuous response surfaces due to the disagreement between neighboring models at their shared boundary. In this paper, we extend this idea by allowing for a small amount of communication between neighboring spatial partitions which encourages better alignment of the local models, leading to smoother spatial predictions and a better fit in general. Due to our decentralized communication scheme, the proposed extension remains highly scalable and adds very little overhead in terms of computation (and none, in terms of memory). We demonstrate this Partitioned SVGP (PSVGP) approach for the Energy Exascale Earth System Model (E3SM) and compare the results to the independent SVGP case.

new Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning

Authors: Helena Casademunt, Caden Juang, Adam Karvonen, Samuel Marks, Senthooran Rajamanoharan, Neel Nanda

Abstract: Fine-tuning large language models (LLMs) can lead to unintended out-of-distribution generalization. Standard approaches to this problem rely on modifying training data, for example by adding data that better specify the intended generalization. However, this is not always practical. We introduce Concept Ablation Fine-Tuning (CAFT), a technique that leverages interpretability tools to control how LLMs generalize from fine-tuning, without needing to modify the training data or otherwise use data from the target distribution. Given a set of directions in an LLM's latent space corresponding to undesired concepts, CAFT works by ablating these concepts with linear projections during fine-tuning, steering the model away from unintended generalizations. We successfully apply CAFT to three fine-tuning tasks, including emergent misalignment, a phenomenon where LLMs fine-tuned on a narrow task generalize to give egregiously misaligned responses to general questions. Without any changes to the fine-tuning data, CAFT reduces misaligned responses by 10x without degrading performance on the training distribution. Overall, CAFT represents a novel approach for steering LLM generalization without modifying training data.

new Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty

Authors: Mehul Damani, Isha Puri, Stewart Slocum, Idan Shenfeld, Leshem Choshen, Yoon Kim, Jacob Andreas

Abstract: When language models (LMs) are trained via reinforcement learning (RL) to generate natural language "reasoning chains", their performance improves on a variety of difficult question answering tasks. Today, almost all successful applications of RL for reasoning use binary reward functions that evaluate the correctness of LM outputs. Because such reward functions do not penalize guessing or low-confidence outputs, they often have the unintended side-effect of degrading calibration and increasing the rate at which LMs generate incorrect responses (or "hallucinate") in other problem domains. This paper describes RLCR (Reinforcement Learning with Calibration Rewards), an approach to training reasoning models that jointly improves accuracy and calibrated confidence estimation. During RLCR, LMs generate both predictions and numerical confidence estimates after reasoning. They are trained to optimize a reward function that augments a binary correctness score with a Brier score -- a scoring rule for confidence estimates that incentivizes calibrated prediction. We first prove that this reward function (or any analogous reward function that uses a bounded, proper scoring rule) yields models whose predictions are both accurate and well-calibrated. We next show that across diverse datasets, RLCR substantially improves calibration with no loss in accuracy, on both in-domain and out-of-domain evaluations -- outperforming both ordinary RL training and classifiers trained to assign post-hoc confidence scores. While ordinary RL hurts calibration, RLCR improves it. Finally, we demonstrate that verbalized confidence can be leveraged at test time to improve accuracy and calibration via confidence-weighted scaling methods. Our results show that explicitly optimizing for calibration can produce more generally reliable reasoning models.

new Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning

Authors: Junhao Shen, Haiteng Zhao, Yuzhe Gu, Songyang Gao, Kuikun Liu, Haian Huang, Jianfei Gao, Dahua Lin, Wenwei Zhang, Kai Chen

Abstract: Enhancing large vision-language models (LVLMs) with visual slow-thinking reasoning is crucial for solving complex multimodal tasks. However, since LVLMs are mainly trained with vision-language alignment, it is difficult to adopt on-policy reinforcement learning (RL) to develop the slow thinking ability because the rollout space is restricted by its initial abilities. Off-policy RL offers a way to go beyond the current policy, but directly distilling trajectories from external models may cause visual hallucinations due to mismatched visual perception abilities across models. To address these issues, this paper proposes SOPHIA, a simple and scalable Semi-Off-Policy RL for vision-language slow-tHInking reAsoning. SOPHIA builds a semi-off-policy behavior model by combining on-policy visual understanding from a trainable LVLM with off-policy slow-thinking reasoning from a language model, assigns outcome-based rewards to reasoning, and propagates visual rewards backward. Then LVLM learns slow-thinking reasoning ability from the obtained reasoning trajectories using propagated rewards via off-policy RL algorithms. Extensive experiments with InternVL2.5 and InternVL3.0 with 8B and 38B sizes show the effectiveness of SOPHIA. Notably, SOPHIA improves InternVL3.0-38B by 8.50% in average, reaching state-of-the-art performance among open-source LVLMs on multiple multimodal reasoning benchmarks, and even outperforms some closed-source models (e.g., GPT-4.1) on the challenging MathVision and OlympiadBench, achieving 49.08% and 49.95% pass@1 accuracy, respectively. Analysis shows SOPHIA outperforms supervised fine-tuning and direct on-policy RL methods, offering a better policy initialization for further on-policy training.

cross Adversarial Demonstration Learning for Low-resource NER Using Dual Similarity

Authors: Guowen Yuan, Tien-Hsuan Wu, Lianghao Xia, Ben Kao

Abstract: We study the problem of named entity recognition (NER) based on demonstration learning in low-resource scenarios. We identify two issues in demonstration construction and model training. Firstly, existing methods for selecting demonstration examples primarily rely on semantic similarity; We show that feature similarity can provide significant performance improvement. Secondly, we show that the NER tagger's ability to reference demonstration examples is generally inadequate. We propose a demonstration and training approach that effectively addresses these issues. For the first issue, we propose to select examples by dual similarity, which comprises both semantic similarity and feature similarity. For the second issue, we propose to train an NER model with adversarial demonstration such that the model is forced to refer to the demonstrations when performing the tagging task. We conduct comprehensive experiments in low-resource NER tasks, and the results demonstrate that our method outperforms a range of methods.

cross Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark

Authors: Goeric Huybrechts, Srikanth Ronanki, Sai Muralidhar Jayanthi, Jack Fitzgerald, Srinivasan Veeravanallur

Abstract: The proliferation of multimodal Large Language Models has significantly advanced the ability to analyze and understand complex data inputs from different modalities. However, the processing of long documents remains under-explored, largely due to a lack of suitable benchmarks. To address this, we introduce Document Haystack, a comprehensive benchmark designed to evaluate the performance of Vision Language Models (VLMs) on long, visually complex documents. Document Haystack features documents ranging from 5 to 200 pages and strategically inserts pure text or multimodal text+image "needles" at various depths within the documents to challenge VLMs' retrieval capabilities. Comprising 400 document variants and a total of 8,250 questions, it is supported by an objective, automated evaluation framework. We detail the construction and characteristics of the Document Haystack dataset, present results from prominent VLMs and discuss potential research avenues in this area.

cross ADEPTS: A Capability Framework for Human-Centered Agent Design

Authors: Pierluca D'Oro, Caley Drooff, Joy Chen, Joseph Tighe

Abstract: Large language models have paved the way to powerful and flexible AI agents, assisting humans by increasingly integrating into their daily life. This flexibility, potential, and growing adoption demands a holistic and cross-disciplinary approach to developing, monitoring and discussing the capabilities required for agent-driven user experiences. However, current guidance on human-centered AI agent development is scattered: UX heuristics focus on interface behaviors, engineering taxonomies describe internal pipelines, and ethics checklists address high-level governance. There is no concise, user-facing vocabulary that tells teams what an agent should fundamentally be able to do. We introduce ADEPTS, a capability framework defining a set of core user-facing capabilities to provide unified guidance around the development of AI agents. ADEPTS is based on six principles for human-centered agent design, that express the minimal, user-facing capabilities an AI agent should demonstrate to be understandable, controllable and trustworthy in everyday use. ADEPTS complements existing frameworks and taxonomies; differently from them, it sits at the interface between technical and experience development. By presenting ADEPTS, we aim to condense complex AI-UX requirements into a compact framework that is actionable guidance for AI researchers, designers, engineers, and policy reviewers alike. We believe ADEPTS has the potential of accelerating the improvement of user-relevant agent capabilities, of easing the design of experiences that take advantage of those capabilities, and of providing a shared language to track and discuss progress around the development of AI agents.

cross AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?

Authors: Ori Press, Brandon Amos, Haoyu Zhao, Yikai Wu, Samuel K. Ainsworth, Dominik Krupke, Patrick Kidger, Touqir Sajed, Bartolomeo Stellato, Jisun Park, Nathanael Bosch, Eli Meril, Albert Steppi, Arman Zharmagambetov, Fangzhao Zhang, David Perez-Pineiro, Alberto Mercurio, Ni Zhan, Talor Abramovich, Kilian Lieret, Hanlin Zhang, Shirley Huang, Matthias Bethge, Ofir Press

Abstract: Despite progress in language model (LM) capabilities, evaluations have thus far focused on models' performance on tasks that humans have previously solved, including in programming (Jimenez et al., 2024) and mathematics (Glazer et al., 2024). We therefore propose testing models' ability to design and implement algorithms in an open-ended benchmark: We task LMs with writing code that efficiently solves computationally challenging problems in computer science, physics, and mathematics. Our AlgoTune benchmark consists of 155 coding tasks collected from domain experts and a framework for validating and timing LM-synthesized solution code, which is compared to reference implementations from popular open-source packages. In addition, we develop a baseline LM agent, AlgoTuner, and evaluate its performance across a suite of frontier models. AlgoTuner achieves an average 1.72x speedup against our reference solvers, which use libraries such as SciPy, sk-learn and CVXPY. However, we find that current models fail to discover algorithmic innovations, instead preferring surface-level optimizations. We hope that AlgoTune catalyzes the development of LM agents exhibiting creative problem solving beyond state-of-the-art human performance.

cross Integrating Reason-Based Moral Decision-Making in the Reinforcement Learning Architecture

Authors: Lisa Dargasz

Abstract: Reinforcement Learning is a machine learning methodology that has demonstrated strong performance across a variety of tasks. In particular, it plays a central role in the development of artificial autonomous agents. As these agents become increasingly capable, market readiness is rapidly approaching, which means those agents, for example taking the form of humanoid robots or autonomous cars, are poised to transition from laboratory prototypes to autonomous operation in real-world environments. This transition raises concerns leading to specific requirements for these systems - among them, the requirement that they are designed to behave ethically. Crucially, research directed toward building agents that fulfill the requirement to behave ethically - referred to as artificial moral agents(AMAs) - has to address a range of challenges at the intersection of computer science and philosophy. This study explores the development of reason-based artificial moral agents (RBAMAs). RBAMAs are build on an extension of the reinforcement learning architecture to enable moral decision-making based on sound normative reasoning, which is achieved by equipping the agent with the capacity to learn a reason-theory - a theory which enables it to process morally relevant propositions to derive moral obligations - through case-based feedback. They are designed such that they adapt their behavior to ensure conformance to these obligations while they pursue their designated tasks. These features contribute to the moral justifiability of the their actions, their moral robustness, and their moral trustworthiness, which proposes the extended architecture as a concrete and deployable framework for the development of AMAs that fulfills key ethical desiderata. This study presents a first implementation of an RBAMA and demonstrates the potential of RBAMAs in initial experiments.

cross Structural DID with ML: Theory, Simulation, and a Roadmap for Applied Research

Authors: Yile Yu, Anzhi Xu, Yi Wang

Abstract: Causal inference in observational panel data has become a central concern in economics,policy analysis,and the broader social sciences.To address the core contradiction where traditional difference-in-differences (DID) struggles with high-dimensional confounding variables in observational panel data,while machine learning (ML) lacks causal structure interpretability,this paper proposes an innovative framework called S-DIDML that integrates structural identification with high-dimensional estimation.Building upon the structure of traditional DID methods,S-DIDML employs structured residual orthogonalization techniques (Neyman orthogonality+cross-fitting) to retain the group-time treatment effect (ATT) identification structure while resolving high-dimensional covariate interference issues.It designs a dynamic heterogeneity estimation module combining causal forests and semi-parametric models to capture spatiotemporal heterogeneity effects.The framework establishes a complete modular application process with standardized Stata implementation paths.The introduction of S-DIDML enriches methodological research on DID and DDML innovations, shifting causal inference from method stacking to architecture integration.This advancement enables social sciences to precisely identify policy-sensitive groups and optimize resource allocation.The framework provides replicable evaluation tools, decision optimization references,and methodological paradigms for complex intervention scenarios such as digital transformation policies and environmental regulations.

cross MSGM: A Multi-Scale Spatiotemporal Graph Mamba for EEG Emotion Recognition

Authors: Hanwen Liu, Yifeng Gong, Zuwei Yan, Zeheng Zhuang, Jiaxuan Lu

Abstract: EEG-based emotion recognition struggles with capturing multi-scale spatiotemporal dynamics and ensuring computational efficiency for real-time applications. Existing methods often oversimplify temporal granularity and spatial hierarchies, limiting accuracy. To overcome these challenges, we propose the Multi-Scale Spatiotemporal Graph Mamba (MSGM), a novel framework integrating multi-window temporal segmentation, bimodal spatial graph modeling, and efficient fusion via the Mamba architecture. By segmenting EEG signals across diverse temporal scales and constructing global-local graphs with neuroanatomical priors, MSGM effectively captures fine-grained emotional fluctuations and hierarchical brain connectivity. A multi-depth Graph Convolutional Network (GCN) and token embedding fusion module, paired with Mamba's state-space modeling, enable dynamic spatiotemporal interaction at linear complexity. Notably, with just one MSST-Mamba layer, MSGM surpasses leading methods in the field on the SEED, THU-EP, and FACED datasets, outperforming baselines in subject-independent emotion classification while achieving robust accuracy and millisecond-level inference on the NVIDIA Jetson Xavier NX.

cross Efficient dataset construction using active learning and uncertainty-aware neural networks for plasma turbulent transport surrogate models

Authors: Aaron Ho (MIT Plasma Science and Fusion Center, Cambridge, USA), Lorenzo Zanisi (UKAEA Culham Centre for Fusion Energy, Abingdon, UK), Bram de Leeuw (Radboud University, Nijmegen, Netherlands), Vincent Galvan (MIT Plasma Science and Fusion Center, Cambridge, USA), Pablo Rodriguez-Fernandez (MIT Plasma Science and Fusion Center, Cambridge, USA), Nathaniel T. Howard (MIT Plasma Science and Fusion Center, Cambridge, USA)

Abstract: This work demonstrates a proof-of-principle for using uncertainty-aware architectures, in combination with active learning techniques and an in-the-loop physics simulation code as a data labeller, to construct efficient datasets for data-driven surrogate model generation. Building off of a previous proof-of-principle successfully demonstrating training set reduction on static pre-labelled datasets, using the ADEPT framework, this strategy was applied again to the plasma turbulent transport problem within tokamak fusion plasmas, specifically the QuaLiKiz quasilinear electrostatic gyrokinetic turbulent transport code. While QuaLiKiz provides relatively fast evaluations, this study specifically targeted small datasets to serve as a proxy for more expensive codes, such as CGYRO or GENE. The newly implemented algorithm uses the SNGP architecture for the classification component of the problem and the BNN-NCP architecture for the regression component, training models for all turbulent modes (ITG, TEM, ETG) and all transport fluxes ($Q_e$, $Q_i$, $\Gamma_e$, $\Gamma_i$, and $\Pi_i$) described by the general QuaLiKiz output. With 45 active learning iterations, moving from a small initial training set of $10^{2}$ to a final set of $10^{4}$, the resulting models reached a $F_1$ classification performance of ~0.8 and a $R^2$ regression performance of ~0.75 on an independent test set across all outputs. This extrapolates to reaching the same performance and efficiency as the previous ADEPT pipeline, although on a problem with 1 extra input dimension. While the improvement rate achieved in this implementation diminishes faster than expected, the overall technique is formulated with components that can be upgraded and generalized to many surrogate modeling applications beyond plasma turbulent transport predictions.

cross Generative AI Models for Learning Flow Maps of Stochastic Dynamical Systems in Bounded Domains

Authors: Minglei Yang, Yanfang Liu, Diego del-Castillo-Negrete, Yanzhao Cao, Guannan Zhang

Abstract: Simulating stochastic differential equations (SDEs) in bounded domains, presents significant computational challenges due to particle exit phenomena, which requires accurate modeling of interior stochastic dynamics and boundary interactions. Despite the success of machine learning-based methods in learning SDEs, existing learning methods are not applicable to SDEs in bounded domains because they cannot accurately capture the particle exit dynamics. We present a unified hybrid data-driven approach that combines a conditional diffusion model with an exit prediction neural network to capture both interior stochastic dynamics and boundary exit phenomena. Our ML model consists of two major components: a neural network that learns exit probabilities using binary cross-entropy loss with rigorous convergence guarantees, and a training-free diffusion model that generates state transitions for non-exiting particles using closed-form score functions. The two components are integrated through a probabilistic sampling algorithm that determines particle exit at each time step and generates appropriate state transitions. The performance of the proposed approach is demonstrated via three test cases: a one-dimensional simplified problem for theoretical verification, a two-dimensional advection-diffusion problem in a bounded domain, and a three-dimensional problem of interest to magnetically confined fusion plasmas.

cross Automated Design of Structured Variational Quantum Circuits with Reinforcement Learning

Authors: Gloria Turati, Simone Foder\`a, Riccardo Nembrini, Maurizio Ferrari Dacrema, Paolo Cremonesi

Abstract: Variational Quantum Algorithms (VQAs) are among the most promising approaches for leveraging near-term quantum hardware, yet their effectiveness strongly depends on the design of the underlying circuit ansatz, which is typically constructed with heuristic methods. In this work, we represent the synthesis of variational quantum circuits as a sequential decision-making problem, where gates are added iteratively in order to optimize an objective function, and we introduce two reinforcement learning-based methods, RLVQC Global and RLVQC Block, tailored to combinatorial optimization problems. RLVQC Block creates ansatzes that generalize the Quantum Approximate Optimization Algorithm (QAOA), by discovering a two-qubits block that is applied to all the interacting qubit pairs. While RLVQC Global further generalizes the ansatz and adds gates unconstrained by the structure of the interacting qubits. Both methods adopt the Proximal Policy Optimization (PPO) algorithm and use empirical measurement outcomes as state observations to guide the agent. We evaluate the proposed methods on a broad set of QUBO instances derived from classical graph-based optimization problems. Our results show that both RLVQC methods exhibit strong results with RLVQC Block consistently outperforming QAOA and generally surpassing RLVQC Global. While RLVQC Block produces circuits with depth comparable to QAOA, the Global variant is instead able to find significantly shorter ones. These findings suggest that reinforcement learning methods can be an effective tool to discover new ansatz structures tailored for specific problems and that the most effective circuit design strategy lies between rigid predefined architectures and completely unconstrained ones, offering a favourable trade-off between structure and adaptability.

cross Learning without training: The implicit dynamics of in-context learning

Authors: Benoit Dherin, Michael Munn, Hanna Mazzawi, Michael Wunder, Javier Gonzalvo

Abstract: One of the most striking features of Large Language Models (LLM) is their ability to learn in context. Namely at inference time an LLM is able to learn new patterns without any additional weight update when these patterns are presented in the form of examples in the prompt, even if these patterns were not seen during training. The mechanisms through which this can happen are still largely unknown. In this work, we show that the stacking of a self-attention layer with an MLP, allows the transformer block to implicitly modify the weights of the MLP layer according to the context. We argue through theory and experimentation that this simple mechanism may be the reason why LLMs can learn in context and not only during training. Specifically, we show under mild simplifying assumptions how a transformer block implicitly transforms a context into a low-rank weight-update of the MLP layer.

cross Minor Embedding for Quantum Annealing with Reinforcement Learning

Authors: Riccardo Nembrini, Maurizio Ferrari Dacrema, Paolo Cremonesi

Abstract: Quantum Annealing (QA) is a quantum computing paradigm for solving combinatorial optimization problems formulated as Quadratic Unconstrained Binary Optimization (QUBO) problems. An essential step in QA is minor embedding, which maps the problem graph onto the sparse topology of the quantum processor. This process is computationally expensive and scales poorly with increasing problem size and hardware complexity. Existing heuristics are often developed for specific problem graphs or hardware topologies and are difficult to generalize. Reinforcement Learning (RL) offers a promising alternative by treating minor embedding as a sequential decision-making problem, where an agent learns to construct minor embeddings by iteratively mapping the problem variables to the hardware qubits. We propose a RL-based approach to minor embedding using a Proximal Policy Optimization agent, testing its ability to embed both fully connected and randomly generated problem graphs on two hardware topologies, Chimera and Zephyr. The results show that our agent consistently produces valid minor embeddings, with reasonably efficient number of qubits, in particular on the more modern Zephyr topology. Our proposed approach is also able to scale to moderate problem sizes and adapts well to different graph structures, highlighting RL's potential as a flexible and general-purpose framework for minor embedding in QA.

cross AutoMAT: A Hierarchical Framework for Autonomous Alloy Discovery

Authors: Penghui Yang, Chendong Zhao, Bijun Tang, Zhonghan Zhang, Xinrun Wang, Yanchen Deng, Yuhao Lu, Cuntai Guan, Zheng Liu, Bo An

Abstract: Alloy discovery is central to advancing modern industry but remains hindered by the vastness of compositional design space and the costly validation. Here, we present AutoMAT, a hierarchical and autonomous framework grounded in and validated by experiments, which integrates large language models, automated CALPHAD-based simulations, and AI-driven search to accelerate alloy design. Spanning the entire pipeline from ideation to validation, AutoMAT achieves high efficiency, accuracy, and interpretability without the need for manually curated large datasets. In a case study targeting a lightweight, high-strength alloy, AutoMAT identifies a titanium alloy with 8.1% lower density and comparable yield strength relative to the state-of-the-art reference, achieving the highest specific strength among all comparisons. In a second case targeting high-yield-strength high-entropy alloys, AutoMAT achieves a 28.2% improvement in yield strength over the base alloy. In both cases, AutoMAT reduces the discovery timeline from years to weeks, illustrating its potential as a scalable and versatile platform for next-generation alloy design.

cross Radiological and Biological Dictionary of Radiomics Features: Addressing Understandable AI Issues in Personalized Breast Cancer; Dictionary Version BM1.0

Authors: Arman Gorji, Nima Sanati, Amir Hossein Pouria, Somayeh Sadat Mehrnia, Ilker Hacihaliloglu, Arman Rahmim, Mohammad R. Salmanpour

Abstract: Radiomics-based AI models show promise for breast cancer diagnosis but often lack interpretability, limiting clinical adoption. This study addresses the gap between radiomic features (RF) and the standardized BI-RADS lexicon by proposing a dual-dictionary framework. First, a Clinically-Informed Feature Interpretation Dictionary (CIFID) was created by mapping 56 RFs to BI-RADS descriptors (shape, margin, internal enhancement) through literature and expert review. The framework was applied to classify triple-negative breast cancer (TNBC) versus non-TNBC using dynamic contrast-enhanced MRI from a multi-institutional cohort of 1,549 patients. We trained 27 machine learning classifiers with 27 feature selection methods. SHapley Additive exPlanations (SHAP) were used to interpret predictions and generate a complementary Data-Driven Feature Interpretation Dictionary (DDFID) for 52 additional RFs. The best model, combining Variance Inflation Factor (VIF) selection with Extra Trees Classifier, achieved an average cross-validation accuracy of 0.83. Key predictive RFs aligned with clinical knowledge: higher Sphericity (round/oval shape) and lower Busyness (more homogeneous enhancement) were associated with TNBC. The framework confirmed known imaging biomarkers and uncovered novel, interpretable associations. This dual-dictionary approach (BM1.0) enhances AI model transparency and supports the integration of RFs into routine breast cancer diagnosis and personalized care.

cross Is memory all you need? Data-driven Mori-Zwanzig modeling of Lagrangian particle dynamics in turbulent flows

Authors: Xander de Wit, Alessandro Gabbana, Michael Woodward, Yen Ting Lin, Federico Toschi, Daniel Livescu

Abstract: The dynamics of Lagrangian particles in turbulence play a crucial role in mixing, transport, and dispersion processes in complex flows. Their trajectories exhibit highly non-trivial statistical behavior, motivating the development of surrogate models that can reproduce these trajectories without incurring the high computational cost of direct numerical simulations of the full Eulerian field. This task is particularly challenging because reduced-order models typically lack access to the full set of interactions with the underlying turbulent field. Novel data-driven machine learning techniques can be very powerful in capturing and reproducing complex statistics of the reduced-order/surrogate dynamics. In this work, we show how one can learn a surrogate dynamical system that is able to evolve a turbulent Lagrangian trajectory in a way that is point-wise accurate for short-time predictions (with respect to Kolmogorov time) and stable and statistically accurate at long times. This approach is based on the Mori--Zwanzig formalism, which prescribes a mathematical decomposition of the full dynamical system into resolved dynamics that depend on the current state and the past history of a reduced set of observables and the unresolved orthogonal dynamics due to unresolved degrees of freedom of the initial state. We show how by training this reduced order model on a point-wise error metric on short time-prediction, we are able to correctly learn the dynamics of the Lagrangian turbulence, such that also the long-time statistical behavior is stably recovered at test time. This opens up a range of new applications, for example, for the control of active Lagrangian agents in turbulence.

cross Compositional Coordination for Multi-Robot Teams with Large Language Models

Authors: Zhehui Huang, Guangyao Shi, Yuwei Wu, Vijay Kumar, Gaurav S. Sukhatme

Abstract: Multi-robot coordination has traditionally relied on a task-specific and expert-driven pipeline, where natural language mission descriptions are manually translated by domain experts into mathematical formulation, algorithm design, and executable code. This conventional process is labor-intensive, inaccessible to non-experts, and inflexible to changes in mission requirements. Here, we propose LAN2CB (Language to Collective Behavior), a novel framework that leverages large language models (LLMs) to streamline and generalize the multi-robot coordination pipeline. LAN2CB directly converts natural language mission descriptions into executable Python code for multi-robot systems through two key components: (1) Mission Decomposition for Task Representation, which parses the mission into a task graph with dependencies, and (2) Code Generation, which uses the task graph and a structured knowledge base to generate deployable robot control code. We further introduce a dataset of natural language mission specifications to support development and benchmarking. Experimental results in both simulation and real-world settings show that LAN2CB enables effective and flexible multi-robot coordination from natural language, significantly reducing the need for manual engineering while supporting generalization across mission types. Website: https://sites.google.com/view/lan2cb.

URLs: https://sites.google.com/view/lan2cb.

cross Interpreting CFD Surrogates through Sparse Autoencoders

Authors: Yeping Hu, Shusen Liu

Abstract: Learning-based surrogate models have become a practical alternative to high-fidelity CFD solvers, but their latent representations remain opaque and hinder adoption in safety-critical or regulation-bound settings. This work introduces a posthoc interpretability framework for graph-based surrogate models used in computational fluid dynamics (CFD) by leveraging sparse autoencoders (SAEs). By obtaining an overcomplete basis in the node embedding space of a pretrained surrogate, the method extracts a dictionary of interpretable latent features. The approach enables the identification of monosemantic concepts aligned with physical phenomena such as vorticity or flow structures, offering a model-agnostic pathway to enhance explainability and trustworthiness in CFD applications.

cross AI-driven Orchestration at Scale: Estimating Service Metrics on National-Wide Testbeds

Authors: Rodrigo Moreira, Rafael Pasquini, Joberto S. B. Martins, Tereza C. Carvalho, Fl\'avio de Oliveira Silva

Abstract: Network Slicing (NS) realization requires AI-native orchestration architectures to efficiently and intelligently handle heterogeneous user requirements. To achieve this, network slicing is evolving towards a more user-centric digital transformation, focusing on architectures that incorporate native intelligence to enable self-managed connectivity in an integrated and isolated manner. However, these initiatives face the challenge of validating their results in production environments, particularly those utilizing ML-enabled orchestration, as they are often tested in local networks or laboratory simulations. This paper proposes a large-scale validation method using a network slicing prediction model to forecast latency using Deep Neural Networks (DNNs) and basic ML algorithms embedded within an NS architecture, evaluated in real large-scale production testbeds. It measures and compares the performance of different DNNs and ML algorithms, considering a distributed database application deployed as a network slice over two large-scale production testbeds. The investigation highlights how AI-based prediction models can enhance network slicing orchestration architectures and presents a seamless, production-ready validation method as an alternative to fully controlled simulations or laboratory setups.

cross Efficient Compositional Multi-tasking for On-device Large Language Models

Authors: Ondrej Bohdal, Mete Ozay, Jijoong Moon, Kyeng-Hun Lee, Hyeonmok Ko, Umberto Michieli

Abstract: Adapter parameters provide a mechanism to modify the behavior of machine learning models and have gained significant popularity in the context of large language models (LLMs) and generative AI. These parameters can be merged to support multiple tasks via a process known as task merging. However, prior work on merging in LLMs, particularly in natural language processing, has been limited to scenarios where each test example addresses only a single task. In this paper, we focus on on-device settings and study the problem of text-based compositional multi-tasking, where each test example involves the simultaneous execution of multiple tasks. For instance, generating a translated summary of a long text requires solving both translation and summarization tasks concurrently. To facilitate research in this setting, we propose a benchmark comprising four practically relevant compositional tasks. We also present an efficient method (Learnable Calibration) tailored for on-device applications, where computational resources are limited, emphasizing the need for solutions that are both resource-efficient and high-performing. Our contributions lay the groundwork for advancing the capabilities of LLMs in real-world multi-tasking scenarios, expanding their applicability to complex, resource-constrained use cases.

cross Recursive Equations For Imputation Of Missing Not At Random Data With Sparse Pattern Support

Authors: Trung Phung, Kyle Reese, Ilya Shpitser, Rohit Bhattacharya

Abstract: A common approach for handling missing values in data analysis pipelines is multiple imputation via software packages such as MICE (Van Buuren and Groothuis-Oudshoorn, 2011) and Amelia (Honaker et al., 2011). These packages typically assume the data are missing at random (MAR), and impose parametric or smoothing assumptions upon the imputing distributions in a way that allows imputation to proceed even if not all missingness patterns have support in the data. Such assumptions are unrealistic in practice, and induce model misspecification bias on any analysis performed after such imputation. In this paper, we provide a principled alternative. Specifically, we develop a new characterization for the full data law in graphical models of missing data. This characterization is constructive, is easily adapted for the calculation of imputation distributions for both MAR and MNAR (missing not at random) mechanisms, and is able to handle lack of support for certain patterns of missingness. We use this characterization to develop a new imputation algorithm -- Multivariate Imputation via Supported Pattern Recursion (MISPR) -- which uses Gibbs sampling, by analogy with the Multivariate Imputation with Chained Equations (MICE) algorithm, but which is consistent under both MAR and MNAR settings, and is able to handle missing data patterns with no support without imposing additional assumptions beyond those already imposed by the missing data model itself. In simulations, we show MISPR obtains comparable results to MICE when data are MAR, and superior, less biased results when data are MNAR. Our characterization and imputation algorithm based on it are a step towards making principled missing data methods more practical in applied settings, where the data are likely both MNAR and sufficiently high dimensional to yield missing data patterns with no support at available sample sizes.

cross Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization

Authors: Shengchao Liu, Hannan Xu, Yan Ai, Huanxin Li, Yoshua Bengio, Harry Guo

Abstract: Large language models (LLMs) leverage chain-of-thought (CoT) techniques to tackle complex problems, representing a transformative breakthrough in artificial intelligence (AI). However, their reasoning capabilities have primarily been demonstrated in solving math and coding problems, leaving their potential for domain-specific applications-such as battery discovery-largely unexplored. Inspired by the idea that reasoning mirrors a form of guided search, we introduce ChatBattery, a novel agentic framework that integrates domain knowledge to steer LLMs toward more effective reasoning in materials design. Using ChatBattery, we successfully identify, synthesize, and characterize three novel lithium-ion battery cathode materials, which achieve practical capacity improvements of 28.8%, 25.2%, and 18.5%, respectively, over the widely used cathode material, LiNi0.8Mn0.1Co0.1O2 (NMC811). Beyond this discovery, ChatBattery paves a new path by showing a successful LLM-driven and reasoning-based platform for battery materials invention. This complete AI-driven cycle-from design to synthesis to characterization-demonstrates the transformative potential of AI-driven reasoning in revolutionizing materials discovery.

cross Equivariant Goal Conditioned Contrastive Reinforcement Learning

Authors: Arsh Tangri, Nichols Crawford Taylor, Haojie Huang, Robert Platt

Abstract: Contrastive Reinforcement Learning (CRL) provides a promising framework for extracting useful structured representations from unlabeled interactions. By pulling together state-action pairs and their corresponding future states, while pushing apart negative pairs, CRL enables learning nontrivial policies without manually designed rewards. In this work, we propose Equivariant CRL (ECRL), which further structures the latent space using equivariant constraints. By leveraging inherent symmetries in goal-conditioned manipulation tasks, our method improves both sample efficiency and spatial generalization. Specifically, we formally define Goal-Conditioned Group-Invariant MDPs to characterize rotation-symmetric robotic manipulation tasks, and build on this by introducing a novel rotation-invariant critic representation paired with a rotation-equivariant actor for Contrastive RL. Our approach consistently outperforms strong baselines across a range of simulated tasks in both state-based and image-based settings. Finally, we extend our method to the offline RL setting, demonstrating its effectiveness across multiple tasks.

cross Attacking interpretable NLP systems

Authors: Eldor Abdukhamidov, Tamer Abuhmed, Joanna C. S. Santos, Mohammed Abuhamad

Abstract: Studies have shown that machine learning systems are vulnerable to adversarial examples in theory and practice. Where previous attacks have focused mainly on visual models that exploit the difference between human and machine perception, text-based models have also fallen victim to these attacks. However, these attacks often fail to maintain the semantic meaning of the text and similarity. This paper introduces AdvChar, a black-box attack on Interpretable Natural Language Processing Systems, designed to mislead the classifier while keeping the interpretation similar to benign inputs, thus exploiting trust in system transparency. AdvChar achieves this by making less noticeable modifications to text input, forcing the deep learning classifier to make incorrect predictions and preserve the original interpretation. We use an interpretation-focused scoring approach to determine the most critical tokens that, when changed, can cause the classifier to misclassify the input. We apply simple character-level modifications to measure the importance of tokens, minimizing the difference between the original and new text while generating adversarial interpretations similar to benign ones. We thoroughly evaluated AdvChar by testing it against seven NLP models and three interpretation models using benchmark datasets for the classification task. Our experiments show that AdvChar can significantly reduce the prediction accuracy of current deep learning models by altering just two characters on average in input samples.

cross The Impact of Pseudo-Science in Financial Loans Risk Prediction

Authors: Bruno Scarone, Ricardo Baeza-Yates

Abstract: We study the societal impact of pseudo-scientific assumptions for predicting the behavior of people in a straightforward application of machine learning to risk prediction in financial lending. This use case also exemplifies the impact of survival bias in loan return prediction. We analyze the models in terms of their accuracy and social cost, showing that the socially optimal model may not imply a significant accuracy loss for this downstream task. Our results are verified for commonly used learning methods and datasets. Our findings also show that there is a natural dynamic when training models that suffer survival bias where accuracy slightly deteriorates, and whose recall and precision improves with time. These results act as an illusion, leading the observer to believe that the system is getting better, when in fact the model is suffering from increasingly more unfairness and survival bias.

cross SVAgent: AI Agent for Hardware Security Verification Assertion

Authors: Rui Guo, Avinash Ayalasomayajula, Henian Li, Jingbo Zhou, Sujan Kumar Saha, Farimah Farahmandi

Abstract: Verification using SystemVerilog assertions (SVA) is one of the most popular methods for detecting circuit design vulnerabilities. However, with the globalization of integrated circuit design and the continuous upgrading of security requirements, the SVA development model has exposed major limitations. It is not only inefficient in development, but also unable to effectively deal with the increasing number of security vulnerabilities in modern complex integrated circuits. In response to these challenges, this paper proposes an innovative SVA automatic generation framework SVAgent. SVAgent introduces a requirement decomposition mechanism to transform the original complex requirements into a structured, gradually solvable fine-grained problem-solving chain. Experiments have shown that SVAgent can effectively suppress the influence of hallucinations and random answers, and the key evaluation indicators such as the accuracy and consistency of the SVA are significantly better than existing frameworks. More importantly, we successfully integrated SVAgent into the most mainstream integrated circuit vulnerability assessment framework and verified its practicality and reliability in a real engineering design environment.

cross Towards Compute-Optimal Many-Shot In-Context Learning

Authors: Shahriar Golchin, Yanfei Chen, Rujun Han, Manan Gandhi, Tianli Yu, Swaroop Mishra, Mihai Surdeanu, Rishabh Agarwal, Chen-Yu Lee, Tomas Pfister

Abstract: Long-context large language models (LLMs) are able to process inputs containing up to several million tokens. In the scope of in-context learning (ICL), this translates into using hundreds/thousands of demonstrations in the input prompt, enabling many-shot ICL. In practice, a fixed set of demonstrations is often selected at random in many-shot settings due to (1) high inference costs, (2) the benefits of caching and reusing computations, and (3) the similar performance offered by this strategy compared to others when scaled. In this work, we propose two straightforward strategies for demonstration selection in many-shot ICL that improve performance with minimal computational overhead. Our first method combines a small number of demonstrations, selected based on their similarity to each test sample, with a disproportionately larger set of random demonstrations that are cached. The second strategy improves the first by replacing random demonstrations with those selected using centroids derived from test sample representations via k-means clustering. Our experiments with Gemini Pro and Flash across several datasets indicate that our strategies consistently outperform random selection and surpass or match the most performant selection approach while supporting caching and reducing inference cost by up to an order of magnitude. We also show that adjusting the proportion of demonstrations selected based on different criteria can balance performance and inference cost in many-shot ICL.

cross Toward Routine CSP of Pharmaceuticals: A Fully Automated Protocol Using Neural Network Potentials

Authors: Zachary L. Glick, Derek P. Metcalf, Scott F. Swarthout

Abstract: Crystal structure prediction (CSP) is a useful tool in pharmaceutical development for identifying and assessing risks associated with polymorphism, yet widespread adoption has been hindered by high computational costs and the need for both manual specification and expert knowledge to achieve useful results. Here, we introduce a fully automated, high-throughput CSP protocol designed to overcome these barriers. The protocol's efficiency is driven by Lavo-NN, a novel neural network potential (NNP) architected and trained specifically for pharmaceutical crystal structure generation and ranking. This NNP-driven crystal generation phase is integrated into a scalable cloud-based workflow. We validate this CSP protocol on an extensive retrospective benchmark of 49 unique molecules, almost all of which are drug-like, successfully generating structures that match all 110 $Z' = 1$ experimental polymorphs. The average CSP in this benchmark is performed with approximately 8.4k CPU hours, which is a significant reduction compared to other protocols. The practical utility of the protocol is further demonstrated through case studies that resolve ambiguities in experimental data and a semi-blinded challenge that successfully identifies and ranks polymorphs of three modern drugs from powder X-ray diffraction patterns alone. By significantly reducing the required time and cost, the protocol enables CSP to be routinely deployed earlier in the drug discovery pipeline, such as during lead optimization. Rapid turnaround times and high throughput also enable CSP that can be run in parallel with experimental screening, providing chemists with real-time insights to guide their work in the lab.

cross PAC Off-Policy Prediction of Contextual Bandits

Authors: Yilong Wan, Yuqiang Li, Xianyi Wu

Abstract: This paper investigates off-policy evaluation in contextual bandits, aiming to quantify the performance of a target policy using data collected under a different and potentially unknown behavior policy. Recently, methods based on conformal prediction have been developed to construct reliable prediction intervals that guarantee marginal coverage in finite samples, making them particularly suited for safety-critical applications. To further achieve coverage conditional on a given offline data set, we propose a novel algorithm that constructs probably approximately correct prediction intervals. Our method builds upon a PAC-valid conformal prediction framework, and we strengthen its theoretical guarantees by establishing PAC-type bounds on coverage. We analyze both finite-sample and asymptotic properties of the proposed method, and compare its empirical performance with existing methods in simulations.

cross LLM-Enhanced Reranking for Complementary Product Recommendation

Authors: Zekun Xu, Yudi Zhang

Abstract: Complementary product recommendation, which aims to suggest items that are used together to enhance customer value, is a crucial yet challenging task in e-commerce. While existing graph neural network (GNN) approaches have made significant progress in capturing complex product relationships, they often struggle with the accuracy-diversity tradeoff, particularly for long-tail items. This paper introduces a model-agnostic approach that leverages Large Language Models (LLMs) to enhance the reranking of complementary product recommendations. Unlike previous works that use LLMs primarily for data preprocessing and graph augmentation, our method applies LLM-based prompting strategies directly to rerank candidate items retrieved from existing recommendation models, eliminating the need for model retraining. Through extensive experiments on public datasets, we demonstrate that our approach effectively balances accuracy and diversity in complementary product recommendations, with at least 50% lift in accuracy metrics and 2% lift in diversity metrics on average for the top recommended items across datasets.

cross Toward a Lightweight and Robust Design for Caching

Authors: Peng Chen, Hailiang Zhao, Jiaji Zhang, Xueyan Tang, Yixuan Wang, Shuiguang Deng

Abstract: The online caching problem aims to minimize cache misses when serving a sequence of requests under a limited cache size. While naive learning-augmented caching algorithms achieve ideal $1$-consistency, they lack robustness guarantees. Existing robustification methods either sacrifice $1$-consistency or introduce significant computational overhead. In this paper, we introduce Guard, a lightweight robustification framework that enhances the robustness of a broad class of learning-augmented caching algorithms to $2H_k + 2$, while preserving their $1$-consistency. Guard achieves the current best-known trade-off between consistency and robustness, with only $O(1)$ additional per-request overhead, thereby maintaining the original time complexity of the base algorithm. Extensive experiments across multiple real-world datasets and prediction models validate the effectiveness of Guard in practice.

cross Edge-case Synthesis for Fisheye Object Detection: A Data-centric Perspective

Authors: Seunghyeon Kim, Kyeongryeol Go

Abstract: Fisheye cameras introduce significant distortion and pose unique challenges to object detection models trained on conventional datasets. In this work, we propose a data-centric pipeline that systematically improves detection performance by focusing on the key question of identifying the blind spots of the model. Through detailed error analysis, we identify critical edge-cases such as confusing class pairs, peripheral distortions, and underrepresented contexts. Then we directly address them through edge-case synthesis. We fine-tuned an image generative model and guided it with carefully crafted prompts to produce images that replicate real-world failure modes. These synthetic images are pseudo-labeled using a high-quality detector and integrated into training. Our approach results in consistent performance gains, highlighting how deeply understanding data and selectively fixing its weaknesses can be impactful in specialized domains like fisheye object detection.

cross ToFe: Lagged Token Freezing and Reusing for Efficient Vision Transformer Inference

Authors: Haoyue Zhang, Jie Zhang, Song Guo

Abstract: Although vision transformers (ViT) have shown remarkable success in various vision tasks, their computationally expensive self-attention hinder their deployment on resource-constrained devices. Token reduction, which discards less important tokens during forward propagation, has been proposed to enhance the efficiency of transformer models. However, existing methods handle unimportant tokens irreversibly, preventing their reuse in subsequent blocks. Considering that transformers focus on different information among blocks, tokens reduced in early blocks might be useful later. Furthermore, to adapt transformer models for resource-constrained devices, it is crucial to strike a balance between model performance and computational overhead. To address these challenges, in this paper, we introduce a novel Token Freezing and Reusing (ToFe) framework, where we identify important tokens at each stage and temporarily freeze the unimportant ones, allowing their lagged reusing at a later stage. Specifically, we design a prediction module for token identification and an approximate module for recovery of the frozen tokens. By jointly optimizing with the backbone through computation budget-aware end-to-end training, ToFe can adaptively process the necessary tokens at each block, thereby reducing computational cost while maintaining performance. Extensive experiments demonstrate that ToFe reduces the computational cost of LV-ViT model by 50% with less than 2% drop in Top-1 accuracy, achieving a better trade-off between performance and complexity compared to state-of-the-art methods.

cross Time to Split: Exploring Data Splitting Strategies for Offline Evaluation of Sequential Recommenders

Authors: Danil Gusak, Anna Volodkevich, Anton Klenitskiy, Alexey Vasilev, Evgeny Frolov

Abstract: Modern sequential recommender systems, ranging from lightweight transformer-based variants to large language models, have become increasingly prominent in academia and industry due to their strong performance in the next-item prediction task. Yet common evaluation protocols for sequential recommendations remain insufficiently developed: they often fail to reflect the corresponding recommendation task accurately, or are not aligned with real-world scenarios. Although the widely used leave-one-out split matches next-item prediction, it permits the overlap between training and test periods, which leads to temporal leakage and unrealistically long test horizon, limiting real-world relevance. Global temporal splitting addresses these issues by evaluating on distinct future periods. However, its applications to sequential recommendations remain loosely defined, particularly in terms of selecting target interactions and constructing a validation subset that provides necessary consistency between validation and test metrics. In this paper, we demonstrate that evaluation outcomes can vary significantly across splitting strategies, influencing model rankings and practical deployment decisions. To improve reproducibility in both academic and industrial settings, we systematically compare different splitting strategies for sequential recommendations across multiple datasets and established baselines. Our findings show that prevalent splits, such as leave-one-out, may be insufficiently aligned with more realistic evaluation strategies. Code: https://github.com/monkey0head/time-to-split

URLs: https://github.com/monkey0head/time-to-split

cross Physics-Driven Neural Network for Solving Electromagnetic Inverse Scattering Problems

Authors: Yutong Du, Zicheng Liu, Bazargul Matkerim, Changyou Li, Yali Zong, Bo Qi, Jingwei Kou

Abstract: In recent years, deep learning-based methods have been proposed for solving inverse scattering problems (ISPs), but most of them heavily rely on data and suffer from limited generalization capabilities. In this paper, a new solving scheme is proposed where the solution is iteratively updated following the updating of the physics-driven neural network (PDNN), the hyperparameters of which are optimized by minimizing the loss function which incorporates the constraints from the collected scattered fields and the prior information about scatterers. Unlike data-driven neural network solvers, PDNN is trained only requiring the input of collected scattered fields and the computation of scattered fields corresponding to predicted solutions, thus avoids the generalization problem. Moreover, to accelerate the imaging efficiency, the subregion enclosing the scatterers is identified. Numerical and experimental results demonstrate that the proposed scheme has high reconstruction accuracy and strong stability, even when dealing with composite lossy scatterers.

cross Higher Gauge Flow Models

Authors: Alexander Strunk, Roland Assam

Abstract: This paper introduces Higher Gauge Flow Models, a novel class of Generative Flow Models. Building upon ordinary Gauge Flow Models (arXiv:2507.13414), these Higher Gauge Flow Models leverage an L$_{\infty}$-algebra, effectively extending the Lie Algebra. This expansion allows for the integration of the higher geometry and higher symmetries associated with higher groups into the framework of Generative Flow Models. Experimental evaluation on a Gaussian Mixture Model dataset revealed substantial performance improvements compared to traditional Flow Models.

cross Constructing material network representations for intelligent amorphous alloys design

Authors: S. -Y. Zhang, J. Tian, S. -L. Liu, H. -M. Zhang, H. -Y. Bai, Y. -C. Hu, W. -H. Wang

Abstract: Designing high-performance amorphous alloys is demanding for various applications. But this process intensively relies on empirical laws and unlimited attempts. The high-cost and low-efficiency nature of the traditional strategies prevents effective sampling in the enormous material space. Here, we propose material networks to accelerate the discovery of binary and ternary amorphous alloys. The network topologies reveal hidden material candidates that were obscured by traditional tabular data representations. By scrutinizing the amorphous alloys synthesized in different years, we construct dynamical material networks to track the history of the alloy discovery. We find that some innovative materials designed in the past were encoded in the networks, demonstrating their predictive power in guiding new alloy design. These material networks show physical similarities with several real-world networks in our daily lives. Our findings pave a new way for intelligent materials design, especially for complex alloys.

cross Meta-learning of Gibbs states for many-body Hamiltonians with applications to Quantum Boltzmann Machines

Authors: Ruchira V Bhat, Rahul Bhowmick, Avinash Singh, Krishna Kumar Sabapathy

Abstract: The preparation of quantum Gibbs states is a fundamental challenge in quantum computing, essential for applications ranging from modeling open quantum systems to quantum machine learning. Building on the Meta-Variational Quantum Eigensolver framework proposed by Cervera-Lierta et al.(2021) and a problem driven ansatz design, we introduce two meta-learning algorithms: Meta-Variational Quantum Thermalizer (Meta-VQT) and Neural Network Meta-VQT (NN-Meta VQT) for efficient thermal state preparation of parametrized Hamiltonians on Noisy Intermediate-Scale Quantum (NISQ) devices. Meta-VQT utilizes a fully quantum ansatz, while NN Meta-VQT integrates a quantum classical hybrid architecture. Both leverage collective optimization over training sets to generalize Gibbs state preparation to unseen parameters. We validate our methods on upto 8-qubit Transverse Field Ising Model and the 2-qubit Heisenberg model with all field terms, demonstrating efficient thermal state generation beyond training data. For larger systems, we show that our meta-learned parameters when combined with appropriately designed ansatz serve as warm start initializations, significantly outperforming random initializations in the optimization tasks. Furthermore, a 3- qubit Kitaev ring example showcases our algorithm's effectiveness across finite-temperature crossover regimes. Finally, we apply our algorithms to train a Quantum Boltzmann Machine (QBM) on a 2-qubit Heisenberg model with all field terms, achieving enhanced training efficiency, improved Gibbs state accuracy, and a 30-fold runtime speedup over existing techniques such as variational quantum imaginary time (VarQITE)-based QBM highlighting the scalability and practicality of meta-algorithm-based QBMs.

cross Self-Supervised Inductive Logic Programming

Authors: Stassa Patsantzis

Abstract: Inductive Logic Programming (ILP) approaches like Meta \-/ Interpretive Learning (MIL) can learn, from few examples, recursive logic programs with invented predicates that generalise well to unseen instances. This ability relies on a background theory and negative examples, both carefully selected with expert knowledge of a learning problem and its solutions. But what if such a problem-specific background theory or negative examples are not available? We formalise this question as a new setting for Self-Supervised ILP and present a new MIL algorithm that learns in the new setting from some positive labelled, and zero or more unlabelled examples, and automatically generates, and labels, new positive and negative examples during learning. We implement this algorithm in Prolog in a new MIL system, called Poker. We compare Poker to state-of-the-art MIL system Louise on experiments learning grammars for Context-Free and L-System languages from labelled, positive example strings, no negative examples, and just the terminal vocabulary of a language, seen in examples, as a first-order background theory. We introduce a new approach for the principled selection of a second-order background theory as a Second Order Definite Normal Form (SONF), sufficiently general to learn all programs in a class, thus removing the need for a backgound theory tailored to a learning task. We find that Poker's performance improves with increasing numbers of automatically generated examples while Louise, bereft of negative examples, over-generalises.

cross GG-BBQ: German Gender Bias Benchmark for Question Answering

Authors: Shalaka Satheesh, Katrin Klug, Katharina Beckh, H\'ector Allende-Cid, Sebastian Houben, Teena Hassan

Abstract: Within the context of Natural Language Processing (NLP), fairness evaluation is often associated with the assessment of bias and reduction of associated harm. In this regard, the evaluation is usually carried out by using a benchmark dataset, for a task such as Question Answering, created for the measurement of bias in the model's predictions along various dimensions, including gender identity. In our work, we evaluate gender bias in German Large Language Models (LLMs) using the Bias Benchmark for Question Answering by Parrish et al. (2022) as a reference. Specifically, the templates in the gender identity subset of this English dataset were machine translated into German. The errors in the machine translated templates were then manually reviewed and corrected with the help of a language expert. We find that manual revision of the translation is crucial when creating datasets for gender bias evaluation because of the limitations of machine translation from English to a language such as German with grammatical gender. Our final dataset is comprised of two subsets: Subset-I, which consists of group terms related to gender identity, and Subset-II, where group terms are replaced with proper names. We evaluate several LLMs used for German NLP on this newly created dataset and report the accuracy and bias scores. The results show that all models exhibit bias, both along and against existing social stereotypes.

cross PromptAL: Sample-Aware Dynamic Soft Prompts for Few-Shot Active Learning

Authors: Hui Xiang, Jinqiao Shi, Ting Zhang, Xiaojie Zhao, Yong Liu, Yong Ma

Abstract: Active learning (AL) aims to optimize model training and reduce annotation costs by selecting the most informative samples for labeling. Typically, AL methods rely on the empirical distribution of labeled data to define the decision boundary and perform uncertainty or diversity estimation, subsequently identifying potential high-quality samples. In few-shot scenarios, the empirical distribution often diverges significantly from the target distribution, causing the decision boundary to shift away from its optimal position. However, existing methods overlook the role of unlabeled samples in enhancing the empirical distribution to better align with the target distribution, resulting in a suboptimal decision boundary and the selection of samples that inadequately represent the target distribution. To address this, we propose a hybrid AL framework, termed \textbf{PromptAL} (Sample-Aware Dynamic Soft \textbf{Prompts} for Few-Shot \textbf{A}ctive \textbf{L}earning). This framework accounts for the contribution of each unlabeled data point in aligning the current empirical distribution with the target distribution, thereby optimizing the decision boundary. Specifically, PromptAL first leverages unlabeled data to construct sample-aware dynamic soft prompts that adjust the model's predictive distribution and decision boundary. Subsequently, based on the adjusted decision boundary, it integrates uncertainty estimation with both global and local diversity to select high-quality samples that more accurately represent the target distribution. Experimental results on six in-domain and three out-of-domain datasets show that PromptAL achieves superior performance over nine baselines. Our codebase is openly accessible.

cross Combined Image Data Augmentations diminish the benefits of Adaptive Label Smoothing

Authors: Georg Siedel, Ekagra Gupta, Weijia Shao, Silvia Vock, Andrey Morozov

Abstract: Soft augmentation regularizes the supervised learning process of image classifiers by reducing label confidence of a training sample based on the magnitude of random-crop augmentation applied to it. This paper extends this adaptive label smoothing framework to other types of aggressive augmentations beyond random-crop. Specifically, we demonstrate the effectiveness of the method for random erasing and noise injection data augmentation. Adaptive label smoothing permits stronger regularization via higher-intensity Random Erasing. However, its benefits vanish when applied with a diverse range of image transformations as in the state-of-the-art TrivialAugment method, and excessive label smoothing harms robustness to common corruptions. Our findings suggest that adaptive label smoothing should only be applied when the training data distribution is dominated by a limited, homogeneous set of image transformation types.

cross An effective physics-informed neural operator framework for predicting wavefields

Authors: Xiao Ma, Tariq Alkhalifah

Abstract: Solving the wave equation is fundamental for geophysical applications. However, numerical solutions of the Helmholtz equation face significant computational and memory challenges. Therefore, we introduce a physics-informed convolutional neural operator (PICNO) to solve the Helmholtz equation efficiently. The PICNO takes both the background wavefield corresponding to a homogeneous medium and the velocity model as input function space, generating the scattered wavefield as the output function space. Our workflow integrates PDE constraints directly into the training process, enabling the neural operator to not only fit the available data but also capture the underlying physics governing wave phenomena. PICNO allows for high-resolution reasonably accurate predictions even with limited training samples, and it demonstrates significant improvements over a purely data-driven convolutional neural operator (CNO), particularly in predicting high-frequency wavefields. These features and improvements are important for waveform inversion down the road.

cross Adaptive Multi-task Learning for Multi-sector Portfolio Optimization

Authors: Qingliang Fan, Ruike Wu, Yanrong Yang

Abstract: Accurate transfer of information across multiple sectors to enhance model estimation is both significant and challenging in multi-sector portfolio optimization involving a large number of assets in different classes. Within the framework of factor modeling, we propose a novel data-adaptive multi-task learning methodology that quantifies and learns the relatedness among the principal temporal subspaces (spanned by factors) across multiple sectors under study. This approach not only improves the simultaneous estimation of multiple factor models but also enhances multi-sector portfolio optimization, which heavily depends on the accurate recovery of these factor models. Additionally, a novel and easy-to-implement algorithm, termed projection-penalized principal component analysis, is developed to accomplish the multi-task learning procedure. Diverse simulation designs and practical application on daily return data from Russell 3000 index demonstrate the advantages of multi-task learning methodology.

cross From model-based learning to model-free behaviour with Meta-Interpretive Learning

Authors: Stassa Patsantzis

Abstract: A "model" is a theory that describes the state of an environment and the effects of an agent's decisions on the environment. A model-based agent can use its model to predict the effects of its future actions and so plan ahead, but must know the state of the environment. A model-free agent cannot plan, but can act without a model and without completely observing the environment. An autonomous agent capable of acting independently in novel environments must combine both sets of capabilities. We show how to create such an agent with Meta-Interpretive Learning used to learn a model-based Solver used to train a model-free Controller that can solve the same planning problems as the Solver. We demonstrate the equivalence in problem-solving ability of the two agents on grid navigation problems in two kinds of environment: randomly generated mazes, and lake maps with wide open areas. We find that all navigation problems solved by the Solver are also solved by the Controller, indicating the two are equivalent.

cross The Sweet Danger of Sugar: Debunking Representation Learning for Encrypted Traffic Classification

Authors: Yuqi Zhao, Giovanni Dettori, Matteo Boffa, Luca Vassio, Marco Mellia

Abstract: Recently we have witnessed the explosion of proposals that, inspired by Language Models like BERT, exploit Representation Learning models to create traffic representations. All of them promise astonishing performance in encrypted traffic classification (up to 98% accuracy). In this paper, with a networking expert mindset, we critically reassess their performance. Through extensive analysis, we demonstrate that the reported successes are heavily influenced by data preparation problems, which allow these models to find easy shortcuts - spurious correlation between features and labels - during fine-tuning that unrealistically boost their performance. When such shortcuts are not present - as in real scenarios - these models perform poorly. We also introduce Pcap-Encoder, an LM-based representation learning model that we specifically design to extract features from protocol headers. Pcap-Encoder appears to be the only model that provides an instrumental representation for traffic classification. Yet, its complexity questions its applicability in practical settings. Our findings reveal flaws in dataset preparation and model training, calling for a better and more conscious test design. We propose a correct evaluation methodology and stress the need for rigorous benchmarking.

cross Estimating Treatment Effects with Independent Component Analysis

Authors: Patrik Reizinger, Lester Mackey, Wieland Brendel, Rahul Krishnan

Abstract: The field of causal inference has developed a variety of methods to accurately estimate treatment effects in the presence of nuisance. Meanwhile, the field of identifiability theory has developed methods like Independent Component Analysis (ICA) to identify latent sources and mixing weights from data. While these two research communities have developed largely independently, they aim to achieve similar goals: the accurate and sample-efficient estimation of model parameters. In the partially linear regression (PLR) setting, Mackey et al. (2018) recently found that estimation consistency can be improved with non-Gaussian treatment noise. Non-Gaussianity is also a crucial assumption for identifying latent factors in ICA. We provide the first theoretical and empirical insights into this connection, showing that ICA can be used for causal effect estimation in the PLR model. Surprisingly, we find that linear ICA can accurately estimate multiple treatment effects even in the presence of Gaussian confounders or nonlinear nuisance.

cross Adaptive Bayesian Single-Shot Quantum Sensing

Authors: Ivana Nikoloska, Ruud Van Sloun, Osvaldo Simeone

Abstract: Quantum sensing harnesses the unique properties of quantum systems to enable precision measurements of physical quantities such as time, magnetic and electric fields, acceleration, and gravitational gradients well beyond the limits of classical sensors. However, identifying suitable sensing probes and measurement schemes can be a classically intractable task, as it requires optimizing over Hilbert spaces of high dimension. In variational quantum sensing, a probe quantum system is generated via a parameterized quantum circuit (PQC), exposed to an unknown physical parameter through a quantum channel, and measured to collect classical data. PQCs and measurements are typically optimized using offline strategies based on frequentist learning criteria. This paper introduces an adaptive protocol that uses Bayesian inference to optimize the sensing policy via the maximization of the active information gain. The proposed variational methodology is tailored for non-asymptotic regimes where a single probe can be deployed in each time step, and is extended to support the fusion of estimates from multiple quantum sensing agents.

cross Combining Language and Topic Models for Hierarchical Text Classification

Authors: Jaco du Toit, Marcel Dunaiski

Abstract: Hierarchical text classification (HTC) is a natural language processing task which has the objective of categorising text documents into a set of classes from a predefined structured class hierarchy. Recent HTC approaches use various techniques to incorporate the hierarchical class structure information with the natural language understanding capabilities of pre-trained language models (PLMs) to improve classification performance. Furthermore, using topic models along with PLMs to extract features from text documents has been shown to be an effective approach for multi-label text classification tasks. The rationale behind the combination of these feature extractor models is that the PLM captures the finer-grained contextual and semantic information while the topic model obtains high-level representations which consider the corpus of documents as a whole. In this paper, we use a HTC approach which uses a PLM and a topic model to extract features from text documents which are used to train a classification model. Our objective is to determine whether the combination of the features extracted from the two models is beneficial to HTC performance in general. In our approach, the extracted features are passed through separate convolutional layers whose outputs are combined and passed to a label-wise attention mechanisms which obtains label-specific document representations by weighing the most important features for each class separately. We perform comprehensive experiments on three HTC benchmark datasets and show that using the features extracted from the topic model generally decreases classification performance compared to only using the features obtained by the PLM. In contrast to previous work, this shows that the incorporation of features extracted from topic models for text classification tasks should not be assumed beneficial.

cross C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning

Authors: Xiuwei Chen, Wentao Hu, Hanhui Li, Jun Zhou, Zisheng Chen, Meng Cao, Yihan Zeng, Kui Zhang, Yu-Jie Yuan, Jianhua Han, Hang Xu, Xiaodan Liang

Abstract: Recent advances in multimodal large language models (MLLMs) have shown impressive reasoning capabilities. However, further enhancing existing MLLMs necessitates high-quality vision-language datasets with carefully curated task complexities, which are both costly and challenging to scale. Although recent self-improving models that iteratively refine themselves offer a feasible solution, they still suffer from two core challenges: (i) most existing methods augment visual or textual data separately, resulting in discrepancies in data complexity (e.g., over-simplified diagrams paired with redundant textual descriptions); and (ii) the evolution of data and models is also separated, leading to scenarios where models are exposed to tasks with mismatched difficulty levels. To address these issues, we propose C2-Evo, an automatic, closed-loop self-improving framework that jointly evolves both training data and model capabilities. Specifically, given a base dataset and a base model, C2-Evo enhances them by a cross-modal data evolution loop and a data-model evolution loop. The former loop expands the base dataset by generating complex multimodal problems that combine structured textual sub-problems with iteratively specified geometric diagrams, while the latter loop adaptively selects the generated problems based on the performance of the base model, to conduct supervised fine-tuning and reinforcement learning alternately. Consequently, our method continuously refines its model and training data, and consistently obtains considerable performance gains across multiple mathematical reasoning benchmarks. Our code, models, and datasets will be released.

cross Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report

Authors: Shanghai AI Lab, :, Xiaoyang Chen, Yunhao Chen, Zeren Chen, Zhiyun Chen, Hanyun Cui, Yawen Duan, Jiaxuan Guo, Qi Guo, Xuhao Hu, Hong Huang, Lige Huang, Chunxiao Li, Juncheng Li, Qihao Lin, Dongrui Liu, Xinmin Liu, Zicheng Liu, Chaochao Lu, Xiaoya Lu, Jingjing Qu, Qibing Ren, Jing Shao, Jingwei Shi, Jingwei Sun, Peng Wang, Weibing Wang, Jia Xu, Lewen Yan, Xiao Yu, Yi Yu, Boxuan Zhang, Jie Zhang, Weichen Zhang, Zhijie Zheng, Tianyi Zhou, Bowen Zhou

Abstract: To understand and identify the unprecedented risks posed by rapidly advancing artificial intelligence (AI) models, this report presents a comprehensive assessment of their frontier risks. Drawing on the E-T-C analysis (deployment environment, threat source, enabling capability) from the Frontier AI Risk Management Framework (v1.0) (SafeWork-F1-Framework), we identify critical risks in seven areas: cyber offense, biological and chemical risks, persuasion and manipulation, uncontrolled autonomous AI R\&D, strategic deception and scheming, self-replication, and collusion. Guided by the "AI-$45^\circ$ Law," we evaluate these risks using "red lines" (intolerable thresholds) and "yellow lines" (early warning indicators) to define risk zones: green (manageable risk for routine deployment and continuous monitoring), yellow (requiring strengthened mitigations and controlled deployment), and red (necessitating suspension of development and/or deployment). Experimental results show that all recent frontier AI models reside in green and yellow zones, without crossing red lines. Specifically, no evaluated models cross the yellow line for cyber offense or uncontrolled AI R\&D risks. For self-replication, and strategic deception and scheming, most models remain in the green zone, except for certain reasoning models in the yellow zone. In persuasion and manipulation, most models are in the yellow zone due to their effective influence on humans. For biological and chemical risks, we are unable to rule out the possibility of most models residing in the yellow zone, although detailed threat modeling and in-depth assessment are required to make further claims. This work reflects our current understanding of AI frontier risks and urges collective action to mitigate these challenges.

cross Alternative Loss Function in Evaluation of Transformer Models

Authors: Jakub Micha\'nk\'ow, Pawe{\l} Sakowski, Robert \'Slepaczuk

Abstract: The proper design and architecture of testing of machine learning models, especially in their application to quantitative finance problems, is crucial. The most important in this process is selecting an adequate loss function used for training, validation, estimation purposes, and tuning of hyperparameters. Therefore, in this research, through empirical experiments on equity and cryptocurrency assets, we introduce the Mean Absolute Directional Loss (MADL) function which is more adequate for optimizing forecast-generating models used in algorithmic investment strategies. The MADL function results are compared for Transformer and LSTM models and we show that almost in every case Transformer results are significantly better than those obtained with LSTM.

cross Optimization of DNN-based HSI Segmentation FPGA-based SoC for ADS: A Practical Approach

Authors: Jon Guti\'errez-Zaballa, Koldo Basterretxea, Javier Echanobe

Abstract: The use of HSI for autonomous navigation is a promising research field aimed at improving the accuracy and robustness of detection, tracking, and scene understanding systems based on vision sensors. Combining advanced computer algorithms, such as DNNs, with small-size snapshot HSI cameras enhances the reliability of these systems. HSI overcomes intrinsic limitations of greyscale and RGB imaging in depicting physical properties of targets, particularly regarding spectral reflectance and metamerism. Despite promising results in HSI-based vision developments, safety-critical systems like ADS demand strict constraints on latency, resource consumption, and security, motivating the shift of ML workloads to edge platforms. This involves a thorough software/hardware co-design scheme to distribute and optimize the tasks efficiently among the limited resources of computing platforms. With respect to inference, the over-parameterized nature of DNNs poses significant computational challenges for real-time on-the-edge deployment. In addition, the intensive data preprocessing required by HSI, which is frequently overlooked, must be carefully managed in terms of memory arrangement and inter-task communication to enable an efficient integrated pipeline design on a SoC. This work presents a set of optimization techniques for the practical co-design of a DNN-based HSI segmentation processor deployed on a FPGA-based SoC targeted at ADS, including key optimizations such as functional software/hardware task distribution, hardware-aware preprocessing, ML model compression, and a complete pipelined deployment. Applied compression techniques significantly reduce the complexity of the designed DNN to 24.34% of the original operations and to 1.02% of the original number of parameters, achieving a 2.86x speed-up in the inference task without noticeable degradation of the segmentation accuracy.

cross Exploring Gender Bias in Large Language Models: An In-depth Dive into the German Language

Authors: Kristin Gnadt, David Thulke, Simone Kopeinik, Ralf Schl\"uter

Abstract: In recent years, various methods have been proposed to evaluate gender bias in large language models (LLMs). A key challenge lies in the transferability of bias measurement methods initially developed for the English language when applied to other languages. This work aims to contribute to this research strand by presenting five German datasets for gender bias evaluation in LLMs. The datasets are grounded in well-established concepts of gender bias and are accessible through multiple methodologies. Our findings, reported for eight multilingual LLM models, reveal unique challenges associated with gender bias in German, including the ambiguous interpretation of male occupational terms and the influence of seemingly neutral nouns on gender perception. This work contributes to the understanding of gender bias in LLMs across languages and underscores the necessity for tailored evaluation frameworks.

cross Automatic Fine-grained Segmentation-assisted Report Generation

Authors: Frederic Jonske, Constantin Seibold, Osman Alperen Koras, Fin Bahnsen, Marie Bauer, Amin Dada, Hamza Kalisch, Anton Schily, Jens Kleesiek

Abstract: Reliable end-to-end clinical report generation has been a longstanding goal of medical ML research. The end goal for this process is to alleviate radiologists' workloads and provide second opinions to clinicians or patients. Thus, a necessary prerequisite for report generation models is a strong general performance and some type of innate grounding capability, to convince clinicians or patients of the veracity of the generated reports. In this paper, we present ASaRG (\textbf{A}utomatic \textbf{S}egmentation-\textbf{a}ssisted \textbf{R}eport \textbf{G}eneration), an extension of the popular LLaVA architecture that aims to tackle both of these problems. ASaRG proposes to fuse intermediate features and fine-grained segmentation maps created by specialist radiological models into LLaVA's multi-modal projection layer via simple concatenation. With a small number of added parameters, our approach achieves a +0.89\% performance gain ($p=0.012$) in CE F1 score compared to the LLaVA baseline when using only intermediate features, and +2.77\% performance gain ($p<0.001$) when adding a combination of intermediate features and fine-grained segmentation maps. Compared with COMG and ORID, two other report generation methods that utilize segmentations, the performance gain amounts to 6.98\% and 6.28\% in F1 score, respectively. ASaRG is not mutually exclusive with other changes made to the LLaVA architecture, potentially allowing our method to be combined with other advances in the field. Finally, the use of an arbitrary number of segmentations as part of the input demonstrably allows tracing elements of the report to the corresponding segmentation maps and verifying the groundedness of assessments. Our code will be made publicly available at a later date.

cross Towards Automated Regulatory Compliance Verification in Financial Auditing with Large Language Models

Authors: Armin Berger, Lars Hillebrand, David Leonhard, Tobias Deu{\ss}er, Thiago Bell Felix de Oliveira, Tim Dilmaghani, Mohamed Khaled, Bernd Kliem, R\"udiger Loitz, Christian Bauckhage, Rafet Sifa

Abstract: The auditing of financial documents, historically a labor-intensive process, stands on the precipice of transformation. AI-driven solutions have made inroads into streamlining this process by recommending pertinent text passages from financial reports to align with the legal requirements of accounting standards. However, a glaring limitation remains: these systems commonly fall short in verifying if the recommended excerpts indeed comply with the specific legal mandates. Hence, in this paper, we probe the efficiency of publicly available Large Language Models (LLMs) in the realm of regulatory compliance across different model configurations. We place particular emphasis on comparing cutting-edge open-source LLMs, such as Llama-2, with their proprietary counterparts like OpenAI's GPT models. This comparative analysis leverages two custom datasets provided by our partner PricewaterhouseCoopers (PwC) Germany. We find that the open-source Llama-2 70 billion model demonstrates outstanding performance in detecting non-compliance or true negative occurrences, beating all their proprietary counterparts. Nevertheless, proprietary models such as GPT-4 perform the best in a broad variety of scenarios, particularly in non-English contexts.

cross Deep Unfolding Network for Nonlinear Multi-Frequency Electrical Impedance Tomography

Authors: Giovanni S. Alberti, Damiana Lazzaro, Serena Morigi, Luca Ratti, Matteo Santacesaria

Abstract: Multi-frequency Electrical Impedance Tomography (mfEIT) represents a promising biomedical imaging modality that enables the estimation of tissue conductivities across a range of frequencies. Addressing this challenge, we present a novel variational network, a model-based learning paradigm that strategically merges the advantages and interpretability of classical iterative reconstruction with the power of deep learning. This approach integrates graph neural networks (GNNs) within the iterative Proximal Regularized Gauss Newton (PRGN) framework. By unrolling the PRGN algorithm, where each iteration corresponds to a network layer, we leverage the physical insights of nonlinear model fitting alongside the GNN's capacity to capture inter-frequency correlations. Notably, the GNN architecture preserves the irregular triangular mesh structure used in the solution of the nonlinear forward model, enabling accurate reconstruction of overlapping tissue fraction concentrations.

cross Structural Effect and Spectral Enhancement of High-Dimensional Regularized Linear Discriminant Analysis

Authors: Yonghan Zhang, Zhangni Pu, Lu Yan, Jiang Hu

Abstract: Regularized linear discriminant analysis (RLDA) is a widely used tool for classification and dimensionality reduction, but its performance in high-dimensional scenarios is inconsistent. Existing theoretical analyses of RLDA often lack clear insight into how data structure affects classification performance. To address this issue, we derive a non-asymptotic approximation of the misclassification rate and thus analyze the structural effect and structural adjustment strategies of RLDA. Based on this, we propose the Spectral Enhanced Discriminant Analysis (SEDA) algorithm, which optimizes the data structure by adjusting the spiked eigenvalues of the population covariance matrix. By developing a new theoretical result on eigenvectors in random matrix theory, we derive an asymptotic approximation on the misclassification rate of SEDA. The bias correction algorithm and parameter selection strategy are then obtained. Experiments on synthetic and real datasets show that SEDA achieves higher classification accuracy and dimensionality reduction compared to existing LDA methods.

cross Interpretable Topic Extraction and Word Embedding Learning using row-stochastic DEDICOM

Authors: Lars Hillebrand, David Biesner, Christian Bauckhage, Rafet Sifa

Abstract: The DEDICOM algorithm provides a uniquely interpretable matrix factorization method for symmetric and asymmetric square matrices. We employ a new row-stochastic variation of DEDICOM on the pointwise mutual information matrices of text corpora to identify latent topic clusters within the vocabulary and simultaneously learn interpretable word embeddings. We introduce a method to efficiently train a constrained DEDICOM algorithm and a qualitative evaluation of its topic modeling and word embedding performance.

cross Pixel-Resolved Long-Context Learning for Turbulence at Exascale: Resolving Small-scale Eddies Toward the Viscous Limit

Authors: Junqi Yin, Mijanur Palash, M. Paul Laiu, Muralikrishnan Gopalakrishnan Meena, John Gounley, Stephen M. de Bruyn Kops, Feiyi Wang, Ramanan Sankaran, Pei Zhang

Abstract: Turbulence plays a crucial role in multiphysics applications, including aerodynamics, fusion, and combustion. Accurately capturing turbulence's multiscale characteristics is essential for reliable predictions of multiphysics interactions, but remains a grand challenge even for exascale supercomputers and advanced deep learning models. The extreme-resolution data required to represent turbulence, ranging from billions to trillions of grid points, pose prohibitive computational costs for models based on architectures like vision transformers. To address this challenge, we introduce a multiscale hierarchical Turbulence Transformer that reduces sequence length from billions to a few millions and a novel RingX sequence parallelism approach that enables scalable long-context learning. We perform scaling and science runs on the Frontier supercomputer. Our approach demonstrates excellent performance up to 1.1 EFLOPS on 32,768 AMD GPUs, with a scaling efficiency of 94%. To our knowledge, this is the first AI model for turbulence that can capture small-scale eddies down to the dissipative range.

cross Multi-objective Portfolio Optimization Via Gradient Descent

Authors: Christian Oliva, Pedro R. Ventura, Luis F. Lago-Fern\'andez

Abstract: Traditional approaches to portfolio optimization, often rooted in Modern Portfolio Theory and solved via quadratic programming or evolutionary algorithms, struggle with scalability or flexibility, especially in scenarios involving complex constraints, large datasets and/or multiple conflicting objectives. To address these challenges, we introduce a benchmark framework for multi-objective portfolio optimization (MPO) using gradient descent with automatic differentiation. Our method supports any optimization objective, such as minimizing risk measures (e.g., CVaR) or maximizing Sharpe ratio, along with realistic constraints, such as tracking error limits, UCITS regulations, or asset group restrictions. We have evaluated our framework across six experimental scenarios, from single-objective setups to complex multi-objective cases, and have compared its performance against standard solvers like CVXPY and SKFOLIO. Our results show that our method achieves competitive performance while offering enhanced flexibility for modeling multiple objectives and constraints. We aim to provide a practical and extensible tool for researchers and practitioners exploring advanced portfolio optimization problems in real-world conditions.

cross Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning

Authors: Ang Li, Charles Wang, Kaiyu Yue, Zikui Cai, Ollie Liu, Deqing Fu, Peng Guo, Wang Bill Zhu, Vatsal Sharan, Robin Jia, Willie Neiswanger, Furong Huang, Tom Goldstein, Micah Goldblum

Abstract: Humans often use visual aids, for example diagrams or sketches, when solving complex problems. Training multimodal models to do the same, known as Visual Chain of Thought (Visual CoT), is challenging due to: (1) poor off-the-shelf visual CoT performance, which hinders reinforcement learning, and (2) the lack of high-quality visual CoT training data. We introduce $\textbf{Zebra-CoT}$, a diverse large-scale dataset with 182,384 samples, containing logically coherent interleaved text-image reasoning traces. We focus on four categories of tasks where sketching or visual reasoning is especially natural, spanning scientific questions such as geometry, physics, and algorithms; 2D visual reasoning tasks like visual search and jigsaw puzzles; 3D reasoning tasks including 3D multi-hop inference, embodied and robot planning; visual logic problems and strategic games like chess. Fine-tuning the Anole-7B model on the Zebra-CoT training corpus results in an improvement of +12% in our test-set accuracy and yields up to +13% performance gain on standard VLM benchmark evaluations. Fine-tuning Bagel-7B yields a model that generates high-quality interleaved visual reasoning chains, underscoring Zebra-CoT's effectiveness for developing multimodal reasoning abilities. We open-source our dataset and models to support development and evaluation of visual CoT.

cross Faithful, Interpretable Chest X-ray Diagnosis with Anti-Aliased B-cos Networks

Authors: Marcel Kleinmann, Shashank Agnihotri, Margret Keuper

Abstract: Faithfulness and interpretability are essential for deploying deep neural networks (DNNs) in safety-critical domains such as medical imaging. B-cos networks offer a promising solution by replacing standard linear layers with a weight-input alignment mechanism, producing inherently interpretable, class-specific explanations without post-hoc methods. While maintaining diagnostic performance competitive with state-of-the-art DNNs, standard B-cos models suffer from severe aliasing artifacts in their explanation maps, making them unsuitable for clinical use where clarity is essential. Additionally, the original B-cos formulation is limited to multi-class settings, whereas chest X-ray analysis often requires multi-label classification due to co-occurring abnormalities. In this work, we address both limitations: (1) we introduce anti-aliasing strategies using FLCPooling (FLC) and BlurPool (BP) to significantly improve explanation quality, and (2) we extend B-cos networks to support multi-label classification. Our experiments on chest X-ray datasets demonstrate that the modified $\text{B-cos}_\text{FLC}$ and $\text{B-cos}_\text{BP}$ preserve strong predictive performance while providing faithful and artifact-free explanations suitable for clinical application in multi-label settings. Code available at: $\href{https://github.com/mkleinma/B-cos-medical-paper}{GitHub repository}$.

URLs: https://github.com/mkleinma/B-cos-medical-paper

cross Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning

Authors: Yanjun Zheng, Xiyang Du, Longfei Liao, Xiaoke Zhao, Zhaowen Zhou, Bo Zhang, Jiawei Liu, Xiang Qi, Zhe Li, Zhiqiang Zhang, Wei Wang, Peng Zhang

Abstract: Large Language Models (LLMs) exhibit considerable promise in financial applications; however, prevailing models frequently demonstrate limitations when confronted with scenarios that necessitate sophisticated reasoning capabilities, stringent trustworthiness criteria, and efficient adaptation to domain-specific requirements. We introduce the Agentar-Fin-R1 series of financial large language models (8B and 32B parameters), specifically engineered based on the Qwen3 foundation model to enhance reasoning capabilities, reliability, and domain specialization for financial applications. Our optimization approach integrates a high-quality, systematic financial task label system with a comprehensive multi-layered trustworthiness assurance framework. This framework encompasses high-quality trustworthy knowledge engineering, multi-agent trustworthy data synthesis, and rigorous data validation governance. Through label-guided automated difficulty-aware optimization, tow-stage training pipeline, and dynamic attribution systems, we achieve substantial improvements in training efficiency. Our models undergo comprehensive evaluation on mainstream financial benchmarks including Fineva, FinEval, and FinanceIQ, as well as general reasoning datasets such as MATH-500 and GPQA-diamond. To thoroughly assess real-world deployment capabilities, we innovatively propose the Finova evaluation benchmark, which focuses on agent-level financial reasoning and compliance verification. Experimental results demonstrate that Agentar-Fin-R1 not only achieves state-of-the-art performance on financial tasks but also exhibits exceptional general reasoning capabilities, validating its effectiveness as a trustworthy solution for high-stakes financial applications. The Finova bench is available at https://github.com/antgroup/Finova.

URLs: https://github.com/antgroup/Finova.

cross MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning

Authors: Run-Ze Fan, Zengzhi Wang, Pengfei Liu

Abstract: Scientific reasoning is critical for developing AI scientists and supporting human researchers in advancing the frontiers of natural science discovery. However, the open-source community has primarily focused on mathematics and coding while neglecting the scientific domain, largely due to the absence of open, large-scale, high-quality, verifiable scientific reasoning datasets. To bridge this gap, we first present TextbookReasoning, an open dataset featuring truthful reference answers extracted from 12k university-level scientific textbooks, comprising 650k reasoning questions spanning 7 scientific disciplines. We further introduce MegaScience, a large-scale mixture of high-quality open-source datasets totaling 1.25 million instances, developed through systematic ablation studies that evaluate various data selection methodologies to identify the optimal subset for each publicly available scientific dataset. Meanwhile, we build a comprehensive evaluation system covering diverse subjects and question types across 15 benchmarks, incorporating comprehensive answer extraction strategies to ensure accurate evaluation metrics. Our experiments demonstrate that our datasets achieve superior performance and training efficiency with more concise response lengths compared to existing open-source scientific datasets. Furthermore, we train Llama3.1, Qwen2.5, and Qwen3 series base models on MegaScience, which significantly outperform the corresponding official instruct models in average performance. In addition, MegaScience exhibits greater effectiveness for larger and stronger models, suggesting a scaling benefit for scientific tuning. We release our data curation pipeline, evaluation system, datasets, and seven trained models to the community to advance scientific reasoning research.

cross ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

Authors: Chi-Pin Huang, Yueh-Hua Wu, Min-Hung Chen, Yu-Chiang Frank Wang, Fu-En Yang

Abstract: Vision-language-action (VLA) reasoning tasks require agents to interpret multimodal instructions, perform long-horizon planning, and act adaptively in dynamic environments. Existing approaches typically train VLA models in an end-to-end fashion, directly mapping inputs to actions without explicit reasoning, which hinders their ability to plan over multiple steps or adapt to complex task variations. In this paper, we propose ThinkAct, a dual-system framework that bridges high-level reasoning with low-level action execution via reinforced visual latent planning. ThinkAct trains a multimodal LLM to generate embodied reasoning plans guided by reinforcing action-aligned visual rewards based on goal completion and trajectory consistency. These reasoning plans are compressed into a visual plan latent that conditions a downstream action model for robust action execution on target environments. Extensive experiments on embodied reasoning and robot manipulation benchmarks demonstrate that ThinkAct enables few-shot adaptation, long-horizon planning, and self-correction behaviors in complex embodied AI tasks.

replace Learning from Data Streams: An Overview and Update

Authors: Jesse Read, Indr\.e \v{Z}liobait\.e

Abstract: The literature on machine learning in the context of data streams is vast and growing. However, many of the defining assumptions regarding data-stream learning tasks are too strong to hold in practice, or are even contradictory such that they cannot be met in the contexts of supervised learning. Algorithms are chosen and designed based on criteria which are often not clearly stated, for problem settings not clearly defined, tested in unrealistic settings, and/or in isolation from related approaches in the wider literature. This puts into question the potential for real-world impact of many approaches conceived in such contexts, and risks propagating a misguided research focus. We propose to tackle these issues by reformulating the fundamental definitions and settings of supervised data-stream learning with regard to contemporary considerations of concept drift and temporal dependence; and we take a fresh look at what constitutes a supervised data-stream learning task, and a reconsideration of algorithms that may be applied to tackle such tasks. Through and in reflection of this formulation and overview, helped by an informal survey of industrial players dealing with real-world data streams, we provide recommendations. Our main emphasis is that learning from data streams does not impose a single-pass or online-learning approach, or any particular learning regime; and any constraints on memory and time are not specific to streaming. Meanwhile, there exist established techniques for dealing with temporal dependence and concept drift, in other areas of the literature. For the data streams community, we thus encourage a shift in research focus, from dealing with often-artificial constraints and assumptions on the learning mode, to issues such as robustness, privacy, and interpretability which are increasingly relevant to learning in data streams in academic and industrial settings.

replace Energy-Efficient and Real-Time Sensing for Federated Continual Learning via Sample-Driven Control

Authors: Minh Ngoc Luu, Minh-Duong Nguyen, Ebrahim Bedeer, Van Duc Nguyen, Dinh Thai Hoang, Diep N. Nguyen, Quoc-Viet Pham

Abstract: An intelligent Real-Time Sensing (RTS) system must continuously acquire, update, integrate, and apply knowledge to adapt to real-world dynamics. Managing distributed intelligence in this context requires Federated Continual Learning (FCL). However, effectively capturing the diverse characteristics of RTS data in FCL systems poses significant challenges, including severely impacting computational and communication resources, escalating energy costs, and ultimately degrading overall system performance. To overcome these challenges, we investigate how the data distribution shift from ideal to practical RTS scenarios affects Artificial Intelligence (AI) model performance by leveraging the \textit{generalization gap} concept. In this way, we can analyze how sampling time in RTS correlates with the decline in AI performance, computation cost, and communication efficiency. Based on this observation, we develop a novel Sample-driven Control for Federated Continual Learning (SCFL) technique, specifically designed for mobile edge networks with RTS capabilities. In particular, SCFL is an optimization problem that harnesses the sampling process to concurrently minimize the generalization gap and improve overall accuracy while upholding the energy efficiency of the FCL framework. To solve the highly complex and time-varying optimization problem, we introduce a new soft actor-critic algorithm with explicit and implicit constraints (A2C-EI). Our empirical experiments reveal that we can achieve higher efficiency compared to other DRL baselines. Notably, SCFL can significantly reduce energy consumption up to $85\%$ while maintaining FL convergence and timely data transmission.

replace Practical Insights into Knowledge Distillation for Pre-Trained Models

Authors: Norah Alballa, Ahmed M. Abdelmoniem, Marco Canini

Abstract: This research investigates the enhancement of knowledge distillation (KD) processes in pre-trained models, an emerging field in knowledge transfer with significant implications for distributed training and federated learning environments. These environments benefit from reduced communication demands and accommodate various model architectures. Despite the adoption of numerous KD approaches for transferring knowledge among pre-trained models, a comprehensive understanding of KD's application in these scenarios is lacking. Our study conducts an extensive comparison of multiple KD techniques, including standard KD, tuned KD (via optimized temperature and weight parameters), deep mutual learning, and data partitioning KD. We assess these methods across various data distribution strategies to identify the most effective contexts for each. Through detailed examination of hyperparameter tuning, informed by extensive grid search evaluations, we pinpoint when adjustments are crucial to enhance model performance. This paper sheds light on optimal hyperparameter settings for distinct data partitioning scenarios and investigates KD's role in improving federated learning by minimizing communication rounds and expediting the training process. By filling a notable void in current research, our findings serve as a practical framework for leveraging KD in pre-trained models within collaborative and federated learning frameworks.

replace DP-TLDM: Differentially Private Tabular Latent Diffusion Model

Authors: Chaoyi Zhu, Jiayi Tang, Juan F. P\'erez, Marten van Dijk, Lydia Y. Chen

Abstract: Synthetic data from generative models emerges as the privacy-preserving data sharing solution. Such a synthetic data set shall resemble the original data without revealing identifiable private information. Till date, the prior focus on limited types of tabular synthesizers and a small number of privacy attacks, particularly on Generative Adversarial Networks, and overlooks membership inference attacks and defense strategies, i.e., differential privacy. Motivated by the conundrum of keeping high data quality and low privacy risk of synthetic data tables, we propose DPTLDM, Differentially Private Tabular Latent Diffusion Model, which is composed of an autoencoder network to encode the tabular data and a latent diffusion model to synthesize the latent tables. Following the emerging f-DP framework, we apply DP-SGD to train the auto-encoder in combination with batch clipping and use the separation value as the privacy metric to better capture the privacy gain from DP algorithms. Our empirical evaluation demonstrates that DPTLDM is capable of achieving a meaningful theoretical privacy guarantee while also significantly enhancing the utility of synthetic data. Specifically, compared to other DP-protected tabular generative models, DPTLDM improves the synthetic quality by an average of 35% in data resemblance, 15% in the utility for downstream tasks, and 50% in data discriminability, all while preserving a comparable level of privacy risk.

replace Learning Neural Differential Algebraic Equations via Operator Splitting

Authors: James Koch, Madelyn Shapiro, Himanshu Sharma, Draguna Vrabie, Jan Drgona

Abstract: Differential algebraic equations (DAEs) describe the temporal evolution of systems that obey both differential and algebraic constraints. Of particular interest are systems that contain implicit relationships between their components, such as conservation laws. Here, we present an Operator Splitting (OS) numerical integration scheme for learning unknown components of DAEs from time-series data. In this work, we show that the proposed OS-based time-stepping scheme is suitable for relevant system-theoretic data-driven modeling tasks. Presented examples include (i) the inverse problem of tank-manifold dynamics and (ii) discrepancy modeling of a network of pumps, tanks, and pipes. Our experiments demonstrate the proposed method's robustness to noise and extrapolation ability to (i) learn the behaviors of the system components and their interaction physics and (ii) disambiguate between data trends and mechanistic relationships contained in the system.

replace Unisolver: PDE-Conditional Transformers Are Universal PDE Solvers

Authors: Hang Zhou, Yuezhou Ma, Haixu Wu, Haowen Wang, Mingsheng Long

Abstract: Deep models have recently emerged as promising tools to solve partial differential equations (PDEs), known as neural PDE solvers. While neural solvers trained from either simulation data or physics-informed loss can solve PDEs reasonably well, they are mainly restricted to a few instances of PDEs, e.g. a certain equation with a limited set of coefficients. This limits their generalization to diverse PDEs, preventing them from being practical surrogate models of numerical solvers. In this paper, we present Unisolver, a novel Transformer model trained on diverse data and conditioned on diverse PDEs, aiming towards a universal neural PDE solver capable of solving a wide scope of PDEs. Instead of purely scaling up data and parameters, Unisolver stems from the theoretical analysis of the PDE-solving process. Inspired by the mathematical structure of PDEs that a PDE solution is fundamentally governed by a series of PDE components such as equation symbols and boundary conditions, we define a complete set of PDE components and flexibly embed them as domain-wise and point-wise deep conditions for Transformer PDE solvers. Integrating physical insights with recent Transformer advances, Unisolver achieves consistent state-of-the-art on three challenging large-scale benchmarks, showing impressive performance and generalizability. Code is available at https://github.com/thuml/Unisolver.

URLs: https://github.com/thuml/Unisolver.

replace Graph Neural Networks Gone Hogwild

Authors: Olga Solodova, Nick Richardson, Deniz Oktay, Ryan P. Adams

Abstract: Graph neural networks (GNNs) appear to be powerful tools to learn state representations for agents in distributed, decentralized multi-agent systems, but generate catastrophically incorrect predictions when nodes update asynchronously during inference. This failure under asynchrony effectively excludes these architectures from many potential applications where synchrony is difficult or impossible to enforce, e.g., robotic swarms or sensor networks. In this work we identify "implicitly-defined" GNNs as a class of architectures which is provably robust to asynchronous "hogwild" inference, adapting convergence guarantees from work in asynchronous and distributed optimization. We then propose a novel implicitly-defined GNN architecture, which we call an 'energy GNN'. We show that this architecture outperforms other GNNs from this class on a variety of synthetic tasks inspired by multi-agent systems.

replace Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance

Authors: Ao Shen, Qiang Wang, Zhiquan Lai, Xionglve Li, Dongsheng Li

Abstract: Large Language Models (LLMs) have demonstrated impressive performance across various domains. However, the enormous number of model parameters makes fine-tuning challenging, significantly limiting their application and deployment. Existing solutions combine parameter quantization with Low-Rank Adaptation (LoRA), reducing memory usage but causing performance degradation. Additionally, converting fine-tuned models to low-precision representations further degrades performance. In this paper, we identify an imbalance in fine-tuning quantized LLMs with LoRA: overly complex adapter inputs and outputs versus low effective trainability of the adapter, leading to underfitting during fine-tuning. Thus, we propose Quantized LLMs fine-tuning with Balanced Low-Rank Adaptation (Q-BLoRA), which simplifies the adapter's inputs and outputs while increasing the adapter's rank to alleviate underfitting during fine-tuning. For low-precision deployment, we propose Quantization-Aware fine-tuning with Balanced Low-Rank Adaptation (QA-BLoRA), which aligns with the block-wise quantization and facilitates quantization-aware fine-tuning of low-rank adaptation based on the parameter merging of Q-BLoRA. Both Q-BLoRA and QA-BLoRA are easily implemented and offer the following optimizations: (i) Q-BLoRA consistently achieves state-of-the-art accuracy compared to baselines and other variants; (ii) QA-BLoRA enables the direct generation of low-precision inference models, which exhibit significant performance improvements over other low-precision models. We validate the effectiveness of Q-BLoRA and QA-BLoRA across various models and scenarios. Code will be made available at \href{https://github.com/xiaocaigou/qbaraqahira}{https://github.com/xiaocaigou/qbaraqahira}

URLs: https://github.com/xiaocaigou/qbaraqahira, https://github.com/xiaocaigou/qbaraqahira

replace FLAIN: Mitigating Backdoor Attacks in Federated Learning via Flipping Weight Updates of Low-Activation Input Neurons

Authors: Binbin Ding, Penghui Yang, Sheng-Jun Huang

Abstract: Federated learning (FL) enables multiple clients to collaboratively train machine learning models under the coordination of a central server, while maintaining privacy. However, the server cannot directly monitor the local training processes, leaving room for malicious clients to introduce backdoors into the model. Research has shown that backdoor attacks exploit specific neurons that are activated only by malicious inputs, remaining dormant with clean data. Building on this insight, we propose a novel defense method called Flipping Weight Updates of Low-Activation Input Neurons (FLAIN) to counter backdoor attacks in FL. Specifically, upon the completion of global training, we use an auxiliary dataset to identify low-activation input neurons and iteratively flip their associated weight updates. This flipping process continues while progressively raising the threshold for low-activation neurons, until the model's performance on the auxiliary data begins to degrade significantly. Extensive experiments demonstrate that FLAIN effectively reduces the success rate of backdoor attacks across a variety of scenarios, including Non-IID data distributions and high malicious client ratios (MCR), while maintaining minimal impact on the performance of clean data.

replace Exploring How Generative MLLMs Perceive More Than CLIP with the Same Vision Encoder

Authors: Siting Li, Pang Wei Koh, Simon Shaolei Du

Abstract: Recent research has shown that CLIP models struggle with visual reasoning tasks that require grounding compositionality, understanding spatial relationships, or capturing fine-grained details. One natural hypothesis is that the CLIP vision encoder does not embed essential information for these tasks. However, we find that this is not always the case: The encoder gathers query-relevant visual information, while CLIP fails to extract it. In particular, we show that another branch of Vision-Language Models (VLMs), Generative Multimodal Large Language Models (MLLMs), achieve significantly higher accuracy than CLIP in many of these tasks using the same vision encoder and weights, indicating that these Generative MLLMs perceive more -- as they extract and utilize visual information more effectively. We conduct a series of controlled experiments and reveal that their success is attributed to multiple key design choices, including patch tokens, position embeddings, and prompt-based weighting. On the other hand, enhancing the training data alone or applying a stronger text encoder does not suffice to solve the task, and additional text tokens offer little benefit. Interestingly, we find that fine-grained visual reasoning is not exclusive to generative models trained by an autoregressive loss: When converted into CLIP-like encoders by contrastive finetuning, these MLLMs still outperform CLIP under the same cosine similarity-based evaluation protocol. Our study highlights the importance of VLM architectural choices and suggests directions for improving the performance of CLIP-like contrastive VLMs.

replace Streamlining Prediction in Bayesian Deep Learning

Authors: Rui Li, Marcus Klasson, Arno Solin, Martin Trapp

Abstract: The rising interest in Bayesian deep learning (BDL) has led to a plethora of methods for estimating the posterior distribution. However, efficient computation of inferences, such as predictions, has been largely overlooked with Monte Carlo integration remaining the standard. In this work we examine streamlining prediction in BDL through a single forward pass without sampling. For this we use local linearisation on activation functions and local Gaussian approximations at linear layers. Thus allowing us to analytically compute an approximation to the posterior predictive distribution. We showcase our approach for both MLP and transformers, such as ViT and GPT-2, and assess its performance on regression and classification tasks. Open-source library: https://github.com/AaltoML/SUQ

URLs: https://github.com/AaltoML/SUQ

replace Edge of Stochastic Stability: Revisiting the Edge of Stability for SGD

Authors: Arseniy Andreyev, Pierfrancesco Beneventano

Abstract: Recent findings by Cohen et al., 2021, demonstrate that when training neural networks with full-batch gradient descent with a step size of $\eta$, the largest eigenvalue $\lambda_{\max}$ of the full-batch Hessian consistently stabilizes at $\lambda_{\max} = 2/\eta$. These results have significant implications for convergence and generalization. This, however, is not the case of mini-batch stochastic gradient descent (SGD), limiting the broader applicability of its consequences. We show that SGD trains in a different regime we term Edge of Stochastic Stability (EoSS). In this regime, what stabilizes at $2/\eta$ is *Batch Sharpness*: the expected directional curvature of mini-batch Hessians along their corresponding stochastic gradients. As a consequence $\lambda_{\max}$ -- which is generally smaller than Batch Sharpness -- is suppressed, aligning with the long-standing empirical observation that smaller batches and larger step sizes favor flatter minima. We further discuss implications for mathematical modeling of SGD trajectories.

replace Soft Computing Approaches for Predicting Shade-Seeking Behaviour in Dairy Cattle under Heat Stress: A Comparative Study of Random Forests and Neural Networks

Authors: S. Sanjuan, D. A. M\'endez, R. Arnau, J. M. Calabuig, X. D\'iaz de Ot\'alora Aguirre, F. Estell\'es

Abstract: Heat stress is one of the main welfare and productivity problems faced by dairy cattle in Mediterranean climates. In this study, we approach the prediction of the daily shade-seeking count as a non-linear multivariate regression problem and evaluate two soft computing algorithms -- Random Forests and Neural Networks -- trained on high-resolution behavioral and micro-climatic data collected in a commercial farm in Titaguas (Valencia, Spain) during the 2023 summer season. The raw dataset (6907 daytime observations, 5-10 min resolution) includes the number of cows in the shade, ambient temperature and relative humidity. From these we derive three features: current Temperature--Humidity Index (THI), accumulated daytime THI, and mean night-time THI. To evaluate the models' performance a 5-fold cross-validation is also used. Results show that both soft computing models outperform a single Decision Tree baseline. The best Neural Network (3 hidden layers, 16 neurons each, learning rate = 10e-3) reaches an average RMSE of 14.78, while a Random Forest (10 trees, depth = 5) achieves 14.97 and offers best interpretability. Daily error distributions reveal a median RMSE of 13.84 and confirm that predictions deviate less than one hour from observed shade-seeking peaks. These results demonstrate the suitability of soft computing, data-driven approaches embedded in an applied-mathematical feature framework for modeling noisy biological phenomena, demonstrating their value as low-cost, real-time decision-support tools for precision livestock farming under heat-stress conditions.

replace Learning to Bid in Non-Stationary Repeated First-Price Auctions

Authors: Zihao Hu, Xiaoyu Fan, Yuan Yao, Jiheng Zhang, Zhengyuan Zhou

Abstract: First-price auctions have recently gained significant traction in digital advertising markets, exemplified by Google's transition from second-price to first-price auctions. Unlike in second-price auctions, where bidding one's private valuation is a dominant strategy, determining an optimal bidding strategy in first-price auctions is more complex. From a learning perspective, the learner (a specific bidder) can interact with the environment (other bidders, i.e., opponents) sequentially to infer their behaviors. Existing research often assumes specific environmental conditions and benchmarks performance against the best fixed policy (static benchmark). While this approach ensures strong learning guarantees, the static benchmark can deviate significantly from the optimal strategy in environments with even mild non-stationarity. To address such scenarios, a dynamic benchmark--representing the sum of the highest achievable rewards at each time step--offers a more suitable objective. However, achieving no-regret learning with respect to the dynamic benchmark requires additional constraints. By inspecting reward functions in online first-price auctions, we introduce two metrics to quantify the regularity of the sequence of opponents' highest bids, which serve as measures of non-stationarity. We provide a minimax-optimal characterization of the dynamic regret for the class of sequences of opponents' highest bids that satisfy either of these regularity constraints. Our main technical tool is the Optimistic Mirror Descent (OMD) framework with a novel optimism configuration, which is well-suited for achieving minimax-optimal dynamic regret rates in this context. We then use synthetic datasets to validate our theoretical guarantees and demonstrate that our methods outperform existing ones.

replace BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning

Authors: Haiteng Zhao, Chang Ma, Fangzhi Xu, Lingpeng Kong, Zhi-Hong Deng

Abstract: The applications of large language models (LLMs) in various biological domains have been explored recently, but their reasoning ability in complex biological systems, such as pathways, remains underexplored, which is crucial for predicting biological phenomena, formulating hypotheses, and designing experiments. This work explores the potential of LLMs in pathway reasoning. We introduce BioMaze, a dataset with 5.1K complex pathway problems derived from real research, covering various biological contexts including natural dynamic changes, disturbances, additional intervention conditions, and multi-scale research targets. Our evaluation of methods such as CoT and graph-augmented reasoning, shows that LLMs struggle with pathway reasoning, especially in perturbed systems. To address this, we propose PathSeeker, an LLM agent that enhances reasoning through interactive subgraph-based navigation, enabling a more effective approach to handling the complexities of biological systems in a scientifically aligned manner. The dataset and code are available at https://github.com/zhao-ht/BioMaze.

URLs: https://github.com/zhao-ht/BioMaze.

replace Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $\mu$P Parametrization

Authors: Zixiang Chen, Greg Yang, Qingyue Zhao, Quanquan Gu

Abstract: Despite deep neural networks' powerful representation learning capabilities, theoretical understanding of how networks can simultaneously achieve meaningful feature learning and global convergence remains elusive. Existing approaches like the neural tangent kernel (NTK) are limited because features stay close to their initialization in this parametrization, leaving open questions about feature properties during substantial evolution. In this paper, we investigate the training dynamics of infinitely wide, $L$-layer neural networks using the tensor program (TP) framework. Specifically, we show that, when trained with stochastic gradient descent (SGD) under the Maximal Update parametrization ($\mu$P) and mild conditions on the activation function, SGD enables these networks to learn linearly independent features that substantially deviate from their initial values. This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum. Our analysis leverages both the interactions among features across layers and the properties of Gaussian random variables, providing new insights into deep representation learning. We further validate our theoretical findings through experiments on real-world datasets.

replace Antithetic Sampling for Top-k Shapley Identification

Authors: Patrick Kolpaczki, Tim Nielen, Eyke H\"ullermeier

Abstract: Additive feature explanations rely primarily on game-theoretic notions such as the Shapley value by viewing features as cooperating players. The Shapley value's popularity in and outside of explainable AI stems from its axiomatic uniqueness. However, its computational complexity severely limits practicability. Most works investigate the uniform approximation of all features' Shapley values, needlessly consuming samples for insignificant features. In contrast, identifying the $k$ most important features can already be sufficiently insightful and yields the potential to leverage algorithmic opportunities connected to the field of multi-armed bandits. We propose Comparable Marginal Contributions Sampling (CMCS), a method for the top-$k$ identification problem utilizing a new sampling scheme taking advantage of correlated observations. We conduct experiments to showcase the efficacy of our method in compared to competitive baselines. Our empirical findings reveal that estimation quality for the approximate-all problem does not necessarily transfer to top-$k$ identification and vice versa.

replace Balancing Robustness and Efficiency in Embedded DNNs Through Activation Function Selection

Authors: Jon Guti\'errez-Zaballa, Koldo Basterretxea, Javier Echanobe

Abstract: Machine learning-based embedded systems for safety-critical applications, such as aerospace and autonomous driving, must be robust to perturbations caused by soft errors. As transistor geometries shrink and voltages decrease, modern electronic devices become more susceptible to background radiation, increasing the concern about failures produced by soft errors. The resilience of deep neural networks (DNNs) to these errors depends not only on target device technology but also on model structure and the numerical representation and arithmetic precision of their parameters. Compression techniques like pruning and quantization, used to reduce memory footprint and computational complexity, alter both model structure and representation, affecting soft error robustness. In this regard, although often overlooked, the choice of activation functions (AFs) impacts not only accuracy and trainability but also compressibility and error resilience. This paper explores the use of bounded AFs to enhance robustness against parameter perturbations, while evaluating their effects on model accuracy, compressibility, and computational load with a technology-agnostic approach. We focus on encoder-decoder convolutional models developed for semantic segmentation of hyperspectral images with application to autonomous driving systems. Experiments are conducted on an AMD-Xilinx's KV260 SoM.

replace Feature Selection and Junta Testing are Statistically Equivalent

Authors: Lorenzo Beretta, Nathaniel Harms, Caleb Koch

Abstract: For a function $f \colon \{0,1\}^n \to \{0,1\}$, the junta testing problem asks whether $f$ depends on only $k$ variables. If $f$ depends on only $k$ variables, the feature selection problem asks to find those variables. We prove that these two tasks are statistically equivalent. Specifically, we show that the ``brute-force'' algorithm, which checks for any set of $k$ variables consistent with the sample, is simultaneously sample-optimal for both problems, and the optimal sample size is \[ \Theta\left(\frac 1 \varepsilon \left( \sqrt{2^k \log {n \choose k}} + \log {n \choose k}\right)\right). \]

replace Manifold Learning with Normalizing Flows: Towards Regularity, Expressivity and Iso-Riemannian Geometry

Authors: Willem Diepeveen, Deanna Needell

Abstract: Modern machine learning increasingly leverages the insight that high-dimensional data often lie near low-dimensional, non-linear manifolds, an idea known as the manifold hypothesis. By explicitly modeling the geometric structure of data through learning Riemannian geometry algorithms can achieve improved performance and interpretability in tasks like clustering, dimensionality reduction, and interpolation. In particular, learned pullback geometry has recently undergone transformative developments that now make it scalable to learn and scalable to evaluate, which further opens the door for principled non-linear data analysis and interpretable machine learning. However, there are still steps to be taken when considering real-world multi-modal data. This work focuses on addressing distortions and modeling errors that can arise in the multi-modal setting and proposes to alleviate both challenges through isometrizing the learned Riemannian structure and balancing regularity and expressivity of the diffeomorphism parametrization. We showcase the effectiveness of the synergy of the proposed approaches in several numerical experiments with both synthetic and real data.

replace Ownership Verification of DNN Models Using White-Box Adversarial Attacks with Specified Probability Manipulation

Authors: Teruki Sano, Minoru Kuribayashi, Masao Sakai, Shuji Ishobe, Eisuke Koizumi

Abstract: In this paper, we propose a novel framework for ownership verification of deep neural network (DNN) models for image classification tasks. It allows verification of model identity by both the rightful owner and third party without presenting the original model. We assume a gray-box scenario where an unauthorized user owns a model that is illegally copied from the original model, provides services in a cloud environment, and the user throws images and receives the classification results as a probability distribution of output classes. The framework applies a white-box adversarial attack to align the output probability of a specific class to a designated value. Due to the knowledge of original model, it enables the owner to generate such adversarial examples. We propose a simple but effective adversarial attack method based on the iterative Fast Gradient Sign Method (FGSM) by introducing control parameters. Experimental results confirm the effectiveness of the identification of DNN models using adversarial attack.

replace Note on Follow-the-Perturbed-Leader in Combinatorial Semi-Bandit Problems

Authors: Botao Chen, Junya Honda

Abstract: This paper studies the optimality and complexity of Follow-the-Perturbed-Leader (FTPL) policy in size-invariant combinatorial semi-bandit problems. Recently, Honda et al. (2023) and Lee et al. (2024) showed that FTPL achieves Best-of-Both-Worlds (BOBW) optimality in standard multi-armed bandit problems with Fr\'{e}chet-type distributions. However, the optimality of FTPL in combinatorial semi-bandit problems remains unclear. In this paper, we consider the regret bound of FTPL with geometric resampling (GR) in size-invariant semi-bandit setting, showing that FTPL respectively achieves $O\left(\sqrt{m^2 d^\frac{1}{\alpha}T}+\sqrt{mdT}\right)$ regret with Fr\'{e}chet distributions, and the best possible regret bound of $O\left(\sqrt{mdT}\right)$ with Pareto distributions in adversarial setting. Furthermore, we extend the conditional geometric resampling (CGR) to size-invariant semi-bandit setting, which reduces the computational complexity from $O(d^2)$ of original GR to $O\left(md\left(\log(d/m)+1\right)\right)$ without sacrificing the regret performance of FTPL.

replace Learning Causally Predictable Outcomes from Psychiatric Longitudinal Data

Authors: Eric V. Strobl

Abstract: Causal inference in longitudinal biomedical data remains a central challenge, especially in psychiatry, where symptom heterogeneity and latent confounding frequently undermine classical estimators. Most existing methods for treatment effect estimation presuppose a fixed outcome variable and address confounding through observed covariate adjustment. However, the assumption of unconfoundedness may not hold for a fixed outcome in practice. To address this foundational limitation, we directly optimize the outcome definition to maximize causal identifiability. Our DEBIAS (Durable Effects with Backdoor-Invariant Aggregated Symptoms) algorithm learns non-negative, clinically interpretable weights for outcome aggregation, maximizing durable treatment effects and empirically minimizing both observed and latent confounding by leveraging the time-limited direct effects of prior treatments in psychiatric longitudinal data. The algorithm also furnishes an empirically verifiable test for outcome unconfoundedness. DEBIAS consistently outperforms state-of-the-art methods in recovering causal effects for clinically interpretable composite outcomes across comprehensive experiments in depression and schizophrenia.

replace Towards a deeper GCN: Alleviate over-smoothing with iterative training and fine-tuning

Authors: Furong Peng, Jinzhen Gao, Xuan Lu, Kang Liu, Yifan Huo, Sheng Wang

Abstract: Graph Convolutional Networks (GCNs) suffer from severe performance degradation in deep architectures due to over-smoothing. While existing studies primarily attribute the over-smoothing to repeated applications of graph Laplacian operators, our empirical analysis reveals a critical yet overlooked factor: trainable linear transformations in GCNs significantly exacerbate feature collapse, even at moderate depths (e.g., 8 layers). In contrast, Simplified Graph Convolution (SGC), which removes these transformations, maintains stable feature diversity up to 32 layers, highlighting linear transformations' dual role in facilitating expressive power and inducing over-smoothing. However, completely removing linear transformations weakens the model's expressive capacity. To address this trade-off, we propose Layer-wise Gradual Training (LGT), a novel training strategy that progressively builds deep GCNs while preserving their expressiveness. LGT integrates three complementary components: (1) layer-wise training to stabilize optimization from shallow to deep layers, (2) low-rank adaptation to fine-tune shallow layers and accelerate training, and (3) identity initialization to ensure smooth integration of new layers and accelerate convergence. Extensive experiments on benchmark datasets demonstrate that LGT achieves state-of-the-art performance on vanilla GCN, significantly improving accuracy even in 32-layer settings. Moreover, as a training method, LGT can be seamlessly combined with existing methods such as PairNorm and ContraNorm, further enhancing their performance in deeper networks. LGT offers a general, architecture-agnostic training framework for scalable deep GCNs. The code is available at [https://github.com/jfklasdfj/LGT_GCN].

URLs: https://github.com/jfklasdfj/LGT_GCN].

replace Neural Approaches for Multi-Objective Routing on Multigraphs

Authors: Filip Rydin, Attila Lischka, Jiaming Wu, Morteza Haghir Chehreghani, Bal\'azs Kulcs\'ar

Abstract: Learning-based methods for routing have gained significant attention in recent years, both in single-objective and multi-objective contexts. Yet, existing methods are unsuitable for routing on multigraphs, which feature multiple edges with distinct attributes between node pairs, despite their strong relevance in real-world scenarios. In this paper, we propose two graph neural network-based methods to address multi-objective routing on multigraphs. Our first approach operates directly on the multigraph by autoregressively selecting edges until a tour is completed. The second model first simplifies the multigraph via a learned pruning strategy and then performs routing on the resulting simple graph. We evaluate both models empirically and demonstrate their strong performance across a range of problems and distributions.

replace FedWSQ: Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization

Authors: Seung-Wook Kim, Seongyeol Kim, Jiah Kim, Seowon Ji, Se-Ho Lee

Abstract: Federated learning (FL) often suffers from performance degradation due to key challenges such as data heterogeneity and communication constraints. To address these limitations, we present a novel FL framework called FedWSQ, which integrates weight standardization (WS) and the proposed distribution-aware non-uniform quantization (DANUQ). WS enhances FL performance by filtering out biased components in local updates during training, thereby improving the robustness of the model against data heterogeneity and unstable client participation. In addition, DANUQ minimizes quantization errors by leveraging the statistical properties of local model updates. As a result, FedWSQ significantly reduces communication overhead while maintaining superior model accuracy. Extensive experiments on FL benchmark datasets demonstrate that FedWSQ consistently outperforms existing FL methods across various challenging FL settings, including extreme data heterogeneity and ultra-low-bit communication scenarios.

replace Harnessing Near-Infrared Spectroscopy and Machine Learning for Traceable Classification of Hanwoo and Holstein Beef

Authors: AMM Nurul Alam, Abdul Samad, AMM Shamsul Alam, Jahan Ara Monti, Ayesha Muazzam

Abstract: This study evaluates the use of Near-Infrared spectroscopy (NIRS) combined with advanced machine learning (ML) techniques to differentiate Hanwoo beef (HNB) and Holstein beef (HLB) to address food authenticity, mislabeling, and adulteration. Rapid and non-invasive spectral data were attained by a portable NIRS, recording absorbance data within the wavelength range of 700 to 1100 nm. A total of 40 Longissimus lumborum samples, evenly split between HNB and HLB, were obtained from a local hypermarket. Data analysis using Principal Component Analysis (PCA) demonstrated distinct spectral patterns associated with chemical changes, clearly separating the two beef varieties and accounting for 93.72% of the total variance. ML models, including Linear Discriminant Analysis (LDA), Support Vector Machine (SVM), Logistic Regression (LR), Random Forest, Gradient Boosting (GB), K-Nearest Neighbors, Decision Tree (DT), Naive Bayes (NB), and Neural Networks (NN), were implemented, optimized through hyperparameter tuning, and validated by 5-fold cross-validation techniques to enhance model robustness and prevent overfitting. Random Forest provided the highest predictive accuracy with a Receiver Operating Characteristic (ROC) Area Under the Curve (AUC) of 0.8826, closely followed by the SVM model at 0.8747. Furthermore, GB and NN algorithms exhibited satisfactory performances, with cross-validation scores of 0.752. Notably, the NN model achieved the highest recall rate of 0.7804, highlighting its suitability in scenarios requiring heightened sensitivity. DT and NB exhibited comparatively lower predictive performance. The LR and SVM models emerged as optimal choices by effectively balancing high accuracy, precision, and recall. This study confirms that integrating NIRS with ML techniques offers a powerful and reliable method for meat authenticity, significantly contributing to detecting food fraud.

replace Beyond the ATE: Interpretable Modelling of Treatment Effects over Dose and Time

Authors: Julianna Piskorz, Krzysztof Kacprzyk, Harry Amad, Mihaela van der Schaar

Abstract: The Average Treatment Effect (ATE) is a foundational metric in causal inference, widely used to assess intervention efficacy in randomized controlled trials (RCTs). However, in many applications -- particularly in healthcare -- this static summary fails to capture the nuanced dynamics of treatment effects that vary with both dose and time. We propose a framework for modelling treatment effect trajectories as smooth surfaces over dose and time, enabling the extraction of clinically actionable insights such as onset time, peak effect, and duration of benefit. To ensure interpretability, robustness, and verifiability -- key requirements in high-stakes domains -- we adapt SemanticODE, a recent framework for interpretable trajectory modelling, to the causal setting where treatment effects are never directly observed. Our approach decouples the estimation of trajectory shape from the specification of clinically relevant properties (e.g., maxima, inflection points), supporting domain-informed priors, post-hoc editing, and transparent analysis. We show that our method yields accurate, interpretable, and editable models of treatment dynamics, facilitating both rigorous causal analysis and practical decision-making.

replace OPC: One-Point-Contraction Unlearning Toward Deep Feature Forgetting

Authors: Jaeheun Jung, Bosung Jung, Suhyun Bae, Donghun Lee

Abstract: Machine unlearning seeks to remove the influence of particular data or class from trained models to meet privacy, legal, or ethical requirements. Existing unlearning methods tend to forget shallowly: phenomenon of an unlearned model pretend to forget by adjusting only the model response, while its internal representations retain information sufficiently to restore the forgotten data or behavior. We empirically confirm the widespread shallowness by reverting the forgetting effect of various unlearning methods via training-free performance recovery attack and gradient-inversion-based data reconstruction attack. To address this vulnerability fundamentally, we define a theoretical criterion of ``deep forgetting'' based on one-point-contraction of feature representations of data to forget. We also propose an efficient approximation algorithm, and use it to construct a novel general-purpose unlearning algorithm: One-Point-Contraction (OPC). Empirical evaluations on image classification unlearning benchmarks show that OPC achieves not only effective unlearning performance but also superior resilience against both performance recovery attack and gradient-inversion attack. The distinctive unlearning performance of OPC arises from the deep feature forgetting enforced by its theoretical foundation, and recaps the need for improved robustness of machine unlearning methods.

replace Pre-Training LLMs on a budget: A comparison of three optimizers

Authors: Joel Schlotthauer, Christian Kroos, Chris Hinze, Viktor Hangya, Luzian Hahn, Fabian K\"uch

Abstract: Optimizers play a decisive role in reducing pre-training times for LLMs and achieving better-performing models. In this study, we compare three major variants: the de-facto standard AdamW, the simpler Lion, developed through an evolutionary search, and the second-order optimizer Sophia. For better generalization, we train with two different base architectures and use a single- and a multiple-epoch approach while keeping the number of tokens constant. Using the Maximal Update Parametrization and smaller proxy models, we tune relevant hyperparameters separately for each combination of base architecture and optimizer. We found that while the results from all three optimizers were in approximately the same range, Sophia exhibited the lowest training and validation loss, Lion was fastest in terms of training GPU hours but AdamW led to the best downstream evaluation results.

replace Leveraging Distribution Matching to Make Approximate Machine Unlearning Faster

Authors: Junaid Iqbal Khan

Abstract: Approximate machine unlearning (AMU) enables models to `forget' specific training data through specialized fine-tuning on a retained dataset subset. However, processing this retained subset still dominates computational runtime, while reductions of epochs also remain a challenge. We propose two complementary methods to accelerate classification-oriented AMU. First, \textbf{Blend}, a novel distribution-matching dataset condensation (DC), merges visually similar images with shared blend-weights to significantly reduce the retained set size. It operates with minimal pre-processing overhead and is orders of magnitude faster than state-of-the-art DC methods. Second, our loss-centric method, \textbf{Accelerated-AMU (A-AMU)}, augments the unlearning objective to quicken convergence. A-AMU achieves this by combining a steepened primary loss to expedite forgetting with a novel, differentiable regularizer that matches the loss distributions of forgotten and in-distribution unseen data. Our extensive experiments demonstrate that this dual approach of data and loss-centric optimization dramatically reduces end-to-end unlearning latency across both single and multi-round scenarios, all while preserving model utility and privacy. To our knowledge, this is the first work to systematically tackle unlearning efficiency by jointly designing a specialized dataset condensation technique with a dedicated accelerated loss function. Code is available at https://github.com/algebraicdianuj/DC_Unlearning.

URLs: https://github.com/algebraicdianuj/DC_Unlearning.

replace Rethinking Inductive Bias in Geographically Neural Network Weighted Regression

Authors: Zhenyuan Chen

Abstract: Inductive bias is a key factor in spatial regression models, determining how well a model can learn from limited data and capture spatial patterns. This work revisits the inductive biases in Geographically Neural Network Weighted Regression (GNNWR) and identifies limitations in current approaches for modeling spatial non-stationarity. While GNNWR extends traditional Geographically Weighted Regression by using neural networks to learn spatial weighting functions, existing implementations are often restricted by fixed distance-based schemes and limited inductive bias. We propose to generalize GNNWR by incorporating concepts from convolutional neural networks, recurrent neural networks, and transformers, introducing local receptive fields, sequential context, and self-attention into spatial regression. Through extensive benchmarking on synthetic spatial datasets with varying heterogeneity, noise, and sample sizes, we show that GNNWR outperforms classic methods in capturing nonlinear and complex spatial relationships. Our results also reveal that model performance depends strongly on data characteristics, with local models excelling in highly heterogeneous or small-sample scenarios, and global models performing better with larger, more homogeneous data. These findings highlight the importance of inductive bias in spatial modeling and suggest future directions, including learnable spatial weighting functions, hybrid neural architectures, and improved interpretability for models handling non-stationary spatial data.

replace T-GRAB: A Synthetic Diagnostic Benchmark for Learning on Temporal Graphs

Authors: Alireza Dizaji, Benedict Aaron Tjandra, Mehrab Hamidi, Shenyang Huang, Guillaume Rabusseau

Abstract: Dynamic graph learning methods have recently emerged as powerful tools for modelling relational data evolving through time. However, despite extensive benchmarking efforts, it remains unclear whether current Temporal Graph Neural Networks (TGNNs) effectively capture core temporal patterns such as periodicity, cause-and-effect, and long-range dependencies. In this work, we introduce the Temporal Graph Reasoning Benchmark (T-GRAB), a comprehensive set of synthetic tasks designed to systematically probe the capabilities of TGNNs to reason across time. T-GRAB provides controlled, interpretable tasks that isolate key temporal skills: counting/memorizing periodic repetitions, inferring delayed causal effects, and capturing long-range dependencies over both spatial and temporal dimensions. We evaluate 11 temporal graph learning methods on these tasks, revealing fundamental shortcomings in their ability to generalize temporal patterns. Our findings offer actionable insights into the limitations of current models, highlight challenges hidden by traditional real-world benchmarks, and motivate the development of architectures with stronger temporal reasoning abilities. The code for T-GRAB can be found at: https://github.com/alirezadizaji/T-GRAB.

URLs: https://github.com/alirezadizaji/T-GRAB.

replace Physical models realizing the transformer architecture of large language models

Authors: Zeqian Chen

Abstract: The introduction of the transformer architecture in 2017 marked the most striking advancement in natural language processing. The transformer is a model architecture relying entirely on an attention mechanism to draw global dependencies between input and output. However, we believe there is a gap in our theoretical understanding of what the transformer is, and how it works physically. From a physical perspective on modern chips, such as those chips under 28nm, modern intelligent machines should be regarded as open quantum systems beyond conventional statistical systems. Thereby, in this paper, we construct physical models realizing large language models based on a transformer architecture as open quantum systems in the Fock space over the Hilbert space of tokens. Our physical models underlie the transformer architecture for large language models.

replace Tri-Learn Graph Fusion Network for Attributed Graph Clustering

Authors: Binxiong Li, Xu Xiang, Xue Li, Binyu Zhao, Heyang Gao, Qinyu Zhao

Abstract: In recent years, models based on Graph Convolutional Networks (GCN) have made significant strides in the field of graph data analysis. However, challenges such as over-smoothing and over-compression remain when handling large-scale and complex graph datasets, leading to a decline in clustering quality. Although the Graph Transformer architecture has mitigated some of these issues, its performance is still limited when processing heterogeneous graph data. To address these challenges, this study proposes a novel deep clustering framework that comprising GCN, Autoencoder (AE), and Graph Transformer, termed the Tri-Learn Graph Fusion Network (Tri-GFN). This framework enhances the differentiation and consistency of global and local information through a unique tri-learning mechanism and feature fusion enhancement strategy. The framework integrates GCN, AE, and Graph Transformer modules. These components are meticulously fused by a triple-channel enhancement module, which maximizes the use of both node attributes and topological structures, ensuring robust clustering representation. The tri-learning mechanism allows mutual learning among these modules, while the feature fusion strategy enables the model to capture complex relationships, yielding highly discriminative representations for graph clustering. It surpasses many state-of-the-art methods, achieving an accuracy improvement of approximately 0.87% on the ACM dataset, 14.14 % on the Reuters dataset, and 7.58 % on the USPS dataset. Due to its outstanding performance on the Reuters dataset, Tri-GFN can be applied to automatic news classification, topic retrieval, and related fields.

replace MolPIF: A Parameter Interpolation Flow Model for Molecule Generation

Authors: Yaowei Jin, Junjie Wang, Wenkai Xiang, Duanhua Cao, Dan Teng, Zhehuan Fan, Jiacheng Xiong, Xia Sheng, Chuanlong Zeng, Duo An, Mingyue Zheng, Shuangjia Zheng, Qian Shi

Abstract: Advances in deep learning for molecular generation show promise in accelerating drug discovery. Bayesian Flow Networks (BFNs) have recently shown impressive performance across diverse chemical tasks, with their success often ascribed to the paradigm of modeling in a low-variance parameter space. However, the Bayesian inference-based strategy imposes limitations on designing more flexible distribution transformation pathways, making it challenging to adapt to diverse data distributions and varied task requirements. Furthermore, the potential for simpler, more efficient parameter-space-based models is unexplored. To address this, we propose a novel Parameter Interpolation Flow model (named PIF) with detailed theoretical foundation, training, and inference procedures. We then develop MolPIF for structure-based drug design, demonstrating its superior performance across diverse metrics compared to baselines. This work validates the effectiveness of parameter-space-based generative modeling paradigm for molecules and offers new perspectives for model design.

replace IPPRO: Importance-based Pruning with PRojective Offset for Magnitude-indifferent Structural Pruning

Authors: Jaeheun Jung, Jaehyuk Lee, Yeajin Lee, Donghun Lee

Abstract: With the growth of demand on neural network compression methods, the structured pruning methods including importance-based approach are actively studied. The magnitude importance and many correlated modern importance criteria often limit the capacity of pruning decision, since the filters with larger magnitudes are not likely to be pruned if the smaller one didn't, even if it is redundant. In this paper, we propose a novel pruning strategy to challenge this dominating effect of magnitude and provide fair chance to each filter to be pruned, by placing it on projective space. After that, we observe the gradient descent movement whether the filters move toward the origin or not, to measure how the filter is likely to be pruned. This measurement is used to construct PROscore, a novel importance score for IPPRO, a novel importance-based structured pruning with magnitude-indifference. Our evaluation results shows that the proposed importance criteria using the projective space achieves near-lossless pruning by reducing the performance drop in pruning, with promising performance after the finetuning. Our work debunks the ``size-matters'' myth in pruning and expands the frontier of importance-based pruning both theoretically and empirically.

replace Feature Construction Using Network Control Theory and Rank Encoding for Graph Machine Learning

Authors: Anwar Said, Yifan Wei, Obaid Ullah Ahmad, Mudassir Shabbir, Waseem Abbas, Xenofon Koutsoukos

Abstract: In this article, we utilize the concept of average controllability in graphs, along with a novel rank encoding method, to enhance the performance of Graph Neural Networks (GNNs) in social network classification tasks. GNNs have proven highly effective in various network-based learning applications and require some form of node features to function. However, their performance is heavily influenced by the expressiveness of these features. In social networks, node features are often unavailable due to privacy constraints or the absence of inherent attributes, making it challenging for GNNs to achieve optimal performance. To address this limitation, we propose two strategies for constructing expressive node features. First, we introduce average controllability along with other centrality metrics (denoted as NCT-EFA) as node-level metrics that capture critical aspects of network topology. Building on this, we develop a rank encoding method that transforms average controllability or any other graph-theoretic metric into a fixed-dimensional feature space, thereby improving feature representation. We conduct extensive numerical evaluations using six benchmark GNN models across four social network datasets to compare different node feature construction methods. Our results demonstrate that incorporating average controllability into the feature space significantly improves GNN performance. Moreover, the proposed rank encoding method outperforms traditional one-hot degree encoding, improving the ROC AUC from 68.7% to 73.9% using GraphSAGE on the GitHub Stargazers dataset, underscoring its effectiveness in generating expressive and efficient node representations.

replace FedMultiEmo: Real-Time Emotion Recognition via Multimodal Federated Learning

Authors: Baran Can G\"ul, Suraksha Nadig, Stefanos Tziampazis, Nasser Jazdi, Michael Weyrich

Abstract: In-vehicle emotion recognition underpins adaptive driver-assistance systems and, ultimately, occupant safety. However, practical deployment is hindered by (i) modality fragility - poor lighting and occlusions degrade vision-based methods; (ii) physiological variability - heart-rate and skin-conductance patterns differ across individuals; and (iii) privacy risk - centralized training requires transmission of sensitive data. To address these challenges, we present FedMultiEmo, a privacy-preserving framework that fuses two complementary modalities at the decision level: visual features extracted by a Convolutional Neural Network from facial images, and physiological cues (heart rate, electrodermal activity, and skin temperature) classified by a Random Forest. FedMultiEmo builds on three key elements: (1) a multimodal federated learning pipeline with majority-vote fusion, (2) an end-to-end edge-to-cloud prototype on Raspberry Pi clients and a Flower server, and (3) a personalized Federated Averaging scheme that weights client updates by local data volume. Evaluated on FER2013 and a custom physiological dataset, the federated Convolutional Neural Network attains 77% accuracy, the Random Forest 74%, and their fusion 87%, matching a centralized baseline while keeping all raw data local. The developed system converges in 18 rounds, with an average round time of 120 seconds and a per-client memory footprint below 200 MB. These results indicate that FedMultiEmo offers a practical approach to real-time, privacy-aware emotion recognition in automotive settings.

replace GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding

Authors: Fei Tang, Zhangxuan Gu, Zhengxi Lu, Xuyang Liu, Shuheng Shen, Changhua Meng, Wen Wang, Wenqi Zhang, Yongliang Shen, Weiming Lu, Jun Xiao, Yueting Zhuang

Abstract: Graphical User Interface (GUI) grounding maps natural language instructions to precise interface locations for autonomous interaction. Current reinforcement learning approaches use binary rewards that treat elements as hit-or-miss targets, creating sparse signals that ignore the continuous nature of spatial interactions. Motivated by human clicking behavior that naturally forms Gaussian distributions centered on target elements, we introduce GUI Gaussian Grounding Rewards (GUI-G$^2$), a principled reward framework that models GUI elements as continuous Gaussian distributions across the interface plane. GUI-G$^2$ incorporates two synergistic mechanisms: Gaussian point rewards model precise localization through exponentially decaying distributions centered on element centroids, while coverage rewards assess spatial alignment by measuring the overlap between predicted Gaussian distributions and target regions. To handle diverse element scales, we develop an adaptive variance mechanism that calibrates reward distributions based on element dimensions. This framework transforms GUI grounding from sparse binary classification to dense continuous optimization, where Gaussian distributions generate rich gradient signals that guide models toward optimal interaction positions. Extensive experiments across ScreenSpot, ScreenSpot-v2, and ScreenSpot-Pro benchmarks demonstrate that GUI-G$^2$, substantially outperforms state-of-the-art method UI-TARS-72B, with the most significant improvement of 24.7% on ScreenSpot-Pro. Our analysis reveals that continuous modeling provides superior robustness to interface variations and enhanced generalization to unseen layouts, establishing a new paradigm for spatial reasoning in GUI interaction tasks.

replace-cross Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment

Authors: Annie S. Chen, Govind Chada, Laura Smith, Archit Sharma, Zipeng Fu, Sergey Levine, Chelsea Finn

Abstract: To succeed in the real world, robots must cope with situations that differ from those seen during training. We study the problem of adapting on-the-fly to such novel scenarios during deployment, by drawing upon a diverse repertoire of previouslylearned behaviors. Our approach, RObust Autonomous Modulation (ROAM), introduces a mechanism based on the perceived value of pre-trained behaviors to select and adapt pre-trained behaviors to the situation at hand. Crucially, this adaptation process all happens within a single episode at test time, without any human supervision. We demonstrate that ROAM enables a robot to adapt rapidly to changes in dynamics both in simulation and on a real Go1 quadruped, even successfully moving forward with roller skates on its feet. Our approach adapts over 2x as efficiently compared to existing methods when facing a variety of out-of-distribution situations during deployment by effectively choosing and adapting relevant behaviors on-the-fly.

replace-cross Risks of AI Scientists: Prioritizing Safeguarding Over Autonomy

Authors: Xiangru Tang, Qiao Jin, Kunlun Zhu, Tongxin Yuan, Yichi Zhang, Wangchunshu Zhou, Meng Qu, Yilun Zhao, Jian Tang, Zhuosheng Zhang, Arman Cohan, Zhiyong Lu, Mark Gerstein

Abstract: AI scientists powered by large language models have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines. While their capabilities are promising, these agents also introduce novel vulnerabilities that require careful consideration for safety. However, there has been limited comprehensive exploration of these vulnerabilities. This perspective examines vulnerabilities in AI scientists, shedding light on potential risks associated with their misuse, and emphasizing the need for safety measures. We begin by providing an overview of the potential risks inherent to AI scientists, taking into account user intent, the specific scientific domain, and their potential impact on the external environment. Then, we explore the underlying causes of these vulnerabilities and provide a scoping review of the limited existing works. Based on our analysis, we propose a triadic framework involving human regulation, agent alignment, and an understanding of environmental feedback (agent regulation) to mitigate these identified risks. Furthermore, we highlight the limitations and challenges associated with safeguarding AI scientists and advocate for the development of improved models, robust benchmarks, and comprehensive regulations.

replace-cross Randomization Can Reduce Both Bias and Variance: A Case Study in Random Forests

Authors: Brian Liu, Rahul Mazumder

Abstract: We study the often overlooked phenomenon, first noted in \cite{breiman2001random}, that random forests appear to reduce bias compared to bagging. Motivated by an interesting paper by \cite{mentch2020randomization}, where the authors explain the success of random forests in low signal-to-noise ratio (SNR) settings through regularization, we explore how random forests can capture patterns in the data that bagging ensembles fail to capture. We empirically demonstrate that in the presence of such patterns, random forests reduce bias along with variance and can increasingly outperform bagging ensembles when SNR is high. Our observations offer insights into the real-world success of random forests across a range of SNRs and enhance our understanding of the difference between random forests and bagging ensembles. Our investigations also yield practical insights into the importance of tuning $mtry$ in random forests.

replace-cross Analysis of the 2024 BraTS Meningioma Radiotherapy Planning Automated Segmentation Challenge

Authors: Dominic LaBella, Valeriia Abramova, Mehdi Astaraki, Andre Ferreira, Zhifan Jiang, Mason C. Cleveland, Ramandeep Kang, Uma M. Lal-Trehan Estrada, Cansu Yalcin, Rachika E. Hamadache, Clara Lisazo, Adri\`a Casamitjana, Joaquim Salvi, Arnau Oliver, Xavier Llad\'o, Iuliana Toma-Dasu, Tiago Jesus, Behrus Puladi, Jens Kleesiek, Victor Alves, Jan Egger, Daniel Capell\'an-Mart\'in, Abhijeet Parida, Austin Tapp, Xinyang Liu, Maria J. Ledesma-Carbayo, Jay B. Patel, Thomas N. McNeal, Maya Viera, Owen McCall, Albert E. Kim, Elizabeth R. Gerstner, Christopher P. Bridge, Katherine Schumacher, Michael Mix, Kevin Leu, Shan McBurney-Lin, Pierre Nedelec, Javier Villanueva-Meyer, David R. Raleigh, Jonathan Shapey, Tom Vercauteren, Kazumi Chia, Marina Ivory, Theodore Barfoot, Omar Al-Salihi, Justin Leu, Lia M. Halasz, Yuri S. Velichko, Chunhao Wang, John P. Kirkpatrick, Scott R. Floyd, Zachary J. Reitman, Trey C. Mullikin, Eugene J. Vaios, Christina Huang, Ulas Bagci, Sean Sachdev, Jona A. Hattangadi-Gluth, Tyler M. Seibert, Nikdokht Farid, Connor Puett, Matthew W. Pease, Kevin Shiue, Syed Muhammad Anwar, Shahriar Faghani, Peter Taylor, Pranav Warman, Jake Albrecht, Andr\'as Jakab, Mana Moassefi, Verena Chung, Rong Chai, Alejandro Aristizabal, Alexandros Karargyris, Hasan Kassem, Sarthak Pati, Micah Sheller, Nazanin Maleki, Rachit Saluja, Florian Kofler, Christopher G. Schwarz, Philipp Lohmann, Phillipp Vollmuth, Louis Gagnon, Maruf Adewole, Hongwei Bran Li, Anahita Fathi Kazerooni, Nourel Hoda Tahon, Udunna Anazodo, Ahmed W. Moawad, Bjoern Menze, Marius George Linguraru, Mariam Aboian, Benedikt Wiestler, Ujjwal Baid, Gian-Marco Conte, Andreas M. Rauschecker, Ayman Nada, Aly H. Abayazeed, Raymond Huang, Maria Correia de Verdier, Jeffrey D. Rudie, Spyridon Bakas, Evan Calabrese

Abstract: The 2024 Brain Tumor Segmentation Meningioma Radiotherapy (BraTS-MEN-RT) challenge aimed to advance automated segmentation algorithms using the largest known multi-institutional dataset of 750 radiotherapy planning brain MRIs with expert-annotated target labels for patients with intact or postoperative meningioma that underwent either conventional external beam radiotherapy or stereotactic radiosurgery. Each case included a defaced 3D post-contrast T1-weighted radiotherapy planning MRI in its native acquisition space, accompanied by a single-label "target volume" representing the gross tumor volume (GTV) and any at-risk post-operative site. Target volume annotations adhered to established radiotherapy planning protocols, ensuring consistency across cases and institutions, and were approved by expert neuroradiologists and radiation oncologists. Six participating teams developed, containerized, and evaluated automated segmentation models using this comprehensive dataset. Team rankings were assessed using a modified lesion-wise Dice Similarity Coefficient (DSC) and 95% Hausdorff Distance (95HD). The best reported average lesion-wise DSC and 95HD was 0.815 and 26.92 mm, respectively. BraTS-MEN-RT is expected to significantly advance automated radiotherapy planning by enabling precise tumor segmentation and facilitating tailored treatment, ultimately improving patient outcomes. We describe the design and results from the BraTS-MEN-RT challenge.

replace-cross Network Analytics for Anti-Money Laundering -- A Systematic Literature Review and Experimental Evaluation

Authors: Bruno Deprez, Toon Vanderschueren, Bart Baesens, Tim Verdonck, Wouter Verbeke

Abstract: Money laundering presents a pervasive challenge, burdening society by financing illegal activities. The use of network information is increasingly being explored to effectively combat money laundering, given it involves connected parties. This led to a surge in research on network analytics for anti-money laundering (AML). The literature is, however, fragmented and a comprehensive overview of existing work is missing. This results in limited understanding of the methods to apply and their comparative detection power. This paper presents an extensive and unique literature review, based on 97 papers from Web of Science and Scopus, resulting in a taxonomy following a recently proposed fraud analytics framework. We conclude that most research relies on expert-based rules and manual features, while deep learning methods have been gaining traction. This paper also presents a comprehensive framework to evaluate and compare the performance of prominent methods in a standardized setup. We compare manual feature engineering, random walk-based, and deep learning methods on two publicly available data sets. We conclude that (1) network analytics increases the predictive power, but caution is needed when applying GNNs in the face of class imbalance and network topology, and that (2) care should be taken with synthetic data as this can give overly optimistic results. The open-source implementation facilitates researchers and practitioners to extend this work on proprietary data, promoting a standardised approach for the analysis and evaluation of network analytics for AML.

replace-cross Rethinking Data Input for Point Cloud Upsampling

Authors: Tongxu Zhang

Abstract: Point cloud upsampling is crucial for tasks like 3D reconstruction. While existing methods rely on patch-based inputs, and there is no research discussing the differences and principles between point cloud model full input and patch based input. Ergo, we propose a novel approach using whole model inputs i.e. Average Segment input. Our experiments on PU1K and ABC datasets reveal that patch-based inputs consistently outperform whole model inputs. To understand this, we will delve into factors in feature extraction, and network architecture that influence upsampling results.

replace-cross Risk and cross validation in ridge regression with correlated samples

Authors: Alexander Atanasov, Jacob A. Zavatone-Veth, Cengiz Pehlevan

Abstract: Recent years have seen substantial advances in our understanding of high-dimensional ridge regression, but existing theories assume that training examples are independent. By leveraging techniques from random matrix theory and free probability, we provide sharp asymptotics for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations. We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk. However, in the case where the noise residuals have the same correlations as the data points, one can modify the GCV to yield an efficiently-computable unbiased estimator that concentrates in the high-dimensional limit, which we dub CorrGCV. We further extend our asymptotic analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting. Assuming knowledge of the correlation structure of the time series, this again yields an extension of the GCV estimator, and sharply characterizes the degree to which such test points yield an overly optimistic prediction of long-time risk. We validate the predictions of our theory across a variety of high dimensional data.

replace-cross Investigation of unsupervised and supervised hyperspectral anomaly detection

Authors: Mazharul Hossain, Aaron Robinson, Lan Wang, Chrysanthe Preza

Abstract: Hyperspectral sensing is a valuable tool for detecting anomalies and distinguishing between materials in a scene. Hyperspectral anomaly detection (HS-AD) helps characterize the captured scenes and separates them into anomaly and background classes. It is vital in agriculture, environment, and military applications such as RSTA (reconnaissance, surveillance, and target acquisition) missions. We previously designed an equal voting ensemble of hyperspectral unmixing and three unsupervised HS-AD algorithms. We later utilized a supervised classifier to determine the weights of a voting ensemble, creating a hybrid of heterogeneous unsupervised HS-AD algorithms with a supervised classifier in a model stacking, which improved detection accuracy. However, supervised classification methods usually fail to detect novel or unknown patterns that substantially deviate from those seen previously. In this work, we evaluate our technique and other supervised and unsupervised methods using general hyperspectral data to provide new insights.

replace-cross A computational transition for detecting correlated stochastic block models by low-degree polynomials

Authors: Guanyi Chen, Jian Ding, Shuyang Gong, Zhangsong Li

Abstract: Detection of correlation in a pair of random graphs is a fundamental statistical and computational problem that has been extensively studied in recent years. In this work, we consider a pair of correlated (sparse) stochastic block models $\mathcal{S}(n,\tfrac{\lambda}{n};k,\epsilon;s)$ that are subsampled from a common parent stochastic block model $\mathcal S(n,\tfrac{\lambda}{n};k,\epsilon)$ with $k=O(1)$ symmetric communities, average degree $\lambda=O(1)$, divergence parameter $\epsilon$, and subsampling probability $s$. For the detection problem of distinguishing this model from a pair of independent Erd\H{o}s-R\'enyi graphs with the same edge density $\mathcal{G}(n,\tfrac{\lambda s}{n})$, we focus on tests based on \emph{low-degree polynomials} of the entries of the adjacency matrices, and we determine the threshold that separates the easy and hard regimes. More precisely, we show that this class of tests can distinguish these two models if and only if $s> \min \{ \sqrt{\alpha}, \frac{1}{\lambda \epsilon^2} \}$, where $\alpha\approx 0.338$ is the Otter's constant and $\frac{1}{\lambda \epsilon^2}$ is the Kesten-Stigum threshold. Combining a reduction argument in \cite{Li25+}, our hardness result also implies low-degree hardness for partial recovery and detection (to independent block models) when $s< \min \{ \sqrt{\alpha}, \frac{1}{\lambda \epsilon^2} \}$. Finally, our proof of low-degree hardness is based on a conditional variant of the low-degree likelihood calculation.

replace-cross Multilevel Picard approximations and deep neural networks with ReLU, leaky ReLU, and softplus activation overcome the curse of dimensionality when approximating semilinear parabolic partial differential equations in $L^p$-sense

Authors: Ariel Neufeld, Tuan Anh Nguyen

Abstract: We prove that multilevel Picard approximations and deep neural networks with ReLU, leaky ReLU, and softplus activation are capable of approximating solutions of semilinear Kolmogorov PDEs in $L^\mathfrak{p}$-sense, $\mathfrak{p}\in [2,\infty)$, in the case of gradient-independent, Lipschitz-continuous nonlinearities, while the computational effort of the multilevel Picard approximations and the required number of parameters in the neural networks grow at most polynomially in both dimension $d\in \mathbb{N}$ and reciprocal of the prescribed accuracy $\epsilon$.

replace-cross Erasing Conceptual Knowledge from Language Models

Authors: Rohit Gandikota, Sheridan Feucht, Samuel Marks, David Bau

Abstract: In this work, we introduce Erasure of Language Memory (ELM), a principled approach to concept-level unlearning that operates by matching distributions defined by the model's own introspective classification capabilities. Our key insight is that effective unlearning should leverage the model's ability to evaluate its own knowledge, using the language model itself as a classifier to identify and reduce the likelihood of generating content related to undesired concepts. ELM applies this framework to create targeted low-rank updates that reduce generation probabilities for concept-specific content while preserving the model's broader capabilities. We demonstrate ELM's efficacy on biosecurity, cybersecurity, and literary domain erasure tasks. Comparative evaluation reveals that ELM-modified models achieve near-random performance on assessments targeting erased concepts, while simultaneously preserving generation coherence, maintaining benchmark performance on unrelated tasks, and exhibiting strong robustness to adversarial attacks. Our code, data, and trained models are available at https://elm.baulab.info

URLs: https://elm.baulab.info

replace-cross Probing Ranking LLMs: A Mechanistic Analysis for Information Retrieval

Authors: Tanya Chowdhury, Atharva Nijasure, James Allan

Abstract: Transformer networks, particularly those achieving performance comparable to GPT models, are well known for their robust feature extraction abilities. However, the nature of these extracted features and their alignment with human-engineered ones remain unexplored. In this work, we investigate the internal mechanisms of state-of-the-art, fine-tuned LLMs for passage reranking. We employ a probing-based analysis to examine neuron activations in ranking LLMs, identifying the presence of known human-engineered and semantic features. Our study spans a broad range of feature categories, including lexical signals, document structure, query-document interactions, and complex semantic representations, to uncover underlying patterns influencing ranking decisions. Through experiments on four different ranking LLMs, we identify statistical IR features that are prominently encoded in LLM activations, as well as others that are notably missing. Furthermore, we analyze how these models respond to out-of-distribution queries and documents, revealing distinct generalization behaviors. By dissecting the latent representations within LLM activations, we aim to improve both the interpretability and effectiveness of ranking models. Our findings offer crucial insights for developing more transparent and reliable retrieval systems, and we release all necessary scripts and code to support further exploration.

replace-cross Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation

Authors: Shukang Yin, Chaoyou Fu, Sirui Zhao, Chunjiang Ge, Yan Yang, Yuhan Dai, Yongdong Luo, Tong Xu, Caifeng Shan, Enhong Chen

Abstract: Recent years have seen the success of Multimodal Large Language Models (MLLMs) in the domain of vision understanding. The success of these models can largely be attributed to the dominant scaling law, which states that larger parameter sizes and data volumes contribute to better performance. Notably, data scaling has been primarily driven by automatic data pipelines, which focus on the self-instruction of LLMs. The paradigm has been taken for granted for quite some time, but the study of the effectiveness of scaling with these data has been neglected for a long time. In this context, this work revisits scaling with synthetic data and focuses on developing video-LLMs from a data-centric perspective. Our primary study approach involves fine-tuning pre-trained image-LLMs with video data and examining learning efficiency through data scaling. Results from our preliminary experiments reveal a low learning efficiency phenomenon when simply scaling up video data samples, which, through our probing, can be ascribed to a lack of instruction diversity. Aiming at this issue, we propose a data augmentation method called Sparrow, which synthesizes video-like samples from pure text instruction data. Mixing these synthetic samples with the video data enables a more efficient training scheme. Through comprehensive experiments, we demonstrate that our proposed method achieves performance comparable to or even superior to that of baselines trained with significantly more samples. Meanwhile, we find that incorporating these synthetic samples can enhance the performance of long video understanding without requiring training on long video data. The code and data examples are available at https://github.com/VITA-MLLM/Sparrow.

URLs: https://github.com/VITA-MLLM/Sparrow.

replace-cross R-Bot: An LLM-based Query Rewrite System

Authors: Zhaoyan Sun, Xuanhe Zhou, Guoliang Li, Xiang Yu, Jianhua Feng, Yong Zhang

Abstract: Query rewrite is essential for optimizing SQL queries to improve their execution efficiency without changing their results. Traditionally, this task has been tackled through heuristic and learning-based methods, each with its limitations in terms of inferior quality and low robustness. Recent advancements in LLMs offer a new paradigm by leveraging their superior natural language and code comprehension abilities. Despite their potential, directly applying LLMs like GPT-4 has faced challenges due to problems such as hallucinations, where the model might generate inaccurate or irrelevant results. To address this, we propose R-Bot, an LLM-based query rewrite system with a systematic approach. We first design a multi-source rewrite evidence preparation pipeline to generate query rewrite evidences for guiding LLMs to avoid hallucinations. We then propose a hybrid structure-semantics retrieval method that combines structural and semantic analysis to retrieve the most relevant rewrite evidences for effectively answering an online query. We next propose a step-by-step LLM rewrite method that iteratively leverages the retrieved evidences to select and arrange rewrite rules with self-reflection. We conduct comprehensive experiments on real-world datasets and widely used benchmarks, and demonstrate the superior performance of our system, R-Bot, surpassing state-of-the-art query rewrite methods. The R-Bot system has been deployed at Huawei and with real customers, and the results show that the proposed R-Bot system achieves lower query latency.

replace-cross Progressive-Resolution Policy Distillation: Leveraging Coarse-Resolution Simulations for Time-Efficient Fine-Resolution Policy Learning

Authors: Yuki Kadokawa, Hirotaka Tahara, Takamitsu Matsubara

Abstract: In earthwork and construction, excavators often encounter large rocks mixed with various soil conditions, requiring skilled operators. This paper presents a framework for achieving autonomous excavation using reinforcement learning (RL) through a rock excavation simulator. In the simulation, resolution can be defined by the particle size/number in the whole soil space. Fine-resolution simulations closely mimic real-world behavior but demand significant calculation time and challenging sample collection, while coarse-resolution simulations enable faster sample collection but deviate from real-world behavior. To combine the advantages of both resolutions, we explore using policies developed in coarse-resolution simulations for pre-training in fine-resolution simulations. To this end, we propose a novel policy learning framework called Progressive-Resolution Policy Distillation (PRPD), which progressively transfers policies through some middle-resolution simulations with conservative policy transfer to avoid domain gaps that could lead to policy transfer failure. Validation in a rock excavation simulator and nine real-world rock environments demonstrated that PRPD reduced sampling time to less than 1/7 while maintaining task success rates comparable to those achieved through policy learning in a fine-resolution simulation.

replace-cross FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models

Authors: Vladimir Kulikov, Matan Kleiner, Inbar Huberman-Spiegelglas, Tomer Michaeli

Abstract: Editing real images using a pre-trained text-to-image (T2I) diffusion/flow model often involves inverting the image into its corresponding noise map. However, inversion by itself is typically insufficient for obtaining satisfactory results, and therefore many methods additionally intervene in the sampling process. Such methods achieve improved results but are not seamlessly transferable between model architectures. Here, we introduce FlowEdit, a text-based editing method for pre-trained T2I flow models, which is inversion-free, optimization-free and model agnostic. Our method constructs an ODE that directly maps between the source and target distributions (corresponding to the source and target text prompts) and achieves a lower transport cost than the inversion approach. This leads to state-of-the-art results, as we illustrate with Stable Diffusion 3 and FLUX. Code and examples are available on the project's webpage.

replace-cross RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment

Authors: Difei Gu, Yunhe Gao, Yang Zhou, Mu Zhou, Dimitris Metaxas

Abstract: Automated chest radiographs interpretation requires both accurate disease classification and detailed radiology report generation, presenting a significant challenge in the clinical workflow. Current approaches either focus on classification accuracy at the expense of interpretability or generate detailed but potentially unreliable reports through image captioning techniques. In this study, we present RadAlign, a novel framework that combines the predictive accuracy of vision-language models (VLMs) with the reasoning capabilities of large language models (LLMs). Inspired by the radiologist's workflow, RadAlign first employs a specialized VLM to align visual features with key medical concepts, achieving superior disease classification with an average AUC of 0.885 across multiple diseases. These recognized medical conditions, represented as text-based concepts in the aligned visual-language space, are then used to prompt LLM-based report generation. Enhanced by a retrieval-augmented generation mechanism that grounds outputs in similar historical cases, RadAlign delivers superior report quality with a GREEN score of 0.678, outperforming state-of-the-art methods' 0.634. Our framework maintains strong clinical interpretability while reducing hallucinations, advancing automated medical imaging and report analysis through integrated predictive and generative AI. Code is available at https://github.com/difeigu/RadAlign.

URLs: https://github.com/difeigu/RadAlign.

replace-cross InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers

Authors: Chenchen Shou, Guyue Liu, Hao Nie, Huaiyu Meng, Yu Zhou, Yimin Jiang, Wenqing Lv, Yelong Xu, Yuanwei Lu, Zhang Chen, Yanbo Yu, Yichen Shen, Yibo Zhu, Daxin Jiang

Abstract: Scaling Large Language Model (LLM) training relies on multi-dimensional parallelism, where High-Bandwidth Domains (HBDs) are critical for communication-intensive parallelism like Tensor Parallelism (TP) and Expert Parallelism (EP). However, existing HBD architectures face fundamental limitations in scalability, cost, and fault resiliency: switch-centric HBDs (e.g., NVL-72) incur prohibitive scaling costs, while GPU-centric HBDs (e.g., TPUv3/Dojo) suffer from severe fault propagation. Switch-GPU hybrid HBDs such as TPUv4 take a middle-ground approach, but the fault explosion radius remains large at the cube level (e.g., 64 TPUs). We propose InfiniteHBD, a novel transceiver-centric HBD architecture that unifies connectivity and dynamic switching at the transceiver level} using Optical Circuit Switching (OCS). By embedding OCS within each transceiver, InfiniteHBD achieves reconfigurable point-to-multipoint connectivity, allowing the topology to adapt to variable-size rings. This design provides: i) datacenter-wide scalability without cost explosion; ii) fault resilience by isolating failures to a single node, and iii) full bandwidth utilization for fault-free GPUs. Key innovations include a Silicon Photonic (SiPh)-based low-cost OCS transceiver (OCSTrx), a reconfigurable k-hop ring topology co-designed with intra-/inter-node communication, and an HBD-DCN orchestration algorithm maximizing GPU utilization while minimizing cross-ToR datacenter network traffic. The evaluation demonstrates that InfiniteHBD achieves 31% of the cost of NVL-72, near-zero GPU waste ratio (over one order of magnitude lower than NVL-72 and TPUv4), near-zero cross-ToR traffic when node fault ratios are under 7%, and improves Model FLOPs Utilization by 3.37x compared to NVIDIA DGX (8 GPUs per Node).

replace-cross Universal Model Routing for Efficient LLM Inference

Authors: Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Jeevesh Juneja, Congchao Wang, Zifeng Wang, Alec Go, Chen-Yu Lee, Pradeep Shenoy, Rina Panigrahy, Aditya Krishna Menon, Sanjiv Kumar

Abstract: Model routing is a simple technique for reducing the inference cost of large language models (LLMs), wherein one maintains a pool of candidate LLMs, and learns to route each prompt to the smallest feasible LLM. Existing works focus on learning a router for a fixed pool of LLMs. In this paper, we consider the problem of dynamic routing, where new, previously unobserved LLMs are available at test time. We propose UniRoute, a new approach to this problem that relies on representing each LLM as a feature vector, derived based on predictions on a set of representative prompts. Based on this, we detail two effective instantiations of UniRoute, relying on cluster-based routing and a learned cluster map respectively. We show that these are estimates of a theoretically optimal routing rule, and quantify their errors via an excess risk bound. Experiments on a range of public benchmarks show the effectiveness of UniRoute in routing amongst more than 30 unseen LLMs.

replace-cross Recent Advances in Malware Detection: Graph Learning and Explainability

Authors: Hossein Shokouhinejad, Roozbeh Razavi-Far, Hesamodin Mohammadian, Mahdi Rabbani, Samuel Ansong, Griffin Higgins, Ali A Ghorbani

Abstract: The rapid evolution of malware has necessitated the development of sophisticated detection methods that go beyond traditional signature-based approaches. Graph learning techniques have emerged as powerful tools for modeling and analyzing the complex relationships inherent in malware behavior, leveraging advancements in Graph Neural Networks (GNNs) and related methods. This survey provides a comprehensive exploration of recent advances in malware detection, focusing on the interplay between graph learning and explainability. It begins by reviewing malware analysis techniques and datasets, emphasizing their foundational role in understanding malware behavior and supporting detection strategies. The survey then discusses feature engineering, graph reduction, and graph embedding methods, highlighting their significance in transforming raw data into actionable insights, while ensuring scalability and efficiency. Furthermore, this survey focuses on explainability techniques and their applications in malware detection, ensuring transparency and trustworthiness. By integrating these components, this survey demonstrates how graph learning and explainability contribute to building robust, interpretable, and scalable malware detection systems. Future research directions are outlined to address existing challenges and unlock new opportunities in this critical area of cybersecurity.

replace-cross MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment

Authors: Tianze Wang, Dongnan Gui, Yifan Hu, Shuhang Lin, Linjun Zhang

Abstract: Reinforcement Learning from Human Feedback (RLHF) has shown promise in aligning large language models (LLMs). Yet its reliance on a singular reward model often overlooks the diversity of human preferences. Recent approaches address this limitation by leveraging multi-dimensional feedback to fine-tune corresponding reward models and train LLMs using reinforcement learning. However, the process is costly and unstable, especially given the competing and heterogeneous nature of human preferences. In this paper, we propose Mixing Preference Optimization (MPO), a post-processing framework for aggregating single-objective policies as an alternative to both multi-objective RLHF (MORLHF) and MaxMin-RLHF. MPO avoids alignment from scratch. Instead, it log-linearly combines existing policies into a unified one with the weight of each policy computed via a batch stochastic mirror descent. Empirical results demonstrate that MPO achieves balanced performance across diverse preferences, outperforming or matching existing models with significantly reduced computational costs.

replace-cross Stable and Accurate Orbital-Free DFT Powered by Machine Learning

Authors: Roman Remme, Tobias Kaczun, Tim Ebert, Christof A. Gehrig, Dominik Geng, Gerrit Gerhartz, Marc K. Ickler, Manuel V. Klockow, Peter Lippmann, Johannes S. Schmidt, Simon Wagner, Andreas Dreuw, Fred A. Hamprecht

Abstract: Hohenberg and Kohn have proven that the electronic energy and the one-particle electron density can, in principle, be obtained by minimizing an energy functional with respect to the density. While decades of theoretical work have produced increasingly faithful approximations to this elusive exact energy functional, their accuracy is still insufficient for many applications, making it reasonable to try and learn it empirically. Using rotationally equivariant atomistic machine learning, we obtain for the first time a density functional that, when applied to the organic molecules in QM9, yields energies with chemical accuracy relative to the Kohn-Sham reference while also converging to meaningful electron densities. Augmenting the training data with densities obtained from perturbed potentials proved key to these advances. This work demonstrates that machine learning can play a crucial role in narrowing the gap between theory and the practical realization of Hohenberg and Kohn's vision, paving the way for more efficient calculations in large molecular systems.

replace-cross Curating Demonstrations using Online Experience

Authors: Annie S. Chen, Alec M. Lessing, Yuejiang Liu, Chelsea Finn

Abstract: Many robot demonstration datasets contain heterogeneous demonstrations of varying quality. This heterogeneity may benefit policy pre-training, but can hinder robot performance when used with a final imitation learning objective. In particular, some strategies in the data may be less reliable than others or may be underrepresented in the data, leading to poor performance when such strategies are sampled at test time. Moreover, such unreliable or underrepresented strategies can be difficult even for people to discern, and sifting through demonstration datasets is time-consuming and costly. On the other hand, policy performance when trained on such demonstrations can reflect the reliability of different strategies. We thus propose for robots to self-curate based on online robot experience (Demo-SCORE). More specifically, we train and cross-validate a classifier to discern successful policy roll-outs from unsuccessful ones and use the classifier to filter heterogeneous demonstration datasets. Our experiments in simulation and the real world show that Demo-SCORE can effectively identify suboptimal demonstrations without manual curation. Notably, Demo-SCORE achieves over 15-35% higher absolute success rate in the resulting policy compared to the base policy trained with all original demonstrations.

replace-cross Balanced Image Stylization with Style Matching Score

Authors: Yuxin Jiang, Liming Jiang, Shuai Yang, Jia-Wei Liu, Ivor Tsang, Mike Zheng Shou

Abstract: We present Style Matching Score (SMS), a novel optimization method for image stylization with diffusion models. Balancing effective style transfer with content preservation is a long-standing challenge. Unlike existing efforts, our method reframes image stylization as a style distribution matching problem. The target style distribution is estimated from off-the-shelf style-dependent LoRAs via carefully designed score functions. To preserve content information adaptively, we propose Progressive Spectrum Regularization, which operates in the frequency domain to guide stylization progressively from low-frequency layouts to high-frequency details. In addition, we devise a Semantic-Aware Gradient Refinement technique that leverages relevance maps derived from diffusion semantic priors to selectively stylize semantically important regions. The proposed optimization formulation extends stylization from pixel space to parameter space, readily applicable to lightweight feedforward generators for efficient one-step stylization. SMS effectively balances style alignment and content preservation, outperforming state-of-the-art approaches, verified by extensive experiments.

replace-cross Antibiotic Resistance Microbiology Dataset (ARMD): A Resource for Antimicrobial Resistance from EHRs

Authors: Fateme Nateghi Haredasht, Fatemeh Amrollahi, Manoj Maddali, Nicholas Marshall, Stephen P. Ma, Lauren N. Cooper, Andrew O. Johnson, Ziming Wei, Richard J. Medford, Sanjat Kanjilal, Niaz Banaei, Stanley Deresinski, Mary K. Goldstein, Steven M. Asch, Amy Chang, Jonathan H. Chen

Abstract: The Antibiotic Resistance Microbiology Dataset (ARMD) is a de-identified resource derived from electronic health records (EHR) that facilitates research in antimicrobial resistance (AMR). ARMD encompasses big data from adult patients collected from over 15 years at two academic-affiliated hospitals, focusing on microbiological cultures, antibiotic susceptibilities, and associated clinical and demographic features. Key attributes include organism identification, susceptibility patterns for 55 antibiotics, implied susceptibility rules, and de-identified patient information. This dataset supports studies on antimicrobial stewardship, causal inference, and clinical decision-making. ARMD is designed to be reusable and interoperable, promoting collaboration and innovation in combating AMR. This paper describes the dataset's acquisition, structure, and utility while detailing its de-identification process.

replace-cross Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras

Authors: Shuang Guo, Friedhelm Hamann, Guillermo Gallego

Abstract: Event cameras rely on motion to obtain information about scene appearance. This means that appearance and motion are inherently linked: either both are present and recorded in the event data, or neither is captured. Previous works treat the recovery of these two visual quantities as separate tasks, which does not fit with the above-mentioned nature of event cameras and overlooks the inherent relations between them. We propose an unsupervised learning framework that jointly estimates optical flow (motion) and image intensity (appearance) using a single network. From the data generation model, we newly derive the event-based photometric error as a function of optical flow and image intensity. This error is further combined with the contrast maximization framework to form a comprehensive loss function that provides proper constraints for both flow and intensity estimation. Exhaustive experiments show our method's state-of-the-art performance: in optical flow estimation, it reduces EPE by 20% and AE by 25% compared to unsupervised approaches, while delivering competitive intensity estimation results, particularly in high dynamic range scenarios. Our method also achieves shorter inference time than all other optical flow methods and many of the image reconstruction methods, while they output only one quantity. Project page: https://github.com/tub-rip/E2FAI

URLs: https://github.com/tub-rip/E2FAI

replace-cross Graph Neural Network-Based Distributed Optimal Control for Linear Networked Systems: An Online Distributed Training Approach

Authors: Zihao Song, Shirantha Welikala, Panos J. Antsaklis, Hai Lin

Abstract: In this paper, we consider the distributed optimal control problem for discrete-time linear networked systems. In particular, we are interested in learning distributed optimal controllers using graph recurrent neural networks (GRNNs). Most of the existing approaches result in centralized optimal controllers with offline training processes. However, as the increasing demand of network resilience, the optimal controllers are further expected to be distributed, and are desirable to be trained in an online distributed fashion, which are also the main contributions of our work. To solve this problem, we first propose a GRNN-based distributed optimal control method, and we cast the problem as a self-supervised learning problem. Then, the distributed online training is achieved via distributed gradient computation, and inspired by the (consensus-based) distributed optimization idea, a distributed online training optimizer is designed. Furthermore, the local closed-loop stability of the linear networked system under our proposed GRNN-based controller is provided by assuming that the nonlinear activation function of the GRNN-based controller is both local sector-bounded and slope-restricted. The effectiveness of our proposed method is illustrated by numerical simulations using a specifically developed simulator.

replace-cross Spectral Algorithms under Covariate Shift

Authors: Jun Fan, Zheng-Chu Guo, Lei Shi

Abstract: Spectral algorithms leverage spectral regularization techniques to analyze and process data, providing a flexible framework for addressing supervised learning problems. To deepen our understanding of their performance in real-world scenarios where the distributions of training and test data may differ, we conduct a rigorous investigation into the convergence behavior of spectral algorithms under covariate shift. In this setting, the marginal distributions of the input data differ between the training and test datasets, while the conditional distribution of the output given the input remains unchanged. Within a non-parametric regression framework over a reproducing kernel Hilbert space, we analyze the convergence rates of spectral algorithms under covariate shift and show that they achieve minimax optimality when the density ratios between the training and test distributions are uniformly bounded. However, when these density ratios are unbounded, the spectral algorithms may become suboptimal. To address this issue, we propose a novel weighted spectral algorithm with normalized weights that incorporates density ratio information into the learning process. Our theoretical analysis shows that this normalized weighted approach achieves optimal capacity-independent convergence rates, but the rates will suffer from the saturation phenomenon. Furthermore, by introducing a weight clipping technique, we demonstrate that the convergence rates of the weighted spectral algorithm with clipped weights can approach the optimal capacity-dependent convergence rates arbitrarily closely. This improvement resolves the suboptimality issue in unbounded density ratio scenarios and advances the state-of-the-art by refining existing theoretical results.

replace-cross Benchmarking machine learning models for predicting aerofoil performance

Authors: Oliver Summerell, Gerardo Aragon-Camarasa, Stephanie Ordonez Sanchez

Abstract: This paper investigates the capability of Neural Networks (NNs) as alternatives to the traditional methods to analyse the performance of aerofoils used in the wind and tidal energy industry. The current methods used to assess the characteristic lift and drag coefficients include Computational Fluid Dynamics (CFD), thin aerofoil and panel methods, all face trade-offs between computational speed and the accuracy of the results and as such NNs have been investigated as an alternative with the aim that it would perform both quickly and accurately. As such, this paper provides a benchmark for the windAI_bench dataset published by the National Renewable Energy Laboratory (NREL) in the USA. In order to validate the methodology of the benchmarking, the AirfRANSdataset benchmark is used as both a starting point and a point of comparison. This study evaluates four neural networks (MLP, PointNet, GraphSAGE, GUNet) trained on a range of aerofoils at 25 angles of attack (4$^\circ$ to 20$^\circ$) to predict fluid flow and calculate lift coefficients ($C_L$) via the panel method. GraphSAGE and GUNet performed well during the training phase, but underperformed during testing. Accordingly, this paper has identified PointNet and MLP as the two strongest models tested, however whilst the results from MLP are more commonly correct for predicting the behaviour of the fluid, the results from PointNet provide the more accurate results for calculating $C_L$.

replace-cross A Goal-Oriented Reinforcement Learning-Based Path Planning Algorithm for Modular Self-Reconfigurable Satellites

Authors: Bofei Liu, Dong Ye, Zunhao Yao, Zhaowei Sun

Abstract: Modular self-reconfigurable satellites refer to satellite clusters composed of individual modular units capable of altering their configurations. The configuration changes enable the execution of diverse tasks and mission objectives. Existing path planning algorithms for reconfiguration often suffer from high computational complexity, poor generalization capability, and limited support for diverse target configurations. To address these challenges, this paper proposes a goal-oriented reinforcement learning-based path planning algorithm. This algorithm is the first to address the challenge that previous reinforcement learning methods failed to overcome, namely handling multiple target configurations. Moreover, techniques such as Hindsight Experience Replay and Invalid Action Masking are incorporated to overcome the significant obstacles posed by sparse rewards and invalid actions. Based on these designs, our model achieves a 95% and 73% success rate in reaching arbitrary target configurations in a modular satellite cluster composed of four and six units, respectively.

replace-cross Tagging fully hadronic exotic decays of the vectorlike $\mathbf{B}$ quark using a graph neural network

Authors: Jai Bardhan, Tanumoy Mandal, Subhadip Mitra, Cyrin Neeraj, Mihir Rawat

Abstract: Following up on our earlier study in [J. Bardhan et al., Machine learning-enhanced search for a vectorlike singlet B quark decaying to a singlet scalar or pseudoscalar, Phys. Rev. D 107 (2023) 115001; arXiv:2212.02442], we investigate the LHC prospects of pair-produced vectorlike $B$ quarks decaying exotically to a new gauge-singlet (pseudo)scalar field $\Phi$ and a $b$ quark. After the electroweak symmetry breaking, the $\Phi$ decays predominantly to $gg/bb$ final states, leading to a fully hadronic $2b+4j$ or $6b$ signature. Because of the large Standard Model background and the lack of leptonic handles, it is a difficult channel to probe. To overcome the challenge, we employ a hybrid deep learning model containing a graph neural network followed by a deep neural network. We estimate that such a state-of-the-art deep learning analysis pipeline can lead to a performance comparable to that in the semi-leptonic mode, taking the discovery (exclusion) reach up to about $M_B=1.8\:(2.4)$ TeV at HL-LHC when $B$ decays fully exotically, i.e., BR$(B \to b\Phi) = 100\%$.

replace-cross Aitomia: Your Intelligent Assistant for AI-Driven Atomistic and Quantum Chemical Simulations

Authors: Jinming Hu, Hassan Nawaz, Yuting Rui, Lijie Chi, Arif Ullah, Pavlo O. Dral

Abstract: We have developed Aitomia - a platform powered by AI to assist in performing AI-driven atomistic and quantum chemical (QC) simulations. This evolving intelligent assistant platform is equipped with chatbots and AI agents to help experts and guide non-experts in setting up and running atomistic simulations, monitoring their computational status, analyzing simulation results, and summarizing them for the user in both textual and graphical forms. We achieve these goals by exploiting large language models that leverage the versatility of our MLatom ecosystem, supporting AI-enhanced computational chemistry tasks ranging from ground-state to excited-state calculations, including geometry optimizations, thermochemistry, and spectral calculations. The multi-agent implementation enables autonomous executions of the complex computational workflows, such as the computation of the reaction enthalpies. Aitomia is the first intelligent assistant publicly accessible online on a cloud computing platform for atomistic simulations of broad scope (Aitomistic Hub at https://aitomistic.xyz). It may also be deployed locally as described at http://mlatom.com/aitomia. Aitomia is expected to lower the barrier to performing atomistic simulations, thereby democratizing simulations and accelerating research and development in relevant fields.

URLs: https://aitomistic.xyz)., http://mlatom.com/aitomia.

replace-cross Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models

Authors: Yue Li, Xin Yi, Dongsheng Shi, Gerard de Melo, Xiaoling Wang, Linlin Wang

Abstract: With the increasing size of Large Vision-Language Models (LVLMs), network pruning techniques aimed at compressing models for deployment in resource-constrained environments have garnered significant attention. However, we observe that pruning often leads to a degradation in safety performance. To address this issue, we present a novel and lightweight approach, termed Hierarchical Safety Realignment (HSR). HSR operates by first quantifying the contribution of each attention head to safety, identifying the most critical ones, and then selectively restoring neurons directly within these attention heads that play a pivotal role in maintaining safety. This process hierarchically realigns the safety of pruned LVLMs, progressing from the attention head level to the neuron level. We validate HSR across various models and pruning strategies, consistently achieving notable improvements in safety performance. To our knowledge, this is the first work explicitly focused on restoring safety in LVLMs post-pruning.

replace-cross Learning novel representations of variable sources from multi-modal $\textit{Gaia}$ data via autoencoders

Authors: P. Huijse, J. De Ridder, L. Eyer, L. Rimoldini, B. Holl, N. Chornay, J. Roquette, K. Nienartowicz, G. Jevardat de Fombelle, D. J. Fritzewski, A. Kemp, V. Vanlaer, M. Vanrespaille, H. Wang, M. I. Carnerero, C. M. Raiteri, G. Marton, M. Madar\'asz, G. Clementini, P. Gavras, C. Aerts

Abstract: Gaia Data Release 3 (DR3) published for the first time epoch photometry, BP/RP (XP) low-resolution mean spectra, and supervised classification results for millions of variable sources. This extensive dataset offers a unique opportunity to study their variability by combining multiple Gaia data products. In preparation for DR4, we propose and evaluate a machine learning methodology capable of ingesting multiple Gaia data products to achieve an unsupervised classification of stellar and quasar variability. A dataset of 4 million Gaia DR3 sources is used to train three variational autoencoders (VAE), which are artificial neural networks (ANNs) designed for data compression and generation. One VAE is trained on Gaia XP low-resolution spectra, another on a novel approach based on the distribution of magnitude differences in the Gaia G band, and the third on folded Gaia G band light curves. Each Gaia source is compressed into 15 numbers, representing the coordinates in a 15-dimensional latent space generated by combining the outputs of these three models. The learned latent representation produced by the ANN effectively distinguishes between the main variability classes present in Gaia DR3, as demonstrated through both supervised and unsupervised classification analysis of the latent space. The results highlight a strong synergy between light curves and low-resolution spectral data, emphasising the benefits of combining the different Gaia data products. A two-dimensional projection of the latent variables reveals numerous overdensities, most of which strongly correlate with astrophysical properties, showing the potential of this latent space for astrophysical discovery. We show that the properties of our novel latent representation make it highly valuable for variability analysis tasks, including classification, clustering and outlier detection.

replace-cross Autocomp: LLM-Driven Code Optimization for Tensor Accelerators

Authors: Charles Hong, Sahil Bhatia, Alvin Cheung, Yakun Sophia Shao

Abstract: Hardware accelerators, especially those designed for tensor processing, have become ubiquitous in today's computing landscape. However, even with significant efforts in building compilers, programming these tensor accelerators remains challenging, leaving much of their potential underutilized. Recently, large language models (LLMs), trained on large amounts of code, have shown significant promise in code generation and optimization tasks, but generating low-resource languages like specialized tensor accelerator code still poses a significant challenge. We tackle this challenge with Autocomp, an approach that empowers accelerator programmers to leverage domain knowledge and hardware feedback to optimize code via an automated LLM-driven search. We accomplish this by: 1) formulating each optimization pass as a structured two-phase prompt, divided into planning and code generation phases, 2) inserting domain knowledge during planning via a concise and adaptable optimization menu, and 3) integrating correctness and performance metrics from hardware as feedback at each search iteration. Across three categories of representative workloads and two different accelerators, we demonstrate that Autocomp-optimized code runs 5.6x (GEMM) and 2.7x (convolution) faster than the vendor-provided library, and outperforms expert-level hand-tuned code by 1.4x (GEMM), 1.1x (convolution), and 1.3x (fine-grained linear algebra). Additionally, we demonstrate that optimization schedules generated from Autocomp can be reused across similar tensor operations, improving speedups by up to 24% under a fixed sample budget.

replace-cross Audio Geolocation: A Natural Sounds Benchmark

Authors: Mustafa Chasmai, Wuao Liu, Subhransu Maji, Grant Van Horn

Abstract: Can we determine someone's geographic location purely from the sounds they hear? Are acoustic signals enough to localize within a country, state, or even city? We tackle the challenge of global-scale audio geolocation, formalize the problem, and conduct an in-depth analysis with wildlife audio from the iNatSounds dataset. Adopting a vision-inspired approach, we convert audio recordings to spectrograms and benchmark existing image geolocation techniques. We hypothesize that species vocalizations offer strong geolocation cues due to their defined geographic ranges and propose an approach that integrates species range prediction with retrieval-based geolocation. We further evaluate whether geolocation improves when analyzing species-rich recordings or when aggregating across spatiotemporal neighborhoods. Finally, we introduce case studies from movies to explore multimodal geolocation using both audio and visual content. Our work highlights the advantages of integrating audio and visual cues, and sets the stage for future research in audio geolocation.

replace-cross Quantum Cognition Machine Learning for Forecasting Chromosomal Instability

Authors: Giuseppe Di Caro, Vahagn Kirakosyan, Alexander G. Abanov, Jerome R. Busemeyer, Luca Candelori, Nadine Hartmann, Ernest T. Lam, Kharen Musaelian, Ryan Samson, Harold Steinacker, Dario Villani, Martin T. Wells, Richard J. Wenstrup, Mengjia Xu

Abstract: The accurate prediction of chromosomal instability from the morphology of circulating tumor cells (CTCs) enables real-time detection of CTCs with high metastatic potential in the context of liquid biopsy diagnostics. However, it presents a significant challenge due to the high dimensionality and complexity of single-cell digital pathology data. Here, we introduce the application of Quantum Cognition Machine Learning (QCML), a quantum-inspired computational framework, to estimate morphology-predicted chromosomal instability in CTCs from patients with metastatic breast cancer. QCML leverages quantum mechanical principles to represent data as state vectors in a Hilbert space, enabling context-aware feature modeling, dimensionality reduction, and enhanced generalization without requiring curated feature selection. QCML outperforms conventional machine learning methods when tested on out of sample verification CTCs, achieving higher accuracy in identifying predicted large-scale state transitions (pLST) status from CTC-derived morphology features. These preliminary findings support the application of QCML as a novel machine learning tool with superior performance in high-dimensional, low-sample-size biomedical contexts. QCML enables the simulation of cognition-like learning for the identification of biologically meaningful prediction of chromosomal instability from CTC morphology, offering a novel tool for CTC classification in liquid biopsy.

replace-cross Diffusion-Based Electrocardiography Noise Quantification via Anomaly Detection

Authors: Tae-Seong Han, Jae-Wook Heo, Hakseung Kim, Cheol-Hui Lee, Hyub Huh, Eue-Keun Choi, Hye Jin Kim, Dong-Joo Kim

Abstract: Electrocardiography (ECG) signals are frequently degraded by noise, limiting their clinical reliability in both conventional and wearable settings. Existing methods for addressing ECG noise, relying on artifact classification or denoising, are constrained by annotation inconsistencies and poor generalizability. Here, we address these limitations by reframing ECG noise quantification as an anomaly detection task. We propose a diffusion-based framework trained to model the normative distribution of clean ECG signals, identifying deviations as noise without requiring explicit artifact labels. To robustly evaluate performance and mitigate label inconsistencies, we introduce a distribution-based metric using the Wasserstein-1 distance ($W_1$). Our model achieved a macro-average $W_1$ score of 1.308, outperforming the next-best method by over 48\%. External validation confirmed strong generalizability, facilitating the exclusion of noisy segments to improve diagnostic accuracy and support timely clinical intervention. This approach enhances real-time ECG monitoring and broadens ECG applicability in digital health technologies.

replace-cross Hierarchical Reasoning Model

Authors: Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song, Yasin Abbasi Yadkori

Abstract: Reasoning, the process of devising and executing complex goal-oriented action sequences, remains a critical challenge in AI. Current large language models (LLMs) primarily employ Chain-of-Thought (CoT) techniques, which suffer from brittle task decomposition, extensive data requirements, and high latency. Inspired by the hierarchical and multi-timescale processing in the human brain, we propose the Hierarchical Reasoning Model (HRM), a novel recurrent architecture that attains significant computational depth while maintaining both training stability and efficiency. HRM executes sequential reasoning tasks in a single forward pass without explicit supervision of the intermediate process, through two interdependent recurrent modules: a high-level module responsible for slow, abstract planning, and a low-level module handling rapid, detailed computations. With only 27 million parameters, HRM achieves exceptional performance on complex reasoning tasks using only 1000 training samples. The model operates without pre-training or CoT data, yet achieves nearly perfect performance on challenging tasks including complex Sudoku puzzles and optimal path finding in large mazes. Furthermore, HRM outperforms much larger models with significantly longer context windows on the Abstraction and Reasoning Corpus (ARC), a key benchmark for measuring artificial general intelligence capabilities. These results underscore HRM's potential as a transformative advancement toward universal computation and general-purpose reasoning systems.

replace-cross The Joys of Categorical Conformal Prediction

Authors: Michele Caprio

Abstract: Conformal prediction (CP) is an Uncertainty Representation technique that delivers finite-sample calibrated prediction regions for any underlying Machine Learning model. Its status as an Uncertainty Quantification (UQ) tool, though, has remained conceptually opaque: While Conformal Prediction Regions (CPRs) give an ordinal representation of uncertainty (larger regions typically indicate higher uncertainty), they lack the capability to cardinally quantify it (twice as large regions do not imply twice the uncertainty). We adopt a category-theoretic approach to CP -- framing it as a morphism, embedded in a commuting diagram, of two newly-defined categories -- that brings us three joys. First, we show that -- under minimal assumptions -- CP is intrinsically a UQ mechanism, that is, its cardinal UQ capabilities are a structural feature of the method. Second, we demonstrate that CP bridges (and perhaps subsumes) the Bayesian, frequentist, and imprecise probabilistic approaches to predictive statistical reasoning. Finally, we show that a CPR is the image of a covariant functor. This observation is relevant to AI privacy: It implies that privacy noise added locally does not break the global coverage guarantee.

replace-cross Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition

Authors: Zijin Gu, Tatiana Likhomanenko, Navdeep Jaitly

Abstract: Mixture-of-experts (MoE) architectures have expanded from language modeling to automatic speech recognition (ASR). Traditional MoE methods, such as the Switch Transformer, route experts independently within each layer. Our analysis reveals that routers in most layers make expert choices that are not strongly correlated with the choices of the routers in other layers. To increase the cooperation between experts in different layers and encourage greater specialization, we use a shared router across different MoE layers. We call this model Omni-router Transformer. Extensive experiments on a large-scale pseudo-labeled dataset and evaluations across 10 diverse, out-of-domain ASR benchmarks demonstrate that the Omni-router Transformer is able to achieve lower training loss and consistently outperform dense and Switch Transformer models, reducing average word error rates by 11.2% and 8.2%, respectively, while providing structured expert usage and improved robustness to diverse data.

replace-cross Adaptive Gaussian Mixture Models-based Anomaly Detection for under-constrained Cable-Driven Parallel Robots

Authors: Julio Garrido, Javier Vales, Diego Silva-Mu\~niz, Enrique Riveiro, Pablo L\'opez-Matencio, Josu\'e Rivera-Andrade

Abstract: Cable-Driven Parallel Robots (CDPRs) are increasingly used for load manipulation tasks involving predefined toolpaths with intermediate stops. At each stop, where the platform maintains a fixed pose and the motors keep the cables under tension, the system must evaluate whether it is safe to proceed by detecting anomalies that could compromise performance (e.g., wind gusts or cable impacts). This paper investigates whether anomalies can be detected using only motor torque data, without additional sensors. It introduces an adaptive, unsupervised outlier detection algorithm based on Gaussian Mixture Models (GMMs) to identify anomalies from torque signals. The method starts with a brief calibration period, just a few seconds, during which a GMM is fit on known anomaly-free data. Real-time torque measurements are then evaluated using Mahalanobis distance from the GMM, with statistically derived thresholds triggering anomaly flags. Model parameters are periodically updated using the latest segments identified as anomaly-free to adapt to changing conditions. Validation includes 14 long-duration test sessions simulating varied wind intensities. The proposed method achieves a 100% true positive rate and 95.4% average true negative rate, with 1-second detection latency. Comparative evaluation against power threshold and non-adaptive GMM methods indicates higher robustness to drift and environmental variation.

replace-cross A Survey of Deep Learning for Geometry Problem Solving

Authors: Jianzhe Ma, Wenxuan Wang, Qin Jin

Abstract: Geometry problem solving is a key area of mathematical reasoning, which is widely involved in many important fields such as education, mathematical ability assessment of artificial intelligence, and multimodal ability assessment. In recent years, the rapid development of deep learning technology, especially the rise of multimodal large language models, has triggered a widespread research boom. This paper provides a survey of the applications of deep learning in geometry problem solving, including (i) a comprehensive summary of the relevant tasks in geometry problem solving; (ii) a thorough review of related deep learning methods; (iii) a detailed analysis of evaluation metrics and methods; and (iv) a critical discussion of the current challenges and future directions that can be explored. Our goal is to provide a comprehensive and practical reference of deep learning for geometry problem solving to promote further developments in this field. We create a continuously updated list of papers on GitHub: https://github.com/majianz/dl4gps.

URLs: https://github.com/majianz/dl4gps.

replace-cross Multimodal Coordinated Online Behavior: Trade-offs and Strategies

Authors: Lorenzo Mannocci, Stefano Cresci, Matteo Magnani, Anna Monreale, Maurizio Tesconi

Abstract: Coordinated online behavior, which spans from beneficial collective actions to harmful manipulation such as disinformation campaigns, has become a key focus in digital ecosystem analysis. Traditional methods often rely on monomodal approaches, focusing on single types of interactions like co-retweets or co-hashtags, or consider multiple modalities independently of each other. However, these approaches may overlook the complex dynamics inherent in multimodal coordination. This study compares different ways of operationalizing the detection of multimodal coordinated behavior. It examines the trade-off between weakly and strongly integrated multimodal models, highlighting the balance between capturing broader coordination patterns and identifying tightly coordinated behavior. By comparing monomodal and multimodal approaches, we assess the unique contributions of different data modalities and explore how varying implementations of multimodality impact detection outcomes. Our findings reveal that not all the modalities provide distinct insights, but that with a multimodal approach we can get a more comprehensive understanding of coordination dynamics. This work enhances the ability to detect and analyze coordinated online behavior, offering new perspectives for safeguarding the integrity of digital platforms.

replace-cross Assessing Adaptive World Models in Machines with Novel Games

Authors: Lance Ying, Katherine M. Collins, Prafull Sharma, Cedric Colas, Kaiya Ivy Zhao, Adrian Weller, Zenna Tavares, Phillip Isola, Samuel J. Gershman, Jacob D. Andreas, Thomas L. Griffiths, Francois Chollet, Kelsey R. Allen, Joshua B. Tenenbaum

Abstract: Human intelligence exhibits a remarkable capacity for rapid adaptation and effective problem-solving in novel and unfamiliar contexts. We argue that this profound adaptability is fundamentally linked to the efficient construction and refinement of internal representations of the environment, commonly referred to as world models, and we refer to this adaptation mechanism as world model induction. However, current understanding and evaluation of world models in artificial intelligence (AI) remains narrow, often focusing on static representations learned from training on massive corpora of data, instead of the efficiency and efficacy in learning these representations through interaction and exploration within a novel environment. In this Perspective, we provide a view of world model induction drawing on decades of research in cognitive science on how humans learn and adapt so efficiently; we then call for a new evaluation framework for assessing adaptive world models in AI. Concretely, we propose a new benchmarking paradigm based on suites of carefully designed games with genuine, deep and continually refreshing novelty in the underlying game structures -- we refer to this class of games as novel games. We detail key desiderata for constructing these games and propose appropriate metrics to explicitly challenge and evaluate the agent's ability for rapid world model induction. We hope that this new evaluation framework will inspire future evaluation efforts on world models in AI and provide a crucial step towards developing AI systems capable of human-like rapid adaptation and robust generalization -- a critical component of artificial general intelligence.

replace-cross A Collaborative Framework Integrating Large Language Model and Chemical Fragment Space: Mutual Inspiration for Lead Design

Authors: Hao Tuo, Yan Li, Xuanning Hu, Haishi Zhao, Xueyan Liu, Bo Yang

Abstract: Combinatorial optimization algorithm is essential in computer-aided drug design by progressively exploring chemical space to design lead compounds with high affinity to target protein. However current methods face inherent challenges in integrating domain knowledge, limiting their performance in identifying lead compounds with novel and valid binding mode. Here, we propose AutoLeadDesign, a lead compounds design framework that inspires extensive domain knowledge encoded in large language models with chemical fragments to progressively implement efficient exploration of vast chemical space. The comprehensive experiments indicate that AutoLeadDesign outperforms baseline methods. Significantly, empirical lead design campaigns targeting two clinically relevant targets (PRMT5 and SARS-CoV-2 PLpro) demonstrate AutoLeadDesign's competence in de novo generation of lead compounds achieving expert-competitive design efficacy. Structural analysis further confirms their mechanism-validated inhibitory patterns. By tracing the process of design, we find that AutoLeadDesign shares analogous mechanisms with fragment-based drug design which traditionally rely on the expert decision-making, further revealing why it works. Overall, AutoLeadDesign offers an efficient approach for lead compounds design, suggesting its potential utility in drug design.

replace-cross CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

Authors: Xiaoya Li, Xiaofei Sun, Albert Wang, Jiwei Li, Chris Shum

Abstract: The exponential growth in demand for GPU computing resources, driven by the rapid advancement of Large Language Models, has created an urgent need for automated CUDA optimization strategies. While recent advances in LLMs show promise for code generation, current SOTA models (e.g. R1, o1) achieve low success rates in improving CUDA speed. In this paper, we introduce CUDA-L1, an automated reinforcement learning framework for CUDA optimization. CUDA-L1 achieves performance improvements on the CUDA optimization task: trained on NVIDIA A100, it delivers an average speedup of x17.7 across all 250 CUDA kernels of KernelBench, with peak speedups reaching x449. Furthermore, the model also demonstrates excellent portability across GPU architectures, achieving average speedups of x17.8 on H100, x19.0 on RTX 3090, x16.5 on L40, x14.7 on H800, and x13.9 on H20 despite being optimized specifically for A100. Beyond these benchmark results, CUDA-L1 demonstrates several remarkable properties: 1) Discovers a variety of CUDA optimization techniques and learns to combine them strategically to achieve optimal performance; 2) Uncovers fundamental principles of CUDA optimization; 3) Identifies non-obvious performance bottlenecks and rejects seemingly beneficial optimizations that harm performance. The capabilities of CUDA-L1 demonstrate that reinforcement learning can transform an initially poor-performing LLM into an effective CUDA optimizer through speedup-based reward signals alone, without human expertise or domain knowledge. More importantly, the trained RL model extend the acquired reasoning abilities to new kernels. This paradigm opens possibilities for automated optimization of CUDA operations, and holds promise to substantially promote GPU efficiency and alleviate the rising pressure on GPU computing resources.

replace-cross Attention-Based Fusion of IQ and FFT Spectrograms with AoA Features for GNSS Jammer Localization

Authors: Lucas Heublein, Christian Wielenberg, Thorsten Nowak, Tobias Feigl, Christopher Mutschler, Felix Ott

Abstract: Jamming devices disrupt signals from the global navigation satellite system (GNSS) and pose a significant threat by compromising the reliability of accurate positioning. Consequently, the detection and localization of these interference signals are essential to achieve situational awareness, mitigating their impact, and implementing effective counter-measures. Classical Angle of Arrival (AoA) methods exhibit reduced accuracy in multipath environments due to signal reflections and scattering, leading to localization errors. Additionally, AoA-based techniques demand substantial computational resources for array signal processing. In this paper, we propose a novel approach for detecting and classifying interference while estimating the distance, azimuth, and elevation of jamming sources. Our benchmark study evaluates 128 vision encoder and time-series models to identify the highest-performing methods for each task. We introduce an attention-based fusion framework that integrates in-phase and quadrature (IQ) samples with Fast Fourier Transform (FFT)-computed spectrograms while incorporating 22 AoA features to enhance localization accuracy. Furthermore, we present a novel dataset of moving jamming devices recorded in an indoor environment with dynamic multipath conditions and demonstrate superior performance compared to state-of-the-art methods.

replace-cross NeuroHD-RA: Neural-distilled Hyperdimensional Model with Rhythm Alignment

Authors: ZhengXiao He, Jinghao Wen, Huayu Li, Siyuan Tian, Ao Li

Abstract: We present a novel and interpretable framework for electrocardiogram (ECG)-based disease detection that combines hyperdimensional computing (HDC) with learnable neural encoding. Unlike conventional HDC approaches that rely on static, random projections, our method introduces a rhythm-aware and trainable encoding pipeline based on RR intervals, a physiological signal segmentation strategy that aligns with cardiac cycles. The core of our design is a neural-distilled HDC architecture, featuring a learnable RR-block encoder and a BinaryLinear hyperdimensional projection layer, optimized jointly with cross-entropy and proxy-based metric loss. This hybrid framework preserves the symbolic interpretability of HDC while enabling task-adaptive representation learning. Experiments on Apnea-ECG and PTB-XL demonstrate that our model significantly outperforms traditional HDC and classical ML baselines, achieving 73.09\% precision and an F1 score of 0.626 on Apnea-ECG, with comparable robustness on PTB-XL. Our framework offers an efficient and scalable solution for edge-compatible ECG classification, with strong potential for interpretable and personalized health monitoring.

replace-cross On exploration of an interior mirror descent flow for stochastic nonconvex constrained problem

Authors: Kuangyu Ding, Kim-Chuan Toh

Abstract: We study a nonsmooth nonconvex optimization problem defined over nonconvex constraints, where the feasible set is given by the intersection of the closure of an open set and a smooth manifold. By endowing the open set with a Riemannian metric induced by a barrier function, we obtain a Riemannian subgradient flow formulated as a differential inclusion, which remains strictly within the interior of the feasible set. This continuous dynamical system unifies two classes of iterative optimization methods, namely the Hessian barrier method and mirror descent scheme, by revealing that these methods can be interpreted as discrete approximations of the continuous flow. We explore the long-term behavior of the trajectories generated by this dynamical system and show that the existing deficient convergence properties of the Hessian barrier and mirror descent scheme can be unifily and more insightfully interpreted through these of the continuous trajectory. For instance, the notorious spurious stationary points \cite{chen2024spurious} observed in Hessian barrier method and mirror descent scheme are interpreted as stable equilibria of the dynamical system that do not correspond to real stationary points of the original optimization problem. We provide two sufficient condition such that these spurious stationary points can be avoided if the strict complementarity conditions holds. In the absence of these regularity condition, we propose a random perturbation strategy that ensures the trajectory converges (subsequentially) to an approximate stationary point. Building on these insights, we introduce two iterative Riemannian subgradient methods, form of interior point methods, that generalizes the existing Hessian barrier method and mirror descent scheme for solving nonsmooth nonconvex optimization problems.

replace-cross Privacy-Preserving Multimodal News Recommendation through Federated Learning

Authors: Mehdi Khalaj, Shahrzad Golestani Najafabadi, Julita Vassileva

Abstract: Personalized News Recommendation systems (PNR) have emerged as a solution to information overload by predicting and suggesting news items tailored to individual user interests. However, traditional PNR systems face several challenges, including an overreliance on textual content, common neglect of short-term user interests, and significant privacy concerns due to centralized data storage. This paper addresses these issues by introducing a novel multimodal federated learning-based approach for news recommendation. First, it integrates both textual and visual features of news items using a multimodal model, enabling a more comprehensive representation of content. Second, it employs a time-aware model that balances users' long-term and short-term interests through multi-head self-attention networks, improving recommendation accuracy. Finally, to enhance privacy, a federated learning framework is implemented, enabling collaborative model training without sharing user data. The framework divides the recommendation model into a large server-maintained news model and a lightweight user model shared between the server and clients. The client requests news representations (vectors) and a user model from the central server, then computes gradients with user local data, and finally sends their locally computed gradients to the server for aggregation. The central server aggregates gradients to update the global user model and news model. The updated news model is further used to infer news representation by the server. To further safeguard user privacy, a secure aggregation algorithm based on Shamir's secret sharing is employed. Experiments on a real-world news dataset demonstrate strong performance compared to existing systems, representing a significant advancement in privacy-preserving personalized news recommendation.

replace-cross Supernova: Achieving More with Less in Transformer Architectures

Authors: Andrei-Valentin Tanase, Elena Pelican

Abstract: We present Supernova, a 650M-parameter decoder-only transformer that demonstrates how careful architectural design and tokenization innovation can achieve the performance of larger models while maintaining computational efficiency. Our architecture combines Rotary Positional Embeddings (RoPE), Grouped Query Attention (GQA) with a 3:1 compression ratio, RMSNorm for computational efficiency, and SwiGLU activation functions. A critical innovation is our custom 128,000-vocabulary byte-level BPE tokenizer, which achieves state-of-the-art compression performance. Through detailed analysis, we show that Supernova achieves 90% of the performance of 1B-parameter models while using 35% fewer parameters and requiring only 100B training tokens--an order of magnitude less than competing models. Our findings challenge the prevailing scaling paradigm, demonstrating that architectural efficiency and tokenization quality can compensate for reduced parameter counts.