new Evaluating Anomaly Detectors for Simulated Highly Imbalanced Industrial Classification Problems

Authors: Lesley Wheat, Martin v. Mohrenschildt, Saeid Habibi

Abstract: Machine learning offers potential solutions to current issues in industrial systems in areas such as quality control and predictive maintenance, but also faces unique barriers in industrial applications. An ongoing challenge is extreme class imbalance, primarily due to the limited availability of faulty data during training. This paper presents a comprehensive evaluation of anomaly detection algorithms using a problem-agnostic simulated dataset that reflects real-world engineering constraints. Using a synthetic dataset with a hyper-spherical based anomaly distribution in 2D and 10D, we benchmark 14 detectors across training datasets with anomaly rates between 0.05% and 20% and training sizes between 1 000 and 10 000 (with a testing dataset size of 40 000) to assess performance and generalization error. Our findings reveal that the best detector is highly dependant on the total number of faulty examples in the training dataset, with additional healthy examples offering insignificant benefits in most cases. With less than 20 faulty examples, unsupervised methods (kNN/LOF) dominate; but around 30-50 faulty examples, semi-supervised (XGBOD) and supervised (SVM/CatBoost) detectors, we see large performance increases. While semi-supervised methods do not show significant benefits with only two features, the improvements are evident at ten features. The study highlights the performance drop on generalization of anomaly detection methods on smaller datasets, and provides practical insights for deploying anomaly detection in industrial environments.

new Yahtzee: Reinforcement Learning Techniques for Stochastic Combinatorial Games

Authors: Nicholas A. Pape

Abstract: Yahtzee is a classic dice game with a stochastic, combinatorial structure and delayed rewards, making it an interesting mid-scale RL benchmark. While an optimal policy for solitaire Yahtzee can be computed using dynamic programming methods, multiplayer is intractable, motivating approximation methods. We formulate Yahtzee as a Markov Decision Process (MDP), and train self-play agents using various policy gradient methods: REINFORCE, Advantage Actor-Critic (A2C), and Proximal Policy Optimization (PPO), all using a multi-headed network with a shared trunk. We ablate feature and action encodings, architecture, return estimators, and entropy regularization to understand their impact on learning. Under a fixed training budget, REINFORCE and PPO prove sensitive to hyperparameters and fail to reach near-optimal performance, whereas A2C trains robustly across a range of settings. Our agent attains a median score of 241.78 points over 100,000 evaluation games, within 5.0\% of the optimal DP score of 254.59, achieving the upper section bonus and Yahtzee at rates of 24.9\% and 34.1\%, respectively. All models struggle to learn the upper bonus strategy, overindexing on four-of-a-kind's, highlighting persistent long-horizon credit-assignment and exploration challenges.

new The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition

Authors: Xiaoze Liu, Weichen Yu, Matt Fredrikson, Xiaoqian Wang, Jing Gao

Abstract: The open-weight LLM ecosystem is increasingly defined by model composition techniques (such as weight merging, speculative decoding, and vocabulary expansion) that remix capabilities from diverse sources. A critical prerequisite for applying these methods across different model families is tokenizer transplant, which aligns incompatible vocabularies to a shared embedding space. We demonstrate that this essential interoperability step introduces a supply-chain vulnerability: we engineer a single "breaker token" that is functionally inert in a donor model yet reliably reconstructs into a high-salience malicious feature after transplant into a base model. By exploiting the geometry of coefficient reuse, our attack creates an asymmetric realizability gap that sabotages the base model's generation while leaving the donor's utility statistically indistinguishable from nominal behavior. We formalize this as a dual-objective optimization problem and instantiate the attack using a sparse solver. Empirically, the attack is training-free and achieves spectral mimicry to evade outlier detection, while demonstrating structural persistence against fine-tuning and weight merging, highlighting a hidden risk in the pipeline of modular AI composition. Code is available at https://github.com/xz-liu/tokenforge

URLs: https://github.com/xz-liu/tokenforge

new IMBWatch -- a Spatio-Temporal Graph Neural Network approach to detect Illicit Massage Business

Authors: Swetha Varadarajan, Abhishek Ray, Lumina Albert

Abstract: Illicit Massage Businesses (IMBs) are a covert and persistent form of organized exploitation that operate under the facade of legitimate wellness services while facilitating human trafficking, sexual exploitation, and coerced labor. Detecting IMBs is difficult due to encoded digital advertisements, frequent changes in personnel and locations, and the reuse of shared infrastructure such as phone numbers and addresses. Traditional approaches, including community tips and regulatory inspections, are largely reactive and ineffective at revealing the broader operational networks traffickers rely on. To address these challenges, we introduce IMBWatch, a spatio-temporal graph neural network (ST-GNN) framework for large-scale IMB detection. IMBWatch constructs dynamic graphs from open-source intelligence, including scraped online advertisements, business license records, and crowdsourced reviews. Nodes represent heterogeneous entities such as businesses, aliases, phone numbers, and locations, while edges capture spatio-temporal and relational patterns, including co-location, repeated phone usage, and synchronized advertising. The framework combines graph convolutional operations with temporal attention mechanisms to model the evolution of IMB networks over time and space, capturing patterns such as intercity worker movement, burner phone rotation, and coordinated advertising surges. Experiments on real-world datasets from multiple U.S. cities show that IMBWatch outperforms baseline models, achieving higher accuracy and F1 scores. Beyond performance gains, IMBWatch offers improved interpretability, providing actionable insights to support proactive and targeted interventions. The framework is scalable, adaptable to other illicit domains, and released with anonymized data and open-source code to support reproducible research.

new Exploration in the Limit

Authors: Brian M. Cho, Nathan Kallus

Abstract: In fixed-confidence best arm identification (BAI), the objective is to quickly identify the optimal option while controlling the probability of error below a desired threshold. Despite the plethora of BAI algorithms, existing methods typically fall short in practical settings, as stringent exact error control requires using loose tail inequalities and/or parametric restrictions. To overcome these limitations, we introduce a relaxed formulation that requires valid error control asymptotically with respect to a minimum sample size. This aligns with many real-world settings that often involve weak signals, high desired significance, and post-experiment inference requirements, all of which necessitate long horizons. This allows us to achieve tighter optimality, while better handling flexible nonparametric outcome distributions and fully leveraging individual-level contexts. We develop a novel asymptotic anytime-valid confidence sequences over arm indices, and we use it to design a new BAI algorithm for our asymptotic framework. Our method flexibly incorporates covariates for variance reduction and ensures approximate error control in fully nonparametric settings. Under mild convergence assumptions, we provide asymptotic bounds on the sample complexity and show the worst-case sample complexity of our approach matches the best-case sample complexity of Gaussian BAI under exact error guarantees and known variances. Experiments suggest our approach reduces average sample complexities while maintaining error control.

new Dynamic Bayesian Optimization Framework for Instruction Tuning in Partial Differential Equation Discovery

Authors: Junqi Qu, Yan Zhang, Shangqian Gao, Shibo Li

Abstract: Large Language Models (LLMs) show promise for equation discovery, yet their outputs are highly sensitive to prompt phrasing, a phenomenon we term instruction brittleness. Static prompts cannot adapt to the evolving state of a multi-step generation process, causing models to plateau at suboptimal solutions. To address this, we propose NeuroSymBO, which reframes prompt engineering as a sequential decision problem. Our method maintains a discrete library of reasoning strategies and uses Bayesian Optimization to select the optimal instruction at each step based on numerical feedback. Experiments on PDE discovery benchmarks show that adaptive instruction selection significantly outperforms fixed prompts, achieving higher recovery rates with more parsimonious solutions.

new GRL-SNAM: Geometric Reinforcement Learning with Path Differential Hamiltonians for Simultaneous Navigation and Mapping in Unknown Environments

Authors: Aditya Sai Ellendula, Yi Wang, Minh Nguyen, Chandrajit Bajaj

Abstract: We present GRL-SNAM, a geometric reinforcement learning framework for Simultaneous Navigation and Mapping(SNAM) in unknown environments. A SNAM problem is challenging as it needs to design hierarchical or joint policies of multiple agents that control the movement of a real-life robot towards the goal in mapless environment, i.e. an environment where the map of the environment is not available apriori, and needs to be acquired through sensors. The sensors are invoked from the path learner, i.e. navigator, through active query responses to sensory agents, and along the motion path. GRL-SNAM differs from preemptive navigation algorithms and other reinforcement learning methods by relying exclusively on local sensory observations without constructing a global map. Our approach formulates path navigation and mapping as a dynamic shortest path search and discovery process using controlled Hamiltonian optimization: sensory inputs are translated into local energy landscapes that encode reachability, obstacle barriers, and deformation constraints, while policies for sensing, planning, and reconfiguration evolve stagewise via updating Hamiltonians. A reduced Hamiltonian serves as an adaptive score function, updating kinetic/potential terms, embedding barrier constraints, and continuously refining trajectories as new local information arrives. We evaluate GRL-SNAM on two different 2D navigation tasks. Comparing against local reactive baselines and global policy learning references under identical stagewise sensing constraints, it preserves clearance, generalizes to unseen layouts, and demonstrates that Geometric RL learning via updating Hamiltonians enables high-quality navigation through minimal exploration via local energy refinement rather than extensive global mapping. The code is publicly available on \href{https://github.com/CVC-Lab/GRL-SNAM}{Github}.

URLs: https://github.com/CVC-Lab/GRL-SNAM

new Reinforcement Learning with Function Approximation for Non-Markov Processes

Authors: Ali Devran Kara

Abstract: We study reinforcement learning methods with linear function approximation under non-Markov state and cost processes. We first consider the policy evaluation method and show that the algorithm converges under suitable ergodicity conditions on the underlying non-Markov processes. Furthermore, we show that the limit corresponds to the fixed point of a joint operator composed of an orthogonal projection and the Bellman operator of an auxiliary \emph{Markov} decision process. For Q-learning with linear function approximation, as in the Markov setting, convergence is not guaranteed in general. We show, however, that for the special case where the basis functions are chosen based on quantization maps, the convergence can be shown under similar ergodicity conditions. Finally, we apply our results to partially observed Markov decision processes, where finite-memory variables are used as state representations, and we derive explicit error bounds for the limits of the resulting learning algorithms.

new The Weather Paradox: Why Precipitation Fails to Predict Traffic Accident Severity in Large-Scale US Data

Authors: Yann Bellec, Rohan Kaman, Siwen Cui, Aarav Agrawal, Calvin Chen

Abstract: This study investigates the predictive capacity of environmental, temporal, and spatial factors on traffic accident severity in the United States. Using a dataset of 500,000 U.S. traffic accidents spanning 2016-2023, we trained an XGBoost classifier optimized through randomized search cross-validation and adjusted for class imbalance via class weighting. The final model achieves an overall accuracy of 78%, with strong performance on the majority class (Severity 2), attaining 87% precision and recall. Feature importance analysis reveals that time of day, geographic location, and weather-related variables, including visibility, temperature, and wind speed, rank among the strongest predictors of accident severity. However, contrary to initial hypotheses, precipitation and visibility demonstrate limited predictive power, potentially reflecting behavioral adaptation by drivers under overtly hazardous conditions. The dataset's predominance of mid-level severity accidents constrains the model's capacity to learn meaningful patterns for extreme cases, highlighting the need for alternative sampling strategies, enhanced feature engineering, and integration of external datasets. These findings contribute to evidence-based traffic management and suggest future directions for severity prediction research.

new Online Finetuning Decision Transformers with Pure RL Gradients

Authors: Junkai Luo, Yinglun Zhu

Abstract: Decision Transformers (DTs) have emerged as a powerful framework for sequential decision making by formulating offline reinforcement learning (RL) as a sequence modeling problem. However, extending DTs to online settings with pure RL gradients remains largely unexplored, as existing approaches continue to rely heavily on supervised sequence-modeling objectives during online finetuning. We identify hindsight return relabeling -- a standard component in online DTs -- as a critical obstacle to RL-based finetuning: while beneficial for supervised learning, it is fundamentally incompatible with importance sampling-based RL algorithms such as GRPO, leading to unstable training. Building on this insight, we propose new algorithms that enable online finetuning of Decision Transformers using pure reinforcement learning gradients. We adapt GRPO to DTs and introduce several key modifications, including sub-trajectory optimization for improved credit assignment, sequence-level likelihood objectives for enhanced stability and efficiency, and active sampling to encourage exploration in uncertain regions. Through extensive experiments, we demonstrate that our methods outperform existing online DT baselines and achieve new state-of-the-art performance across multiple benchmarks, highlighting the effectiveness of pure-RL-based online finetuning for Decision Transformers.

new Sequential Reservoir Computing for Efficient High-Dimensional Spatiotemporal Forecasting

Authors: Ata Akbari Asanjan, Filip Wudarski, Daniel O'Connor, Shaun Geaney, Elena Strbac, P. Aaron Lott, Davide Venturelli

Abstract: Forecasting high-dimensional spatiotemporal systems remains computationally challenging for recurrent neural networks (RNNs) and long short-term memory (LSTM) models due to gradient-based training and memory bottlenecks. Reservoir Computing (RC) mitigates these challenges by replacing backpropagation with fixed recurrent layers and a convex readout optimization, yet conventional RC architectures still scale poorly with input dimensionality. We introduce a Sequential Reservoir Computing (Sequential RC) architecture that decomposes a large reservoir into a series of smaller, interconnected reservoirs. This design reduces memory and computational costs while preserving long-term temporal dependencies. Using both low-dimensional chaotic systems (Lorenz63) and high-dimensional physical simulations (2D vorticity and shallow-water equations), Sequential RC achieves 15-25% longer valid forecast horizons, 20-30% lower error metrics (SSIM, RMSE), and up to three orders of magnitude lower training cost compared to LSTM and standard RNN baselines. The results demonstrate that Sequential RC maintains the simplicity and efficiency of conventional RC while achieving superior scalability for high-dimensional dynamical systems. This approach provides a practical path toward real-time, energy-efficient forecasting in scientific and engineering applications.

new Early Prediction of Liver Cirrhosis Up to Three Years in Advance: A Machine Learning Study Benchmarking Against the FIB-4 Score

Authors: Zhuqi Miao, Sujan Ravi, Abdulaziz Ahmed

Abstract: Objective: Develop and evaluate machine learning (ML) models for predicting incident liver cirrhosis one, two, and three years prior to diagnosis using routinely collected electronic health record (EHR) data, and to benchmark their performance against the FIB-4 score. Methods: We conducted a retrospective cohort study using de-identified EHR data from a large academic health system. Patients with fatty liver disease were identified and categorized into cirrhosis and non-cirrhosis cohorts based on ICD-9/10 codes. Prediction scenarios were constructed using observation and prediction windows to emulate real-world clinical use. Demographics, diagnoses, laboratory results, vital signs, and comorbidity indices were aggregated from the observation window. XGBoost models were trained for 1-, 2-, and 3-year prediction horizons and evaluated on held-out test sets. Model performance was compared with FIB-4 using area under the receiver operating characteristic curve (AUC). Results: Final cohorts included 3,043 patients for the 1-year prediction, 1,981 for the 2-year prediction, and 1,470 for the 3-year prediction. Across all prediction windows, ML models consistently outperformed FIB-4. The XGBoost models achieved AUCs of 0.81, 0.73, and 0.69 for 1-, 2-, and 3-year predictions, respectively, compared with 0.71, 0.63, and 0.57 for FIB-4. Performance gains persisted with longer prediction horizons, indicating improved early risk discrimination. Conclusions: Machine learning models leveraging routine EHR data substantially outperform the traditional FIB-4 score for early prediction of liver cirrhosis. These models enable earlier and more accurate risk stratification and can be integrated into clinical workflows as automated decision-support tools to support proactive cirrhosis prevention and management.

new Reinforcement-Learned Unequal Error Protection for Quantized Semantic Embeddings

Authors: Moirangthem Tiken Singh, Adnan Arif

Abstract: This paper tackles the pressing challenge of preserving semantic meaning in communication systems constrained by limited bandwidth. We introduce a novel reinforcement learning framework that achieves per-dimension unequal error protection via adaptive repetition coding. Central to our approach is a composite semantic distortion metric that balances global embedding similarity with entity-level preservation, empowering the reinforcement learning agent to allocate protection in a context-aware manner. Experiments show statistically significant gains over uniform protection, achieving 6.8% higher chrF scores and 9.3% better entity preservation at 1 dB SNR. The key innovation of our framework is the demonstration that simple, intelligently allocated repetition coding enables fine-grained semantic protection -- an advantage unattainable with conventional codes such as LDPC or Reed-Solomon. Our findings challenge traditional channel coding paradigms by establishing that code structure must align with semantic granularity. This approach is particularly suited to edge computing and IoT scenarios, where bandwidth is scarce, but semantic fidelity is critical, providing a practical pathway for next-generation semantic-aware networks.

new SSI-GAN: Semi-Supervised Swin-Inspired Generative Adversarial Networks for Neuronal Spike Classification

Authors: Danial Sharifrazi, Nouman Javed, Mojtaba Mohammadi, Seyede Sana Salehi, Roohallah Alizadehsani, Prasad N. Paradkar, U. Rajendra Acharya, Asim Bhatti

Abstract: Mosquitos are the main transmissive agents of arboviral diseases. Manual classification of their neuronal spike patterns is very labor-intensive and expensive. Most available deep learning solutions require fully labeled spike datasets and highly preprocessed neuronal signals. This reduces the feasibility of mass adoption in actual field scenarios. To address the scarcity of labeled data problems, we propose a new Generative Adversarial Network (GAN) architecture that we call the Semi-supervised Swin-Inspired GAN (SSI-GAN). The Swin-inspired, shifted-window discriminator, together with a transformer-based generator, is used to classify neuronal spike trains and, consequently, detect viral neurotropism. We use a multi-head self-attention model in a flat, window-based transformer discriminator that learns to capture sparser high-frequency spike features. Using just 1 to 3% labeled data, SSI-GAN was trained with more than 15 million spike samples collected at five-time post-infection and recording classification into Zika-infected, dengue-infected, or uninfected categories. Hyperparameters were optimized using the Bayesian Optuna framework, and performance for robustness was validated under fivefold Monte Carlo cross-validation. SSI-GAN reached 99.93% classification accuracy on the third day post-infection with only 3% labeled data. It maintained high accuracy across all stages of infection with just 1% supervision. This shows a 97-99% reduction in manual labeling effort relative to standard supervised approaches at the same performance level. The shifted-window transformer design proposed here beat all baselines by a wide margin and set new best marks in spike-based neuronal infection classification.

new Optimized Hybrid Feature Engineering for Resource-Efficient Arrhythmia Detection in ECG Signals: An Optimization Framework

Authors: Moirangthem Tiken Singh, Manibhushan Yaikhom

Abstract: Cardiovascular diseases, particularly arrhythmias, remain a leading global cause of mortality, necessitating continuous monitoring via the Internet of Medical Things (IoMT). However, state-of-the-art deep learning approaches often impose prohibitive computational overheads, rendering them unsuitable for resource-constrained edge devices. This study proposes a resource-efficient, data-centric framework that prioritizes feature engineering over complexity. Our optimized pipeline makes the complex, high-dimensional arrhythmia data linearly separable. This is achieved by integrating time-frequency wavelet decompositions with graph-theoretic structural descriptors, such as PageRank centrality. This hybrid feature space, combining wavelet decompositions and graph-theoretic descriptors, is then refined using mutual information and recursive elimination, enabling interpretable, ultra-lightweight linear classifiers. Validation on the MIT-BIH and INCART datasets yields 98.44% diagnostic accuracy with an 8.54 KB model footprint. The system achieves 0.46 $\mu$s classification inference latency within a 52 ms per-beat pipeline, ensuring real-time operation. These outcomes provide an order-of-magnitude efficiency gain over compressed models, such as KD-Light (25 KB, 96.32% accuracy), advancing battery-less cardiac sensors.

new Unknown Aware AI-Generated Content Attribution

Authors: Ellie Thieu, Jifan Zhang, Haoyue Bai

Abstract: The rapid advancement of photorealistic generative models has made it increasingly important to attribute the origin of synthetic content, moving beyond binary real or fake detection toward identifying the specific model that produced a given image. We study the problem of distinguishing outputs from a target generative model (e.g., OpenAI Dalle 3) from other sources, including real images and images generated by a wide range of alternative models. Using CLIP features and a simple linear classifier, shown to be effective in prior work, we establish a strong baseline for target generator attribution using only limited labeled data from the target model and a small number of known generators. However, this baseline struggles to generalize to harder, unseen, and newly released generators. To address this limitation, we propose a constrained optimization approach that leverages unlabeled wild data, consisting of images collected from the Internet that may include real images, outputs from unknown generators, or even samples from the target model itself. The proposed method encourages wild samples to be classified as non target while explicitly constraining performance on labeled data to remain high. Experimental results show that incorporating wild data substantially improves attribution performance on challenging unseen generators, demonstrating that unlabeled data from the wild can be effectively exploited to enhance AI generated content attribution in open world settings.

new Robust Graph Fine-Tuning with Adversarial Graph Prompting

Authors: Ziyan Zhang, Bo Jiang, Jin Tang

Abstract: Parameter-Efficient Fine-Tuning (PEFT) method has emerged as a dominant paradigm for adapting pre-trained GNN models to downstream tasks. However, existing PEFT methods usually exhibit significant vulnerability to various noise and attacks on graph topology and node attributes/features. To address this issue, for the first time, we propose integrating adversarial learning into graph prompting and develop a novel Adversarial Graph Prompting (AGP) framework to achieve robust graph fine-tuning. Our AGP has two key aspects. First, we propose the general problem formulation of AGP as a min-max optimization problem and develop an alternating optimization scheme to solve it. For inner maximization, we propose Joint Projected Gradient Descent (JointPGD) algorithm to generate strong adversarial noise. For outer minimization, we employ a simple yet effective module to learn the optimal node prompts to counteract the adversarial noise. Second, we demonstrate that the proposed AGP can theoretically address both graph topology and node noise. This confirms the versatility and robustness of our AGP fine-tuning method across various graph noise. Note that, the proposed AGP is a general method that can be integrated with various pre-trained GNN models to enhance their robustness on the downstream tasks. Extensive experiments on multiple benchmark tasks validate the robustness and effectiveness of AGP method compared to state-of-the-art methods.

new GRIT -- Geometry-Aware PEFT with K-FACPreconditioning, Fisher-Guided Reprojection, andDynamic Rank Adaptation

Authors: Pritish Saha, Chandrav Rajbangshi, Rudra Goyal, Mohit Goyal, Anurag Deo, Biswajit Roy, Ningthoujam Dhanachandra Singh, Raxit Goswami, Amitava Das

Abstract: Parameter-efficient fine-tuning (PEFT) is the default way to adapt LLMs, but widely used LoRA and QLoRA are largely geometry-agnostic: they optimize in fixed, randomly oriented low-rank subspaces with first-order descent, mostly ignoring local loss curvature. This can inflate the effective update budget and amplify drift along weakly constrained directions. We introduce GRIT, a dynamic, curvature-aware LoRA procedure that preserves the LoRA parameterization but: (1) preconditions gradients in rank space using K-FAC as a natural-gradient proxy; (2) periodically reprojects the low-rank basis onto dominant Fisher eigendirections to suppress drift; and (3) adapts the effective rank from the spectrum so capacity concentrates where signal resides. Across instruction-following, comprehension, and reasoning benchmarks on LLaMA backbones, GRIT matches or surpasses LoRA and QLoRA while reducing trainable parameters by 46% on average (25--80% across tasks), without practical quality loss across prompt styles and data mixes. To model forgetting, we fit a curvature-modulated power law. Empirically, GRIT yields lower drift and a better updates-vs-retention frontier than strong PEFT-optimizer baselines (Orthogonal-LoRA, IA3, DoRA, Eff-FT, Shampoo).

new Task-Driven Kernel Flows: Label Rank Compression and Laplacian Spectral Filtering

Authors: Hongxi Li, Chunlin Huang

Abstract: We present a theory of feature learning in wide L2-regularized networks showing that supervised learning is inherently compressive. We derive a kernel ODE that predicts a "water-filling" spectral evolution and prove that for any stable steady state, the kernel rank is bounded by the number of classes ($C$). We further demonstrate that SGD noise is similarly low-rank ($O(C)$), confining dynamics to the task-relevant subspace. This framework unifies the deterministic and stochastic views of alignment and contrasts the low-rank nature of supervised learning with the high-rank, expansive representations of self-supervision.

new Can Optimal Transport Improve Federated Inverse Reinforcement Learning?

Authors: David Millard, Ali Baheri

Abstract: In robotics and multi-agent systems, fleets of autonomous agents often operate in subtly different environments while pursuing a common high-level objective. Directly pooling their data to learn a shared reward function is typically impractical due to differences in dynamics, privacy constraints, and limited communication bandwidth. This paper introduces an optimal transport-based approach to federated inverse reinforcement learning (IRL). Each client first performs lightweight Maximum Entropy IRL locally, adhering to its computational and privacy limitations. The resulting reward functions are then fused via a Wasserstein barycenter, which considers their underlying geometric structure. We further prove that this barycentric fusion yields a more faithful global reward estimate than conventional parameter averaging methods in federated learning. Overall, this work provides a principled and communication-efficient framework for deriving a shared reward that generalizes across heterogeneous agents and environments.

new Quantum King-Ring Domination in Chess: A QAOA Approach

Authors: Gerhard Stenzel, Michael K\"olle, Tobias Rohe, Julian Hager, Leo S\"unkel, Maximilian Zorn, Claudia Linnhoff-Popien

Abstract: The Quantum Approximate Optimization Algorithm (QAOA) is extensively benchmarked on synthetic random instances such as MaxCut, TSP, and SAT problems, but these lack semantic structure and human interpretability, offering limited insight into performance on real-world problems with meaningful constraints. We introduce Quantum King-Ring Domination (QKRD), a NISQ-scale benchmark derived from chess tactical positions that provides 5,000 structured instances with one-hot constraints, spatial locality, and 10--40 qubit scale. The benchmark pairs human-interpretable coverage metrics with intrinsic validation against classical heuristics, enabling algorithmic conclusions without external oracles. Using QKRD, we systematically evaluate QAOA design choices and find that constraint-preserving mixers (XY, domain-wall) converge approximately 13 steps faster than standard mixers (p<10^{-7}, d\approx0.5) while eliminating penalty tuning, warm-start strategies reduce convergence by 45 steps (p<10^{-127}, d=3.35) with energy improvements exceeding d=8, and Conditional Value-at-Risk (CVaR) optimization yields an informative negative result with worse energy (p<10^{-40}, d=1.21) and no coverage benefit. Intrinsic validation shows QAOA outperforms greedy heuristics by 12.6\% and random selection by 80.1\%. Our results demonstrate that structured benchmarks reveal advantages of problem-informed QAOA techniques obscured in random instances. We release all code, data, and experimental artifacts for reproducible NISQ algorithm research.

new Smart Fault Detection in Nanosatellite Electrical Power System

Authors: Alireza Rezaee, Niloofar Nobahari, Amin Asgarifar, Farshid Hajati

Abstract: This paper presents a new detection method of faults at Nanosatellites' electrical power without an Attitude Determination Control Subsystem (ADCS) at the LEO orbit. Each part of this system is at risk of fault due to pressure tolerance, launcher pressure, and environmental circumstances. Common faults are line to line fault and open circuit for the photovoltaic subsystem, short circuit and open circuit IGBT at DC to DC converter, and regulator fault of the ground battery. The system is simulated without fault based on a neural network using solar radiation and solar panel's surface temperature as input data and current and load as outputs. Finally, using the neural network classifier, different faults are diagnosed by pattern and type of fault. For fault classification, other machine learning methods are also used, such as PCA classification, decision tree, and KNN.

new Real-Time Human Detection for Aerial Captured Video Sequences via Deep Models

Authors: Nouar AlDahoul, Aznul Qalid Md Sabri, Ali Mohammed Mansoor

Abstract: Human detection in videos plays an important role in various real-life applications. Most traditional approaches depend on utilizing handcrafted features, which are problem-dependent and optimal for specific tasks. Moreover, they are highly susceptible to dynamical events such as illumination changes, camera jitter, and variations in object sizes. On the other hand, the proposed feature learning approaches are cheaper and easier because highly abstract and discriminative features can be produced automatically without the need of expert knowledge. In this paper, we utilize automatic feature learning methods, which combine optical flow and three different deep models (i.e., supervised convolutional neural network (S-CNN), pretrained CNN feature extractor, and hierarchical extreme learning machine) for human detection in videos captured using a nonstatic camera on an aerial platform with varying altitudes. The models are trained and tested on the publicly available and highly challenging UCF-ARG aerial dataset. The comparison between these models in terms of training, testing accuracy, and learning speed is analyzed. The performance evaluation considers five human actions (digging, waving, throwing, walking, and running). Experimental results demonstrated that the proposed methods are successful for the human detection task. The pretrained CNN produces an average accuracy of 98.09%. S-CNN produces an average accuracy of 95.6% with softmax and 91.7% with Support Vector Machines (SVM). H-ELM has an average accuracy of 95.9%. Using a normal Central Processing Unit (CPU), H-ELM's training time takes 445 seconds. Learning in S-CNN takes 770 seconds with a high-performance Graphical Processing Unit (GPU).

new Deep Delta Learning

Authors: Yifan Zhang, Yifeng Liu, Mengdi Wang, Quanquan Gu

Abstract: The efficacy of deep residual networks is fundamentally predicated on the identity shortcut connection. While this mechanism effectively mitigates the vanishing gradient problem, it imposes a strictly additive inductive bias on feature transformations, thereby limiting the network's capacity to model complex state transitions. In this paper, we introduce Deep Delta Learning (DDL), a novel architecture that generalizes the standard residual connection by modulating the identity shortcut with a learnable, data-dependent geometric transformation. This transformation, termed the Delta Operator, constitutes a rank-1 perturbation of the identity matrix, parameterized by a reflection direction vector $\mathbf{k}(\mathbf{X})$ and a gating scalar $\beta(\mathbf{X})$. We provide a spectral analysis of this operator, demonstrating that the gate $\beta(\mathbf{X})$ enables dynamic interpolation between identity mapping, orthogonal projection, and geometric reflection. Furthermore, we restructure the residual update as a synchronous rank-1 injection, where the gate acts as a dynamic step size governing both the erasure of old information and the writing of new features. This unification empowers the network to explicitly control the spectrum of its layer-wise transition operator, enabling the modeling of complex, non-monotonic dynamics while preserving the stable training characteristics of gated residual architectures.

new E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models

Authors: Shengjun Zhang, Zhang Zhang, Chensheng Dai, Yueqi Duan

Abstract: Recent reinforcement learning has enhanced the flow matching models on human preference alignment. While stochastic sampling enables the exploration of denoising directions, existing methods which optimize over multiple denoising steps suffer from sparse and ambiguous reward signals. We observe that the high entropy steps enable more efficient and effective exploration while the low entropy steps result in undistinguished roll-outs. To this end, we propose E-GRPO, an entropy aware Group Relative Policy Optimization to increase the entropy of SDE sampling steps. Since the integration of stochastic differential equations suffer from ambiguous reward signals due to stochasticity from multiple steps, we specifically merge consecutive low entropy steps to formulate one high entropy step for SDE sampling, while applying ODE sampling on other steps. Building upon this, we introduce multi-step group normalized advantage, which computes group-relative advantages within samples sharing the same consolidated SDE denoising step. Experimental results on different reward settings have demonstrated the effectiveness of our methods.

new A Comparative Analysis of Interpretable Machine Learning Methods

Authors: Mattia Billa, Giovanni Orlandi, Veronica Guidetti, Federica Mandreoli

Abstract: In recent years, Machine Learning (ML) has seen widespread adoption across a broad range of sectors, including high-stakes domains such as healthcare, finance, and law. This growing reliance has raised increasing concerns regarding model interpretability and accountability, particularly as legal and regulatory frameworks place tighter constraints on using black-box models in critical applications. Although interpretable ML has attracted substantial attention, systematic evaluations of inherently interpretable models, especially for tabular data, remain relatively scarce and often focus primarily on aggregated performance outcomes. To address this gap, we present a large-scale comparative evaluation of 16 inherently interpretable methods, ranging from classical linear models and decision trees to more recent approaches such as Explainable Boosting Machines (EBMs), Symbolic Regression (SR), and Generalized Optimal Sparse Decision Trees (GOSDT). Our study spans 216 real-world tabular datasets and goes beyond aggregate rankings by stratifying performance according to structural dataset characteristics, including dimensionality, sample size, linearity, and class imbalance. In addition, we assess training time and robustness under controlled distributional shifts. Our results reveal clear performance hierarchies, especially for regression tasks, where EBMs consistently achieve strong predictive accuracy. At the same time, we show that performance is highly context-dependent: SR and Interpretable Generalized Additive Neural Networks (IGANNs) perform particularly well in non-linear regimes, while GOSDT models exhibit pronounced sensitivity to class imbalance. Overall, these findings provide practical guidance for practitioners seeking a balance between interpretability and predictive performance, and contribute to a deeper empirical understanding of interpretable modeling for tabular data.

new A Comparative Study of Adaptation Strategies for Time Series Foundation Models in Anomaly Detection

Authors: Miseon Park, Kijung Yoon

Abstract: Time series anomaly detection is essential for the reliable operation of complex systems, but most existing methods require extensive task-specific training. We explore whether time series foundation models (TSFMs), pretrained on large heterogeneous data, can serve as universal backbones for anomaly detection. Through systematic experiments across multiple benchmarks, we compare zero-shot inference, full model adaptation, and parameter-efficient fine-tuning (PEFT) strategies. Our results demonstrate that TSFMs outperform task-specific baselines, achieving notable gains in AUC-PR and VUS-PR, particularly under severe class imbalance. Moreover, PEFT methods such as LoRA, OFT, and HRA not only reduce computational cost but also match or surpass full fine-tuning in most cases, indicating that TSFMs can be efficiently adapted for anomaly detection, even when pretrained for forecasting. These findings position TSFMs as promising general-purpose models for scalable and efficient time series anomaly detection.

new Controllable Concept Bottleneck Models

Authors: Hongbin Lin, Chenyang Ren, Juangui Xu, Zhengyu Hu, Cheng-Long Wang, Yao Shu, Hui Xiong, Jingfeng Zhang, Di Wang, Lijie Hu

Abstract: Concept Bottleneck Models (CBMs) have garnered much attention for their ability to elucidate the prediction process through a human-understandable concept layer. However, most previous studies focused on static scenarios where the data and concepts are assumed to be fixed and clean. In real-world applications, deployed models require continuous maintenance: we often need to remove erroneous or sensitive data (unlearning), correct mislabeled concepts, or incorporate newly acquired samples (incremental learning) to adapt to evolving environments. Thus, deriving efficient editable CBMs without retraining from scratch remains a significant challenge, particularly in large-scale applications. To address these challenges, we propose Controllable Concept Bottleneck Models (CCBMs). Specifically, CCBMs support three granularities of model editing: concept-label-level, concept-level, and data-level, the latter of which encompasses both data removal and data addition. CCBMs enjoy mathematically rigorous closed-form approximations derived from influence functions that obviate the need for retraining. Experimental results demonstrate the efficiency and adaptability of our CCBMs, affirming their practical value in enabling dynamic and trustworthy CBMs.

new Imitation from Observations with Trajectory-Level Generative Embeddings

Authors: Yongtao Qu, Shangzhe Li, Weitong Zhang

Abstract: We consider the offline imitation learning from observations (LfO) where the expert demonstrations are scarce and the available offline suboptimal data are far from the expert behavior. Many existing distribution-matching approaches struggle in this regime because they impose strict support constraints and rely on brittle one-step models, making it hard to extract useful signal from imperfect data. To tackle this challenge, we propose TGE, a trajectory-level generative embedding for offline LfO that constructs a dense, smooth surrogate reward by estimating expert state density in the latent space of a temporal diffusion model trained on offline trajectory data. By leveraging the smooth geometry of the learned diffusion embedding, TGE captures long-horizon temporal dynamics and effectively bridges the gap between disjoint supports, ensuring a robust learning signal even when offline data is distributionally distinct from the expert. Empirically, the proposed approach consistently matches or outperforms prior offline LfO methods across a range of D4RL locomotion and manipulation benchmarks.

new Deep Networks Learn Deep Hierarchical Models

Authors: Amit Daniely

Abstract: We consider supervised learning with $n$ labels and show that layerwise SGD on residual networks can efficiently learn a class of hierarchical models. This model class assumes the existence of an (unknown) label hierarchy $L_1 \subseteq L_2 \subseteq \dots \subseteq L_r = [n]$, where labels in $L_1$ are simple functions of the input, while for $i > 1$, labels in $L_i$ are simple functions of simpler labels. Our class surpasses models that were previously shown to be learnable by deep learning algorithms, in the sense that it reaches the depth limit of efficient learnability. That is, there are models in this class that require polynomial depth to express, whereas previous models can be computed by log-depth circuits. Furthermore, we suggest that learnability of such hierarchical models might eventually form a basis for understanding deep learning. Beyond their natural fit for domains where deep learning excels, we argue that the mere existence of human ``teachers" supports the hypothesis that hierarchical structures are inherently available. By providing granular labels, teachers effectively reveal ``hints'' or ``snippets'' of the internal algorithms used by the brain. We formalize this intuition, showing that in a simplified model where a teacher is partially aware of their internal logic, a hierarchical structure emerges that facilitates efficient learnability.

new Geometric Regularization in Mixture-of-Experts: The Disconnect Between Weights and Activations

Authors: Hyunjun Kim

Abstract: Mixture-of-Experts (MoE) models achieve efficiency through sparse activation, but the role of geometric regularization in expert specialization remains unclear. We apply orthogonality loss to enforce expert diversity and find it fails on multiple fronts: it does not reduce weight-space overlap (MSO actually increases by up to 114%), activation-space overlap remains high (~0.6) regardless of regularization, and effects on performance are inconsistent -- marginal improvement on WikiText-103 (-0.9%), slight degradation on TinyStories (+0.9%), and highly variable results on PTB (std > 1.0). Our analysis across 7 regularization strengths reveals no significant correlation (r = -0.293, p = 0.523) between weight and activation orthogonality. These findings demonstrate that weight-space regularization neither achieves its geometric goal nor reliably improves performance, making it unsuitable for MoE diversity.

new Detecting Spike Wave Discharges (SWD) using 1-dimensional Residual UNet

Authors: Saurav Sengupta, Scott Kilianski, Suchetha Sharma, Sakina Lashkeri, Ashley McHugh, Mark Beenhakker, Donald E. Brown

Abstract: The manual labeling of events in electroencephalography (EEG) records is time-consuming. This is especially true when EEG recordings are taken continuously over weeks to months. Therefore, a method to automatically label pertinent EEG events reduces the manual workload. Spike wave discharges (SWD), which are the electrographic hallmark of absence seizures, are EEG events that are often labeled manually. While some previous studies have utilized machine learning to automatically segment and classify EEG signals like SWDs, they can be improved. Here we compare the performance of 14 machine learning classifiers on our own manually annotated dataset of 961 hours of EEG recordings from C3H/HeJ mice, including 22,637 labeled SWDs. We find that a 1D UNet performs best for labeling SWDs in this dataset. We also improve the 1D UNet by augmenting our training data and determine that scaling showed the greatest benefit of all augmentation procedures applied. We then compare the 1D UNet with data augmentation, AugUNet1D, against a recently published time- and frequency-based algorithmic approach called "Twin Peaks". AugUNet1D showed superior performance and detected events with more similar features to the SWDs labeled manually. AugUNet1D, pretrained on our manually annotated data or untrained, is made public for others users.

new Laplacian Kernelized Bandit

Authors: Shuang Wu, Arash A. Amini

Abstract: We study multi-user contextual bandits where users are related by a graph and their reward functions exhibit both non-linear behavior and graph homophily. We introduce a principled joint penalty for the collection of user reward functions $\{f_u\}$, combining a graph smoothness term based on RKHS distances with an individual roughness penalty. Our central contribution is proving that this penalty is equivalent to the squared norm within a single, unified \emph{multi-user RKHS}. We explicitly derive its reproducing kernel, which elegantly fuses the graph Laplacian with the base arm kernel. This unification allows us to reframe the problem as learning a single ''lifted'' function, enabling the design of principled algorithms, \texttt{LK-GP-UCB} and \texttt{LK-GP-TS}, that leverage Gaussian Process posteriors over this new kernel for exploration. We provide high-probability regret bounds that scale with an \emph{effective dimension} of the multi-user kernel, replacing dependencies on user count or ambient dimension. Empirically, our methods outperform strong linear and non-graph-aware baselines in non-linear settings and remain competitive even when the true rewards are linear. Our work delivers a unified, theoretically grounded, and practical framework that bridges Laplacian regularization with kernelized bandits for structured exploration.

new Neural Chains and Discrete Dynamical Systems

Authors: Sauro Succi, Abhisek Ganguly, Santosh Ansumali

Abstract: We inspect the analogy between machine-learning (ML) applications based on the transformer architecture without self-attention, {\it neural chains} hereafter, and discrete dynamical systems associated with discretised versions of neural integral and partial differential equations (NIE, PDE). A comparative analysis of the numerical solution of the (viscid and inviscid) Burgers and Eikonal equations via standard numerical discretization (also cast in terms of neural chains) and via PINN's learning is presented and commented on. It is found that standard numerical discretization and PINN learning provide two different paths to acquire essentially the same knowledge about the dynamics of the system. PINN learning proceeds through random matrices which bear no direct relation to the highly structured matrices associated with finite-difference (FD) procedures. Random matrices leading to acceptable solutions are far more numerous than the unique tridiagonal form in matrix space, which explains why the PINN search typically lands on the random ensemble. The price is a much larger number of parameters, causing lack of physical transparency (explainability) as well as large training costs with no counterpart in the FD procedure. However, our results refer to one-dimensional dynamic problems, hence they don't rule out the possibility that PINNs and ML in general, may offer better strategies for high-dimensional problems.

new When Small Models Are Right for Wrong Reasons: Process Verification for Trustworthy Agents

Authors: Laksh Advani

Abstract: Deploying small language models (7-9B parameters) as autonomous agents requires trust in their reasoning, not just their outputs. We reveal a critical reliability crisis: 50-69\% of correct answers from these models contain fundamentally flawed reasoning -- a ``Right-for-Wrong-Reasons'' phenomenon invisible to standard accuracy metrics. Through analysis of 10,734 reasoning traces across three models and diverse tasks, we introduce the Reasoning Integrity Score (RIS), a process-based metric validated with substantial inter-rater agreement ($\kappa=0.657$). Conventional practices are challenged by our findings: while retrieval-augmented generation (RAG) significantly improves reasoning integrity (Cohen's $d=0.23$--$0.93$), meta-cognitive interventions like self-critique often harm performance ($d=-0.14$ to $-0.33$) in small models on the evaluated tasks. Mechanistic analysis reveals RAG succeeds by grounding calculations in external evidence, reducing errors by 7.6\%, while meta-cognition amplifies confusion without sufficient model capacity. To enable deployment, verification capabilities are distilled into a neural classifier achieving 0.86 F1-score with 100$\times$ speedup. These results underscore the necessity of process-based verification for trustworthy agents: accuracy alone is dangerously insufficient when models can be right for entirely wrong reasons.

new Trajectory Guard -- A Lightweight, Sequence-Aware Model for Real-Time Anomaly Detection in Agentic AI

Authors: Laksh Advani

Abstract: Autonomous LLM agents generate multi-step action plans that can fail due to contextual misalignment or structural incoherence. Existing anomaly detection methods are ill-suited for this challenge: mean-pooling embeddings dilutes anomalous steps, while contrastive-only approaches ignore sequential structure. Standard unsupervised methods on pre-trained embeddings achieve F1-scores no higher than 0.69. We introduce Trajectory Guard, a Siamese Recurrent Autoencoder with a hybrid loss function that jointly learns task-trajectory alignment via contrastive learning and sequential validity via reconstruction. This dual objective enables unified detection of both "wrong plan for this task" and "malformed plan structure." On benchmarks spanning synthetic perturbations and real-world failures from security audits (RAS-Eval) and multi-agent systems (Who\&When), we achieve F1-scores of 0.88-0.94 on balanced sets and recall of 0.86-0.92 on imbalanced external benchmarks. At 32 ms inference latency, our approach runs 17-27$\times$ faster than LLM Judge baselines, enabling real-time safety verification in production deployments.

new A Sparse-Attention Deep Learning Model Integrating Heterogeneous Multimodal Features for Parkinson's Disease Severity Profiling

Authors: Dristi Datta, Tanmoy Debnath, Minh Chau, Manoranjan Paul, Gourab Adhikary, Md Geaur Rahman

Abstract: Characterising the heterogeneous presentation of Parkinson's disease (PD) requires integrating biological and clinical markers within a unified predictive framework. While multimodal data provide complementary information, many existing computational models struggle with interpretability, class imbalance, or effective fusion of high-dimensional imaging and tabular clinical features. To address these limitations, we propose the Class-Weighted Sparse-Attention Fusion Network (SAFN), an interpretable deep learning framework for robust multimodal profiling. SAFN integrates MRI cortical thickness, MRI volumetric measures, clinical assessments, and demographic variables using modality-specific encoders and a symmetric cross-attention mechanism that captures nonlinear interactions between imaging and clinical representations. A sparsity-constrained attention-gating fusion layer dynamically prioritises informative modalities, while a class-balanced focal loss (beta = 0.999, gamma = 1.5) mitigates dataset imbalance without synthetic oversampling. Evaluated on 703 participants (570 PD, 133 healthy controls) from the Parkinson's Progression Markers Initiative using subject-wise five-fold cross-validation, SAFN achieves an accuracy of 0.98 plus or minus 0.02 and a PR-AUC of 1.00 plus or minus 0.00, outperforming established machine learning and deep learning baselines. Interpretability analysis shows a clinically coherent decision process, with approximately 60 percent of predictive weight assigned to clinical assessments, consistent with Movement Disorder Society diagnostic principles. SAFN provides a reproducible and transparent multimodal modelling paradigm for computational profiling of neurodegenerative disease.

new Optimizing LSTM Neural Networks for Resource-Constrained Retail Sales Forecasting: A Model Compression Study

Authors: Ravi Teja Pagidoju

Abstract: Standard LSTM(Long Short-Term Memory) neural networks provide accurate predictions for sales data in the retail industry, but require a lot of computing power. It can be challenging especially for mid to small retail industries. This paper examines LSTM model compression by gradually reducing the number of hidden units from 128 to 16. We used the Kaggle Store Item Demand Forecasting dataset, which has 913,000 daily sales records from 10 stores and 50 items, to look at the trade-off between model size and how accurate the predictions are. Experiments show that lowering the number of hidden LSTM units to 64 maintains the same level of accuracy while also improving it. The mean absolute percentage error (MAPE) ranges from 23.6% for the full 128-unit model to 12.4% for the 64-unit model. The optimized model is 73% smaller (from 280KB to 76KB) and 47% more accurate. These results show that larger models do not always achieve better results.

new Federated Customization of Large Models: Approaches, Experiments, and Insights

Authors: Yuchuan Ye, Ming Ding, Youjia Chen, Peng Cheng, Dusit Niyato

Abstract: In this article, we explore federated customization of large models and highlight the key challenges it poses within the federated learning framework. We review several popular large model customization techniques, including full fine-tuning, efficient fine-tuning, prompt engineering, prefix-tuning, knowledge distillation, and retrieval-augmented generation. Then, we discuss how these techniques can be implemented within the federated learning framework. Moreover, we conduct experiments on federated prefix-tuning, which, to the best of our knowledge, is the first trial to apply prefix-tuning in the federated learning setting. The conducted experiments validate its feasibility with performance close to centralized approaches. Further comparison with three other federated customization methods demonstrated its competitive performance, satisfactory efficiency, and consistent robustness.

new Cloud-Native Generative AI for Automated Planogram Synthesis: A Diffusion Model Approach for Multi-Store Retail Optimization

Authors: Ravi Teja Pagidoju, Shriya Agarwal

Abstract: Planogram creation is a significant challenge for retail, requiring an average of 30 hours per complex layout. This paper introduces a cloud-native architecture using diffusion models to automatically generate store-specific planograms. Unlike conventional optimization methods that reorganize existing layouts, our system learns from successful shelf arrangements across multiple retail locations to create new planogram configurations. The architecture combines cloud-based model training via AWS with edge deployment for real-time inference. The diffusion model integrates retail-specific constraints through a modified loss function. Simulation-based analysis demonstrates the system reduces planogram design time by 98.3% (from 30 to 0.5 hours) while achieving 94.4% constraint satisfaction. Economic analysis reveals a 97.5% reduction in creation expenses with a 4.4-month break-even period. The cloud-native architecture scales linearly, supporting up to 10,000 concurrent store requests. This work demonstrates the viability of generative AI for automated retail space optimization.

new Entropy Production in Machine Learning Under Fokker-Planck Probability Flow

Authors: Lennon Shikhman

Abstract: Machine learning models deployed in nonstationary environments experience performance degradation due to data drift. While many drift detection heuristics exist, most lack a principled dynamical interpretation and provide limited guidance on how retraining frequency should be balanced against operational cost. In this work, we propose an entropy--based retraining framework grounded in nonequilibrium stochastic dynamics. Modeling deployment--time data drift as probability flow governed by a Fokker--Planck equation, we quantify model--data mismatch using a time--evolving Kullback--Leibler divergence. We show that the time derivative of this mismatch admits an entropy--balance decomposition featuring a nonnegative entropy production term driven by probability currents. This interpretation motivates entropy--triggered retraining as a label--free intervention strategy that responds to accumulated mismatch rather than delayed performance collapse. In a controlled nonstationary classification experiment, entropy--triggered retraining achieves predictive performance comparable to high--frequency retraining while reducing retraining events by an order of magnitude relative to daily and label--based policies.

new Adversarial Samples Are Not Created Equal

Authors: Jennifer Crawford, Amol Khanna, Fred Lu, Amy R. Wagoner, Stella Biderman, Andre T. Nguyen, Edward Raff

Abstract: Over the past decade, numerous theories have been proposed to explain the widespread vulnerability of deep neural networks to adversarial evasion attacks. Among these, the theory of non-robust features proposed by Ilyas et al. has been widely accepted, showing that brittle but predictive features of the data distribution can be directly exploited by attackers. However, this theory overlooks adversarial samples that do not directly utilize these features. In this work, we advocate that these two kinds of samples - those which use use brittle but predictive features and those that do not - comprise two types of adversarial weaknesses and should be differentiated when evaluating adversarial robustness. For this purpose, we propose an ensemble-based metric to measure the manipulation of non-robust features by adversarial perturbations and use this metric to analyze the makeup of adversarial samples generated by attackers. This new perspective also allows us to re-examine multiple phenomena, including the impact of sharpness-aware minimization on adversarial robustness and the robustness gap observed between adversarially training and standard training on robust datasets.

new Learning to be Reproducible: Custom Loss Design for Robust Neural Networks

Authors: Waqas Ahmed, Sheeba Samuel, Kevin Coakley, Birgitta Koenig-Ries, Odd Erik Gundersen

Abstract: To enhance the reproducibility and reliability of deep learning models, we address a critical gap in current training methodologies: the lack of mechanisms that ensure consistent and robust performance across runs. Our empirical analysis reveals that even under controlled initialization and training conditions, the accuracy of the model can exhibit significant variability. To address this issue, we propose a Custom Loss Function (CLF) that reduces the sensitivity of training outcomes to stochastic factors such as weight initialization and data shuffling. By fine-tuning its parameters, CLF explicitly balances predictive accuracy with training stability, leading to more consistent and reliable model performance. Extensive experiments across diverse architectures for both image classification and time series forecasting demonstrate that our approach significantly improves training robustness without sacrificing predictive performance. These results establish CLF as an effective and efficient strategy for developing more stable, reliable and trustworthy neural networks.

new HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts

Authors: Zihan Fang, Zheng Lin, Senkang Hu, Yanan Ma, Yihang Tao, Yiqin Deng, Xianhao Chen, Yuguang Fang

Abstract: While federated learning (FL) enables fine-tuning of large language models (LLMs) without compromising data privacy, the substantial size of an LLM renders on-device training impractical for resource-constrained clients, such as mobile devices. Thus, Mixture-of-Experts (MoE) models have emerged as a computation-efficient solution, which activates only a sparse subset of experts during model training to reduce computing burden without sacrificing performance. Though integrating MoE into FL fine-tuning holds significant potential, it still encounters three key challenges: i) selecting appropriate experts for clients remains challenging due to the lack of a reliable metric to measure each expert's impact on local fine-tuning performance, ii) the heterogeneous computing resources across clients severely hinder MoE-based LLM fine-tuning, as dynamic expert activations across diverse input samples can overwhelm resource-constrained devices, and iii) client-specific expert subsets and routing preference undermine global aggregation, where misaligned expert updates and inconsistent gating networks in troduce destructive interference. To address these challenges, we propose HFedMoE, a heterogeneous MoE-based FL fine-tuning framework that customizes a subset of experts to each client for computation-efficient LLM fine-tuning. Specifically, HFedMoE identifies the expert importance based on its contributions to fine-tuning performance, and then adaptively selects a subset of experts from an information bottleneck perspective to align with each client' s computing budget. A sparsity-aware model aggregation strategy is also designed to aggregate the actively fine-tuned experts and gating parameters with importance weighted contributions. Extensive experiments demonstrate that HFedMoE outperforms state-of-the-art benchmarks in training accuracy and convergence speed.

new Cycling Race Time Prediction: A Personalized Machine Learning Approach Using Route Topology and Training Load

Authors: Francisco Aguilera Moreno

Abstract: Predicting cycling duration for a given route is essential for training planning and event preparation. Existing solutions rely on physics-based models that require extensive parameterization, including aerodynamic drag coefficients and real-time wind forecasts, parameters impractical for most amateur cyclists. This work presents a machine learning approach that predicts ride duration using route topology features combined with the athlete's current fitness state derived from training load metrics. The model learns athlete-specific performance patterns from historical data, substituting complex physical measurements with historical performance proxies. We evaluate the approach using a single-athlete dataset (N=96 rides) in an N-of-1 study design. After rigorous feature engineering to eliminate data leakage, we find that Lasso regression with Topology + Fitness features achieves MAE=6.60 minutes and R2=0.922. Notably, integrating fitness metrics (CTL, ATL) reduces error by 14% compared to topology alone (MAE=7.66 min), demonstrating that physiological state meaningfully constrains performance even in self-paced efforts. Progressive checkpoint predictions enable dynamic race planning as route difficulty becomes apparent.

new Traffic-Aware Optimal Taxi Placement Using Graph Neural Network-Based Reinforcement Learning

Authors: Sonia Khetarpaul, P Y Sharan

Abstract: In the context of smart city transportation, efficient matching of taxi supply with passenger demand requires real-time integration of urban traffic network data and mobility patterns. Conventional taxi hotspot prediction models often rely solely on historical demand, overlooking dynamic influences such as traffic congestion, road incidents, and public events. This paper presents a traffic-aware, graph-based reinforcement learning (RL) framework for optimal taxi placement in metropolitan environments. The urban road network is modeled as a graph where intersections represent nodes, road segments serve as edges, and node attributes capture historical demand, event proximity, and real-time congestion scores obtained from live traffic APIs. Graph Neural Network (GNN) embeddings are employed to encode spatial-temporal dependencies within the traffic network, which are then used by a Q-learning agent to recommend optimal taxi hotspots. The reward mechanism jointly optimizes passenger waiting time, driver travel distance, and congestion avoidance. Experiments on a simulated Delhi taxi dataset, generated using real geospatial boundaries and historic ride-hailing request patterns, demonstrate that the proposed model reduced passenger waiting time by about 56% and reduced travel distance by 38% compared to baseline stochastic selection. The proposed approach is adaptable to multi-modal transport systems and can be integrated into smart city platforms for real-time urban mobility optimization.

new Stronger Approximation Guarantees for Non-Monotone {\gamma}-Weakly DR-Submodular Maximization

Authors: Hareshkumar Jadav, Ranveer Singh, Vaneet Aggarwal

Abstract: Maximizing submodular objectives under constraints is a fundamental problem in machine learning and optimization. We study the maximization of a nonnegative, non-monotone $\gamma$-weakly DR-submodular function over a down-closed convex body. Our main result is an approximation algorithm whose guarantee depends smoothly on $\gamma$; in particular, when $\gamma=1$ (the DR-submodular case) our bound recovers the $0.401$ approximation factor, while for $\gamma<1$ the guarantee degrades gracefully and, it improves upon previously reported bounds for $\gamma$-weakly DR-submodular maximization under the same constraints. Our approach combines a Frank-Wolfe-guided continuous-greedy framework with a $\gamma$-aware double-greedy step, yielding a simple yet effective procedure for handling non-monotonicity. This results in state-of-the-art guarantees for non-monotone $\gamma$-weakly DR-submodular maximization over down-closed convex bodies.

new Do Chatbot LLMs Talk Too Much? The YapBench Benchmark

Authors: Vadim Borisov, Michael Gr\"oger, Mina Mikhael, Richard H. Schreiber

Abstract: Large Language Models (LLMs) such as ChatGPT, Claude, and Gemini increasingly act as general-purpose copilots, yet they often respond with unnecessary length on simple requests, adding redundant explanations, hedging, or boilerplate that increases cognitive load and inflates token-based inference cost. Prior work suggests that preference-based post-training and LLM-judged evaluations can induce systematic length bias, where longer answers are rewarded even at comparable quality. We introduce YapBench, a lightweight benchmark for quantifying user-visible over-generation on brevity-ideal prompts. Each item consists of a single-turn prompt, a curated minimal-sufficient baseline answer, and a category label. Our primary metric, YapScore, measures excess response length beyond the baseline in characters, enabling comparisons across models without relying on any specific tokenizer. We summarize model performance via the YapIndex, a uniformly weighted average of category-level median YapScores. YapBench contains over three hundred English prompts spanning three common brevity-ideal settings: (A) minimal or ambiguous inputs where the ideal behavior is a short clarification, (B) closed-form factual questions with short stable answers, and (C) one-line coding tasks where a single command or snippet suffices. Evaluating 76 assistant LLMs, we observe an order-of-magnitude spread in median excess length and distinct category-specific failure modes, including vacuum-filling on ambiguous inputs and explanation or formatting overhead on one-line technical requests. We release the benchmark and maintain a live leaderboard for tracking verbosity behavior over time.

new Interpretability-Guided Bi-objective Optimization: Aligning Accuracy and Explainability

Authors: Kasra Fouladi, Hamta Rahmani

Abstract: This paper introduces Interpretability-Guided Bi-objective Optimization (IGBO), a framework that trains interpretable models by incorporating structured domain knowledge via a bi-objective formulation. IGBO encodes feature importance hierarchies as a Directed Acyclic Graph (DAG) and uses Temporal Integrated Gradients (TIG) to measure feature importance. To address the Out-of-Distribution (OOD) problem in TIG computation, we propose an Optimal Path Oracle that learns data-manifold-aware integration paths. Theoretical analysis proves convergence properties and robustness to mini-batch noise, while empirical results on time-series data demonstrate IGBO's effectiveness in enforcing DAG constraints with minimal accuracy loss, outperforming standard regularization baselines.

new Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation

Authors: Taekyung Ki, Sangwon Jang, Jaehyeong Jo, Jaehong Yoon, Sung Ju Hwang

Abstract: Talking head generation creates lifelike avatars from static portraits for virtual communication and content creation. However, current models do not yet convey the feeling of truly interactive communication, often generating one-way responses that lack emotional engagement. We identify two key challenges toward truly interactive avatars: generating motion in real-time under causal constraints and learning expressive, vibrant reactions without additional labeled data. To address these challenges, we propose Avatar Forcing, a new framework for interactive head avatar generation that models real-time user-avatar interactions through diffusion forcing. This design allows the avatar to process real-time multimodal inputs, including the user's audio and motion, with low latency for instant reactions to both verbal and non-verbal cues such as speech, nods, and laughter. Furthermore, we introduce a direct preference optimization method that leverages synthetic losing samples constructed by dropping user conditions, enabling label-free learning of expressive interaction. Experimental results demonstrate that our framework enables real-time interaction with low latency (approximately 500ms), achieving 6.8X speedup compared to the baseline, and produces reactive and expressive avatar motion, which is preferred over 80% against the baseline.

new IRPO: Scaling the Bradley-Terry Model via Reinforcement Learning

Authors: Haonan Song, Qingchen Xie, Huan Zhu, Feng Xiao, Luxi Xing, Fuzhen Li, Liu Kang, Feng Jiang, Zhiyong Zheng, Fan Yang

Abstract: Generative Reward Models (GRMs) have attracted considerable research interest in reward modeling due to their interpretability, inference-time scalability, and potential for refinement through reinforcement learning (RL). However, widely used pairwise GRMs create a computational bottleneck when integrated with RL algorithms such as Group Relative Policy Optimization (GRPO). This bottleneck arises from two factors: (i) the O(n^2) time complexity of pairwise comparisons required to obtain relative scores, and (ii) the computational overhead of repeated sampling or additional chain-of-thought (CoT) reasoning to improve performance. To address the first factor, we propose Intergroup Relative Preference Optimization (IRPO), a novel RL framework that incorporates the well-established Bradley-Terry model into GRPO. By generating a pointwise score for each response, IRPO enables efficient evaluation of arbitrarily many candidates during RL training while preserving interpretability and fine-grained reward signals. Experimental results demonstrate that IRPO achieves state-of-the-art (SOTA) performance among pointwise GRMs across multiple benchmarks, with performance comparable to that of current leading pairwise GRMs. Furthermore, we show that IRPO significantly outperforms pairwise GRMs in post-training evaluations.

new TeleDoCTR: Domain-Specific and Contextual Troubleshooting for Telecommunications

Authors: Mohamed Trabelsi, Huseyin Uzunalioglu

Abstract: Ticket troubleshooting refers to the process of analyzing and resolving problems that are reported through a ticketing system. In large organizations offering a wide range of services, this task is highly complex due to the diversity of submitted tickets and the need for specialized domain knowledge. In particular, troubleshooting in telecommunications (telecom) is a very time-consuming task as it requires experts to interpret ticket content, consult documentation, and search historical records to identify appropriate resolutions. This human-intensive approach not only delays issue resolution but also hinders overall operational efficiency. To enhance the effectiveness and efficiency of ticket troubleshooting in telecom, we propose TeleDoCTR, a novel telecom-related, domain-specific, and contextual troubleshooting system tailored for end-to-end ticket resolution in telecom. TeleDoCTR integrates both domain-specific ranking and generative models to automate key steps of the troubleshooting workflow which are: routing tickets to the appropriate expert team responsible for resolving the ticket (classification task), retrieving contextually and semantically similar historical tickets (retrieval task), and generating a detailed fault analysis report outlining the issue, root cause, and potential solutions (generation task). We evaluate TeleDoCTR on a real-world dataset from a telecom infrastructure and demonstrate that it achieves superior performance over existing state-of-the-art methods, significantly enhancing the accuracy and efficiency of the troubleshooting process.

new ARISE: Adaptive Reinforcement Integrated with Swarm Exploration

Authors: Rajiv Chaitanya M, D R Ramesh Babu

Abstract: Effective exploration remains a key challenge in RL, especially with non-stationary rewards or high-dimensional policies. We introduce ARISE, a lightweight framework that enhances reinforcement learning by augmenting standard policy-gradient methods with a compact swarm-based exploration layer. ARISE blends policy actions with particle-driven proposals, where each particle represents a candidate policy trajectory sampled in the action space, and modulates exploration adaptively using reward-variance cues. While easy benchmarks exhibit only slight improvements (e.g., +0.7% on CartPole-v1), ARISE yields substantial gains on more challenging tasks, including +46% on LunarLander-v3 and +22% on Hopper-v4, while preserving stability on Walker2d and Ant. Under non-stationary reward shifts, ARISE provides marked robustness advantages, outperforming PPO by +75 points on CartPole and improving LunarLander accordingly. Ablation studies confirm that both the swarm component and the adaptive mechanism contribute to the performance. Overall, ARISE offers a simple, architecture-agnostic route to more exploratory and resilient RL agents without altering core algorithmic structures.

new Bayesian Inverse Games with High-Dimensional Multi-Modal Observations

Authors: Yash Jain, Xinjie Liu, Lasse Peters, David Fridovich-Keil, Ufuk Topcu

Abstract: Many multi-agent interaction scenarios can be naturally modeled as noncooperative games, where each agent's decisions depend on others' future actions. However, deploying game-theoretic planners for autonomous decision-making requires a specification of all agents' objectives. To circumvent this practical difficulty, recent work develops maximum likelihood techniques for solving inverse games that can identify unknown agent objectives from interaction data. Unfortunately, these methods only infer point estimates and do not quantify estimator uncertainty; correspondingly, downstream planning decisions can overconfidently commit to unsafe actions. We present an approximate Bayesian inference approach for solving the inverse game problem, which can incorporate observation data from multiple modalities and be used to generate samples from the Bayesian posterior over the hidden agent objectives given limited sensor observations in real time. Concretely, the proposed Bayesian inverse game framework trains a structured variational autoencoder with an embedded differentiable Nash game solver on interaction datasets and does not require labels of agents' true objectives. Extensive experiments show that our framework successfully learns prior and posterior distributions, improves inference quality over maximum likelihood estimation-based inverse game approaches, and enables safer downstream decision-making without sacrificing efficiency. When trajectory information is uninformative or unavailable, multimodal inference further reduces uncertainty by exploiting additional observation modalities.

new BSAT: B-Spline Adaptive Tokenizer for Long-Term Time Series Forecasting

Authors: Maximilian Reinwardt, Michael Eichelbeck, Matthias Althoff

Abstract: Long-term time series forecasting using transformers is hampered by the quadratic complexity of self-attention and the rigidity of uniform patching, which may be misaligned with the data's semantic structure. In this paper, we introduce the \textit{B-Spline Adaptive Tokenizer (BSAT)}, a novel, parameter-free method that adaptively segments a time series by fitting it with B-splines. BSAT algorithmically places tokens in high-curvature regions and represents each variable-length basis function as a fixed-size token, composed of its coefficient and position. Further, we propose a hybrid positional encoding that combines a additive learnable positional encoding with Rotary Positional Embedding featuring a layer-wise learnable base: L-RoPE. This allows each layer to attend to different temporal dependencies. Our experiments on several public benchmarks show that our model is competitive with strong performance at high compression rates. This makes it particularly well-suited for use cases with strong memory constraints.

new Precision Autotuning for Linear Solvers via Contextual Bandit-Based RL

Authors: Erin Carson, Xinye Chen

Abstract: We propose a reinforcement learning (RL) framework for adaptive precision tuning of linear solvers, and can be extended to general algorithms. The framework is formulated as a contextual bandit problem and solved using incremental action-value estimation with a discretized state space to select optimal precision configurations for computational steps, balancing precision and computational efficiency. To verify its effectiveness, we apply the framework to iterative refinement for solving linear systems $Ax = b$. In this application, our approach dynamically chooses precisions based on calculated features from the system. In detail, a Q-table maps discretized features (e.g., approximate condition number and matrix norm)to actions (chosen precision configurations for specific steps), optimized via an epsilon-greedy strategy to maximize a multi-objective reward balancing accuracy and computational cost. Empirical results demonstrate effective precision selection, reducing computational cost while maintaining accuracy comparable to double-precision baselines. The framework generalizes to diverse out-of-sample data and offers insight into utilizing RL precision selection for other numerical algorithms, advancing mixed-precision numerical methods in scientific computing. To the best of our knowledge, this is the first work on precision autotuning with RL and verified on unseen datasets.

new Stochastic Actor-Critic: Mitigating Overestimation via Temporal Aleatoric Uncertainty

Authors: U\u{g}urcan \"Ozalp

Abstract: Off-policy actor-critic methods in reinforcement learning train a critic with temporal-difference updates and use it as a learning signal for the policy (actor). This design typically achieves higher sample efficiency than purely on-policy methods. However, critic networks tend to overestimate value estimates systematically. This is often addressed by introducing a pessimistic bias based on uncertainty estimates. Current methods employ ensembling to quantify the critic's epistemic uncertainty-uncertainty due to limited data and model ambiguity-to scale pessimistic updates. In this work, we propose a new algorithm called Stochastic Actor-Critic (STAC) that incorporates temporal (one-step) aleatoric uncertainty-uncertainty arising from stochastic transitions, rewards, and policy-induced variability in Bellman targets-to scale pessimistic bias in temporal-difference updates, rather than relying on epistemic uncertainty. STAC uses a single distributional critic network to model the temporal return uncertainty, and applies dropout to both the critic and actor networks for regularization. Our results show that pessimism based on a distributional critic alone suffices to mitigate overestimation, and naturally leads to risk-averse behavior in stochastic environments. Introducing dropout further improves training stability and performance by means of regularization. With this design, STAC achieves improved computational efficiency using a single distributional critic network.

new The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving

Authors: Max Ruiz Luyten, Mihaela van der Schaar

Abstract: State-of-the-art large language model (LLM) pipelines rely on bootstrapped reasoning loops: sampling diverse chains of thought and reinforcing the highest-scoring ones, mainly optimizing correctness. We analyze how this design choice is sensitive to the collapse of the model's distribution over reasoning paths, slashing semantic entropy and undermining creative problem-solving. To analyze this failure, we introduce Distributional Creative Reasoning (DCR), a unified variational objective that casts training as gradient flow through probability measures on solution traces. STaR, GRPO, and DPO, as well as entropy bonuses, and other methods, all constitute special cases of the same loss. The framework delivers three core results: (i) the diversity decay theorem, describing how correctness-based objectives lead to distinct modes of diversity decay for STaR, GRPO, and DPO; (ii) designs that ensure convergence to a stable and diverse policy, effectively preventing collapse; and (iii) simple, actionable recipes to achieve this in practice. DCR thus offers the first principled recipe for LLMs that remain both correct and creative.

new A Machine Learning Framework for Off Ball Defensive Role and Performance Evaluation in Football

Authors: Sean Groom, Shuo Wang, Francisco Belo, Axl Rice, Liam Anderson

Abstract: Evaluating off-ball defensive performance in football is challenging, as traditional metrics do not capture the nuanced coordinated movements that limit opponent action selection and success probabilities. Although widely used possession value models excel at appraising on-ball actions, their application to defense remains limited. Existing counterfactual methods, such as ghosting models, help extend these analyses but often rely on simulating "average" behavior that lacks tactical context. To address this, we introduce a covariate-dependent Hidden Markov Model (CDHMM) tailored to corner kicks, a highly structured aspect of football games. Our label-free model infers time-resolved man-marking and zonal assignments directly from player tracking data. We leverage these assignments to propose a novel framework for defensive credit attribution and a role-conditioned ghosting method for counterfactual analysis of off-ball defensive performance. We show how these contributions provide a interpretable evaluation of defensive contributions against context-aware baselines.

new Memory Bank Compression for Continual Adaptation of Large Language Models

Authors: Thomas Katraouras, Dimitrios Rafailidis

Abstract: Large Language Models (LLMs) have become a mainstay for many everyday applications. However, as data evolve their knowledge quickly becomes outdated. Continual learning aims to update LLMs with new information without erasing previously acquired knowledge. Although methods such as full fine-tuning can incorporate new data, they are computationally expensive and prone to catastrophic forgetting, where prior knowledge is overwritten. Memory-augmented approaches address this by equipping LLMs with a memory bank, that is an external memory module which stores information for future use. However, these methods face a critical limitation, in particular, the memory bank constantly grows in the real-world scenario when large-scale data streams arrive. In this paper, we propose MBC, a model that compresses the memory bank through a codebook optimization strategy during online adaptation learning. To ensure stable learning, we also introduce an online resetting mechanism that prevents codebook collapse. In addition, we employ Key-Value Low-Rank Adaptation in the attention layers of the LLM, enabling efficient utilization of the compressed memory representations. Experiments with benchmark question-answering datasets demonstrate that MBC reduces the memory bank size to 0.3% when compared against the most competitive baseline, while maintaining high retention accuracy during online adaptation learning. Our code is publicly available at https://github.com/Thomkat/MBC.

URLs: https://github.com/Thomkat/MBC.

new Categorical Reparameterization with Denoising Diffusion models

Authors: Samson Gourevitch, Alain Durmus, Eric Moulines, Jimmy Olsson, Yazid Janati

Abstract: Gradient-based optimization with categorical variables typically relies on score-function estimators, which are unbiased but noisy, or on continuous relaxations that replace the discrete distribution with a smooth surrogate admitting a pathwise (reparameterized) gradient, at the cost of optimizing a biased, temperature-dependent objective. In this paper, we extend this family of relaxations by introducing a diffusion-based soft reparameterization for categorical distributions. For these distributions, the denoiser under a Gaussian noising process admits a closed form and can be computed efficiently, yielding a training-free diffusion sampler through which we can backpropagate. Our experiments show that the proposed reparameterization trick yields competitive or improved optimization performance on various benchmarks.

new FedHypeVAE: Federated Learning with Hypernetwork Generated Conditional VAEs for Differentially Private Embedding Sharing

Authors: Sunny Gupta, Amit Sethi

Abstract: Federated data sharing promises utility without centralizing raw data, yet existing embedding-level generators struggle under non-IID client heterogeneity and provide limited formal protection against gradient leakage. We propose FedHypeVAE, a differentially private, hypernetwork-driven framework for synthesizing embedding-level data across decentralized clients. Building on a conditional VAE backbone, we replace the single global decoder and fixed latent prior with client-aware decoders and class-conditional priors generated by a shared hypernetwork from private, trainable client codes. This bi-level design personalizes the generative layerrather than the downstream modelwhile decoupling local data from communicated parameters. The shared hypernetwork is optimized under differential privacy, ensuring that only noise-perturbed, clipped gradients are aggregated across clients. A local MMD alignment between real and synthetic embeddings and a Lipschitz regularizer on hypernetwork outputs further enhance stability and distributional coherence under non-IID conditions. After training, a neutral meta-code enables domain agnostic synthesis, while mixtures of meta-codes provide controllable multi-domain coverage. FedHypeVAE unifies personalization, privacy, and distribution alignment at the generator level, establishing a principled foundation for privacy-preserving data synthesis in federated settings. Code: github.com/sunnyinAI/FedHypeVAE

new Geometry of Reason: Spectral Signatures of Valid Mathematical Reasoning

Authors: Valentin No\"el

Abstract: We present a training-free method for detecting valid mathematical reasoning in large language models through spectral analysis of attention patterns. By treating attention matrices as adjacency matrices of dynamic graphs over tokens, we extract four interpretable spectral diagnostics, the Fiedler value (algebraic connectivity), high-frequency energy ratio (HFER), graph signal smoothness, and spectral entropy, that exhibit statistically significant differences between valid and invalid mathematical proofs. Experiments across seven transformer models from four independent architectural families (Meta Llama, Alibaba Qwen, Microsoft Phi, and Mistral AI) demonstrate that this spectral signature produces effect sizes up to Cohen's $d = 3.30$ ($p < 10^{-116}$), enabling 85.0--95.6\% classification accuracy under rigorous evaluation, with calibrated thresholds reaching 93--95\% on the full dataset. The method requires no training data, fine-tuning, or learned classifiers: a single threshold on a spectral metric suffices for high accuracy. Through systematic label correction, we discover that the spectral method detects logical coherence rather than compiler acceptance, identifying mathematically valid proofs that formal verifiers reject due to technical failures. We further identify an architectural dependency: Mistral-7B's Sliding Window Attention shifts the discriminative signal from HFER to late-layer Smoothness ($d = 2.09$, $p_{\text{MW}} = 1.16 \times 10^{-48}$), revealing that attention mechanism design affects which spectral features capture reasoning validity. These findings establish spectral graph analysis as a principled framework for reasoning verification with immediate applications to hallucination detection and AI safety monitoring.

cross Finetuning Large Language Models for Automated Depression Screening in Nigerian Pidgin English: GENSCORE Pilot Study

Authors: Isaac Iyinoluwa Olufadewa, Miracle Ayomikun Adesina, Ezekiel Ayodeji Oladejo, Uthman Babatunde Usman, Owen Kolade Adeniyi, Matthew Tolulope Olawoyin

Abstract: Depression is a major contributor to the mental-health burden in Nigeria, yet screening coverage remains limited due to low access to clinicians, stigma, and language barriers. Traditional tools like the Patient Health Questionnaire-9 (PHQ-9) were validated in high-income countries but may be linguistically or culturally inaccessible for low- and middle-income countries and communities such as Nigeria where people communicate in Nigerian Pidgin and more than 520 local languages. This study presents a novel approach to automated depression screening using fine-tuned large language models (LLMs) adapted for conversational Nigerian Pidgin. We collected a dataset of 432 Pidgin-language audio responses from Nigerian young adults aged 18-40 to prompts assessing psychological experiences aligned with PHQ-9 items, performed transcription, rigorous preprocessing and annotation, including semantic labeling, slang and idiom interpretation, and PHQ-9 severity scoring. Three LLMs - Phi-3-mini-4k-instruct, Gemma-3-4B-it, and GPT-4.1 - were fine-tuned on this annotated dataset, and their performance was evaluated quantitatively (accuracy, precision and semantic alignment) and qualitatively (clarity, relevance, and cultural appropriateness). GPT-4.1 achieved the highest quantitative performance, with 94.5% accuracy in PHQ-9 severity scoring prediction, outperforming Gemma-3-4B-it and Phi-3-mini-4k-instruct. Qualitatively, GPT-4.1 also produced the most culturally appropriate, clear, and contextually relevant responses. AI-mediated depression screening for underserved Nigerian communities. This work provides a foundation for deploying conversational mental-health tools in linguistically diverse, resource-constrained environments.

cross Neural Brain Fields: A NeRF-Inspired Approach for Generating Nonexistent EEG Electrodes

Authors: Shahar Ain Kedem, Itamar Zimerman, Eliya Nachmani

Abstract: Electroencephalography (EEG) data present unique modeling challenges because recordings vary in length, exhibit very low signal to noise ratios, differ significantly across participants, drift over time within sessions, and are rarely available in large and clean datasets. Consequently, developing deep learning methods that can effectively process EEG signals remains an open and important research problem. To tackle this problem, this work presents a new method inspired by Neural Radiance Fields (NeRF). In computer vision, NeRF techniques train a neural network to memorize the appearance of a 3D scene and then uses its learned parameters to render and edit the scene from any viewpoint. We draw an analogy between the discrete images captured from different viewpoints used to learn a continuous 3D scene in NeRF, and EEG electrodes positioned at different locations on the scalp, which are used to infer the underlying representation of continuous neural activity. Building on this connection, we show that a neural network can be trained on a single EEG sample in a NeRF style manner to produce a fixed size and informative weight vector that encodes the entire signal. Moreover, via this representation we can render the EEG signal at previously unseen time steps and spatial electrode positions. We demonstrate that this approach enables continuous visualization of brain activity at any desired resolution, including ultra high resolution, and reconstruction of raw EEG signals. Finally, our empirical analysis shows that this method can effectively simulate nonexistent electrodes data in EEG recordings, allowing the reconstructed signal to be fed into standard EEG processing networks to improve performance.

cross Modeling Day-Long ECG Signals to Predict Heart Failure Risk with Explainable AI

Authors: Eran Zvuloni, Ronit Almog, Michael Glikson, Shany Brimer Biton, Ilan Green, Izhar Laufer, Offer Amir, Joachim A. Behar

Abstract: Heart failure (HF) affects 11.8% of adults aged 65 and older, reducing quality of life and longevity. Preventing HF can reduce morbidity and mortality. We hypothesized that artificial intelligence (AI) applied to 24-hour single-lead electrocardiogram (ECG) data could predict the risk of HF within five years. To research this, the Technion-Leumit Holter ECG (TLHE) dataset, including 69,663 recordings from 47,729 patients, collected over 20 years was used. Our deep learning model, DeepHHF, trained on 24-hour ECG recordings, achieved an area under the receiver operating characteristic curve of 0.80 that outperformed a model using 30-second segments and a clinical score. High-risk individuals identified by DeepHHF had a two-fold chance of hospitalization or death incidents. Explainability analysis showed DeepHHF focused on arrhythmias and heart abnormalities, with key attention between 8 AM and 3 PM. This study highlights the feasibility of deep learning to model 24-hour continuous ECG data, capturing paroxysmal events and circadian variations essential for reliable risk prediction. Artificial intelligence applied to single-lead Holter ECG is non-invasive, inexpensive, and widely accessible, making it a promising tool for HF risk prediction.

cross Personalized Spiking Neural Networks with Ferroelectric Synapses for EEG Signal Processing

Authors: Nikhil Garg, Anxiong Song, Niklas Plessnig, Nathan Savoia, Laura B\'egon-Lours

Abstract: Electroencephalography (EEG)-based brain-computer interfaces (BCIs) are strongly affected by non-stationary neural signals that vary across sessions and individuals, limiting the generalization of subject-agnostic models and motivating adaptive and personalized learning on resource-constrained platforms. Programmable memristive hardware offers a promising substrate for such post-deployment adaptation; however, practical realization is challenged by limited weight resolution, device variability, nonlinear programming dynamics, and finite device endurance. In this work, we show that spiking neural networks (SNNs) can be deployed on ferroelectric memristive synaptic devices for adaptive EEG-based motor imagery decoding under realistic device constraints. We fabricate, characterize, and model ferroelectric synapses. We evaluate a convolutional-recurrent SNN architecture under two complementary deployment strategies: (i) device-aware training using a ferroelectric synapse model, and (ii) transfer of software-trained weights followed by low-overhead on-device re-tuning. To enable efficient adaptation, we introduce a device-aware weight-update strategy in which gradient-based updates are accumulated digitally and converted into discrete programming events only when a threshold is exceeded, emulating nonlinear, state-dependent programming dynamics while reducing programming frequency. Both deployment strategies achieve classification performance comparable to state-of-the-art software-based SNNs. Furthermore, subject-specific transfer learning achieved by retraining only the final network layers improves classification accuracy. These results demonstrate that programmable ferroelectric hardware can support robust, low-overhead adaptation in spiking neural networks, opening a practical path toward personalized neuromorphic processing of neural signals.

cross A multi-algorithm approach for operational human resources workload balancing in a last mile urban delivery system

Authors: Luis M. Moreno-Saavedra, Silvia Jimenez-Fernandez, Antonio Portilla-Figueras, David Casillas-Perez, Sancho Salcedo-Sanz

Abstract: Efficient workload assignment to the workforce is critical in last-mile package delivery systems. In this context, traditional methods of assigning package deliveries to workers based on geographical proximity can be inefficient and surely guide to an unbalanced workload distribution among delivery workers. In this paper, we look at the problem of operational human resources workload balancing in last-mile urban package delivery systems. The idea is to consider the effort workload to optimize the system, i.e., the optimization process is now focused on improving the delivery time, so that the workload balancing is complete among all the staff. This process should correct significant decompensations in workload among delivery workers in a given zone. Specifically, we propose a multi-algorithm approach to tackle this problem. The proposed approach takes as input a set of delivery points and a defined number of workers, and then assigns packages to workers, in such a way that it ensures that each worker completes a similar amount of work per day. The proposed algorithms use a combination of distance and workload considerations to optimize the allocation of packages to workers. In this sense, the distance between the delivery points and the location of each worker is also taken into account. The proposed multi-algorithm methodology includes different versions of k-means, evolutionary approaches, recursive assignments based on k-means initialization with different problem encodings, and a hybrid evolutionary ensemble algorithm. We have illustrated the performance of the proposed approach in a real-world problem in an urban last-mile package delivery workforce operating at Azuqueca de Henares, Spain.

cross Active learning for data-driven reduced models of parametric differential systems with Bayesian operator inference

Authors: Shane A. McQuarrie, Mengwu Guo, Anirban Chaudhuri

Abstract: This work develops an active learning framework to intelligently enrich data-driven reduced-order models (ROMs) of parametric dynamical systems, which can serve as the foundation of virtual assets in a digital twin. Data-driven ROMs are explainable, computationally efficient scientific machine learning models that aim to preserve the underlying physics of complex dynamical simulations. Since the quality of data-driven ROMs is sensitive to the quality of the limited training data, we seek to identify training parameters for which using the associated training data results in the best possible parametric ROM. Our approach uses the operator inference methodology, a regression-based strategy which can be tailored to particular parametric structure for a large class of problems. We establish a probabilistic version of parametric operator inference, casting the learning problem as a Bayesian linear regression. Prediction uncertainties stemming from the resulting probabilistic ROM solutions are used to design a sequential adaptive sampling scheme to select new training parameter vectors that promote ROM stability and accuracy globally in the parameter domain. We conduct numerical experiments for several nonlinear parametric systems of partial differential equations and compare the results to ROMs trained on random parameter samples. The results demonstrate that the proposed adaptive sampling strategy consistently yields more stable and accurate ROMs than random sampling does under the same computational budget.

cross Deep Learning Approach for the Diagnosis of Pediatric Pneumonia Using Chest X-ray Imaging

Authors: Fatemeh Hosseinabadi, Mohammad Mojtaba Rohani

Abstract: Pediatric pneumonia remains a leading cause of morbidity and mortality in children worldwide. Timely and accurate diagnosis is critical but often challenged by limited radiological expertise and the physiological and procedural complexity of pediatric imaging. This study investigates the performance of state-of-the-art convolutional neural network (CNN) architectures ResNetRS, RegNet, and EfficientNetV2 using transfer learning for the automated classification of pediatric chest Xray images as either pneumonia or normal.A curated subset of 1,000 chest X-ray images was extracted from a publicly available dataset originally comprising 5,856 pediatric images. All images were preprocessed and labeled for binary classification. Each model was fine-tuned using pretrained ImageNet weights and evaluated based on accuracy and sensitivity. RegNet achieved the highest classification performance with an accuracy of 92.4 and a sensitivity of 90.1, followed by ResNetRS (accuracy: 91.9, sensitivity: 89.3) and EfficientNetV2 (accuracy: 88.5, sensitivity: 88.1).

cross Large Empirical Case Study: Go-Explore adapted for AI Red Team Testing

Authors: Manish Bhatt, Adrian Wood, Idan Habler, Ammar Al-Kahfah

Abstract: Production LLM agents with tool-using capabilities require security testing despite their safety training. We adapt Go-Explore to evaluate GPT-4o-mini across 28 experimental runs spanning six research questions. We find that random-seed variance dominates algorithmic parameters, yielding an 8x spread in outcomes; single-seed comparisons are unreliable, while multi-seed averaging materially reduces variance in our setup. Reward shaping consistently harms performance, causing exploration collapse in 94% of runs or producing 18 false positives with zero verified attacks. In our environment, simple state signatures outperform complex ones. For comprehensive security testing, ensembles provide attack-type diversity, whereas single agents optimize coverage within a given attack type. Overall, these results suggest that seed variance and targeted domain knowledge can outweigh algorithmic sophistication when testing safety-trained models.

cross Group Cross-Correlations with Faintly Constrained Filters

Authors: Benedikt Fluhr

Abstract: We provide a notion of group cross-correlations, where the associated filter is not as tightly constrained as in the previous literature. This resolves an incompatibility previous constraints have for group actions with non-compact stabilizers. Moreover, we generalize previous results to group actions that are not necessarily transitive, and we weaken the common assumption of unimodularity.

cross Automated electrostatic characterization of quantum dot devices in single- and bilayer heterostructures

Authors: Merritt P. R. Losert, Dario Denora, Barnaby van Straaten, Michael Chan, Stefan D. Oosterhout, Lucas Stehouwer, Giordano Scappucci, Menno Veldhorst, Justyna P. Zwolak

Abstract: As quantum dot (QD)-based spin qubits advance toward larger, more complex device architectures, rapid, automated device characterization and data analysis tools become critical. The orientation and spacing of transition lines in a charge stability diagram (CSD) contain a fingerprint of a QD device's capacitive environment, making these measurements useful tools for device characterization. However, manually interpreting these features is time-consuming, error-prone, and impractical at scale. Here, we present an automated protocol for extracting underlying capacitive properties from CSDs. Our method integrates machine learning, image processing, and object detection to identify and track charge transitions across large datasets without manual labeling. We demonstrate this method using experimentally measured data from a strained-germanium single-quantum-well (planar) and a strained-germanium double-quantum-well (bilayer) QD device. Unlike for planar QD devices, CSDs in bilayer germanium heterostructure exhibit a larger set of transitions, including interlayer tunneling and distinct loading lines for the vertically stacked QDs, making them a powerful testbed for automation methods. By analyzing the properties of many CSDs, we can statistically estimate physically relevant quantities, like relative lever arms and capacitive couplings. Thus, our protocol enables rapid extraction of useful, nontrivial information about QD devices.

cross Cuffless, calibration-free hemodynamic monitoring with physics-informed machine learning models

Authors: Henry Crandall, Tyler Schuessler, Filip B\v{e}l\'ik, Albert Fabregas, Barry M. Stults, Alexandra Boyadzhiev, Huanan Zhang, Jim S. Wu, Aylin R. Rodan, Stephen P. Juraschek, Ramakrishna Mukkamala, Alfred K. Cheung, Stavros G. Drakos, Christel Hohenegger, Braxton Osting, Benjamin Sanchez

Abstract: Wearable technologies have the potential to transform ambulatory and at-home hemodynamic monitoring by providing continuous assessments of cardiovascular health metrics and guiding clinical management. However, existing cuffless wearable devices for blood pressure (BP) monitoring often rely on methods lacking theoretical foundations, such as pulse wave analysis or pulse arrival time, making them vulnerable to physiological and experimental confounders that undermine their accuracy and clinical utility. Here, we developed a smartwatch device with real-time electrical bioimpedance (BioZ) sensing for cuffless hemodynamic monitoring. We elucidate the biophysical relationship between BioZ and BP via a multiscale analytical and computational modeling framework, and identify physiological, anatomical, and experimental parameters that influence the pulsatile BioZ signal at the wrist. A signal-tagged physics-informed neural network incorporating fluid dynamics principles enables calibration-free estimation of BP and radial and axial blood velocity. We successfully tested our approach with healthy individuals at rest and after physical activity including physical and autonomic challenges, and with patients with hypertension and cardiovascular disease in outpatient and intensive care settings. Our findings demonstrate the feasibility of BioZ technology for cuffless BP and blood velocity monitoring, addressing critical limitations of existing cuffless technologies.

cross Reinforcement learning with timed constraints for robotics motion planning

Authors: Zhaoan Wang, Junchao Li, Mahdi Mohammad, Shaoping Xiao

Abstract: Robotic systems operating in dynamic and uncertain environments increasingly require planners that satisfy complex task sequences while adhering to strict temporal constraints. Metric Interval Temporal Logic (MITL) offers a formal and expressive framework for specifying such time-bounded requirements; however, integrating MITL with reinforcement learning (RL) remains challenging due to stochastic dynamics and partial observability. This paper presents a unified automata-based RL framework for synthesizing policies in both Markov Decision Processes (MDPs) and Partially Observable Markov Decision Processes (POMDPs) under MITL specifications. MITL formulas are translated into Timed Limit-Deterministic Generalized B\"uchi Automata (Timed-LDGBA) and synchronized with the underlying decision process to construct product timed models suitable for Q-learning. A simple yet expressive reward structure enforces temporal correctness while allowing additional performance objectives. The approach is validated in three simulation studies: a $5 \times 5$ grid-world formulated as an MDP, a $10 \times 10$ grid-world formulated as a POMDP, and an office-like service-robot scenario. Results demonstrate that the proposed framework consistently learns policies that satisfy strict time-bounded requirements under stochastic transitions, scales to larger state spaces, and remains effective in partially observable environments, highlighting its potential for reliable robotic planning in time-critical and uncertain settings.

cross It's Never Too Late: Noise Optimization for Collapse Recovery in Trained Diffusion Models

Authors: Anne Harrington, A. Sophia Koepke, Shyamgopal Karthik, Trevor Darrell, Alexei A. Efros

Abstract: Contemporary text-to-image models exhibit a surprising degree of mode collapse, as can be seen when sampling several images given the same text prompt. While previous work has attempted to address this issue by steering the model using guidance mechanisms, or by generating a large pool of candidates and refining them, in this work we take a different direction and aim for diversity in generations via noise optimization. Specifically, we show that a simple noise optimization objective can mitigate mode collapse while preserving the fidelity of the base model. We also analyze the frequency characteristics of the noise and show that alternative noise initializations with different frequency profiles can improve both optimization and search. Our experiments demonstrate that noise optimization yields superior results in terms of generation quality and variety.

cross Combining datasets with different ground truths using Low-Rank Adaptation to generalize image-based CNN models for photometric redshift prediction

Authors: Vikram Seenivasan (University of California, Los Angeles), Srinath Saikrishnan (University of California, Los Angeles), Andrew Lizarraga (University of California, Los Angeles), Jonathan Soriano (University of California, Los Angeles), Bernie Boscoe (Southern Oregon University), Tuan Do (University of California, Los Angeles)

Abstract: In this work, we demonstrate how Low-Rank Adaptation (LoRA) can be used to combine different galaxy imaging datasets to improve redshift estimation with CNN models for cosmology. LoRA is an established technique for large language models that adds adapter networks to adjust model weights and biases to efficiently fine-tune large base models without retraining. We train a base model using a photometric redshift ground truth dataset, which contains broad galaxy types but is less accurate. We then fine-tune using LoRA on a spectroscopic redshift ground truth dataset. These redshifts are more accurate but limited to bright galaxies and take orders of magnitude more time to obtain, so are less available for large surveys. Ideally, the combination of the two datasets would yield more accurate models that generalize well. The LoRA model performs better than a traditional transfer learning method, with $\sim2.5\times$ less bias and $\sim$2.2$\times$ less scatter. Retraining the model on a combined dataset yields a model that generalizes better than LoRA but at a cost of greater computation time. Our work shows that LoRA is useful for fine-tuning regression models in astrophysics by providing a middle ground between full retraining and no retraining. LoRA shows potential in allowing us to leverage existing pretrained astrophysical models, especially for data sparse tasks.

cross StockBot 2.0: Vanilla LSTMs Outperform Transformer-based Forecasting for Stock Prices

Authors: Shaswat Mohanty

Abstract: Accurate forecasting of financial markets remains a long-standing challenge due to complex temporal and often latent dependencies, non-linear dynamics, and high volatility. Building on our earlier recurrent neural network framework, we present an enhanced StockBot architecture that systematically evaluates modern attention-based, convolutional, and recurrent time-series forecasting models within a unified experimental setting. While attention-based and transformer-inspired models offer increased modeling flexibility, extensive empirical evaluation reveals that a carefully constructed vanilla LSTM consistently achieves superior predictive accuracy and more stable buy/sell decision-making when trained under a common set of default hyperparameters. These results highlight the robustness and data efficiency of recurrent sequence models for financial time-series forecasting, particularly in the absence of extensive hyperparameter tuning or the availability of sufficient data when discretized to single-day intervals. Additionally, these results underscore the importance of architectural inductive bias in data-limited market prediction tasks.

cross Detecting Unobserved Confounders: A Kernelized Regression Approach

Authors: Yikai Chen, Yunxin Mao, Chunyuan Zheng, Hao Zou, Shanzhi Gu, Shixuan Liu, Yang Shi, Wenjing Yang, Kun Kuang, Haotian Wang

Abstract: Detecting unobserved confounders is crucial for reliable causal inference in observational studies. Existing methods require either linearity assumptions or multiple heterogeneous environments, limiting applicability to nonlinear single-environment settings. To bridge this gap, we propose Kernel Regression Confounder Detection (KRCD), a novel method for detecting unobserved confounding in nonlinear observational data under single-environment conditions. KRCD leverages reproducing kernel Hilbert spaces to model complex dependencies. By comparing standard and higherorder kernel regressions, we derive a test statistic whose significant deviation from zero indicates unobserved confounding. Theoretically, we prove two key results: First, in infinite samples, regression coefficients coincide if and only if no unobserved confounders exist. Second, finite-sample differences converge to zero-mean Gaussian distributions with tractable variance. Extensive experiments on synthetic benchmarks and the Twins dataset demonstrate that KRCD not only outperforms existing baselines but also achieves superior computational efficiency.

cross Application Research of a Deep Learning Model Integrating CycleGAN and YOLO in PCB Infrared Defect Detection

Authors: Chao Yang, Haoyuan Zheng, Yue Ma

Abstract: This paper addresses the critical bottleneck of infrared (IR) data scarcity in Printed Circuit Board (PCB) defect detection by proposing a cross-modal data augmentation framework integrating CycleGAN and YOLOv8. Unlike conventional methods relying on paired supervision, we leverage CycleGAN to perform unpaired image-to-image translation, mapping abundant visible-light PCB images into the infrared domain. This generative process synthesizes high-fidelity pseudo-IR samples that preserve the structural semantics of defects while accurately simulating thermal distribution patterns. Subsequently, we construct a heterogeneous training strategy that fuses generated pseudo-IR data with limited real IR samples to train a lightweight YOLOv8 detector. Experimental results demonstrate that this method effectively enhances feature learning under low-data conditions. The augmented detector significantly outperforms models trained on limited real data alone and approaches the performance benchmarks of fully supervised training, proving the efficacy of pseudo-IR synthesis as a robust augmentation strategy for industrial inspection.

cross Neural Minimum Weight Perfect Matching for Quantum Error Codes

Authors: Yotam Peled, David Zenati, Eliya Nachmani

Abstract: Realizing the full potential of quantum computation requires Quantum Error Correction (QEC). QEC reduces error rates by encoding logical information across redundant physical qubits, enabling errors to be detected and corrected. A common decoder used for this task is Minimum Weight Perfect Matching (MWPM) a graph-based algorithm that relies on edge weights to identify the most likely error chains. In this work, we propose a data-driven decoder named Neural Minimum Weight Perfect Matching (NMWPM). Our decoder utilizes a hybrid architecture that integrates Graph Neural Networks (GNNs) to extract local syndrome features and Transformers to capture long-range global dependencies, which are then used to predict dynamic edge weights for the MWPM decoder. To facilitate training through the non-differentiable MWPM algorithm, we formulate a novel proxy loss function that enables end-to-end optimization. Our findings demonstrate significant performance reduction in the Logical Error Rate (LER) over standard baselines, highlighting the advantage of hybrid decoders that combine the predictive capabilities of neural networks with the algorithmic structure of classical matching.

cross Modern Neuromorphic AI: From Intra-Token to Inter-Token Processing

Authors: Osvaldo Simeone

Abstract: The rapid growth of artificial intelligence (AI) has brought novel data processing and generative capabilities but also escalating energy requirements. This challenge motivates renewed interest in neuromorphic computing principles, which promise brain-like efficiency through discrete and sparse activations, recurrent dynamics, and non-linear feedback. In fact, modern AI architectures increasingly embody neuromorphic principles through heavily quantized activations, state-space dynamics, and sparse attention mechanisms. This paper elaborates on the connections between neuromorphic models, state-space models, and transformer architectures through the lens of the distinction between intra-token processing and inter-token processing. Most early work on neuromorphic AI was based on spiking neural networks (SNNs) for intra-token processing, i.e., for transformations involving multiple channels, or features, of the same vector input, such as the pixels of an image. In contrast, more recent research has explored how neuromorphic principles can be leveraged to design efficient inter-token processing methods, which selectively combine different information elements depending on their contextual relevance. Implementing associative memorization mechanisms, these approaches leverage state-space dynamics or sparse self-attention. Along with a systematic presentation of modern neuromorphic AI models through the lens of intra-token and inter-token processing, training methodologies for neuromorphic AI models are also reviewed. These range from surrogate gradients leveraging parallel convolutional processing to local learning rules based on reinforcement learning mechanisms.

cross Rectifying Adversarial Examples Using Their Vulnerabilities

Authors: Fumiya Morimoto, Ryuto Morita, Satoshi Ono

Abstract: Deep neural network-based classifiers are prone to errors when processing adversarial examples (AEs). AEs are minimally perturbed input data undetectable to humans posing significant risks to security-dependent applications. Hence, extensive research has been undertaken to develop defense mechanisms that mitigate their threats. Most existing methods primarily focus on discriminating AEs based on the input sample features, emphasizing AE detection without addressing the correct sample categorization before an attack. While some tasks may only require mere rejection on detected AEs, others necessitate identifying the correct original input category such as traffic sign recognition in autonomous driving. The objective of this study is to propose a method for rectifying AEs to estimate the correct labels of their original inputs. Our method is based on re-attacking AEs to move them beyond the decision boundary for accurate label prediction, effectively addressing the issue of rectifying minimally perceptible AEs created using white-box attack methods. However, challenge remains with respect to effectively rectifying AEs produced by black-box attacks at a distance from the boundary, or those misclassified into low-confidence categories by targeted attacks. By adopting a straightforward approach of only considering AEs as inputs, the proposed method can address diverse attacks while avoiding the requirement of parameter adjustments or preliminary training. Results demonstrate that the proposed method exhibits consistent performance in rectifying AEs generated via various attack methods, including targeted and black-box attacks. Moreover, it outperforms conventional rectification and input transformation methods in terms of stability against various attacks.

cross Can Large Language Models Still Explain Themselves? Investigating the Impact of Quantization on Self-Explanations

Authors: Qianli Wang, Nils Feldhus, Pepa Atanasova, Fedor Splitt, Simon Ostermann, Sebastian M\"oller, Vera Schmitt

Abstract: Quantization is widely used to accelerate inference and streamline the deployment of large language models (LLMs), yet its effects on self-explanations (SEs) remain unexplored. SEs, generated by LLMs to justify their own outputs, require reasoning about the model's own decision-making process, a capability that may exhibit particular sensitivity to quantization. As SEs are increasingly relied upon for transparency in high-stakes applications, understanding whether and to what extent quantization degrades SE quality and faithfulness is critical. To address this gap, we examine two types of SEs: natural language explanations (NLEs) and counterfactual examples, generated by LLMs quantized using three common techniques at distinct bit widths. Our findings indicate that quantization typically leads to moderate declines in both SE quality (up to 4.4\%) and faithfulness (up to 2.38\%). The user study further demonstrates that quantization diminishes both the coherence and trustworthiness of SEs (up to 8.5\%). Compared to smaller models, larger models show limited resilience to quantization in terms of SE quality but better maintain faithfulness. Moreover, no quantization technique consistently excels across task accuracy, SE quality, and faithfulness. Given that quantization's impact varies by context, we recommend validating SE quality for specific use cases, especially for NLEs, which show greater sensitivity. Nonetheless, the relatively minor deterioration in SE quality and faithfulness does not undermine quantization's effectiveness as a model compression technique.

cross Solving nonlinear subsonic compressible flow in infinite domain via multi-stage neural networks

Authors: Xuehui Qian, Hongkai Tao, Yongji Wang

Abstract: In aerodynamics, accurately modeling subsonic compressible flow over airfoils is critical for aircraft design. However, solving the governing nonlinear perturbation velocity potential equation presents computational challenges. Traditional approaches often rely on linearized equations or finite, truncated domains, which introduce non-negligible errors and limit applicability in real-world scenarios. In this study, we propose a novel framework utilizing Physics-Informed Neural Networks (PINNs) to solve the full nonlinear compressible potential equation in an unbounded (infinite) domain. We address the unbounded-domain and convergence challenges inherent in standard PINNs by incorporating a coordinate transformation and embedding physical asymptotic constraints directly into the network architecture. Furthermore, we employ a Multi-Stage PINN (MS-PINN) approach to iteratively minimize residuals, achieving solution accuracy approaching machine precision. We validate this framework by simulating flow over circular and elliptical geometries, comparing our results against traditional finite-domain and linearized solutions. Our findings quantify the noticeable discrepancies introduced by domain truncation and linearization, particularly at higher Mach numbers, and demonstrate that this new framework is a robust, high-fidelity tool for computational fluid dynamics.

cross Deterministic Coreset for Lp Subspace

Authors: Rachit Chhaya, Anirban Dasgupta, Dan Feldman, Supratim Shit

Abstract: We introduce the first iterative algorithm for constructing a $\varepsilon$-coreset that guarantees deterministic $\ell_p$ subspace embedding for any $p \in [1,\infty)$ and any $\varepsilon > 0$. For a given full rank matrix $\mathbf{X} \in \mathbb{R}^{n \times d}$ where $n \gg d$, $\mathbf{X}' \in \mathbb{R}^{m \times d}$ is an $(\varepsilon,\ell_p)$-subspace embedding of $\mathbf{X}$, if for every $\mathbf{q} \in \mathbb{R}^d$, $(1-\varepsilon)\|\mathbf{Xq}\|_{p}^{p} \leq \|\mathbf{X'q}\|_{p}^{p} \leq (1+\varepsilon)\|\mathbf{Xq}\|_{p}^{p}$. Specifically, in this paper, $\mathbf{X}'$ is a weighted subset of rows of $\mathbf{X}$ which is commonly known in the literature as a coreset. In every iteration, the algorithm ensures that the loss on the maintained set is upper and lower bounded by the loss on the original dataset with appropriate scalings. So, unlike typical coreset guarantees, due to bounded loss, our coreset gives a deterministic guarantee for the $\ell_p$ subspace embedding. For an error parameter $\varepsilon$, our algorithm takes $O(\mathrm{poly}(n,d,\varepsilon^{-1}))$ time and returns a deterministic $\varepsilon$-coreset, for $\ell_p$ subspace embedding whose size is $O\left(\frac{d^{\max\{1,p/2\}}}{\varepsilon^{2}}\right)$. Here, we remove the $\log$ factors in the coreset size, which had been a long-standing open problem. Our coresets are optimal as they are tight with the lower bound. As an application, our coreset can also be used for approximately solving the $\ell_p$ regression problem in a deterministic manner.

cross BERT-JEPA: Reorganizing CLS Embeddings for Language-Invariant Semantics

Authors: Taj Gillin, Adam Lalani, Kenneth Zhang, Marcel Mateos Salles

Abstract: Joint Embedding Predictive Architectures (JEPA) are a novel self supervised training technique that have shown recent promise across domains. We introduce BERT-JEPA (BEPA), a training paradigm that adds a JEPA training objective to BERT-style models, working to combat a collapsed [CLS] embedding space and turning it into a language-agnostic space. This new structure leads to increased performance across multilingual benchmarks.

cross Engineering Attack Vectors and Detecting Anomalies in Additive Manufacturing

Authors: Md Mahbub Hasan, Marcus Sternhagen, Krishna Chandra Roy

Abstract: Additive manufacturing (AM) is rapidly integrating into critical sectors such as aerospace, automotive, and healthcare. However, this cyber-physical convergence introduces new attack surfaces, especially at the interface between computer-aided design (CAD) and machine execution layers. In this work, we investigate targeted cyberattacks on two widely used fused deposition modeling (FDM) systems, Creality's flagship model K1 Max, and Ender 3. Our threat model is a multi-layered Man-in-the-Middle (MitM) intrusion, where the adversary intercepts and manipulates G-code files during upload from the user interface to the printer firmware. The MitM intrusion chain enables several stealthy sabotage scenarios. These attacks remain undetectable by conventional slicer software or runtime interfaces, resulting in structurally defective yet externally plausible printed parts. To counter these stealthy threats, we propose an unsupervised Intrusion Detection System (IDS) that analyzes structured machine logs generated during live printing. Our defense mechanism uses a frozen Transformer-based encoder (a BERT variant) to extract semantic representations of system behavior, followed by a contrastively trained projection head that learns anomaly-sensitive embeddings. Later, a clustering-based approach and a self-attention autoencoder are used for classification. Experimental results demonstrate that our approach effectively distinguishes between benign and compromised executions.

cross NOS-Gate: Queue-Aware Streaming IDS for Consumer Gateways under Timing-Controlled Evasion

Authors: Muhammad Bilal, Omer Tariq, Hasan Ahmed

Abstract: Timing and burst patterns can leak through encryption, and an adaptive adversary can exploit them. This undermines metadata-only detection in a stand-alone consumer gateway. Therefore, consumer gateways need streaming intrusion detection on encrypted traffic using metadata only, under tight CPU and latency budgets. We present a streaming IDS for stand-alone gateways that instantiates a lightweight two-state unit derived from Network-Optimised Spiking (NOS) dynamics per flow, named NOS-Gate. NOS-Gate scores fixed-length windows of metadata features and, under a $K$-of-$M$ persistence rule, triggers a reversible mitigation that temporarily reduces the flow's weight under weighted fair queueing (WFQ). We evaluate NOS-Gate under timing-controlled evasion using an executable 'worlds' benchmark that specifies benign device processes, auditable attacker budgets, contention structure, and packet-level WFQ replay to quantify queue impact. All methods are calibrated label-free via burn-in quantile thresholding. Across multiple reproducible worlds and malicious episodes, at an achieved $0.1%$ false-positive operating point, NOS-Gate attains 0.952 incident recall versus 0.857 for the best baseline in these runs. Under gating, it reduces p99.9 queueing delay and p99.9 collateral delay with a mean scoring cost of ~ 2.09 {\mu}s per flow-window on CPU.

cross Revati: Transparent GPU-Free Time-Warp Emulation for LLM Serving

Authors: Amey Agrawal, Mayank Yadav, Sukrit Kumar, Anirudha Agrawal, Garv Ghai, Souradeep Bera, Elton Pinto, Sirish Gambhira, Mohammad Adain, Kasra Sohrab, Chus Antonanzas, Alexey Tumanov

Abstract: Deploying LLMs efficiently requires testing hundreds of serving configurations, but evaluating each one on a GPU cluster takes hours and costs thousands of dollars. Discrete-event simulators are faster and cheaper, but they require re-implementing the serving system's control logic -- a burden that compounds as frameworks evolve. We present Revati, a time-warp emulator that enables performance modeling by directly executing real serving system code at simulation-like speed. The system intercepts CUDA API calls to virtualize device management, allowing serving frameworks to run without physical GPUs. Instead of executing GPU kernels, it performs time jumps -- fast-forwarding virtual time by predicted kernel durations. We propose a coordination protocol that synchronizes these jumps across distributed processes while preserving causality. On vLLM and SGLang, Revati achieves less than 5% prediction error across multiple models and parallelism configurations, while running 5-17x faster than real GPU execution.

cross Secure, Verifiable, and Scalable Multi-Client Data Sharing via Consensus-Based Privacy-Preserving Data Distribution

Authors: Prajwal Panth, Sahaj Raj Malla

Abstract: We propose the Consensus-Based Privacy-Preserving Data Distribution (CPPDD) framework, a lightweight and post-setup autonomous protocol for secure multi-client data aggregation. The framework enforces unanimous-release confidentiality through a dual-layer protection mechanism that combines per-client affine masking with priority-driven sequential consensus locking. Decentralized integrity is verified via step (sigma_S) and data (sigma_D) checksums, facilitating autonomous malicious deviation detection and atomic abort without requiring persistent coordination. The design supports scalar, vector, and matrix payloads with O(N*D) computation and communication complexity, optional edge-server offloading, and resistance to collusion under N-1 corruptions. Formal analysis proves correctness, Consensus-Dependent Integrity and Fairness (CDIF) with overwhelming-probability abort on deviation, and IND-CPA security assuming a pseudorandom function family. Empirical evaluations on MNIST-derived vectors demonstrate linear scalability up to N = 500 with sub-millisecond per-client computation times. The framework achieves 100% malicious deviation detection, exact data recovery, and three-to-four orders of magnitude lower FLOPs compared to MPC and HE baselines. CPPDD enables atomic collaboration in secure voting, consortium federated learning, blockchain escrows, and geo-information capacity building, addressing critical gaps in scalability, trust minimization, and verifiable multi-party computation for regulated and resource-constrained environments.

cross RMAAT: Astrocyte-Inspired Memory Compression and Replay for Efficient Long-Context Transformers

Authors: Md Zesun Ahmed Mia, Malyaban Bal, Abhronil Sengupta

Abstract: The quadratic complexity of self-attention mechanism presents a significant impediment to applying Transformer models to long sequences. This work explores computational principles derived from astrocytes-glial cells critical for biological memory and synaptic modulation-as a complementary approach to conventional architectural modifications for efficient self-attention. We introduce the Recurrent Memory Augmented Astromorphic Transformer (RMAAT), an architecture integrating abstracted astrocyte functionalities. RMAAT employs a recurrent, segment-based processing strategy where persistent memory tokens propagate contextual information. An adaptive compression mechanism, governed by a novel retention factor derived from simulated astrocyte long-term plasticity (LTP), modulates these tokens. Attention within segments utilizes an efficient, linear-complexity mechanism inspired by astrocyte short-term plasticity (STP). Training is performed using Astrocytic Memory Replay Backpropagation (AMRB), a novel algorithm designed for memory efficiency in recurrent networks. Evaluations on the Long Range Arena (LRA) benchmark demonstrate RMAAT's competitive accuracy and substantial improvements in computational and memory efficiency, indicating the potential of incorporating astrocyte-inspired dynamics into scalable sequence models.

cross Noise-Aware Named Entity Recognition for Historical VET Documents

Authors: Alexander M. Esser, Jens D\"orpinghaus

Abstract: This paper addresses Named Entity Recognition (NER) in the domain of Vocational Education and Training (VET), focusing on historical, digitized documents that suffer from OCR-induced noise. We propose a robust NER approach leveraging Noise-Aware Training (NAT) with synthetically injected OCR errors, transfer learning, and multi-stage fine-tuning. Three complementary strategies, training on noisy, clean, and artificial data, are systematically compared. Our method is one of the first to recognize multiple entity types in VET documents. It is applied to German documents but transferable to arbitrary languages. Experimental results demonstrate that domain-specific and noise-aware fine-tuning substantially increases robustness and accuracy under noisy conditions. We provide publicly available code for reproducible noise-aware NER in domain-specific contexts.

cross Interpretable Machine Learning for Quantum-Informed Property Predictions in Artificial Sensing Materials

Authors: Li Chen, Leonardo Medrano Sandonas, Shirong Huang, Alexander Croy, Gianaurelio Cuniberti

Abstract: Digital sensing faces challenges in developing sustainable methods to extend the applicability of customized e-noses to complex body odor volatilome (BOV). To address this challenge, we developed MORE-ML, a computational framework that integrates quantum-mechanical (QM) property data of e-nose molecular building blocks with machine learning (ML) methods to predict sensing-relevant properties. Within this framework, we expanded our previous dataset, MORE-Q, to MORE-QX by sampling a larger conformational space of interactions between BOV molecules and mucin-derived receptors. This dataset provides extensive electronic binding features (BFs) computed upon BOV adsorption. Analysis of MORE-QX property space revealed weak correlations between QM properties of building blocks and resulting BFs. Leveraging this observation, we defined electronic descriptors of building blocks as inputs for tree-based ML models to predict BFs. Benchmarking showed CatBoost models outperform alternatives, especially in transferability to unseen compounds. Explainable AI methods further highlighted which QM properties most influence BF predictions. Collectively, MORE-ML combines QM insights with ML to provide mechanistic understanding and rational design principles for molecular receptors in BOV sensing. This approach establishes a foundation for advancing artificial sensing materials capable of analyzing complex odor mixtures, bridging the gap between molecular-level computations and practical e-nose applications.

cross Improving LLM-Assisted Secure Code Generation through Retrieval-Augmented-Generation and Multi-Tool Feedback

Authors: Vidyut Sriram, Sawan Pandita, Achintya Lakshmanan, Aneesh Shamraj, Suman Saha

Abstract: Large Language Models (LLMs) can generate code but often introduce security vulnerabilities, logical inconsistencies, and compilation errors. Prior work demonstrates that LLMs benefit substantially from structured feedback, static analysis, retrieval augmentation, and execution-based refinement. We propose a retrieval-augmented, multi-tool repair workflow in which a single code-generating LLM iteratively refines its outputs using compiler diagnostics, CodeQL security scanning, and KLEE symbolic execution. A lightweight embedding model is used for semantic retrieval of previously successful repairs, providing security-focused examples that guide generation. Evaluated on a combined dataset of 3,242 programs generated by DeepSeek-Coder-1.3B and CodeLlama-7B, the system demonstrates significant improvements in robustness. For DeepSeek, security vulnerabilities were reduced by 96%. For the larger CodeLlama model, the critical security defect rate was decreased from 58.55% to 22.19%, highlighting the efficacy of tool-assisted self-repair even on "stubborn" models.

cross Generative Conditional Missing Imputation Networks

Authors: George Sun, Yi-Hui Zhou

Abstract: In this study, we introduce a sophisticated generative conditional strategy designed to impute missing values within datasets, an area of considerable importance in statistical analysis. Specifically, we initially elucidate the theoretical underpinnings of the Generative Conditional Missing Imputation Networks (GCMI), demonstrating its robust properties in the context of the Missing Completely at Random (MCAR) and the Missing at Random (MAR) mechanisms. Subsequently, we enhance the robustness and accuracy of GCMI by integrating a multiple imputation framework using a chained equations approach. This innovation serves to bolster model stability and improve imputation performance significantly. Finally, through a series of meticulous simulations and empirical assessments utilizing benchmark datasets, we establish the superior efficacy of our proposed methods when juxtaposed with other leading imputation techniques currently available. This comprehensive evaluation not only underscores the practicality of GCMI but also affirms its potential as a leading-edge tool in the field of statistical data analysis.

cross AceFF: A State-of-the-Art Machine Learning Potential for Small Molecules

Authors: Stephen E. Farr, Stefan Doerr, Antonio Mirarchi, Francesc Sabanes Zariquiey, Gianni De Fabritiis

Abstract: We introduce AceFF, a pre-trained machine learning interatomic potential (MLIP) optimized for small molecule drug discovery. While MLIPs have emerged as efficient alternatives to Density Functional Theory (DFT), generalizability across diverse chemical spaces remains difficult. AceFF addresses this via a refined TensorNet2 architecture trained on a comprehensive dataset of drug-like compounds. This approach yields a force field that balances high-throughput inference speed with DFT-level accuracy. AceFF fully supports the essential medicinal chemistry elements (H, B, C, N, O, F, Si, P, S, Cl, Br, I) and is explicitly trained to handle charged states. Validation against rigorous benchmarks, including complex torsional energy scans, molecular dynamics trajectories, batched minimizations, and forces and anergy accuracy demonstrates that AceFF establishes a new state-of-the-art for organic molecules. The AceFF-2 model weights and inference code are available at https://huggingface.co/Acellera/AceFF-2.0.

URLs: https://huggingface.co/Acellera/AceFF-2.0.

cross HyperPriv-EPN: Hypergraph Learning with Privileged Knowledge for Ependymoma Prognosis

Authors: Shuren Gabriel Yu, Sikang Ren, Yongji Tian

Abstract: Preoperative prognosis of Ependymoma is critical for treatment planning but challenging due to the lack of semantic insights in MRI compared to post-operative surgical reports. Existing multimodal methods fail to leverage this privileged text data when it is unavailable during inference. To bridge this gap, we propose HyperPriv-EPN, a hypergraph-based Learning Using Privileged Information (LUPI) framework. We introduce a Severed Graph Strategy, utilizing a shared encoder to process both a Teacher graph (enriched with privileged post-surgery information) and a Student graph (restricted to pre-operation data). Through dual-stream distillation, the Student learns to hallucinate semantic community structures from visual features alone. Validated on a multi-center cohort of 311 patients, HyperPriv-EPN achieves state-of-the-art diagnostic accuracy and survival stratification. This effectively transfers expert knowledge to the preoperative setting, unlocking the value of historical post-operative data to guide the diagnosis of new patients without requiring text at inference.

cross Three factor delay learning rules for spiking neural networks

Authors: Luke Vassallo, Nima Taherinejad

Abstract: Spiking Neural Networks (SNNs) are dynamical systems that operate on spatiotemporal data, yet their learnable parameters are often limited to synaptic weights, contributing little to temporal pattern recognition. Learnable parameters that delay spike times can improve classification performance in temporal tasks, but existing methods rely on large networks and offline learning, making them unsuitable for real-time operation in resource-constrained environments. In this paper, we introduce synaptic and axonal delays to leaky integrate and fire (LIF)-based feedforward and recurrent SNNs, and propose three-factor learning rules to simultaneously learn delay parameters online. We employ a smooth Gaussian surrogate to approximate spike derivatives exclusively for the eligibility trace calculation, and together with a top-down error signal determine parameter updates. Our experiments show that incorporating delays improves accuracy by up to 20% over a weights-only baseline, and for networks with similar parameter counts, jointly learning weights and delays yields up to 14% higher accuracy. On the SHD speech recognition dataset, our method achieves similar accuracy to offline backpropagation-based approaches. Compared to state-of-the-art methods, it reduces model size by 6.6x and inference latency by 67%, with only a 2.4% drop in classification accuracy. Our findings benefit the design of power and area-constrained neuromorphic processors by enabling on-device learning and lowering memory requirements.

cross Sparse FEONet: A Low-Cost, Memory-Efficient Operator Network via Finite-Element Local Sparsity for Parametric PDEs

Authors: Seungchan Ko, Jiyeon Kim, Dongwook Shin

Abstract: In this paper, we study the finite element operator network (FEONet), an operator-learning method for parametric problems, originally introduced in J. Y. Lee, S. Ko, and Y. Hong, Finite Element Operator Network for Solving Elliptic-Type Parametric PDEs, SIAM J. Sci. Comput., 47(2), C501-C528, 2025. FEONet realizes the parameter-to-solution map on a finite element space and admits a training procedure that does not require training data, while exhibiting high accuracy and robustness across a broad class of problems. However, its computational cost increases and accuracy may deteriorate as the number of elements grows, posing notable challenges for large-scale problems. In this paper, we propose a new sparse network architecture motivated by the structure of the finite elements to address this issue. Throughout extensive numerical experiments, we show that the proposed sparse network achieves substantial improvements in computational cost and efficiency while maintaining comparable accuracy. We also establish theoretical results demonstrating that the sparse architecture can approximate the target operator effectively and provide a stability analysis ensuring reliable training and prediction.

cross QSLM: A Performance- and Memory-aware Quantization Framework with Tiered Search Strategy for Spike-driven Language Models

Authors: Rachmad Vidya Wicaksana Putra, Pasindu Wickramasinghe, Muhammad Shafique

Abstract: Large Language Models (LLMs) have been emerging as prominent AI models for solving many natural language tasks due to their high performance (e.g., accuracy) and capabilities in generating high-quality responses to the given inputs. However, their large computational cost, huge memory footprints, and high processing power/energy make it challenging for their embedded deployments. Amid several tinyLLMs, recent works have proposed spike-driven language models (SLMs) for significantly reducing the processing power/energy of LLMs. However, their memory footprints still remain too large for low-cost and resource-constrained embedded devices. Manual quantization approach may effectively compress SLM memory footprints, but it requires a huge design time and compute power to find the quantization setting for each network, hence making this approach not-scalable for handling different networks, performance requirements, and memory budgets. To bridge this gap, we propose QSLM, a novel framework that performs automated quantization for compressing pre-trained SLMs, while meeting the performance and memory constraints. To achieve this, QSLM first identifies the hierarchy of the given network architecture and the sensitivity of network layers under quantization, then employs a tiered quantization strategy (e.g., global-, block-, and module-level quantization) while leveraging a multi-objective performance-and-memory trade-off function to select the final quantization setting. Experimental results indicate that our QSLM reduces memory footprint by up to 86.5%, reduces power consumption by up to 20%, maintains high performance across different tasks (i.e., by up to 84.4% accuracy of sentiment classification on the SST-2 dataset and perplexity score of 23.2 for text generation on the WikiText-2 dataset) close to the original non-quantized model while meeting the performance and memory constraints.

cross Cost Optimization in Production Line Using Genetic Algorithm

Authors: Alireza Rezaee

Abstract: This paper presents a genetic algorithm (GA) approach to cost-optimal task scheduling in a production line. The system consists of a set of serial processing tasks, each with a given duration, unit execution cost, and precedence constraints, which must be assigned to an unlimited number of stations subject to a per-station duration bound. The objective is to minimize the total production cost, modeled as a station-wise function of task costs and the duration bound, while strictly satisfying all prerequisite and capacity constraints. Two chromosome encoding strategies are investigated: a station-based representation implemented using the JGAP library with SuperGene validity checks, and a task-based representation in which genes encode station assignments directly. For each encoding, standard GA operators (crossover, mutation, selection, and replacement) are adapted to preserve feasibility and drive the population toward lower-cost schedules. Experimental results on three classes of precedence structures-tightly coupled, loosely coupled, and uncoupled-demonstrate that the task-based encoding yields smoother convergence and more reliable cost minimization than the station-based encoding, particularly when the number of valid schedules is large. The study highlights the advantages of GA over gradient-based and analytical methods for combinatorial scheduling problems, especially in the presence of complex constraints and non-differentiable cost landscapes.

cross Two Deep Learning Approaches for Automated Segmentation of Left Ventricle in Cine Cardiac MRI

Authors: Wenhui Chu, Nikolaos V. Tsekos

Abstract: Left ventricle (LV) segmentation is critical for clinical quantification and diagnosis of cardiac images. In this work, we propose two novel deep learning architectures called LNU-Net and IBU-Net for left ventricle segmentation from short-axis cine MRI images. LNU-Net is derived from layer normalization (LN) U-Net architecture, while IBU-Net is derived from the instance-batch normalized (IB) U-Net for medical image segmentation. The architectures of LNU-Net and IBU-Net have a down-sampling path for feature extraction and an up-sampling path for precise localization. We use the original U-Net as the basic segmentation approach and compared it with our proposed architectures. Both LNU-Net and IBU-Net have left ventricle segmentation methods: LNU-Net applies layer normalization in each convolutional block, while IBU-Net incorporates instance and batch normalization together in the first convolutional block and passes its result to the next layer. Our method incorporates affine transformations and elastic deformations for image data processing. Our dataset that contains 805 MRI images regarding the left ventricle from 45 patients is used for evaluation. We experimentally evaluate the results of the proposed approaches outperforming the dice coefficient and the average perpendicular distance than other state-of-the-art approaches.

replace Density-Based Algorithms for Corruption-Robust Contextual Search and Convex Optimization

Authors: Renato Paes Leme, Chara Podimata, Jon Schneider

Abstract: We study the problem of contextual search, a generalization of binary search in higher dimensions, in the adversarial noise model. Let $d$ be the dimension of the problem, $T$ be the time horizon and $C$ be the total amount of adversarial noise in the system. We focus on the $\epsilon$-ball and the symmetric loss. For the $\epsilon$-ball loss, we give a tight regret bound of $O(C + d \log(1/\epsilon))$ improving over the $O(d^3 \log(1/\epsilon) \log^2(T) + C \log(T) \log(1/\epsilon))$ bound of Krishnamurthy et al (Operations Research '23). For the symmetric loss, we give an efficient algorithm with regret $O(C+d \log T)$. To tackle the symmetric loss case, we study the more general setting of Corruption-Robust Convex Optimization with Subgradient feedback, which is of independent interest. Our techniques are a significant departure from prior approaches. Specifically, we keep track of density functions over the candidate target vectors instead of a knowledge set consisting of the candidate target vectors consistent with the feedback obtained.

replace Distributed Sparse Linear Regression under Communication Constraints

Authors: Rodney Fonseca, Boaz Nadler

Abstract: In multiple domains, statistical tasks are performed in distributed settings, with data split among several end machines that are connected to a fusion center. In various applications, the end machines have limited bandwidth and power, and thus a tight communication budget. In this work we focus on distributed learning of a sparse linear regression model, under severe communication constraints. We propose several two round distributed schemes, whose communication per machine is sublinear in the data dimension. In our schemes, individual machines compute debiased lasso estimators, but send to the fusion center only very few values. On the theoretical front, we analyze one of these schemes and prove that with high probability it achieves exact support recovery at low signal to noise ratios, where individual machines fail to recover the support. We show in simulations that our scheme works as well as, and in some cases better, than more communication intensive approaches.

replace LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid

Authors: Tianyi Zhang, Anshumali Shrivastava

Abstract: Large language models (LLMs) have shown immense potential across various domains, but their high memory requirements and inference costs remain critical challenges for deployment. Post-training quantization (PTQ) has emerged as a promising technique to reduce memory requirements and decoding latency. However, recent accurate quantization methods often depend on specialized computations or custom data formats to achieve better model quality, which limits their compatibility with popular frameworks, as they require dedicated inference kernels tailored to specific hardware and software platforms, hindering wider adoption. Furthermore, many competitive methods have high resource requirements and computational overhead for quantizing models, making it challenging to scale them to hundreds of billions of parameters. In response to these challenges, we propose LeanQuant (Loss-Error-Aware Network Quantization), a novel quantization method that is accurate, versatile, and scalable. In the existing popular iterative loss-error-based quantization framework, we identify a critical limitation in prior methods: the min-max affine quantization grid fails to preserve model quality due to outliers in inverse Hessian diagonals. To overcome this fundamental issue, we propose learning loss-error-aware grids, instead of using non-adaptive min-max affine grids. Our approach not only produces quantized models that are more accurate but also generalizes to a wider range of quantization types, including affine and non-uniform quantization, enhancing compatibility with more frameworks. Extensive experiments with recent LLMs demonstrate that LeanQuant is highly accurate, comparing favorably against competitive baselines in model quality, and scalable, achieving very accurate quantization of Llama-3.1 405B, one of the largest open-source LLMs to date, using two Quadro RTX 8000-48GB GPUs in 21 hours.

replace Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation

Authors: Tianyi Zhang, Junda Su, Aditya Desai, Oscar Wu, Zhaozhuo Xu, Anshumali Shrivastava

Abstract: Adapting pre-trained large language models (LLMs) is crucial but challenging due to their enormous size. Parameter-efficient fine-tuning (PEFT) techniques typically employ additive adapters applied to frozen model weights. To further reduce memory usage, model weights are often compressed through quantization. However, existing PEFT methods often yield suboptimal model quality because they rely on restrictive assumptions, such as low-rank constraints on adapters to limit the number of trainable parameters. We find that sketching, a popular data compression technique, can serve as an efficient LLM adaptation strategy while avoiding the low-rank assumption. We introduce SketchTune, a compressive adaptation strategy that compresses LLM weights into compact fine-tunable sketches, integrating compression and adaptation into a unified framework. This integration eliminates the need for complex two-path computation in existing PEFT techniques, enabling faster and more memory-efficient training and inference. SketchTune is supported by mathematical insights into matrix classes that are better approximated using sketching rather than low-rank methods. Our extensive evaluations with Llama and Mistral models demonstrate that SketchTune outperforms leading PEFT methods across diverse tasks while using substantially smaller base models and comparable trainable parameters. As a highlight, SketchTune outperforms LoRA, DoRA, and S2FT on commonsense and math benchmarks using 2.6-3.5$\times$ smaller base models and exceeds LoftQ in accuracy by 14.48% on GSM8K with 7.3$\times$ fewer trainable parameters. Our code is available at https://github.com/LeanModels/SketchTune.

URLs: https://github.com/LeanModels/SketchTune.

replace A New Flexible Train-Test Split Algorithm, an approach for choosing among the Hold-out, K-fold cross-validation, and Hold-out iteration

Authors: Zahra Bami, Ali Behnampour, Aniruddha Bora, Hassan Doosti

Abstract: Choosing an appropriate strategy for partitioning data into training and evaluation sets is a critical step in machine learning, yet validation methods are often selected using default or conventional settings without considering their impact on generalizability and real-world performance. Common approaches such as hold-out validation or k-fold cross-validation with fixed k values are frequently applied based solely on empirical practice. To address this issue, we propose a flexible Python-based framework that systematically examines how different validation strategies affect predictive performance across seven widely used machine learning algorithms, including Decision Trees, K-Nearest Neighbors, Naive Bayes variants, Logistic Regression, calibrated linear Support Vector Machines, and histogram-based gradient boosting. The framework evaluates these methods under a wide range of validation schemes, including hold-out splits from 10% to 90%, k-fold cross-validation with k between 3 and 15, repeated hold-out, and nested cross-validation. The framework is applied to three biomedical datasets of varying size, and performance is assessed using ROC-AUC, accuracy, and the Matthews correlation coefficient. The results show that no single validation strategy consistently outperforms others across all algorithms and datasets, indicating that optimal validation depends on the interaction between the algorithm, dataset characteristics, and evaluation metric.

replace It's complicated. The relationship of algorithmic fairness and non-discrimination provisions for high-risk systems in the EU AI Act

Authors: Kristof Meding

Abstract: What constitutes a fair decision? This question is not only difficult for humans but becomes more challenging when Artificial Intelligence (AI) models are used. In light of discriminatory algorithmic behaviors, the EU has recently passed the AI Act, which mandates specific rules for high-risk systems, incorporating both traditional legal non-discrimination regulations and machine learning based algorithmic fairness concepts. This paper aims to bridge these two different concepts in the AI Act through: First, a necessary high-level introduction of both concepts targeting legal and computer science-oriented scholars, and second, an in-depth analysis of the AI Act's relationship between legal non-discrimination regulations and algorithmic fairness. Our analysis reveals three key findings: (1.) Most non-discrimination regulations target only high-risk AI systems. (2.) The regulation of high-risk systems encompasses both data input requirements and output monitoring, though these regulations are partly inconsistent and raise questions of computational feasibility. (3.) Finally, we consider the possible (future) interaction of classical EU non-discrimination law and the AI Act regulations. We recommend developing more specific auditing and testing methodologies for AI systems. This paper aims to serve as a foundation for future interdisciplinary collaboration between legal scholars and computer science-oriented machine learning researchers studying discrimination in AI systems.

replace Brain network science modelling of sparse neural networks enables Transformers and LLMs to perform as fully connected

Authors: Yingtao Zhang, Diego Cerretti, Jialin Zhao, Wenjing Wu, Ziheng Liao, Umberto Michieli, Carlo Vittorio Cannistraci

Abstract: Dynamic sparse training (DST) can reduce the computational demands in ANNs, but faces difficulties in keeping peak performance at high sparsity levels. The Cannistraci-Hebb training (CHT) is a brain-inspired method for growing connectivity in DST. CHT leverages a gradient-free, topology-driven link regrowth, which has shown ultra-sparse (less than 1% connectivity) advantage across various tasks compared to fully connected networks. Yet, CHT suffers two main drawbacks: (i) its time complexity is $O(Nd^3)$ - N node network size, d node degree - restricting it to ultra-sparse regimes. (ii) it selects top link prediction scores, which is inappropriate for the early training epochs, when the network presents unreliable connections. Here, we design the first brain-inspired network model - termed bipartite receptive field (BRF) - to initialize the connectivity of sparse artificial neural networks. We further introduce a GPU-friendly matrix-based approximation of CH link prediction, reducing complexity to $O(N^3)$. We introduce the Cannistraci-Hebb training soft rule (CHTs), which adopts a flexible strategy for sampling connections in both link removal and regrowth, balancing the exploration and exploitation of network topology. Additionally, we integrate CHTs with a sigmoid gradual density decay (CHTss). Empirical results show that BRF offers performance advantages over previous network science models. Using 1% of connections, CHTs outperforms fully connected networks in MLP architectures on image classification tasks, compressing some networks to less than 30% of the nodes. Using 5% of the connections, CHTss outperforms fully connected networks in two Transformer-based machine translation tasks. Finally, at 30% connectivity, both CHTs and CHTss outperform other DST methods in language modeling task.

replace A Gaussian Process View on Observation Noise and Initialization in Wide Neural Networks

Authors: Sergio Calvo-Ordo\~nez, Jonathan Plenk, Richard Bergna, Alvaro Cartea, Jose Miguel Hernandez-Lobato, Konstantina Palla, Kamil Ciosek

Abstract: Performing gradient descent in a wide neural network is equivalent to computing the posterior mean of a Gaussian Process with the Neural Tangent Kernel (NTK-GP), for a specific prior mean and with zero observation noise. However, existing formulations have two limitations: (i) the NTK-GP assumes noiseless targets, leading to misspecification on noisy data; (ii) the equivalence does not extend to arbitrary prior means, which are essential for well-specified models. To address (i), we introduce a regularizer into the training objective, showing its correspondence to incorporating observation noise in the NTK-GP. To address (ii), we propose a \textit{shifted network} that enables arbitrary prior means and allows obtaining the posterior mean with gradient descent on a single network, without ensembling or kernel inversion. We validate our results with experiments across datasets and architectures, showing that this approach removes key obstacles to the practical use of NTK-GP equivalence in applied Gaussian process modeling.

replace The Curse of Depth in Large Language Models

Authors: Wenfang Sun, Xinyuan Song, Pengxiang Li, Lu Yin, Yefeng Zheng, Shiwei Liu

Abstract: In this paper, we introduce the Curse of Depth, a concept that highlights, explains, and addresses the recent observation in modern Large Language Models (LLMs) where nearly half of the layers are less effective than expected. We first confirm the wide existence of this phenomenon across the most popular families of LLMs such as Llama, Mistral, DeepSeek, and Qwen. Our analysis, theoretically and empirically, identifies that the underlying reason for the ineffectiveness of deep layers in LLMs is the widespread usage of Pre-Layer Normalization (Pre-LN). While Pre-LN stabilizes the training of Transformer LLMs, its output variance exponentially grows with the model depth, which undesirably causes the derivative of the deep Transformer blocks to be an identity matrix, and therefore barely contributes to the training. To resolve this training pitfall, we propose LayerNorm Scaling (LNS), which scales the variance of output of the layer normalization inversely by the square root of its depth. This simple modification mitigates the output variance explosion of deeper Transformer layers, improving their contribution. Across a wide range of model sizes (130M to 7B), our experiments show that LNS consistently outperforms previous normalization and scaling techniques in enhancing LLM pre-training performance. Moreover, this improvement seamlessly carries over to supervised fine-tuning. All these gains can be attributed to the fact that LayerNorm Scaling enables deeper layers to contribute more effectively during training. Our code is available at \href{https://github.com/lmsdss/LayerNorm-Scaling}{LayerNorm-Scaling}.

URLs: https://github.com/lmsdss/LayerNorm-Scaling

replace A Near-optimal, Scalable and Parallelizable Framework for Stochastic Bandits Robust to Adversarial Corruptions and Beyond

Authors: Zicheng Hu, Cheng Chen

Abstract: We investigate various stochastic bandit problems in the presence of adversarial corruptions. A seminal work for this problem is the BARBAR~\cite{gupta2019better} algorithm, which achieves both robustness and efficiency. However, it suffers from a regret of $O(KC)$, which does not match the lower bound of $\Omega(C)$, where $K$ denotes the number of arms and $C$ denotes the corruption level. In this paper, we first improve the BARBAR algorithm by proposing a novel framework called BARBAT, which eliminates the factor of $K$ to achieve an optimal regret bound up to a logarithmic factor. We also extend BARBAT to various settings, including multi-agent bandits, graph bandits, combinatorial semi-bandits and batched bandits. Compared with the Follow-the-Regularized-Leader framework, our methods are more amenable to parallelization, making them suitable for multi-agent and batched bandit settings, and they incur lower computational costs, particularly in semi-bandit problems. Numerical experiments verify the efficiency of the proposed methods.

replace Digital implementations of deep feature extractors are intrinsically informative

Authors: Max Getter

Abstract: Rapid information (energy) propagation in deep feature extractors is crucial to balance computational complexity versus expressiveness as a representation of the input. We prove an upper bound for the speed of energy propagation in a unified framework that covers different neural network models, both over Euclidean and non-Euclidean domains. Additional structural information about the signal domain can be used to explicitly determine or improve the rate of decay. To illustrate this, we show global exponential energy decay for a range of 1) feature extractors with discrete-domain input signals, and 2) convolutional neural networks (CNNs) via scattering over locally compact abelian (LCA) groups.

replace Tabby: A Language Model Architecture for Tabular and Structured Data Synthesis

Authors: Sonia Cromp, Satya Sai Srinath Namburi GNVV, Mohammed Alkhudhayri, Catherine Cao, Samuel Guo, Nicholas Roberts, Frederic Sala

Abstract: While advances in large language models (LLMs) have greatly improved the quality of synthetic text data in recent years, synthesizing tabular data has received relatively less attention. We address this disparity with Tabby, a simple but powerful post-training modification to the standard Transformer language model architecture, enabling its use for tabular dataset synthesis. Tabby enables the representation of differences across columns using Gated Mixture-of-Experts, with column-specific sets of parameters. Empirically, Tabby results in data quality near or equal to that of real data. By pairing our novel LLM table training technique, Plain, with Tabby, we observe up to a 44% improvement in quality over previous methods. We also show that Tabby extends beyond tables to more general structured data, reaching parity with real data on a nested JSON dataset as well.

replace From Continual Learning to SGD and Back: Better Rates for Continual Linear Models

Authors: Itay Evron, Ran Levinstein, Matan Schliserman, Uri Sherman, Tomer Koren, Daniel Soudry, Nathan Srebro

Abstract: We study the common continual learning setup where an overparameterized model is sequentially fitted to a set of jointly realizable tasks. We analyze forgetting, defined as the loss on previously seen tasks, after $k$ iterations. For continual linear models, we prove that fitting a task is equivalent to a single stochastic gradient descent (SGD) step on a modified objective. We develop novel last-iterate SGD upper bounds in the realizable least squares setup and leverage them to derive new results for continual learning. Focusing on random orderings over $T$ tasks, we establish universal forgetting rates, whereas existing rates depend on problem dimensionality or complexity and become prohibitive in highly overparameterized regimes. In continual regression with replacement, we improve the best existing rate from $O((d-\bar{r})/k)$ to $O(\min(1/\sqrt[4]{k}, \sqrt{(d-\bar{r})}/k, \sqrt{T\bar{r}}/k))$, where $d$ is the dimensionality and $\bar{r}$ the average task rank. Furthermore, we establish the first rate for random task orderings without replacement. The resulting rate $O(\min(1/\sqrt[4]{T},\, (d-\bar{r})/T))$ shows that randomization alone, without task repetition, prevents catastrophic forgetting in sufficiently long task sequences. Finally, we prove a matching $O(1/\sqrt[4]{k})$ forgetting rate for continual linear classification on separable data. Our universal rates extend to broader methods, such as block Kaczmarz and POCS, illuminating their loss convergence under i.i.d. and single-pass orderings.

replace 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)

Authors: Tianyi Zhang, Mohsen Hariri, Shaochen Zhong, Vipin Chaudhary, Yang Sui, Xia Hu, Anshumali Shrivastava

Abstract: Large-scale AI models, such as Large Language Models (LLMs) and Diffusion Models (DMs), have grown rapidly in size, creating significant challenges for efficient deployment on resource-constrained hardware. In this paper, we introduce Dynamic-Length Float (DFloat11), a lossless compression framework that reduces LLM and DM size by 30% while preserving outputs that are bit-for-bit identical to the original model. DFloat11 is motivated by the low entropy in the BFloat16 weight representation of LLMs, which reveals significant inefficiency in the existing storage format. By applying entropy coding, DFloat11 assigns dynamic-length encodings to weights based on frequency, achieving near information-optimal compression without any loss of precision. To facilitate efficient inference with dynamic-length encodings, we develop a custom GPU kernel for fast online decompression. Our design incorporates the following: (i) compact, hierarchical lookup tables (LUTs) that fit within GPU SRAM for efficient decoding, (ii) a two-phase GPU kernel for coordinating thread read/write positions using lightweight auxiliary variables, and (iii) transformer-block-level decompression to minimize latency. Experiments on Llama 3.3, Qwen 3, Mistral 3, FLUX.1, and others validate our hypothesis that DFloat11 achieves around 30% model size reduction while preserving bit-for-bit identical outputs. Compared to a potential alternative of offloading parts of an uncompressed model to the CPU to meet memory constraints, DFloat11 achieves 2.3--46.2x higher throughput in token generation. With a fixed GPU memory budget, DFloat11 enables 5.7--14.9x longer generation lengths than uncompressed models. Notably, our method enables lossless inference of Llama 3.1 405B, an 810GB model, on a single node equipped with 8x80GB GPUs.

replace Reinforcement Learning from Human Feedback

Authors: Nathan Lambert

Abstract: Reinforcement learning from human feedback (RLHF) has become an important technical and storytelling tool to deploy the latest machine learning systems. In this book, we hope to give a gentle introduction to the core methods for people with some level of quantitative background. The book starts with the origins of RLHF -- both in recent literature and in a convergence of disparate fields of science in economics, philosophy, and optimal control. We then set the stage with definitions, problem formulation, data collection, and other common math used in the literature. The core of the book details every optimization stage in using RLHF, from starting with instruction tuning to training a reward model and finally all of rejection sampling, reinforcement learning, and direct alignment algorithms. The book concludes with advanced topics -- understudied research questions in synthetic data and evaluation -- and open questions for the field.

replace Streaming Sliced Optimal Transport

Authors: Khai Nguyen

Abstract: Sliced optimal transport (SOT), or sliced Wasserstein (SW) distance, is widely recognized for its statistical and computational scalability. In this work, we further enhance computational scalability by proposing the first method for estimating SW from sample streams, called \emph{streaming sliced Wasserstein} (Stream-SW). To define Stream-SW, we first introduce a streaming estimator of the one-dimensional Wasserstein distance (1DW). Since the 1DW has a closed-form expression, given by the absolute difference between the quantile functions of the compared distributions, we leverage quantile approximation techniques for sample streams to define a streaming 1DW estimator. By applying the streaming 1DW to all projections, we obtain Stream-SW. The key advantage of Stream-SW is its low memory complexity while providing theoretical guarantees on the approximation error. We demonstrate that Stream-SW achieves a more accurate approximation of SW than random subsampling, with lower memory consumption, when comparing Gaussian distributions and mixtures of Gaussians from streaming samples. Additionally, we conduct experiments on point cloud classification, point cloud gradient flows, and streaming change point detection to further highlight the favorable performance of the proposed Stream-SW

replace Flattening Hierarchies with Policy Bootstrapping

Authors: John L. Zhou, Jonathan C. Kao

Abstract: Offline goal-conditioned reinforcement learning (GCRL) is a promising approach for pretraining generalist policies on large datasets of reward-free trajectories, akin to the self-supervised objectives used to train foundation models for computer vision and natural language processing. However, scaling GCRL to longer horizons remains challenging due to the combination of sparse rewards and discounting, which obscures the comparative advantages of primitive actions with respect to distant goals. Hierarchical RL methods achieve strong empirical results on long-horizon goal-reaching tasks, but their reliance on modular, timescale-specific policies and subgoal generation introduces significant additional complexity and hinders scaling to high-dimensional goal spaces. In this work, we introduce an algorithm to train a flat (non-hierarchical) goal-conditioned policy by bootstrapping on subgoal-conditioned policies with advantage-weighted importance sampling. Our approach eliminates the need for a generative model over the (sub)goal space, which we find is key for scaling to high-dimensional control in large state spaces. We further show that existing hierarchical and bootstrapping-based approaches correspond to specific design choices within our derivation. Across a comprehensive suite of state- and pixel-based locomotion and manipulation benchmarks, our method matches or surpasses state-of-the-art offline GCRL algorithms and scales to complex, long-horizon tasks where prior approaches fail. Project page: https://johnlyzhou.github.io/saw/

URLs: https://johnlyzhou.github.io/saw/

replace Infinite-Width Limit of a Single Attention Layer: Analysis via Tensor Programs

Authors: Mana Sakai, Ryo Karakida, Masaaki Imaizumi

Abstract: In modern theoretical analyses of neural networks, the infinite-width limit is often invoked to justify Gaussian approximations of neuron preactivations (e.g., via neural network Gaussian processes or Tensor Programs). However, these Gaussian-based asymptotic theories have so far been unable to capture the behavior of attention layers, except under special regimes such as infinitely many heads or tailored scaling schemes. In this paper, leveraging the Tensor Programs framework, we rigorously identify the infinite-width limit distribution of variables within a single attention layer under realistic architectural dimensionality and standard $1/\sqrt{n}$-scaling with $n$ dimensionality. We derive the exact form of this limit law without resorting to infinite-head approximations or tailored scalings, demonstrating that it departs fundamentally from Gaussianity. This limiting distribution exhibits non-Gaussianity from a hierarchical structure, being Gaussian conditional on the random similarity scores. Numerical experiments validate our theoretical predictions, confirming the effectiveness of our theory at finite width and accurate description of finite-head attentions. Beyond characterizing a standalone attention layer, our findings lay the groundwork for developing a unified theory of deep Transformer architectures in the infinite-width regime.

replace Robust Molecular Property Prediction via Densifying Scarce Labeled Data

Authors: Jina Kim, Jeffrey Willette, Bruno Andreis, Sung Ju Hwang

Abstract: A widely recognized limitation of molecular prediction models is their reliance on structures observed in the training data, resulting in poor generalization to out-of-distribution compounds. Yet in drug discovery, the compounds most critical for advancing research often lie beyond the training set, making the bias toward the training data particularly problematic. This mismatch introduces substantial covariate shift, under which standard deep learning models produce unstable and inaccurate predictions. Furthermore, the scarcity of labeled data-stemming from the onerous and costly nature of experimental validation-further exacerbates the difficulty of achieving reliable generalization. To address these limitations, we propose a novel bilevel optimization approach that leverages unlabeled data to interpolate between in-distribution (ID) and out-of-distribution (OOD) data, enabling the model to learn how to generalize beyond the training distribution. We demonstrate significant performance gains on challenging real-world datasets with substantial covariate shift, supported by t-SNE visualizations highlighting our interpolation method.

replace Fair Domain Generalization: An Information-Theoretic View

Authors: Tangzheng Lian, Guanyu Hu, Dimitrios Kollias, Xinyu Yang, Oya Celiktutan

Abstract: Domain generalization (DG) and algorithmic fairness are two critical challenges in machine learning. However, most DG methods focus only on minimizing expected risk in the unseen target domain without considering algorithmic fairness. Conversely, fairness methods typically do not account for domain shifts, so the fairness achieved during training may not generalize to unseen test domains. In this work, we bridge these gaps by studying the problem of Fair Domain Generalization (FairDG), which aims to minimize both expected risk and fairness violations in unseen target domains. We derive novel mutual information-based upper bounds for expected risk and fairness violations in multi-class classification tasks with multi-group sensitive attributes. These bounds provide key insights for algorithm design from an information-theoretic perspective. Guided by these insights, we introduce PAFDG (Pareto-Optimal Fairness for Domain Generalization), a practical framework that solves the FairDG problem and models the utility-fairness trade-off through Pareto optimization. Experiments on real-world vision and language datasets show that PAFDG achieves superior utility-fairness trade-offs compared to existing methods.

replace Episodic Contextual Bandits with Knapsacks under Conversion Models

Authors: Wang Chi Cheung, Zitian Li

Abstract: We study an online setting, where a decision maker (DM) interacts with contextual bandit-with-knapsack (BwK) instances in repeated episodes. These episodes start with different resource amounts, and the contexts' probability distributions are non-stationary in an episode. All episodes share the same latent conversion model, which governs the random outcome contingent upon a request's context and an allocation decision. Our model captures applications such as dynamic pricing on perishable resources with episodic replenishment, and first price auctions in repeated episodes with different starting budgets. We design an online algorithm that achieves a regret sub-linear in $T$, the number of episodes, assuming access to a \emph{confidence bound oracle} that achieves an $o(T)$-regret. Such an oracle is readily available from existing contextual bandit literature. We overcome the technical challenge with arbitrarily many possible contexts, which leads to a reinforcement learning problem with an unbounded state space. Our framework provides improved regret bounds in certain settings when the DM is provided with unlabeled feature data, which is novel to the contextual BwK literature.

replace Simulation as Supervision: Mechanistic Pretraining for Scientific Discovery

Authors: Carson Dudley, Reiden Magdaleno, Christopher Harding, Marisa Eisenberg

Abstract: Scientific modeling faces a tradeoff between the interpretability of mechanistic theory and the predictive power of machine learning. While hybrid approaches like Physics-Informed Neural Networks (PINNs) embed domain knowledge as functional constraints, they can be brittle under model misspecification. We introduce Simulation-Grounded Neural Networks (SGNNs), a framework that instead embeds domain knowledge into the training data to establish a structural prior. By pretraining on synthetic corpora spanning diverse model structures and observational artifacts, SGNNs learn the broad patterns of physical possibility. This allows the model to internalize the underlying dynamics of a system without being forced to satisfy a single, potentially incorrect equation. We evaluated SGNNs across scientific disciplines and found that this approach confers significant robustness. In prediction tasks, SGNNs nearly tripled COVID-19 forecasting skill versus CDC baselines. In tests on dengue outbreaks, SGNNs outperformed physics-constrained models even when both were restricted to incorrect human-to-human transmission equations, demonstrating that SGNNs are potentially more robust to model misspecification. For inference, SGNNs extend the logic of simulation-based inference to enable supervised learning for unobservable targets, estimating early COVID-19 transmissibility more accurately than traditional methods. Finally, SGNNs enable back-to-simulation attribution, a form of mechanistic interpretability that maps real-world data back to the simulated manifold to identify underlying processes. By unifying these disparate simulation-based techniques into a single framework, we demonstrate that mechanistic simulations can serve as effective training data for robust scientific inference that generalizes beyond the limitations of fixed functional forms.

replace Weighted Conditional Flow Matching

Authors: Sergio Calvo-Ordonez, Matthieu Meunier, Alvaro Cartea, Christoph Reisinger, Yarin Gal, Jose Miguel Hernandez-Lobato

Abstract: Conditional flow matching (CFM) has emerged as a powerful framework for training continuous normalizing flows due to its computational efficiency and effectiveness. However, standard CFM often produces paths that deviate significantly from straight-line interpolations between prior and target distributions, making generation slower and less accurate due to the need for fine discretization at inference. Recent methods enhance CFM performance by inducing shorter and straighter trajectories but typically rely on computationally expensive mini-batch optimal transport (OT). Drawing insights from entropic optimal transport (EOT), we propose Weighted Conditional Flow Matching (W-CFM), a novel approach that modifies the classical CFM loss by weighting each training pair $(x, y)$ with a Gibbs kernel. We show that this weighting recovers the entropic OT coupling up to some bias in the marginals, and we provide the conditions under which the marginals remain nearly unchanged. Moreover, we establish an equivalence between W-CFM and the minibatch OT method in the large-batch limit, showing how our method overcomes computational and performance bottlenecks linked to batch size. Empirically, we test our method on unconditional generation on various synthetic and real datasets, confirming that W-CFM achieves comparable or superior sample quality, fidelity, and diversity to other alternative baselines while maintaining the computational efficiency of vanilla CFM.

replace uGMM-NN: Univariate Gaussian Mixture Model Neural Network

Authors: Zakeria Sharif Ali

Abstract: This paper introduces the Univariate Gaussian Mixture Model Neural Network (uGMM-NN), a novel neural architecture that embeds probabilistic reasoning directly into the computational units of deep networks. Unlike traditional neurons, which apply weighted sums followed by fixed non-linearities, each uGMM-NN node parameterizes its activations as a univariate Gaussian mixture, with learnable means, variances, and mixing coefficients. This design enables richer representations by capturing multimodality and uncertainty at the level of individual neurons, while retaining the scalability of standard feed-forward networks. We demonstrate that uGMM-NN can achieve competitive discriminative performance compared to conventional multilayer perceptrons, while additionally offering a probabilistic interpretation of activations. The proposed framework provides a foundation for integrating uncertainty-aware components into modern neural architectures, opening new directions for both discriminative and generative modeling.

replace Homogenization with Guaranteed Bounds via Primal-Dual Physically Informed Neural Networks

Authors: Liya Gaynutdinova, Martin Do\v{s}k\'a\v{r}, Ond\v{r}ej Roko\v{s}, Ivana Pultarov\'a

Abstract: Physics-informed neural networks (PINNs) have shown promise in solving partial differential equations (PDEs) relevant to multiscale modeling, but they often fail when applied to materials with discontinuous coefficients, such as media with piecewise constant properties. This paper introduces a dual formulation for the PINN framework to improve the reliability of the homogenization of periodic thermo-conductive composites, for both strong and variational (weak) formulations. The dual approach facilitates the derivation of guaranteed upper and lower error bounds, enabling more robust detection of PINN failure. We compare standard PINNs applied to smoothed material approximations with variational PINNs (VPINNs) using both spectral and neural network-based test functions. Our results indicate that while strong-form PINNs may outperform VPINNs in controlled settings, they are sensitive to material discontinuities and may fail without clear diagnostics. In contrast, VPINNs accommodate piecewise constant material parameters directly but require careful selection of test functions to avoid instability. Dual formulation serves as a reliable indicator of convergence quality, and its integration into PINN frameworks enhances their applicability to homogenization problems in micromechanics.

replace From Autoencoders to CycleGAN: Robust Unpaired Face Manipulation via Adversarial Learning

Authors: Collin Guo, Yi Qian

Abstract: Human face synthesis and manipulation are increasingly important in entertainment and AI, with a growing demand for highly realistic, identity-preserving images even when only unpaired, unaligned datasets are available. We study unpaired face manipulation via adversarial learning, moving from autoencoder baselines to a robust, guided CycleGAN framework. While autoencoders capture coarse identity, they often miss fine details. Our approach integrates spectral normalization for stable training, identity- and perceptual-guided losses to preserve subject identity and high-level structure, and landmark-weighted cycle constraints to maintain facial geometry across pose and illumination changes. Experiments show that our adversarial trained CycleGAN improves realism (FID), perceptual quality (LPIPS), and identity preservation (ID-Sim) over autoencoders, with competitive cycle-reconstruction SSIM and practical inference times, which achieved high quality without paired datasets and approaching pix2pix on curated paired subsets. These results demonstrate that guided, spectrally normalized CycleGANs provide a practical path from autoencoders to robust unpaired face manipulation.

replace Personalized Federated Heat-Kernel Enhanced Multi-View Clustering via Advanced Tensor Decomposition Techniques

Authors: Kristina P. Sinaga

Abstract: This paper introduces mathematical frameworks that address the challenges of multi-view clustering in federated learning environments. The objective is to integrate optimization techniques based on new objective functions employing heat-kernel coefficients to replace conventional distance metrics with quantum-inspired measures. The proposed frameworks utilize advanced tensor decomposition methods, specifically, PARAFAC2 and Tucker decomposition to efficiently represent high-dimensional, multi-view data while preserving inter-view relationships. The research has yielded the development of four novel algorithms, an efficient federated kernel multi-view clustering (E-FKMVC) model, FedHK-PARAFAC2, FedHK-Tucker, and FedHK-MVC-Person with PARAFAC2 Decomposition (Personalized FedHK-PARAFAC2). The primary objective of these algorithms is to enhance the efficacy of clustering processes while ensuring the confidentiality and efficient communication in federated learning environments. Theoretical analyses of convergence guarantees, privacy bounds, and complexity are provided to validate the effectiveness of the proposed methods. In essence, this paper makes a significant academic contribution to the field of federated multi-view clustering through its innovative integration of mathematical modeling and algorithm design. This approach addresses the critical challenges of data heterogeneity and privacy concerns, paving the way for enhanced data management and analytics in various contexts.

replace KANO: Kolmogorov-Arnold Neural Operator

Authors: Jin Lee, Ziming Liu, Xinling Yu, Yixuan Wang, Haewon Jeong, Murphy Yuezhen Niu, Zheng Zhang

Abstract: We introduce Kolmogorov--Arnold Neural Operator (KANO), a dual-domain neural operator jointly parameterized by both spectral and spatial bases with intrinsic symbolic interpretability. We theoretically demonstrate that KANO overcomes the pure-spectral bottleneck of Fourier Neural Operator (FNO): KANO remains expressive over generic position-dependent dynamics (variable coefficient PDEs) for any physical input, whereas FNO stays practical only for spectrally sparse operators and strictly imposes a fast-decaying input Fourier tail. We verify our claims empirically on position-dependent differential operators, for which KANO robustly generalizes but FNO fails to. In the quantum Hamiltonian learning benchmark, KANO reconstructs ground-truth Hamiltonians in closed-form symbolic representations accurate to the fourth decimal place in coefficients and attains $\approx 6\times10^{-6}$ state infidelity from projective measurement data, substantially outperforming that of the FNO trained with ideal full wave function data, $\approx 1.5\times10^{-2}$, by orders of magnitude.

replace PTQTP: Post-Training Quantization to Trit-Planes for Large Language Models

Authors: He Xiao, Runming Yang, Qingyao Yang, Wendong Xu, Zhen Li, Yupeng Su, Zhengwu Liu, Hongxia Yang, Ngai Wong

Abstract: Post-training quantization (PTQ) of large language models (LLMs) to extremely low bit-widths remains challenging due to the fundamental trade-off between computational efficiency and representational capacity. While existing ultra-low-bit methods rely on binary approximations or quantization-aware training(QAT), they often suffer from either limited representational capacity or huge training resource overhead. We introduce PTQ to Trit-Planes (PTQTP), a structured PTQ framework that decomposes weight matrices into dual ternary {-1, 0, 1} trit-planes. This approach achieves multiplication-free additive inference by decoupling weights into discrete topology (trit-planes) and continuous magnitude (scales), effectively enabling high-fidelity sparse approximation. PTQTP provides: (1) a theoretically grounded progressive approximation algorithm ensuring global weight consistency; (2) model-agnostic deployment without architectural modifications; and (3) uniform ternary operations that eliminate mixed-precision overhead. Comprehensive experiments on LLaMA3.x and Qwen3 (0.6B-70B) demonstrate that PTQTP significantly outperforms sub-4bit PTQ methods on both language reasoning tasks and mathematical reasoning as well as coding. PTQTP rivals the 1.58-bit QAT performance while requiring only single-hour quantization compared to 10-14 GPU days for training-based methods, and the end-to-end inference speed achieves 4.63$\times$ faster than the FP16 baseline model, establishing a new and practical solution for efficient LLM deployment in resource-constrained environments. Code will available at https://github.com/HeXiao-55/PTQTP.

URLs: https://github.com/HeXiao-55/PTQTP.

replace Machine Learnability as a Measure of Order in Aperiodic Sequences

Authors: Jennifer Dodgson, Michael Joedhitya, Adith Ramdas, Surender Suresh Kumar, Adarsh Singh Chauhan, Akira Rafhael, Wang Mingshu, Nordine Lotfi

Abstract: Research on the distribution of prime numbers has revealed a dual character: deterministic in definition yet exhibiting statistical behavior reminiscent of random processes. In this paper we show that it is possible to use an image-focused machine learning model to measure the comparative regularity of prime number fields at specific regions of an Ulam spiral. Specifically, we demonstrate that in pure accuracy terms, models trained on blocks extracted from regions of the spiral in the vicinity of 500m outperform models trained on blocks extracted from the region representing integers lower than 25m. This implies existence of more easily learnable order in the former region than in the latter. Moreover, a detailed breakdown of precision and recall scores seem to imply that the model is favouring a different approach to classification in different regions of the spiral, focusing more on identifying prime patterns at lower numbers and more on eliminating composites at higher numbers. This aligns with number theory conjectures suggesting that at higher orders of magnitude we should see diminishing noise in prime number distributions, with averages (density, AP equidistribution) coming to dominate, while local randomness regularises after scaling by log x. Taken together, these findings point toward an interesting possibility: that machine learning can serve as a new experimental instrument for number theory. Notably, the method shows potential 1 for investigating the patterns in strong and weak primes for cryptographic purposes.

replace Collaborative Device-Cloud LLM Inference through Reinforcement Learning

Authors: Wenzhi Fang, Dong-Jun Han, Liangqi Yuan, Christopher Brinton

Abstract: Device-cloud collaboration has emerged as a promising paradigm for deploying large language models (LLMs), combining the efficiency of lightweight on-device inference with the superior performance of powerful cloud LLMs. An essential problem in this scenario lies in deciding whether a given query is best handled locally or delegated to the cloud. Existing approaches typically rely on external routers, implemented as binary classifiers, which often struggle to determine task difficulty from the prompt's surface pattern. To address these limitations, we propose a framework where the on-device LLM makes routing decisions at the end of its solving process, with this capability instilled through post-training. In particular, we formulate a reward maximization problem with carefully designed rewards that encourage effective problem solving and judicious offloading to the cloud. To solve this problem, we develop a group-adaptive policy gradient algorithm, featuring a group-level policy gradient, designed to yield an unbiased gradient estimator of the reward, and adaptive prompt filtering, developed to enforce the constraint on cloud LLM usage. Extensive experiments across models and benchmarks show that the proposed methodology consistently outperforms existing baselines and significantly narrows the gap to full cloud LLM performance.

replace Clustering by Denoising: Latent plug-and-play diffusion for single-cell data

Authors: Dominik Meier, Shixing Yu, Sagnik Nandy, Promit Ghosal, Kyra Gan

Abstract: Single-cell RNA sequencing (scRNA-seq) enables the study of cellular heterogeneity. Yet, clustering accuracy, and with it downstream analyses based on cell labels, remain challenging due to measurement noise and biological variability. In standard latent spaces (e.g., obtained through PCA), data from different cell types can be projected close together, making accurate clustering difficult. We introduce a latent plug-and-play diffusion framework that separates the observation and denoising space. This separation is operationalized through a novel Gibbs sampling procedure: the learned diffusion prior is applied in a low-dimensional latent space to perform denoising, while to steer this process, noise is reintroduced into the original high-dimensional observation space. This unique "input-space steering" ensures the denoising trajectory remains faithful to the original data structure. Our approach offers three key advantages: (1) adaptive noise handling via a tunable balance between prior and observed data; (2) uncertainty quantification through principled uncertainty estimates for downstream analysis; and (3) generalizable denoising by leveraging clean reference data to denoise noisier datasets, and via averaging, improve quality beyond the training set. We evaluate robustness on both synthetic and real single-cell genomics data. Our method improves clustering accuracy on synthetic data across varied noise levels and dataset shifts. On real-world single-cell data, our method demonstrates improved biological coherence in the resulting cell clusters, with cluster boundaries that better align with known cell type markers and developmental trajectories.

replace Scaling Patterns in Adversarial Alignment: Evidence from Multi-LLM Jailbreak Experiments

Authors: Samuel Nathanson, Rebecca Williams, Cynthia Matuszek

Abstract: Large language models (LLMs) increasingly operate in multi-agent and safety-critical settings, raising open questions about how their vulnerabilities scale when models interact adversarially. This study examines whether larger models can systematically jailbreak smaller ones - eliciting harmful or restricted behavior despite alignment safeguards. Using standardized adversarial tasks from JailbreakBench, we simulate over 6,000 multi-turn attacker-target exchanges across major LLM families and scales (0.6B-120B parameters), measuring both harm score and refusal behavior as indicators of adversarial potency and alignment integrity. Each interaction is evaluated through aggregated harm and refusal scores assigned by three independent LLM judges, providing a consistent, model-based measure of adversarial outcomes. Aggregating results across prompts, we find a strong and statistically significant correlation between mean harm and the logarithm of the attacker-to-target size ratio (Pearson r = 0.51, p < 0.001; Spearman rho = 0.52, p < 0.001), indicating that relative model size correlates with the likelihood and severity of harmful completions. Mean harm score variance is higher across attackers (0.18) than across targets (0.10), suggesting that attacker-side behavioral diversity contributes more to adversarial outcomes than target susceptibility. Attacker refusal frequency is strongly and negatively correlated with harm (rho = -0.93, p < 0.001), showing that attacker-side alignment mitigates harmful responses. These findings reveal that size asymmetry influences robustness and provide exploratory evidence for adversarial scaling patterns, motivating more controlled investigations into inter-model alignment and safety.

replace Feature-Modulated UFNO for Improved Prediction of Multiphase Flow in Porous Media

Authors: Alhasan Abdellatif, Hannah P. Menke, Florian Doster, Kamaljit Singh, Ahmed H. Elsheikh

Abstract: The UNet-enhanced Fourier Neural Operator (UFNO) extends the Fourier Neural Operator (FNO) by incorporating a parallel UNet pathway, enabling the retention of both high- and low-frequency components. While UFNO improves predictive accuracy over FNO, it inefficiently treats scalar inputs (e.g., temperature, injection rate) as spatially distributed fields by duplicating their values across the domain. This forces the model to process redundant constant signals within the frequency domain. Additionally, its standard loss function does not account for spatial variations in error sensitivity, limiting performance in regions of high physical importance. We introduce UFNO-FiLM, an enhanced architecture that incorporates two key innovations. First, we decouple scalar inputs from spatial features using a Feature-wise Linear Modulation (FiLM) layer, allowing the model to modulate spatial feature maps without introducing constant signals into the Fourier transform. Second, we employ a spatially weighted loss function that prioritizes learning in critical regions. Our experiments on subsurface multiphase flow demonstrate a 21\% reduction in gas saturation Mean Absolute Error (MAE) compared to UFNO, highlighting the effectiveness of our approach in improving predictive accuracy.

replace Fusion of Multiscale Features Via Centralized Sparse-attention Network for EEG Decoding

Authors: Xiangrui Cai, Shaocheng Ma, Lei Cao, Jie Li, Tianyu Liu, Yilin Dong

Abstract: Electroencephalography (EEG) signal decoding is a key technology that translates brain activity into executable commands, laying the foundation for direct brain-machine interfacing and intelligent interaction. To address the inherent spatiotemporal heterogeneity of EEG signals, this paper proposes a multi-branch parallel architecture, where each temporal scale is equipped with an independent spatial feature extraction module. To further enhance multi-branch feature fusion, we propose a Fusion of Multiscale Features via Centralized Sparse-attention Network (EEG-CSANet), a centralized sparse-attention network. It employs a main-auxiliary branch architecture, where the main branch models core spatiotemporal patterns via multiscale self-attention, and the auxiliary branch facilitates efficient local interactions through sparse cross-attention. Experimental results show that EEG-CSANet achieves state-of-the-art (SOTA) performance across five public datasets (BCIC-IV-2A, BCIC-IV-2B, HGD, SEED, and SEED-VIG), with accuracies of 88.54%, 91.09%, 97.15%, 96.03%, and 90.56%, respectively. Such performance demonstrates its strong adaptability and robustness across various EEG decoding tasks. Moreover, extensive ablation studies are conducted to enhance the interpretability of EEG-CSANet. In the future, we hope that EEG-CSANet could serve as a promising baseline model in the field of EEG signal decoding. The source code is publicly available at: https://github.com/Xiangrui-Cai/EEG-CSANet

URLs: https://github.com/Xiangrui-Cai/EEG-CSANet

replace Decomposing Uncertainty in Probabilistic Knowledge Graph Embeddings: Why Entity Variance Is Not Enough

Authors: Chorok Lee

Abstract: Probabilistic knowledge graph embeddings represent entities as distributions, using learned variances to quantify epistemic uncertainty. We identify a fundamental limitation: these variances are relation-agnostic, meaning an entity receives identical uncertainty regardless of relational context. This conflates two distinct out-of-distribution phenomena that behave oppositely: emerging entities (rare, poorly-learned) and novel relational contexts (familiar entities in unobserved relationships). We prove an impossibility result: any uncertainty estimator using only entity-level statistics independent of relation context achieves near-random OOD detection on novel contexts. We empirically validate this on three datasets, finding 100 percent of novel-context triples have frequency-matched in-distribution counterparts. This explains why existing probabilistic methods achieve 0.99 AUROC on random corruptions but only 0.52-0.64 on temporal distribution shift. We formalize uncertainty decomposition into complementary components: semantic uncertainty from entity embedding variance (detecting emerging entities) and structural uncertainty from entity-relation co-occurrence (detecting novel contexts). Our main theoretical result proves these signals are non-redundant, and that any convex combination strictly dominates either signal alone. Our method (CAGP) combines semantic and structural uncertainty via learned weights, achieving 0.94-0.99 AUROC on temporal OOD detection across multiple benchmarks, a 60-80 percent relative improvement over relation-agnostic baselines. Empirical validation confirms complete frequency overlap on three datasets (FB15k-237, WN18RR, YAGO3-10). On selective prediction, our method reduces errors by 43 percent at 85 percent answer rate.

replace Causality-Inspired Safe Residual Correction for Multivariate Time Series

Authors: Jianxiang Xie, Yuncheng Hua, Mingyue Cheng, Flora Salim, Hao Xue

Abstract: While modern multivariate forecasters such as Transformers and GNNs achieve strong benchmark performance, they often suffer from systematic errors at specific variables or horizons and, critically, lack guarantees against performance degradation in deployment. Existing post-hoc residual correction methods attempt to fix these errors, but are inherently greedy: although they may improve average accuracy, they can also "help in the wrong way" by overcorrecting reliable predictions and causing local failures in unseen scenarios. To address this critical "safety gap," we propose CRC (Causality-inspired Safe Residual Correction), a plug-and-play framework explicitly designed to ensure non-degradation. CRC follows a divide-and-conquer philosophy: it employs a causality-inspired encoder to expose direction-aware structure by decoupling self- and cross-variable dynamics, and a hybrid corrector to model residual errors. Crucially, the correction process is governed by a strict four-fold safety mechanism that prevents harmful updates. Experiments across multiple datasets and forecasting backbones show that CRC consistently improves accuracy, while an in-depth ablation study confirms that its core safety mechanisms ensure exceptionally high non-degradation rates (NDR), making CRC a correction framework suited for safe and reliable deployment.

replace Data-Driven Analysis of Crash Patterns in SAE Level 2 and Level 4 Automated Vehicles Using K-means Clustering and Association Rule Mining

Authors: Jewel Rana Palit (Traffic Engineer/Project Manager-II, Collier County Government, Traffic Management Center, 2695 Francis Ave Unit D, Naples, Fl, 37221), Vijayalakshmi K Kumarasamy (Department of Computer Science and Engineering, University of Tennessee at Chattanooga, Chattanooga, TN, USA 37403), Osama A. Osman (Chief Scientist, Center of Urban Informatics and Progress, Chattanooga, TN, USA 37403)

Abstract: Automated Vehicles (AV) hold potential to reduce or eliminate human driving errors, enhance traffic safety, and support sustainable mobility. Recently, crash data has increasingly revealed that AV behavior can deviate from expected safety outcomes, raising concerns about the technology's safety and operational reliability in mixed traffic environments. While past research has investigated AV crash, most studies rely on small-size California-centered datasets, with a limited focus on understanding crash trends across various SAE Levels of automation. This study analyzes over 2,500 AV crash records from the United States National Highway Traffic Safety Administration (NHTSA), covering SAE Levels 2 and 4, to uncover underlying crash dynamics. A two-stage data mining framework is developed. K-means clustering is first applied to segment crash records into 4 distinct behavioral clusters based on temporal, spatial, and environmental factors. Then, Association Rule Mining (ARM) is used to extract interpretable multivariate relationships between crash patterns and crash contributors including lighting conditions, surface condition, vehicle dynamics, and environmental conditions within each cluster. These insights provide actionable guidance for AV developers, safety regulators, and policymakers in formulating AV deployment strategies and minimizing crash risks.

replace BOAD: Discovering Hierarchical Software Engineering Agents via Bandit Optimization

Authors: Iris Xu, Guangtao Zeng, Zexue He, Charles Jin, Aldo Pareja, Dan Gutfreund, Chuang Gan, Zhang-Wei Hong

Abstract: Large language models (LLMs) have shown strong reasoning and coding capabilities, yet they struggle to generalize to real-world software engineering (SWE) problems that are long-horizon and out of distribution. Existing systems often rely on a single agent to handle the entire workflow-interpreting issues, navigating large codebases, and implementing fixes-within one reasoning chain. Such monolithic designs force the model to retain irrelevant context, leading to spurious correlations and poor generalization. Motivated by how human engineers decompose complex problems, we propose structuring SWE agents as orchestrators coordinating specialized sub-agents for sub-tasks such as localization, editing, and validation. The challenge lies in discovering effective hierarchies automatically: as the number of sub-agents grows, the search space becomes combinatorial, and it is difficult to attribute credit to individual sub-agents within a team. We address these challenges by formulating hierarchy discovery as a multi-armed bandit (MAB) problem, where each arm represents a candidate sub-agent and the reward measures its helpfulness when collaborating with others. This framework, termed Bandit Optimization for Agent Design (BOAD), enables efficient exploration of sub-agent designs under limited evaluation budgets. On SWE-bench-Verified, BOAD outperforms single-agent and manually designed multi-agent systems. On SWE-bench-Live, featuring more recent and out-of-distribution issues, our 36B system ranks second on the leaderboard at the time of evaluation, surpassing larger models such as GPT-4 and Claude. These results demonstrate that automatically discovered hierarchical multi-agent systems significantly improve generalization on challenging long-horizon SWE tasks. Code is available at https://github.com/iamxjy/BOAD-SWE-Agent.

URLs: https://github.com/iamxjy/BOAD-SWE-Agent.

replace Coordinate Matrix Machine: A Human-level Concept Learning to Classify Very Similar Documents

Authors: Amin Sadri, M Maruf Hossain

Abstract: Human-level concept learning argues that humans typically learn new concepts from a single example, whereas machine learning algorithms typically require hundreds of samples to learn a single concept. Our brain subconsciously identifies important features and learns more effectively. Contribution: In this paper, we present the Coordinate Matrix Machine (CM$^2$). This purpose-built small model augments human intelligence by learning document structures and using this information to classify documents. While modern "Red AI" trends rely on massive pre-training and energy-intensive GPU infrastructure, CM$^2$ is designed as a Green AI solution. It achieves human-level concept learning by identifying only the structural "important features" a human would consider, allowing it to classify very similar documents using only one sample per class. Advantage: Our algorithm outperforms traditional vectorizers and complex deep learning models that require larger datasets and significant compute. By focusing on structural coordinates rather than exhaustive semantic vectors, CM$^2$ offers: 1. High accuracy with minimal data (one-shot learning) 2. Geometric and structural intelligence 3. Green AI and environmental sustainability 4. Optimized for CPU-only environments 5. Inherent explainability (glass-box model) 6. Faster computation and low latency 7. Robustness against unbalanced classes 8. Economic viability 9. Generic, expandable, and extendable

replace Information-Theoretic Quality Metric of Low-Dimensional Embeddings

Authors: Sebasti\'an Guti\'errez-Bernal, Hector Medel Cobaxin, Abiel Galindo Gonz\'alez

Abstract: In this work we study the quality of low-dimensional embeddings from an explicitly information-theoretic perspective. We begin by noting that classical evaluation metrics such as stress, rank-based neighborhood criteria, or Local Procrustes quantify distortions in distances or in local geometries, but do not directly assess how much information is preserved when projecting high-dimensional data onto a lower-dimensional space. To address this limitation, we introduce the Entropy Rank Preservation Measure (ERPM), a local metric based on the Shannon entropy of the singular-value spectrum of neighborhood matrices and on the stable rank, which quantifies changes in uncertainty between the original representation and its reduced projection, providing neighborhood-level indicators and a global summary statistic. To validate the results of the metric, we compare its outcomes with the Mean Relative Rank Error (MRRE), which is distance-based, and with Local Procrustes, which is based on geometric properties, using a financial time series and a manifold commonly studied in the literature. We observe that distance-based criteria exhibit very low correlation with geometric and spectral measures, while ERPM and Local Procrustes show strong average correlation but display significant discrepancies in local regimes, leading to the conclusion that ERPM complements existing metrics by identifying neighborhoods with severe information loss, thereby enabling a more comprehensive assessment of embeddings, particularly in information-sensitive applications such as the construction of early-warning indicators.

replace Adaptive Learning Guided by Bias-Noise-Alignment Diagnostics

Authors: Akash Samanta, Sheldon Williamson

Abstract: Learning systems deployed in nonstationary and safety-critical environments often suffer from instability, slow convergence, or brittle adaptation when learning dynamics evolve over time. While modern optimization, reinforcement learning, and meta-learning methods adapt to gradient statistics, they largely ignore the temporal structure of the error signal itself. This paper proposes a diagnostic-driven adaptive learning framework that explicitly models error evolution through a principled decomposition into bias, capturing persistent drift; noise, capturing stochastic variability; and alignment, capturing repeated directional excitation leading to overshoot. These diagnostics are computed online from lightweight statistics of loss or temporal-difference (TD) error trajectories and are independent of model architecture or task domain. We show that the proposed bias-noise-alignment decomposition provides a unifying control backbone for supervised optimization, actor-critic reinforcement learning, and learned optimizers. Within this framework, we introduce three diagnostic-driven instantiations: the Human-inspired Supervised Adaptive Optimizer (HSAO), Hybrid Error-Diagnostic Reinforcement Learning (HED-RL) for actor-critic methods, and the Meta-Learned Learning Policy (MLLP). Under standard smoothness assumptions, we establish bounded effective updates and stability properties for all cases. Representative diagnostic illustrations in actor-critic learning highlight how the proposed signals modulate adaptation in response to TD error structure. Overall, this work elevates error evolution to a first-class object in adaptive learning and provides an interpretable, lightweight foundation for reliable learning in dynamic environments.

replace Unregularized Linear Convergence in Zero-Sum Game from Preference Feedback

Authors: Shulun Chen, Runlong Zhou, Zihan Zhang, Maryam Fazel, Simon S. Du

Abstract: Aligning large language models (LLMs) with human preferences has proven effective for enhancing model capabilities, yet standard preference modeling using the Bradley-Terry model assumes transitivity, overlooking the inherent complexity of human population preferences. Nash learning from human feedback (NLHF) addresses this by framing non-transitive preferences as a two-player zero-sum game, where alignment reduces to finding the Nash equilibrium (NE). However, existing algorithms typically rely on regularization, incurring unavoidable bias when computing the duality gap in the original game. In this work, we provide the first convergence guarantee for Optimistic Multiplicative Weights Update ($\mathtt{OMWU}$) in NLHF, showing that it achieves last-iterate linear convergence after a burn-in phase whenever an NE with full support exists, with an instance-dependent linear convergence rate to the original NE, measured by duality gaps. Compared to prior results in Wei et al. (2020), we do not require the assumption of NE uniqueness. Our analysis identifies a novel marginal convergence behavior, where the probability of rarely played actions grows exponentially from exponentially small values, enabling exponentially better dependence on instance-dependent constants than prior results. Experiments corroborate the theoretical strengths of $\mathtt{OMWU}$ in both tabular and neural policy classes, demonstrating its potential for LLM applications.

replace Frequent subgraph-based persistent homology for graph classification

Authors: Xinyang Chen, Ama\"el Broustet, Guanyuan Zeng, Cheng He, Guoting Chen

Abstract: Persistent homology (PH) has recently emerged as a powerful tool for extracting topological features. Integrating PH into machine learning and deep learning models enhances topology awareness and interpretability. However, most PH methods on graphs rely on a limited set of filtrations, such as degree-based or weight-based filtrations, which overlook richer features like recurring information across the dataset and thus restrict expressive power. In this work, we propose a novel graph filtration called Frequent Subgraph Filtration (FSF), which is derived from frequent subgraphs and produces stable and information-rich frequency-based persistent homology (FPH) features. We study the theoretical properties of FSF and provide both proofs and experimental validation. Beyond persistent homology itself, we introduce two approaches for graph classification: an FPH-based machine learning model (FPH-ML) and a hybrid framework that integrates FPH with graph neural networks (FPH-GNNs) to enhance topology-aware graph representation learning. Our frameworks bridge frequent subgraph mining and topological data analysis, offering a new perspective on topology-aware feature extraction. Experimental results show that FPH-ML achieves competitive or superior accuracy compared with kernel-based and degree-based filtration methods. When integrated into graph neural networks, FPH yields relative performance gains ranging from 0.4 to 21 percent, with improvements of up to 8.2 percentage points over GCN and GIN backbones across benchmarks.

replace-cross MCD: Marginal Contrastive Discrimination for conditional density estimation

Authors: Katia Meziani, Aminata Ndiaye, Benjamin Riu

Abstract: We consider the problem of conditional density estimation, which is a major topic of interest in the fields of statistical and machine learning. Our method, called Marginal Contrastive Discrimination, MCD, reformulates the conditional density function into two factors, the marginal density function of the target variable and a ratio of density functions which can be estimated through binary classification. Like noise-contrastive methods, MCD can leverage state-of-the-art supervised learning techniques to perform conditional density estimation, including neural networks. Our benchmark reveals that our method significantly outperforms in practice existing methods on most density models and regression datasets.

replace-cross Sparse-Input Neural Network using Group Concave Regularization

Authors: Bin Luo, Susan Halabi

Abstract: Simultaneous feature selection and non-linear function estimation is challenging in modeling, especially in high-dimensional settings where the number of variables exceeds the available sample size. In this article, we investigate the problem of feature selection in neural networks. Although the group least absolute shrinkage and selection operator (LASSO) has been utilized to select variables for learning with neural networks, it tends to select unimportant variables into the model to compensate for its over-shrinkage. To overcome this limitation, we propose a framework of sparse-input neural networks using group concave regularization for feature selection in both low-dimensional and high-dimensional settings. The main idea is to apply a proper concave penalty to the $l_2$ norm of weights from all outgoing connections of each input node, and thus obtain a neural net that only uses a small subset of the original variables. In addition, we develop an effective algorithm based on backward path-wise optimization to yield stable solution paths, in order to tackle the challenge of complex optimization landscapes. We provide a rigorous theoretical analysis of the proposed framework, establishing finite-sample guarantees for both variable selection consistency and prediction accuracy. These results are supported by extensive simulation studies and real data applications, which demonstrate the finite-sample performance of the estimator in feature selection and prediction across continuous, binary, and time-to-event outcomes.

replace-cross Evolutionary Optimization of Physics-Informed Neural Networks: Advancing Generalizability by the Baldwin Effect

Authors: Jian Cheng Wong, Chin Chun Ooi, Abhishek Gupta, Pao-Hsiung Chiu, Joshua Shao Zheng Low, My Ha Dao, Yew-Soon Ong

Abstract: Physics-informed neural networks (PINNs) are at the forefront of scientific machine learning, making possible the creation of machine intelligence that is cognizant of physical laws and able to accurately simulate them. However, today's PINNs are often trained for a single physics task and require computationally expensive re-training for each new task, even for tasks from similar physics domains. To address this limitation, this paper proposes a pioneering approach to advance the generalizability of PINNs through the framework of Baldwinian evolution. Drawing inspiration from the neurodevelopment of precocial species that have evolved to learn, predict and react quickly to their environment, we envision PINNs that are pre-wired with connection strengths inducing strong biases towards efficient learning of physics. A novel two-stage stochastic programming formulation coupling evolutionary selection pressure (based on proficiency over a distribution of physics tasks) with lifetime learning (to specialize on a sampled subset of those tasks) is proposed to instantiate the Baldwin effect. The evolved Baldwinian-PINNs demonstrate fast and physics-compliant prediction capabilities across a range of empirically challenging problem instances with more than an order of magnitude improvement in prediction accuracy at a fraction of the computation cost compared to state-of-the-art gradient-based meta-learning methods. For example, when solving the diffusion-reaction equation, a 70x improvement in accuracy was obtained while taking 700x less computational time. This paper thus marks a leap forward in the meta-learning of PINNs as generalizable physics solvers. Sample codes are available at https://github.com/chiuph/Baldwinian-PINN.

URLs: https://github.com/chiuph/Baldwinian-PINN.

replace-cross Real-Time Forecasting of Pathological Gait via IMU Navigation: A Few-Shot and Generative Learning Framework for Wearable Devices

Authors: Wenwen Zhang, Hao Zhang, Zenan Jiang, Amir Servati, Peyman Servati

Abstract: Current gait analysis faces challenges in various aspects, including limited and poorly labeled data within existing wearable electronics databases, difficulties in collecting patient data due to privacy concerns, and the inadequacy of the Zero-Velocity Update Technique (ZUPT) in accurately analyzing pathological gait patterns. To address these limitations, we introduce GaitMotion, a novel machine-learning framework that employs few-shot learning on a multitask dataset collected via wearable IMU sensors for real-time pathological gait analysis. GaitMotion enhances data quality through detailed, ground-truth-labeled sequences and achieves accurate step and stride segmentation and stride length estimation, which are essential for diagnosing neurological disorders. We incorporate a generative augmentation component, which synthesizes rare or underrepresented pathological gait patterns. GaitMotion achieves a 65\% increase in stride length estimation accuracy compared to ZUPT. In addition, its application to real patient datasets via transfer learning confirms its robust predictive capability. By integrating generative AI into wearable gait analysis, GaitMotion not only refines the precision of pathological gait forecasting but also demonstrates a scalable framework for leveraging synthetic data in biomechanical pattern recognition, paving the way for more personalized and data-efficient digital health services.

replace-cross CIC: Circular Image Compression

Authors: Honggui Li, Sinan Chen, Dingtai Li, Zhengyang Zhang, Nahid Md Lokman Hossain, Xinfeng Xu, Yinlu Qin, Ruobing Wang, Maria Trocan, Dimitri Galayko, Amara Amara, Mohamad Sawan

Abstract: Learned image compression (LIC) is currently the cutting-edge method. However, the inherent difference between testing and training images of LIC results in performance degradation to some extent. Especially for out-of-sample, out-of-distribution, or out-of-domain testing images, the performance of LIC degrades significantly. Classical LIC is a serial image compression (SIC) approach that utilizes an open-loop architecture with serial encoding and decoding units. Nevertheless, according to the principles of automatic control systems, a closed-loop architecture holds the potential to improve the dynamic and static performance of LIC. Therefore, a circular image compression (CIC) approach with closed-loop encoding and decoding elements is proposed to minimize the gap between testing and training images and upgrade the capability of LIC. The proposed CIC establishes a nonlinear loop equation and proves that steady-state error between reconstructed and original images is close to zero by Taylor series expansion. The proposed CIC method possesses the property of Post-Training and Plug-and-Play which can be built on any existing advanced SIC methods. Experimental results including rate-distortion curves on five public image compression datasets demonstrate that the proposed CIC outperforms eight competing state-of-the-art open-source SIC algorithms in reconstruction capacity. Experimental results further show that the proposed method is suitable for out-of-sample testing images with dark backgrounds, sharp edges, high contrast, grid shapes, or complex patterns.

replace-cross Towards Knowledge Guided Pretraining Approaches for Multimodal Foundation Models: Applications in Remote Sensing

Authors: Praveen Ravirathinam, Ajitesh Parthasarathy, Ankush Khandelwal, Rahul Ghosh, Vipin Kumar

Abstract: Self-supervised learning has emerged as a powerful paradigm for pretraining foundation models using large-scale data. Existing pretraining approaches predominantly rely on masked reconstruction or next-token prediction strategies, demonstrating strong performance across various downstream tasks, including geoscience applications. However, these approaches do not fully capture the knowledge of causal interplay between different geospatial and environmental variables. To address this limitation, we propose Knowledge Guided Variable-Step Forecasting (KG-VSF), a novel pretraining task that models forecasting as a conditional generation task, where driver variables (e.g., weather) inform the prediction of response variables (e.g., satellite imagery). We demonstrate that pretraining in such a fashion leads to strong embeddings which give enhanced performance when finetuned on downstream tasks where capturing this causality matters such as pixel wise crop type mapping, soil moisture estimation and forecasting, missing image prediction, and future image forecasting when compared to finetuning embeddings from other standard pretraining approaches.

replace-cross Survey of Data-driven Newsvendor: Unified Analysis and Spectrum of Achievable Regrets

Authors: Zhuoxin Chen, Will Ma

Abstract: In the Newsvendor problem, the goal is to guess the number that will be drawn from some distribution, with asymmetric consequences for guessing too high vs. too low. In the data-driven version, the distribution is unknown, and one must work with samples from the distribution. Data-driven Newsvendor has been studied under many variants: additive vs. multiplicative regret, high probability vs. expectation bounds, and different distribution classes. This paper studies all combinations of these variants, filling in many gaps in the literature and simplifying many proofs. In particular, we provide a unified analysis based on the notion of clustered distributions, which in conjunction with our new lower bounds, shows that the entire spectrum of regrets between $1/\sqrt{n}$ and $1/n$ can be possible. Simulations on commonly-used distributions demonstrate that our notion is the "correct" predictor of empirical regret across varying data sizes.

replace-cross Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model

Authors: Kaiwen Tang, Zhanglu Yan, Weng-Fai Wong

Abstract: For reasons such as privacy, there are use cases for language models at the edge. This has given rise to small language models targeted for deployment in resource-constrained devices where energy efficiency is critical. Spiking neural networks (SNNs) offer a promising solution due to their energy efficiency, and there are already works on realizing transformer-based models on SNNs. However, key operations like softmax and layer normalization (LN) are difficult to implement on neuromorphic hardware, and many of these early works sidestepped them. To address these challenges, we introduce Sorbet, a transformer-based spiking language model that is more neuromorphic hardware-compatible. Sorbet incorporates a novel shifting-based softmax called PTsoftmax and a Bit Shifting PowerNorm (BSPN), both designed to replace the respective energy-intensive operations. By leveraging knowledge distillation and model quantization, Sorbet achieved a highly compressed binary weight model that maintains competitive performance while achieving $27.16\times$ energy savings compared to BERT. We validate Sorbet through extensive testing on the GLUE benchmark and a series of ablation studies, demonstrating its potential as an energy-efficient solution for language model inference. Our code is publicly available at \href{https://github.com/Kaiwen-Tang/Sorbet}{https://github.com/Kaiwen-Tang/Sorbet}

URLs: https://github.com/Kaiwen-Tang/Sorbet, https://github.com/Kaiwen-Tang/Sorbet

replace-cross Mitigating optimistic bias in entropic risk estimation and optimization

Authors: Utsav Sadana, Erick Delage, Angelos Georghiou

Abstract: The entropic risk measure is widely used in high-stakes decision-making across economics, management science, finance, and safety-critical control systems because it captures tail risks associated with uncertain losses. However, when data are limited, the empirical entropic risk estimator, formed by replacing the expectation in the risk measure with a sample average, underestimates true risk. We show that this negative bias grows superlinearly with the standard deviation of the loss for distributions with unbounded right tails. We further demonstrate that several existing bias reduction techniques developed for empirical risk either continue to underestimate entropic risk or substantially overestimate it, potentially leading to overly risky or overly conservative decisions. To address this issue, we develop a parametric bootstrap procedure that is strongly asymptotically consistent and provides a controlled overestimation of entropic risk under mild assumptions. The method first fits a distribution to the data and then estimates the empirical estimator's bias via bootstrapping. We show that the fitted distribution must satisfy only weak regularity conditions, and Gaussian mixture models offer a convenient and flexible choice within this class. As an application, we introduce a distributionally robust optimization model for an insurance contract design problem that incorporates correlations in household losses. We show that selecting regularization parameters using standard cross-validation can lead to substantially higher out-of-sample risk for the insurer if the validation bias is not corrected. Our approach improves performance by recommending higher and more accurate premiums, thereby better reflecting the underlying tail risk.

replace-cross Towards Acyclic Preference Evaluation of Language Models via Multiple Evaluators

Authors: Zhengyu Hu, Jieyu Zhang, Zhihan Xiong, Alexander Ratner, Kaize Ding, Ranjay Krishna

Abstract: Despite the remarkable success of Large Language Models (LLMs), evaluating their outputs' quality regarding preference remains a critical challenge. While existing works usually leverage a strong LLM as the judge for comparing LLMs' response pairwisely, such a single-evaluator approach is vulnerable to cyclic preference, i.e., output A is better than B, B than C, but C is better than A, causing contradictory evaluation results. To address this, we introduce PGED (Preference Graph Ensemble and Denoising), a novel approach that leverages multiple model-based evaluators to construct preference graphs, and then ensembles and denoises these graphs for acyclic, non-contradictory evaluation results. We provide theoretical guarantees for our framework, demonstrating its efficacy in recovering the ground truth preference structure. Extensive experiments on ten benchmarks demonstrate PGED's superiority in three applications: 1) model ranking for evaluation, 2) response selection for test-time scaling, and 3) data selection for model fine-tuning. Notably, PGED combines small LLM evaluators (e.g., Llama3-8B, Mistral-7B, Qwen2-7B) to outperform strong ones (e.g., Qwen2-72B), showcasing its effectiveness in enhancing evaluation reliability and improving model performance.

replace-cross AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving

Authors: Shuo Xing, Hongyuan Hua, Xiangbo Gao, Shenzhe Zhu, Renjie Li, Kexin Tian, Xiaopeng Li, Heng Huang, Tianbao Yang, Zhangyang Wang, Yang Zhou, Huaxiu Yao, Zhengzhong Tu

Abstract: Recent advancements in large vision language models (VLMs) tailored for autonomous driving (AD) have shown strong scene understanding and reasoning capabilities, making them undeniable candidates for end-to-end driving systems. However, limited work exists on studying the trustworthiness of DriveVLMs -- a critical factor that directly impacts public transportation safety. In this paper, we introduce AutoTrust, a comprehensive trustworthiness benchmark for large vision-language models in autonomous driving (DriveVLMs), considering diverse perspectives -- including trustfulness, safety, robustness, privacy, and fairness. We constructed the largest visual question-answering dataset for investigating trustworthiness issues in driving scenarios, comprising over 10k unique scenes and 18k queries. We evaluated six publicly available VLMs, spanning from generalist to specialist, from open-source to commercial models. Our exhaustive evaluations have unveiled previously undiscovered vulnerabilities of DriveVLMs to trustworthiness threats. Specifically, we found that the general VLMs like LLaVA-v1.6 and GPT-4o-mini surprisingly outperform specialized models fine-tuned for driving in terms of overall trustworthiness. DriveVLMs like DriveLM-Agent are particularly vulnerable to disclosing sensitive information. Additionally, both generalist and specialist VLMs remain susceptible to adversarial attacks and struggle to ensure unbiased decision-making across diverse environments and populations. Our findings call for immediate and decisive action to address the trustworthiness of DriveVLMs -- an issue of critical importance to public safety and the welfare of all citizens relying on autonomous transportation systems. We release all the codes and datasets in https://github.com/taco-group/AutoTrust.

URLs: https://github.com/taco-group/AutoTrust.

replace-cross Evolutionary Optimization of Physics-Informed Neural Networks: Evo-PINN Frontiers and Opportunities

Authors: Jian Cheng Wong, Abhishek Gupta, Chin Chun Ooi, Pao-Hsiung Chiu, Jiao Liu, Yew-Soon Ong

Abstract: Deep learning models trained on finite data lack a complete understanding of the physical world. On the other hand, physics-informed neural networks (PINNs) are infused with such knowledge through the incorporation of mathematically expressible laws of nature into their training loss function. By complying with physical laws, PINNs provide advantages over purely data-driven models in limited-data regimes and present as a promising route towards Physical AI. This feature has propelled them to the forefront of scientific machine learning, a domain characterized by scarce and costly data. However, the vision of accurate physics-informed learning comes with significant challenges. This work examines PINNs in terms of model optimization and generalization, shedding light on the need for new algorithmic advances to overcome issues pertaining to the training speed, precision, and generalizability of today's PINN models. Of particular interest are gradient-free evolutionary algorithms (EAs) for optimizing the uniquely complex loss landscapes arising in PINN training. Methods synergizing gradient descent and EAs for discovering bespoke neural architectures and balancing multiple terms in physics-informed learning objectives are positioned as important avenues for future research. Another exciting track is to cast EAs as a meta-learner of generalizable PINN models. To substantiate these proposed avenues, we further highlight results from recent literature to showcase the early success of such approaches in addressing the aforementioned challenges in PINN optimization and generalization.

replace-cross Support Vector Machine Kernels as Quantum Propagators

Authors: Nan-Hong Kuo, Renata Wong

Abstract: Selecting optimal kernels for regression in physical systems remains a challenge, often relying on trial-and-error with standard functions. In this work, we establish a mathematical correspondence between support vector machine kernels and quantum propagators, demonstrating that kernel efficacy is determined by its spectral alignment with the system's Green's function. Based on this isomorphism, we propose a unified, physics-informed framework for kernel selection and design. For systems with known propagator forms, we derive analytical selection rules that map standard kernels to physical operators. For complex systems where the Green's function is analytically intractable, we introduce a constructive numerical method using the Kernel Polynomial Method with Jackson smoothing to generate custom, physics-aligned kernels. Numerical experiments spanning electrical conductivity, electronic band structure, anharmonic oscillators, and photonic crystals demonstrate that this framework consistently performs well as long as there is an alignment with a Green's function.

replace-cross Sparse Additive Contextual Bandits: A Nonparametric Approach for Online Decision-Making with High-Dimensional Covariates

Authors: Wenjia Wang, Qingwen Zhang, Xiaowei Zhang

Abstract: Personalized services are central to today's digital economy, and their sequential decisions are often modeled as contextual bandits. Modern applications pose two main challenges: high-dimensional covariates and the need for nonparametric models to capture complex reward-covariate relationships. We propose a contextual bandit algorithm based on a sparse additive reward model that addresses both challenges through (i) a doubly penalized estimator for nonparametric reward estimation and (ii) an epoch-based design with adaptive screening to balance exploration and exploitation. We prove a sublinear regret bound that grows only logarithmically in the covariate dimensionality; to our knowledge, this is the first such result for nonparametric contextual bandits with high-dimensional covariates. We also derive an information-theoretic lower bound, and the gap to the upper bound vanishes as the reward smoothness increases. Extensive experiments on synthetic data and real data from video recommendation and personalized medicine show strong performance in high-dimensional settings.

replace-cross Beyond Accuracy: What Matters in Designing Well-Behaved Image Classification Models?

Authors: Robin Hesse, Do\u{g}ukan Ba\u{g}c{\i}, Bernt Schiele, Simone Schaub-Meyer, Stefan Roth

Abstract: Deep learning has become an essential part of computer vision, with deep neural networks (DNNs) excelling in predictive performance. However, they often fall short in other critical quality dimensions, such as robustness, calibration, or fairness. While existing studies have focused on a subset of these quality dimensions, none have explored a more general form of "well-behavedness" of DNNs. With this work, we address this gap by simultaneously studying nine different quality dimensions for image classification. Through a large-scale study, we provide a bird's-eye view by analyzing 326 backbone models and how different training paradigms and model architectures affect these quality dimensions. We reveal various new insights such that (i) vision-language models exhibit high class balance on ImageNet-1k classification and strong robustness against domain changes; (ii) training models initialized with weights obtained through self-supervised learning is an effective strategy to improve most considered quality dimensions; and (iii) the training dataset size is a major driver for most of the quality dimensions. We conclude our study by introducing the QUBA score (Quality Understanding Beyond Accuracy), a novel metric that ranks models across multiple dimensions of quality, enabling tailored recommendations based on specific user needs.

replace-cross Through a Compressed Lens: Investigating The Impact of Quantization on Factual Knowledge Recall

Authors: Qianli Wang, Mingyang Wang, Nils Feldhus, Simon Ostermann, Yuan Cao, Hinrich Sch\"utze, Sebastian M\"oller, Vera Schmitt

Abstract: Quantization methods are widely used to accelerate inference and streamline the deployment of large language models (LLMs). Although quantization's effects on various LLM capabilities have been extensively studied, one critical area remains underexplored: factual knowledge recall (FKR), the process by which LLMs access stored knowledge. To this end, we conduct comprehensive experiments using three common quantization techniques at distinct bit widths, in conjunction with interpretability-driven analyses on two tasks, knowledge memorization and latent multi-hop reasoning. We show that quantization typically results in information loss within LLMs, consequently diminishing their capacity for FKR. This effect is particularly amplified in smaller models within the same architectural families. However, models quantized at reduced bit precision do not consistently exhibit inferior performance and occasionally quantization may even enhance model FKR. We find that BitSandBytes demonstrates highest preservation of the original full-precision model's FKR. Despite variability across models and methods, quantization causes modest performance degradation and remains an effective compression strategy.

replace-cross PolarGrad: A Class of Matrix-Gradient Optimizers from a Unifying Preconditioning Perspective

Authors: Tim Tsz-Kit Lau, Qi Long, Weijie Su

Abstract: The ever-growing scale of deep learning models and training data underscores the critical importance of efficient optimization methods. While preconditioned gradient methods such as Adam and AdamW are the de facto optimizers for training neural networks and large language models, structure-aware preconditioned optimizers like Shampoo and Muon, which utilize the matrix structure of gradients, have demonstrated promising evidence of faster convergence. In this paper, we introduce a unifying framework for analyzing "matrix-aware" preconditioned methods, which not only sheds light on the effectiveness of Muon and related optimizers but also leads to a class of new structure-aware preconditioned methods. A key contribution of this framework is its precise distinction between preconditioning strategies that treat neural network weights as vectors (addressing curvature anisotropy) versus those that consider their matrix structure (addressing gradient anisotropy). This perspective provides new insights into several empirical phenomena in language model pre-training, including Adam's training instabilities, Muon's accelerated convergence, and the necessity of learning rate warmup for Adam. Building upon this framework, we introduce PolarGrad, a new class of preconditioned optimization methods based on the polar decomposition of matrix-valued gradients. As a special instance, PolarGrad includes Muon with updates scaled by the nuclear norm of the gradients. We provide numerical implementations of these methods, leveraging efficient numerical polar decomposition algorithms for enhanced convergence. Our extensive evaluations across diverse matrix optimization problems and language model pre-training tasks demonstrate that PolarGrad outperforms both Adam and Muon.

replace-cross Esoteric Language Models

Authors: Subham Sekhar Sahoo, Zhihan Yang, Yash Akhauri, Johnna Liu, Deepansha Singh, Zhoujun Cheng, Zhengzhong Liu, Eric Xing, John Thickstun, Arash Vahdat

Abstract: Diffusion-based language models offer a compelling alternative to autoregressive (AR) models by enabling parallel and controllable generation. Within this family, Masked Diffusion Models (MDMs) currently perform best but still underperform AR models in perplexity and lack key inference-time efficiency features, most notably KV caching. We introduce Eso-LMs, a new family of models that fuses AR and MDM paradigms, smoothly interpolating between their perplexities while overcoming their respective limitations. Unlike prior work, which uses transformers with bidirectional attention as MDM denoisers, we exploit the connection between MDMs and Any-Order autoregressive models and adopt causal attention. This design lets us compute the exact likelihood of MDMs for the first time and, crucially, enables us \to introduce KV caching for MDMs while preserving parallel generation for the first time, significantly improving inference efficiency. Combined with an optimized sampling schedule, Eso-LMs achieves a new state of the art on the speed-quality Pareto frontier for unconditional generation. On long contexts, it yields $\mathbf{14 - 65{}\times}$ faster inference than standard MDMs and $\mathbf{3 - 4{}\times}$ faster inference than prior semi-autoregressive approaches. We provide code, model checkpoints, and video tutorials on the project page: http://s-sahoo.github.io/Eso-LMs

URLs: http://s-sahoo.github.io/Eso-LMs

replace-cross Towards Streaming LiDAR Object Detection with Point Clouds as Egocentric Sequences

Authors: Mellon M. Zhang, Glen Chou, Saibal Mukhopadhyay

Abstract: Accurate and low-latency 3D object detection is essential for autonomous driving, where safety hinges on both rapid response and reliable perception. While rotating LiDAR sensors are widely adopted for their robustness and fidelity, current detectors face a trade-off: streaming methods process partial polar sectors on the fly for fast updates but suffer from limited visibility, cross-sector dependencies, and distortions from retrofitted Cartesian designs, whereas full-scan methods achieve higher accuracy but are bottlenecked by the inherent latency of a LiDAR revolution. We propose Polar-Fast-Cartesian-Full (PFCF), a hybrid detector that combines fast polar processing for intra-sector feature extraction with accurate Cartesian reasoning for full-scene understanding. Central to PFCF is a custom Mamba SSM-based streaming backbone with dimensionally-decomposed convolutions that avoids distortion-heavy planes, enabling parameter-efficient, translation-invariant, and distortion-robust polar representation learning. Local sector features are extracted via this backbone, then accumulated into a sector feature buffer to enable efficient inter-sector communication through a full-scan backbone. PFCF establishes a new Pareto frontier on the Waymo Open dataset, surpassing prior streaming baselines by 10% mAP and matching full-scan accuracy at twice the update rate. Code is available at \href{https://github.com/meilongzhang/Polar-Hierarchical-Mamba}{https://github.com/meilongzhang/Polar-Hierarchical-Mamba}.

URLs: https://github.com/meilongzhang/Polar-Hierarchical-Mamba, https://github.com/meilongzhang/Polar-Hierarchical-Mamba

replace-cross Effects of Structural Allocation of Geometric Task Diversity in Linear Meta-Learning Models

Authors: Saptati Datta, Nicolas W. Hengartner, Yulia Pimonova, Natalie E. Klein, Nicholas Lubbers

Abstract: Meta-learning aims to leverage information across related tasks to improve prediction on unlabeled data for new tasks when only a small number of labeled observations are available ("few-shot" learning). Increased task diversity is often believed to enhance meta-learning by providing richer information across tasks. However, recent work by Kumar et al. (2022) shows that increasing task diversity, quantified through the overall geometric spread of task representations, can in fact degrade meta-learning prediction performance across a range of models and datasets. In this work, we build on this observation by showing that meta-learning performance is affected not only by the overall geometric variability of task parameters, but also by how this variability is allocated relative to an underlying low-dimensional structure. Similar to Pimonova et al. (2025), we decompose task-specific regression effects into a structurally informative component and an orthogonal, non-informative component. We show theoretically and through simulation that meta-learning prediction degrades when a larger fraction of between-task variability lies in orthogonal, non-informative directions, even when the overall geometric variability of tasks is held fixed.

replace-cross An Analytical and AI-discovered Stable, Accurate, and Generalizable Subgrid-scale Closure for Geophysical Turbulence

Authors: Karan Jakhar, Yifei Guan, Pedram Hassanzadeh

Abstract: By combining AI and fluid physics, we discover a closed-form closure for 2D turbulence from small direct numerical simulation (DNS) data. Large-eddy simulation (LES) with this closure is accurate and stable, reproducing DNS statistics including those of extremes. We also show that the new closure could be derived from a 4th-order truncated Taylor expansion. Prior analytical and AI-based work only found the 2nd-order expansion, which led to unstable LES. The additional terms emerge only when inter-scale energy transfer is considered alongside standard reconstruction criterion in the sparse-equation discovery.

replace-cross Combinatorial Creativity: A New Frontier in Generalization Abilities

Authors: Samuel Schapiro, Sumuk Shashidhar, Alexi Gladstone, Jonah Black, Royce Moon, Dilek Hakkani-Tur, Lav R. Varshney

Abstract: Artificial intelligence (AI) systems, and Large Language Models (LLMs) in particular, are increasingly employed for creative tasks like scientific idea generation, constituting a form of generalization from training data unaddressed by existing conceptual frameworks. Despite its similarities to compositional generalization (CG), combinatorial creativity (CC) is an open-ended ability. Instead of evaluating for accuracy or correctness against fixed targets, which would contradict the open-ended nature of CC, we propose a theoretical framework and algorithmic task for evaluating outputs by their degrees of novelty and utility. From here, we make several important empirical contributions: (1) We obtain the first insights into the scaling behavior of creativity for LLMs. (2) We discover that, for fixed compute budgets, there exist optimal model depths and widths for creative ability. (3) We find that the ideation-execution gap, whereby LLMs excel at generating novel scientific ideas but struggle to ensure their practical feasibility, may be explained by a more fundamental novelty-utility tradeoff characteristic of creativity algorithms in general. Though our findings persist up to the 100M scale, frontier models today are well into the billions of parameters. Therefore, our conceptual framework and empirical findings can best serve as a starting point for understanding and improving the creativity of frontier-size models today, as we begin to bridge the gap between human and machine intelligence.

replace-cross MTSQL-R1: Towards Long-Horizon Multi-Turn Text-to-SQL via Agentic Training

Authors: Taicheng Guo, Hai Wang, ChaoChun Liu, Mohsen Golalikhani, Xin Chen, Xiangliang Zhang, Chandan K. Reddy

Abstract: Multi-turn Text-to-SQL aims to translate a user's conversational utterances into executable SQL while preserving dialogue coherence and grounding to the target schema. However, most existing systems only regard this task as a simple text translation task and follow a short-horizon paradigm, generating a query per turn without execution, explicit verification, and refinement, which leads to non-executable or incoherent outputs. We present MTSQL-R1, an agentic training framework for long-horizon multi-turn Text-to-SQL. We cast the task as a Markov Decision Process (MDP) in which an agent interacts with (i) a database for execution feedback and (ii) a persistent dialogue memory for coherence verification, performing an iterative propose to execute -> verify -> refine cycle until all checks pass. Experiments on COSQL and SPARC demonstrate that MTSQL-R1 consistently outperforms strong baselines, highlighting the importance of environment-driven verification and memory-guided refinement for conversational semantic parsing. Full recipes (including code, trained models, logs, reasoning trajectories, etc.) will be released after the internal review to contribute to community research.

replace-cross Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models?

Authors: Deokhyung Kang, Seonjeong Hwang, Daehui Kim, Hyounghun Kim, Gary Geunbae Lee

Abstract: Reasoning language models (RLMs) achieve strong performance on complex reasoning tasks, yet they still exhibit a multilingual reasoning gap, performing better in high-resource languages than in low-resource ones. While recent efforts have been made to address this gap, its underlying causes remain largely unexplored. In this work, we show that this gap primarily stems from failures in language understanding-specifically, the model's inability to translate multilingual inputs into the language dominating its reasoning traces (typically English). As identifying understanding failures can enable targeted mitigation of the gap, we evaluate a range of detection methods and find that understanding failures are detectable to a meaningful extent, with supervised approaches performing best. Building on this, we propose Selective Translation, a strategy that incorporates an English translation into the initial reasoning trace only when an understanding failure is detected. Experimental results using Qwen3-4B show that Selective Translation substantially bridges the multilingual reasoning gap, achieving near full-translation performance while translating only about 20% of inputs. Together, our results show that failures in language understanding are the primary driver of the multilingual reasoning gap and can be detected and selectively mitigated, clarifying its origin and suggesting a path toward more equitable multilingual reasoning. Our code and data are publicly available at https://github.com/deokhk/RLM_analysis

URLs: https://github.com/deokhk/RLM_analysis

replace-cross Designing an Optimal Sensor Network via Minimizing Information Loss

Authors: Daniel Waxman, Fernando Llorente, Katia Lamer, Petar M. Djuri\'c

Abstract: Optimal experimental design is a classic topic in statistics, with many well-studied problems, applications, and solutions. The design problem we study is the placement of sensors to monitor spatiotemporal processes, explicitly accounting for the temporal dimension in our modeling and optimization. We observe that recent advancements in computational sciences often yield large datasets based on physics-based simulations, which are rarely leveraged in experimental design. We introduce a novel model-based sensor placement criterion, along with a highly-efficient optimization algorithm, which integrates physics-based simulations and Bayesian experimental design principles to identify sensor networks that "minimize information loss" from simulated data. Our technique relies on sparse variational inference and (separable) Gauss-Markov priors, and thus may adapt many techniques from Bayesian experimental design. We validate our method through a case study monitoring air temperature in Phoenix, Arizona, using state-of-the-art physics-based simulations. Our results show our framework to be superior to random or quasi-random sampling, particularly with a limited number of sensors. We conclude by discussing practical considerations and implications of our framework, including more complex modeling tools and real-world deployments.

replace-cross PrivTune: Efficient and Privacy-Preserving Fine-Tuning of Large Language Models via Device-Cloud Collaboration

Authors: Yi Liu, Weixiang Han, Chengjun Cai, Xingliang Yuan, Cong Wang

Abstract: With the rise of large language models, service providers offer language models as a service, enabling users to fine-tune customized models via uploaded private datasets. However, this raises concerns about sensitive data leakage. Prior methods, relying on differential privacy within device-cloud collaboration frameworks, struggle to balance privacy and utility, exposing users to inference attacks or degrading fine-tuning performance. To address this, we propose PrivTune, an efficient and privacy-preserving fine-tuning framework via Split Learning (SL). The key idea of PrivTune is to inject crafted noise into token representations from the SL bottom model, making each token resemble the $n$-hop indirect neighbors. PrivTune formulates this as an optimization problem to compute the optimal noise vector, aligning with defense-utility goals. On this basis, it then adjusts the parameters (i.e., mean) of the $d_\chi$-Privacy noise distribution to align with the optimization direction and scales the noise according to token importance to minimize distortion. Experiments on five datasets (covering both classification and generation tasks) against three embedding inversion and three attribute inference attacks show that, using RoBERTa on the Stanford Sentiment Treebank dataset, PrivTune reduces the attack success rate to 10% with only a 3.33% drop in utility performance, outperforming state-of-the-art baselines.

replace-cross Iterative Tuning of Nonlinear Model Predictive Control for Robotic Manufacturing Tasks

Authors: Deepak Ingole, Valentin Bhend, Shiva Ganesh Murali, Oliver Dobrich, Alisa Rupenyan

Abstract: Manufacturing processes are often perturbed by drifts in the environment and wear in the system, requiring control re-tuning even in the presence of repetitive operations. This paper presents an iterative learning framework for automatic tuning of Nonlinear Model Predictive Control (NMPC) weighting matrices based on task-level performance feedback. Inspired by norm-optimal Iterative Learning Control (ILC), the proposed method adaptively adjusts NMPC weights Q and R across task repetitions to minimize key performance indicators (KPIs) related to tracking accuracy, control effort, and saturation. Unlike gradient-based approaches that require differentiating through the NMPC solver, we construct an empirical sensitivity matrix, enabling structured weight updates without analytic derivatives. The framework is validated through simulation on a UR10e robot performing carbon fiber winding on a tetrahedral core. Results demonstrate that the proposed approach converges to near-optimal tracking performance (RMSE within 0.3% of offline Bayesian Optimization (BO)) in just 4 online repetitions, compared to 100 offline evaluations required by BO algorithm. The method offers a practical solution for adaptive NMPC tuning in repetitive robotic tasks, combining the precision of carefully optimized controllers with the flexibility of online adaptation.

replace-cross Memento 2: Learning by Stateful Reflective Memory

Authors: Jun Wang

Abstract: We study continual learning in large language model (LLM) based agents that integrate episodic memory with reinforcement learning. We focus on reflection, the ability of an agent to revisit past experience and adjust how it selects future actions, as the central mechanism for continual adaptation without fine tuning model weights. To formalise this, we introduce the Stateful Reflective Decision Process (SRDP), in which an agent maintains and updates episodic memory and alternates between writing new experiences to memory and reading relevant cases to guide decisions. This framework casts reflective memory dynamics as part of the decision process itself and makes them amenable to control and learning analysis. Building on this formulation, we develop a Read-Write Reflective Learning algorithm that incorporates memory retrieval into a soft policy iteration procedure and prove that it converges. We further show that as memory grows and more densely covers the task environment, the resulting policy approaches optimality. Our framework unifies memory based reasoning with reinforcement learning and provides a formal foundation for LLM agents capable of continual, experience driven learning.

replace-cross Benchmark Success, Clinical Failure: When Reinforcement Learning Optimizes for Benchmarks, Not Patients

Authors: Armin Berger, Manuela Bergau, Helen Schneider, Saad Ahmad, Tom Anglim Lagones, Gianluca Brugnara, Martha Foltyn-Dumitru, Kai Schlamp, Philipp Vollmuth, Rafet Sifa

Abstract: Recent Reinforcement Learning (RL) advances for Large Language Models (LLMs) have improved reasoning tasks, yet their resource-constrained application to medical imaging remains underexplored. We introduce ChexReason, a vision-language model trained via R1-style methodology (SFT followed by GRPO) using only 2,000 SFT samples, 1,000 RL samples, and a single A100 GPU. Evaluations on CheXpert and NIH benchmarks reveal a fundamental tension: GRPO recovers in-distribution performance (23% improvement on CheXpert, macro-F1 = 0.346) but degrades cross-dataset transferability (19% drop on NIH). This mirrors high-resource models like NV-Reason-CXR-3B, suggesting the issue stems from the RL paradigm rather than scale. We identify a generalization paradox where the SFT checkpoint uniquely improves on NIH before optimization, indicating teacher-guided reasoning captures more institution-agnostic features. Furthermore, cross-model comparisons show structured reasoning scaffolds benefit general-purpose VLMs but offer minimal gain for medically pre-trained models. Consequently, curated supervised fine-tuning may outperform aggressive RL for clinical deployment requiring robustness across diverse populations.

replace-cross Quantum Intelligence Meets BD-RIS-Enabled AmBC: Challenges, Opportunities, and Practical Insights

Authors: Abd Ullah Khan, Uman Khalid, Trung Q. Duong, Hyundong Shin

Abstract: A beyond-diagonal reconfigurable intelligent surface (BD-RIS) is an innovative type of reconfigurable intelligent surface (RIS) that has recently been proposed and is considered a revolutionary advancement in wave manipulation. Unlike the mutually disconnected arrangement of elements in traditional RISs, BD-RIS creates cost-effective and simple inter-element connections, allowing for greater freedom in configuring the amplitude and phase of impinging waves. However, there are numerous underlying challenges in realizing the advantages associated with BD-RIS, prompting the research community to actively investigate cutting-edge schemes and algorithms in this direction. Particularly, the passive beamforming design for BD-RIS under specific environmental conditions has become a major focus in this research area. In this article, we provide a systematic introduction to BD-RIS, elaborating on its functional principles concerning architectural design, promising advantages, and classification. Subsequently, we present recent advances and identify a series of challenges and opportunities. Additionally, we consider a specific case study where beamforming is designed using four different algorithms, and we analyze their performance with respect to sum rate and computation cost. To augment the beamforming capabilities in 6G BD-RIS with quantum enhancement, we analyze various hybrid quantum-classical machine learning (ML) models to improve beam prediction performance, employing real-world communication Scenario 8 from the DeepSense 6G dataset. Consequently, we derive useful insights about the practical implications of BD-RIS.

replace-cross Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling

Authors: Chulun Zhou, Chunkang Zhang, Guoxin Yu, Fandong Meng, Jie Zhou, Wai Lam, Mo Yu

Abstract: Multi-step retrieval-augmented generation (RAG) has become a widely adopted strategy for enhancing large language models (LLMs) on tasks that demand global comprehension and intensive reasoning. Many RAG systems incorporate a working memory module to consolidate retrieved information. However, existing memory designs function primarily as passive storage that accumulates isolated facts for the purpose of condensing the lengthy inputs and generating new sub-queries through deduction. This static nature overlooks the crucial high-order correlations among primitive facts, the compositions of which can often provide stronger guidance for subsequent steps. Therefore, their representational strength and impact on multi-step reasoning and knowledge evolution are limited, resulting in fragmented reasoning and weak global sense-making capacity in extended contexts. We introduce HGMem, a hypergraph-based memory mechanism that extends the concept of memory beyond simple storage into a dynamic, expressive structure for complex reasoning and global understanding. In our approach, memory is represented as a hypergraph whose hyperedges correspond to distinct memory units, enabling the progressive formation of higher-order interactions within memory. This mechanism connects facts and thoughts around the focal problem, evolving into an integrated and situated knowledge structure that provides strong propositions for deeper reasoning in subsequent steps. We evaluate HGMem on several challenging datasets designed for global sense-making. Extensive experiments and in-depth analyses show that our method consistently improves multi-step RAG and substantially outperforms strong baseline systems across diverse tasks.

replace-cross Training a Huggingface Model on AWS Sagemaker (Without Tears)

Authors: Liling Tan

Abstract: The development of Large Language Models (LLMs) has primarily been driven by resource-rich research groups and industry partners. Due to the lack of on-premise computing resources required for increasingly complex models, many researchers are turning to cloud services like AWS SageMaker to train Hugging Face models. However, the steep learning curve of cloud platforms often presents a barrier for researchers accustomed to local environments. Existing documentation frequently leaves knowledge gaps, forcing users to seek fragmented information across the web. This demo paper aims to democratize cloud adoption by centralizing the essential information required for researchers to successfully train their first Hugging Face model on AWS SageMaker from scratch.