new Neural Autoregressive Modeling of Brain Aging

Authors: Ridvan Yesiloglu, Wei Peng, Md Tauhidul Islam, Ehsan Adeli

Abstract: Brain aging synthesis is a critical task with broad applications in clinical and computational neuroscience. The ability to predict the future structural evolution of a subject's brain from an earlier MRI scan provides valuable insights into aging trajectories. Yet, the high-dimensionality of data, subtle changes of structure across ages, and subject-specific patterns constitute challenges in the synthesis of the aging brain. To overcome these challenges, we propose NeuroAR, a novel brain aging simulation model based on generative autoregressive transformers. NeuroAR synthesizes the aging brain by autoregressively estimating the discrete token maps of a future scan from a convenient space of concatenated token embeddings of a previous and future scan. To guide the generation, it concatenates into each scale the subject's previous scan, and uses its acquisition age and the target age at each block via cross-attention. We evaluate our approach on both the elderly population and adolescent subjects, demonstrating superior performance over state-of-the-art generative models, including latent diffusion models (LDM) and generative adversarial networks, in terms of image fidelity. Furthermore, we employ a pre-trained age predictor to further validate the consistency and realism of the synthesized images with respect to expected aging patterns. NeuroAR significantly outperforms key models, including LDM, demonstrating its ability to model subject-specific brain aging trajectories with high fidelity.

new LLM-Assisted Cheating Detection in Korean Language via Keystrokes

Authors: Dong Hyun Roh, Rajesh Kumar, An Ngo

Abstract: This paper presents a keystroke-based framework for detecting LLM-assisted cheating in Korean, addressing key gaps in prior research regarding language coverage, cognitive context, and the granularity of LLM involvement. Our proposed dataset includes 69 participants who completed writing tasks under three conditions: Bona fide writing, paraphrasing ChatGPT responses, and transcribing ChatGPT responses. Each task spans six cognitive processes defined in Bloom's Taxonomy (remember, understand, apply, analyze, evaluate, and create). We extract interpretable temporal and rhythmic features and evaluate multiple classifiers under both Cognition-Aware and Cognition-Unaware settings. Temporal features perform well under Cognition-Aware evaluation scenarios, while rhythmic features generalize better under cross-cognition scenarios. Moreover, detecting bona fide and transcribed responses was easier than paraphrased ones for both the proposed models and human evaluators, with the models significantly outperforming the humans. Our findings affirm that keystroke dynamics facilitate reliable detection of LLM-assisted writing across varying cognitive demands and writing strategies, including paraphrasing and transcribing LLM-generated responses.

new Scientific Machine Learning with Kolmogorov-Arnold Networks

Authors: Salah A. Faroughi, Farinaz Mostajeran, Amin Hamed Mashhadzadeh, Shirko Faroughi

Abstract: The field of scientific machine learning, which originally utilized multilayer perceptrons (MLPs), is increasingly adopting Kolmogorov-Arnold Networks (KANs) for data encoding. This shift is driven by the limitations of MLPs, including poor interpretability, fixed activation functions, and difficulty capturing localized or high-frequency features. KANs address these issues with enhanced interpretability and flexibility, enabling more efficient modeling of complex nonlinear interactions and effectively overcoming the constraints associated with conventional MLP architectures. This review categorizes recent progress in KAN-based models across three distinct perspectives: (i) data-driven learning, (ii) physics-informed modeling, and (iii) deep operator learning. Each perspective is examined through the lens of architectural design, training strategies, application efficacy, and comparative evaluation against MLP-based counterparts. By benchmarking KANs against MLPs, we highlight consistent improvements in accuracy, convergence, and spectral representation, clarifying KANs' advantages in capturing complex dynamics while learning more effectively. Finally, this review identifies critical challenges and open research questions in KAN development, particularly regarding computational efficiency, theoretical guarantees, hyperparameter tuning, and algorithm complexity. We also outline future research directions aimed at improving the robustness, scalability, and physical consistency of KAN-based frameworks.

new Multi-Hazard Early Warning Systems for Agriculture with Featural-Temporal Explanations

Authors: Boyuan Zheng, Victor W. Chu

Abstract: Climate extremes present escalating risks to agriculture intensifying the need for reliable multi-hazard early warning systems (EWS). The situation is evolving due to climate change and hence such systems should have the intelligent to continue to learn from recent climate behaviours. However, traditional single-hazard forecasting methods fall short in capturing complex interactions among concurrent climatic events. To address this deficiency, in this paper, we combine sequential deep learning models and advanced Explainable Artificial Intelligence (XAI) techniques to introduce a multi-hazard forecasting framework for agriculture. In our experiments, we utilize meteorological data from four prominent agricultural regions in the United States (between 2010 and 2023) to validate the predictive accuracy of our framework on multiple severe event types, which are extreme cold, floods, frost, hail, heatwaves, and heavy rainfall, with tailored models for each area. The framework uniquely integrates attention mechanisms with TimeSHAP (a recurrent XAI explainer for time series) to provide comprehensive temporal explanations revealing not only which climatic features are influential but precisely when their impacts occur. Our results demonstrate strong predictive accuracy, particularly with the BiLSTM architecture, and highlight the system's capacity to inform nuanced, proactive risk management strategies. This research significantly advances the explainability and applicability of multi-hazard EWS, fostering interdisciplinary trust and effective decision-making process for climate risk management in the agricultural industry.

new FedCVD++: Communication-Efficient Federated Learning for Cardiovascular Risk Prediction with Parametric and Non-Parametric Model Optimization

Authors: Abdelrhman Gaber, Hassan Abd-Eltawab, John Elgallab, Youssif Abuzied, Dineo Mpanya, Turgay Celik, Swarun Kumar, Tamer ElBatt

Abstract: Cardiovascular diseases (CVD) cause over 17 million deaths annually worldwide, highlighting the urgent need for privacy-preserving predictive systems. We introduce FedCVD++, an enhanced federated learning (FL) framework that integrates both parametric models (logistic regression, SVM, neural networks) and non-parametric models (Random Forest, XGBoost) for coronary heart disease risk prediction. To address key FL challenges, we propose: (1) tree-subset sampling that reduces Random Forest communication overhead by 70%, (2) XGBoost-based feature extraction enabling lightweight federated ensembles, and (3) federated SMOTE synchronization for resolving cross-institutional class imbalance. Evaluated on the Framingham dataset (4,238 records), FedCVD++ achieves state-of-the-art results: federated XGBoost (F1 = 0.80) surpasses its centralized counterpart (F1 = 0.78), and federated Random Forest (F1 = 0.81) matches non-federated performance. Additionally, our communication-efficient strategies reduce bandwidth consumption by 3.2X while preserving 95% accuracy. Compared to existing FL frameworks, FedCVD++ delivers up to 15% higher F1-scores and superior scalability for multi-institutional deployment. This work represents the first practical integration of non-parametric models into federated healthcare systems, providing a privacy-preserving solution validated under real-world clinical constraints.

new Planning for Cooler Cities: A Multimodal AI Framework for Predicting and Mitigating Urban Heat Stress through Urban Landscape Transformation

Authors: Shengao Yi, Xiaojiang Li, Wei Tu, Tianhong Zhao

Abstract: As extreme heat events intensify due to climate change and urbanization, cities face increasing challenges in mitigating outdoor heat stress. While traditional physical models such as SOLWEIG and ENVI-met provide detailed assessments of human-perceived heat exposure, their computational demands limit scalability for city-wide planning. In this study, we propose GSM-UTCI, a multimodal deep learning framework designed to predict daytime average Universal Thermal Climate Index (UTCI) at 1-meter hyperlocal resolution. The model fuses surface morphology (nDSM), high-resolution land cover data, and hourly meteorological conditions using a feature-wise linear modulation (FiLM) architecture that dynamically conditions spatial features on atmospheric context. Trained on SOLWEIG-derived UTCI maps, GSM-UTCI achieves near-physical accuracy, with an R2 of 0.9151 and a mean absolute error (MAE) of 0.41{\deg}C, while reducing inference time from hours to under five minutes for an entire city. To demonstrate its planning relevance, we apply GSM-UTCI to simulate systematic landscape transformation scenarios in Philadelphia, replacing bare earth, grass, and impervious surfaces with tree canopy. Results show spatially heterogeneous but consistently strong cooling effects, with impervious-to-tree conversion producing the highest aggregated benefit (-4.18{\deg}C average change in UTCI across 270.7 km2). Tract-level bivariate analysis further reveals strong alignment between thermal reduction potential and land cover proportions. These findings underscore the utility of GSM-UTCI as a scalable, fine-grained decision support tool for urban climate adaptation, enabling scenario-based evaluation of greening strategies across diverse urban environments.

new Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead

Authors: Tom S\"uhr, Florian E. Dorner, Olawale Salaudeen, Augustin Kelava, Samira Samadi

Abstract: Large Language Models (LLMs) have achieved remarkable results on a range of standardized tests originally designed to assess human cognitive and psychological traits, such as intelligence and personality. While these results are often interpreted as strong evidence of human-like characteristics in LLMs, this paper argues that such interpretations constitute an ontological error. Human psychological and educational tests are theory-driven measurement instruments, calibrated to a specific human population. Applying these tests to non-human subjects without empirical validation, risks mischaracterizing what is being measured. Furthermore, a growing trend frames AI performance on benchmarks as measurements of traits such as ``intelligence'', despite known issues with validity, data contamination, cultural bias and sensitivity to superficial prompt changes. We argue that interpreting benchmark performance as measurements of human-like traits, lacks sufficient theoretical and empirical justification. This leads to our position: Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead. We call for the development of principled, AI-specific evaluation frameworks tailored to AI systems. Such frameworks might build on existing frameworks for constructing and validating psychometrics tests, or could be created entirely from scratch to fit the unique context of AI.

new Investigating the Invertibility of Multimodal Latent Spaces: Limitations of Optimization-Based Methods

Authors: Siwoo Park

Abstract: This paper investigates the inverse capabilities and broader utility of multimodal latent spaces within task-specific AI (Artificial Intelligence) models. While these models excel at their designed forward tasks (e.g., text-to-image generation, audio-to-text transcription), their potential for inverse mappings remains largely unexplored. We propose an optimization-based framework to infer input characteristics from desired outputs, applying it bidirectionally across Text-Image (BLIP, Flux.1-dev) and Text-Audio (Whisper-Large-V3, Chatterbox-TTS) modalities. Our central hypothesis posits that while optimization can guide models towards inverse tasks, their multimodal latent spaces will not consistently support semantically meaningful and perceptually coherent inverse mappings. Experimental results consistently validate this hypothesis. We demonstrate that while optimization can force models to produce outputs that align textually with targets (e.g., a text-to-image model generating an image that an image captioning model describes correctly, or an ASR model transcribing optimized audio accurately), the perceptual quality of these inversions is chaotic and incoherent. Furthermore, when attempting to infer the original semantic input from generative models, the reconstructed latent space embeddings frequently lack semantic interpretability, aligning with nonsensical vocabulary tokens. These findings highlight a critical limitation. multimodal latent spaces, primarily optimized for specific forward tasks, do not inherently possess the structure required for robust and interpretable inverse mappings. Our work underscores the need for further research into developing truly semantically rich and invertible multimodal latent spaces.

new KLLM: Fast LLM Inference with K-Means Quantization

Authors: Xueying Wu, Baijun Zhou, Zhihui Gao, Yuzhe Fu, Qilin Zheng, Yintao He, Hai Li

Abstract: Large language model (LLM) inference poses significant challenges due to its intensive memory and computation demands. Weight and activation quantization (WAQ) offers a promising solution by reducing both memory footprint and arithmetic complexity. However, two key challenges remain in the existing WAQ designs. (1) Traditional WAQ designs rely on uniform integer-based quantization for hardware efficiency, but this often results in significant accuracy degradation at low precision. K-Means-based quantization, a non-uniform quantization technique, achieves higher accuracy by matching the Gaussian-like distributions of weights and activations in LLMs. However, its non-uniform nature prevents direct execution on low-precision compute units, requiring dequantization and floating-point matrix multiplications (MatMuls) during inference. (2) Activation outliers further hinder effective low-precision WAQ. Offline thresholding methods for outlier detection can lead to significant model performance degradation, while existing online detection techniques introduce substantial runtime overhead. To address the aforementioned challenges and fully unleash the potential of WAQ with K-Means quantization for LLM inference, in this paper, we propose KLLM, a hardware-software co-design framework. KLLM features an index-based computation scheme for efficient execution of MatMuls and nonlinear operations on K-Means-quantized data, which avoids most of the dequantization and full-precision computations. Moreover, KLLM incorporates a novel outlier detection engine, Orizuru, that efficiently identifies the top-$k$ largest and smallest elements in the activation data stream during online inference. Extensive experiments show that, on average, KLLM achieves speedups of 9.67x, 7.03x and energy efficiency improvements of 229.50x, 150.21x compared to the A100 GPU and Atom, respectively.

new Linking Actor Behavior to Process Performance Over Time

Authors: Aur\'elie Leribaux, Rafael Oyamada, Johannes De Smedt, Zahra Dasht Bozorgi, Artem Polyvyanyy, Jochen De Weerdt

Abstract: Understanding how actor behavior influences process outcomes is a critical aspect of process mining. Traditional approaches often use aggregate and static process data, overlooking the temporal and causal dynamics that arise from individual actor behavior. This limits the ability to accurately capture the complexity of real-world processes, where individual actor behavior and interactions between actors significantly shape performance. In this work, we address this gap by integrating actor behavior analysis with Granger causality to identify correlating links in time series data. We apply this approach to realworld event logs, constructing time series for actor interactions, i.e. continuation, interruption, and handovers, and process outcomes. Using Group Lasso for lag selection, we identify a small but consistently influential set of lags that capture the majority of causal influence, revealing that actor behavior has direct and measurable impacts on process performance, particularly throughput time. These findings demonstrate the potential of actor-centric, time series-based methods for uncovering the temporal dependencies that drive process outcomes, offering a more nuanced understanding of how individual behaviors impact overall process efficiency.

new Prediction of Significant Creatinine Elevation in First ICU Stays with Vancomycin Use: A retrospective study through Catboost

Authors: Junyi Fan, Li Sun, Shuheng Chen, Yong Si, Minoo Ahmadi, Greg Placencia, Elham Pishgar, Kamiar Alaei, Maryam Pishgar

Abstract: Background: Vancomycin, a key antibiotic for severe Gram-positive infections in ICUs, poses a high nephrotoxicity risk. Early prediction of kidney injury in critically ill patients is challenging. This study aimed to develop a machine learning model to predict vancomycin-related creatinine elevation using routine ICU data. Methods: We analyzed 10,288 ICU patients (aged 18-80) from the MIMIC-IV database who received vancomycin. Kidney injury was defined by KDIGO criteria (creatinine rise >=0.3 mg/dL within 48h or >=50% within 7d). Features were selected via SelectKBest (top 30) and Random Forest ranking (final 15). Six algorithms were tested with 5-fold cross-validation. Interpretability was evaluated using SHAP, Accumulated Local Effects (ALE), and Bayesian posterior sampling. Results: Of 10,288 patients, 2,903 (28.2%) developed creatinine elevation. CatBoost performed best (AUROC 0.818 [95% CI: 0.801-0.834], sensitivity 0.800, specificity 0.681, negative predictive value 0.900). Key predictors were phosphate, total bilirubin, magnesium, Charlson index, and APSIII. SHAP confirmed phosphate as a major risk factor. ALE showed dose-response patterns. Bayesian analysis estimated mean risk 60.5% (95% credible interval: 16.8-89.4%) in high-risk cases. Conclusions: This machine learning model predicts vancomycin-associated creatinine elevation from routine ICU data with strong accuracy and interpretability, enabling early risk detection and supporting timely interventions in critical care.

new Locally Differentially Private Thresholding Bandits

Authors: Annalisa Barbara, Joseph Lazzaro, Ciara Pike-Burke

Abstract: This work investigates the impact of ensuring local differential privacy in the thresholding bandit problem. We consider both the fixed budget and fixed confidence settings. We propose methods that utilize private responses, obtained through a Bernoulli-based differentially private mechanism, to identify arms with expected rewards exceeding a predefined threshold. We show that this procedure provides strong privacy guarantees and derive theoretical performance bounds on the proposed algorithms. Additionally, we present general lower bounds that characterize the additional loss incurred by any differentially private mechanism, and show that the presented algorithms match these lower bounds up to poly-logarithmic factors. Our results provide valuable insights into privacy-preserving decision-making frameworks in bandit problems.

new A Foundation Model for Material Fracture Prediction

Authors: Agnese Marcato, Aleksandra Pachalieva, Ryley G. Hill, Kai Gao, Xiaoyu Wang, Esteban Rougier, Zhou Lei, Vinamra Agrawal, Janel Chua, Qinjun Kang, Jeffrey D. Hyman, Abigail Hunter, Nathan DeBardeleben, Earl Lawrence, Hari Viswanathan, Daniel O'Malley, Javier E. Santos

Abstract: Accurately predicting when and how materials fail is critical to designing safe, reliable structures, mechanical systems, and engineered components that operate under stress. Yet, fracture behavior remains difficult to model across the diversity of materials, geometries, and loading conditions in real-world applications. While machine learning (ML) methods show promise, most models are trained on narrow datasets, lack robustness, and struggle to generalize. Meanwhile, physics-based simulators offer high-fidelity predictions but are fragmented across specialized methods and require substantial high-performance computing resources to explore the input space. To address these limitations, we present a data-driven foundation model for fracture prediction, a transformer-based architecture that operates across simulators, a wide range of materials (including plastic-bonded explosives, steel, aluminum, shale, and tungsten), and diverse loading conditions. The model supports both structured and unstructured meshes, combining them with large language model embeddings of textual input decks specifying material properties, boundary conditions, and solver settings. This multimodal input design enables flexible adaptation across simulation scenarios without changes to the model architecture. The trained model can be fine-tuned with minimal data on diverse downstream tasks, including time-to-failure estimation, modeling fracture evolution, and adapting to combined finite-discrete element method simulations. It also generalizes to unseen materials such as titanium and concrete, requiring as few as a single sample, dramatically reducing data needs compared to standard ML. Our results show that fracture prediction can be unified under a single model architecture, offering a scalable, extensible alternative to simulator-specific workflows.

new On the Sustainability of AI Inferences in the Edge

Authors: Ghazal Sobhani, Md. Monzurul Amin Ifath, Tushar Sharma, Israat Haque

Abstract: The proliferation of the Internet of Things (IoT) and its cutting-edge AI-enabled applications (e.g., autonomous vehicles and smart industries) combine two paradigms: data-driven systems and their deployment on the edge. Usually, edge devices perform inferences to support latency-critical applications. In addition to the performance of these resource-constrained edge devices, their energy usage is a critical factor in adopting and deploying edge applications. Examples of such devices include Raspberry Pi (RPi), Intel Neural Compute Stick (INCS), NVIDIA Jetson nano (NJn), and Google Coral USB (GCU). Despite their adoption in edge deployment for AI inferences, there is no study on their performance and energy usage for informed decision-making on the device and model selection to meet the demands of applications. This study fills the gap by rigorously characterizing the performance of traditional, neural networks, and large language models on the above-edge devices. Specifically, we analyze trade-offs among model F1 score, inference time, inference power, and memory usage. Hardware and framework optimization, along with external parameter tuning of AI models, can balance between model performance and resource usage to realize practical edge AI deployments.

new Scalable Generative Modeling of Weighted Graphs

Authors: Richard Williams, Eric Nalisnick, Andrew Holbrook

Abstract: Weighted graphs are ubiquitous throughout biology, chemistry, and the social sciences, motivating the development of generative models for abstract weighted graph data using deep neural networks. However, most current deep generative models are either designed for unweighted graphs and are not easily extended to weighted topologies or incorporate edge weights without consideration of a joint distribution with topology. Furthermore, learning a distribution over weighted graphs must account for complex nonlocal dependencies between both the edges of the graph and corresponding weights of each edge. We develop an autoregressive model BiGG-E, a nontrivial extension of the BiGG model, that learns a joint distribution over weighted graphs while still exploiting sparsity to generate a weighted graph with $n$ nodes and $m$ edges in $O((n + m)\log n)$ time. Simulation studies and experiments on a variety of benchmark datasets demonstrate that BiGG-E best captures distributions over weighted graphs while remaining scalable and computationally efficient.

new FLOSS: Federated Learning with Opt-Out and Straggler Support

Authors: David J Goetze, Dahlia J Felten, Jeannie R Albrecht, Rohit Bhattacharya

Abstract: Previous work on data privacy in federated learning systems focuses on privacy-preserving operations for data from users who have agreed to share their data for training. However, modern data privacy agreements also empower users to use the system while opting out of sharing their data as desired. When combined with stragglers that arise from heterogeneous device capabilities, the result is missing data from a variety of sources that introduces bias and degrades model performance. In this paper, we present FLOSS, a system that mitigates the impacts of such missing data on federated learning in the presence of stragglers and user opt-out, and empirically demonstrate its performance in simulations.

new Evaluating and Improving the Robustness of Speech Command Recognition Models to Noise and Distribution Shifts

Authors: Ana\"is Baranger, Lucas Maison

Abstract: Although prior work in computer vision has shown strong correlations between in-distribution (ID) and out-of-distribution (OOD) accuracies, such relationships remain underexplored in audio-based models. In this study, we investigate how training conditions and input features affect the robustness and generalization abilities of spoken keyword classifiers under OOD conditions. We benchmark several neural architectures across a variety of evaluation sets. To quantify the impact of noise on generalization, we make use of two metrics: Fairness (F), which measures overall accuracy gains compared to a baseline model, and Robustness (R), which assesses the convergence between ID and OOD performance. Our results suggest that noise-aware training improves robustness in some configurations. These findings shed new light on the benefits and limitations of noise-based augmentation for generalization in speech models.

new Observational Multiplicity

Authors: Erin George, Deanna Needell, Berk Ustun

Abstract: Many prediction tasks can admit multiple models that can perform almost equally well. This phenomenon can can undermine interpretability and safety when competing models assign conflicting predictions to individuals. In this work, we study how arbitrariness can arise in probabilistic classification tasks as a result of an effect that we call \emph{observational multiplicity}. We discuss how this effect arises in a broad class of practical applications where we learn a classifier to predict probabilities $p_i \in [0,1]$ but are given a dataset of observations $y_i \in \{0,1\}$. We propose to evaluate the arbitrariness of individual probability predictions through the lens of \emph{regret}. We introduce a measure of regret for probabilistic classification tasks, which measures how the predictions of a model could change as a result of different training labels change. We present a general-purpose method to estimate the regret in a probabilistic classification task. We use our measure to show that regret is higher for certain groups in the dataset and discuss potential applications of regret. We demonstrate how estimating regret promote safety in real-world applications by abstention and data collection.

new AI paradigm for solving differential equations: first-principles data generation and scale-dilation operator AI solver

Authors: Xiangshu Gong, Zhiqiang Xie, Xiaowei Jin, Chen Wang, Yanling Qu, Wangmeng Zuo, Hui Li

Abstract: Many problems are governed by differential equations (DEs). Artificial intelligence (AI) is a new path for solving DEs. However, data is very scarce and existing AI solvers struggle with approximation of high frequency components (AHFC). We propose an AI paradigm for solving diverse DEs, including DE-ruled first-principles data generation methodology and scale-dilation operator (SDO) AI solver. Using either prior knowledge or random fields, we generate solutions and then substitute them into the DEs to derive the sources and initial/boundary conditions through balancing DEs, thus producing arbitrarily vast amount of, first-principles-consistent training datasets at extremely low computational cost. We introduce a reversible SDO that leverages the Fourier transform of the multiscale solutions to fix AHFC, and design a spatiotemporally coupled, attention-based Transformer AI solver of DEs with SDO. An upper bound on the Hessian condition number of the loss function is proven to be proportional to the squared 2-norm of the solution gradient, revealing that SDO yields a smoother loss landscape, consequently fixing AHFC with efficient training. Extensive tests on diverse DEs demonstrate that our AI paradigm achieves consistently superior accuracy over state-of-the-art methods. This work makes AI solver of DEs to be truly usable in broad nature and engineering fields.

new FuseTen: A Generative Model for Daily 10 m Land Surface Temperature Estimation from Spatio-Temporal Satellite Observations

Authors: Sofiane Bouaziz, Adel Hafiane, Raphael Canals, Rachid Nedjai

Abstract: Urban heatwaves, droughts, and land degradation are pressing and growing challenges in the context of climate change. A valuable approach to studying them requires accurate spatio-temporal information on land surface conditions. One of the most important variables for assessing and understanding these phenomena is Land Surface Temperature (LST), which is derived from satellites and provides essential information about the thermal state of the Earth's surface. However, satellite platforms inherently face a trade-off between spatial and temporal resolutions. To bridge this gap, we propose FuseTen, a novel generative framework that produces daily LST observations at a fine 10 m spatial resolution by fusing spatio-temporal observations derived from Sentinel-2, Landsat 8, and Terra MODIS. FuseTen employs a generative architecture trained using an averaging-based supervision strategy grounded in physical principles. It incorporates attention and normalization modules within the fusion process and uses a PatchGAN discriminator to enforce realism. Experiments across multiple dates show that FuseTen outperforms linear baselines, with an average 32.06% improvement in quantitative metrics and 31.42% in visual fidelity. To the best of our knowledge, this is the first non-linear method to generate daily LST estimates at such fine spatial resolution.

new BAR Conjecture: the Feasibility of Inference Budget-Constrained LLM Services with Authenticity and Reasoning

Authors: Jinan Zhou, Rajat Ghosh, Vaishnavi Bhargava, Debojyoti Dutta, Aryan Singhal

Abstract: When designing LLM services, practitioners care about three key properties: inference-time budget, factual authenticity, and reasoning capacity. However, our analysis shows that no model can simultaneously optimize for all three. We formally prove this trade-off and propose a principled framework named The BAR Theorem for LLM-application design.

new NaN-Propagation: A Novel Method for Sparsity Detection in Black-Box Computational Functions

Authors: Peter Sharpe

Abstract: Sparsity detection in black-box functions enables significant computational speedups in gradient-based optimization through Jacobian compression, but existing finite-difference methods suffer from false negatives due to coincidental zero gradients. These false negatives can silently corrupt gradient calculations, leading to difficult-to-diagnose errors. We introduce NaN-propagation, which exploits the universal contamination property of IEEE 754 Not-a-Number floating-point values to trace input-output dependencies through floating-point numerical computations. By systematically contaminating inputs with NaN and observing which outputs become NaN, the method reconstructs conservative sparsity patterns that eliminate false negatives. We demonstrate the approach on an aerospace wing weight model, achieving a 1.52x speedup while detecting dozens of dependencies missed by conventional methods -- a significant improvement since gradient computation is the bottleneck in many optimization workflows. The technique leverages IEEE 754 compliance to work across programming languages and math libraries without modifying existing black-box codes. Advanced strategies including NaN payload encoding enable faster-than-linear time complexity, improving upon existing black-box sparsity detection methods. Practical algorithms are also proposed to mitigate challenges from branching code execution common in engineering applications.

new Zero-Shot Document Understanding using Pseudo Table of Contents-Guided Retrieval-Augmented Generation

Authors: Hyeon Seong Jeong, Sangwoo Jo, Byeong Hyun Yoon, Yoonseok Heo, Haedong Jeong, Taehoon Kim

Abstract: Understanding complex multimodal documents remains challenging due to their structural inconsistencies and limited training data availability. We introduce \textit{DocsRay}, a training-free document understanding system that integrates pseudo Table of Contents (TOC) generation with hierarchical Retrieval-Augmented Generation (RAG). Our approach leverages multimodal Large Language Models' (LLMs) native capabilities to seamlessly process documents containing diverse elements such as text, images, charts, and tables without requiring specialized models or additional training. DocsRay's framework synergistically combines three key techniques: (1) a semantic structuring module using prompt-based LLM interactions to generate a hierarchical pseudo-TOC, (2) zero-shot multimodal analysis that converts diverse document elements into unified, text-centric representations using the inherent capabilities of multimodal LLMs, and (3) an efficient two-stage hierarchical retrieval system that reduces retrieval complexity from $O(N)$ to $O(S + k_1 \cdot N_s)$. Evaluated on documents averaging 49.4 pages and 20,971 textual tokens, DocsRay reduced query latency from 3.89 to 2.12 seconds, achieving a 45% efficiency improvement. On the MMLongBench-Doc benchmark, DocsRay-Pro attains an accuracy of 64.7%, substantially surpassing previous state-of-the-art results.

new A Single Direction of Truth: An Observer Model's Linear Residual Probe Exposes and Steers Contextual Hallucinations

Authors: Charles O'Neill, Slava Chalnev, Chi Chi Zhao, Max Kirkby, Mudith Jayasekara

Abstract: Contextual hallucinations -- statements unsupported by given context -- remain a significant challenge in AI. We demonstrate a practical interpretability insight: a generator-agnostic observer model detects hallucinations via a single forward pass and a linear probe on its residual stream. This probe isolates a single, transferable linear direction separating hallucinated from faithful text, outperforming baselines by 5-27 points and showing robust mid-layer performance across Gemma-2 models (2B to 27B). Gradient-times-activation localises this signal to sparse, late-layer MLP activity. Critically, manipulating this direction causally steers generator hallucination rates, proving its actionability. Our results offer novel evidence of internal, low-dimensional hallucination tracking linked to specific MLP sub-circuits, exploitable for detection and mitigation. We release the 2000-example ContraTales benchmark for realistic assessment of such solutions.

new Efficient Machine Unlearning via Influence Approximation

Authors: Jiawei Liu, Chenwang Wu, Defu Lian, Enhong Chen

Abstract: Due to growing privacy concerns, machine unlearning, which aims at enabling machine learning models to ``forget" specific training data, has received increasing attention. Among existing methods, influence-based unlearning has emerged as a prominent approach due to its ability to estimate the impact of individual training samples on model parameters without retraining. However, this approach suffers from prohibitive computational overhead arising from the necessity to compute the Hessian matrix and its inverse across all training samples and parameters, rendering it impractical for large-scale models and scenarios involving frequent data deletion requests. This highlights the difficulty of forgetting. Inspired by cognitive science, which suggests that memorizing is easier than forgetting, this paper establishes a theoretical link between memorizing (incremental learning) and forgetting (unlearning). This connection allows machine unlearning to be addressed from the perspective of incremental learning. Unlike the time-consuming Hessian computations in unlearning (forgetting), incremental learning (memorizing) typically relies on more efficient gradient optimization, which supports the aforementioned cognitive theory. Based on this connection, we introduce the Influence Approximation Unlearning (IAU) algorithm for efficient machine unlearning from the incremental perspective. Extensive empirical evaluations demonstrate that IAU achieves a superior balance among removal guarantee, unlearning efficiency, and comparable model utility, while outperforming state-of-the-art methods across diverse datasets and model architectures. Our code is available at https://github.com/Lolo1222/IAU.

URLs: https://github.com/Lolo1222/IAU.

new DynaSwarm: Dynamically Graph Structure Selection for LLM-based Multi-agent System

Authors: Hui Yi Leong, Yuqing Wu

Abstract: Current multi-agent systems (MAS) frameworks often rely on manually designed and static collaboration graph structures, limiting adaptability and performance. To address these limitations, we propose DynaSwarm, a dynamic framework that enhances LLM-based MAS through two key innovations: (1) an actor-critic reinforcement learning (A2C) mechanism to optimize graph structures with improved stability over prior RL methods, and (2) a dynamic graph selector that adaptively chooses the optimal graph structure for each input sample via parameter-efficient LLM fine-tuning. DynaSwarm eliminates the need for rigid, one-fits-all graph architectures, instead leveraging sample-specific idiosyncrasies to dynamically route queries through specialized agent networks. (c) We propose to fine-tune the demonstration retriever to fully exploit the power of in-context learning (ICL). Extensive experiments on question answering, mathematical reasoning, and coding tasks demonstrate that DynaSwarm consistently outperforms state-of-the-art single-agent and MAS baselines across multiple LLM backbones. Our findings highlight the importance of sample-aware structural flexibility in LLM MAS designs.

new Evaluating the Dynamics of Membership Privacy in Deep Learning

Authors: Yuetian Chen, Zhiqi Wang, Nathalie Baracaldo, Swanand Ravindra Kadhe, Lei Yu

Abstract: Membership inference attacks (MIAs) pose a critical threat to the privacy of training data in deep learning. Despite significant progress in attack methodologies, our understanding of when and how models encode membership information during training remains limited. This paper presents a dynamic analytical framework for dissecting and quantifying privacy leakage dynamics at the individual sample level. By tracking per-sample vulnerabilities on an FPR-TPR plane throughout training, our framework systematically measures how factors such as dataset complexity, model architecture, and optimizer choice influence the rate and severity at which samples become vulnerable. Crucially, we discover a robust correlation between a sample's intrinsic learning difficulty, and find that the privacy risk of samples highly vulnerable in the final trained model is largely determined early during training. Our results thus provide a deeper understanding of how privacy risks dynamically emerge during training, laying the groundwork for proactive, privacy-aware model training strategies.

new SequenceLayers: Sequence Processing and Streaming Neural Networks Made Easy

Authors: RJ Skerry-Ryan, Julian Salazar, Soroosh Mariooryad, David Kao, Daisy Stanton, Eric Battenberg, Matt Shannon, Ron J. Weiss, Robin Scheibler, Jonas Rothfuss, Tom Bagby

Abstract: We introduce a neural network layer API and library for sequence modeling, designed for easy creation of sequence models that can be executed both layer-by-layer (e.g., teacher-forced training) and step-by-step (e.g., autoregressive sampling). To achieve this, layers define an explicit representation of their state over time (e.g., a Transformer KV cache, a convolution buffer, an RNN hidden state), and a step method that evolves that state, tested to give identical results to a stateless layer-wise invocation. This and other aspects of the SequenceLayers contract enables complex models to be immediately streamable, mitigates a wide range of common bugs arising in both streaming and parallel sequence processing, and can be implemented in any deep learning library. A composable and declarative API, along with a comprehensive suite of layers and combinators, streamlines the construction of production-scale models from simple streamable components while preserving strong correctness guarantees. Our current implementations of SequenceLayers (JAX, TensorFlow 2) are available at https://github.com/google/sequence-layers.

URLs: https://github.com/google/sequence-layers.

new An Interpretable Data-Driven Unsupervised Approach for the Prevention of Forgotten Items

Authors: Luca Corbucci, Javier Alejandro Borges Legrottaglie, Francesco Spinnato, Anna Monreale, Riccardo Guidotti

Abstract: Accurately identifying items forgotten during a supermarket visit and providing clear, interpretable explanations for recommending them remains an underexplored problem within the Next Basket Prediction (NBP) domain. Existing NBP approaches typically only focus on forecasting future purchases, without explicitly addressing the detection of unintentionally omitted items. This gap is partly due to the scarcity of real-world datasets that allow for the reliable estimation of forgotten items. Furthermore, most current NBP methods rely on black-box models, which lack transparency and limit the ability to justify recommendations to end users. In this paper, we formally introduce the forgotten item prediction task and propose two novel interpretable-by-design algorithms. These methods are tailored to identify forgotten items while offering intuitive, human-understandable explanations. Experiments on a real-world retail dataset show our algorithms outperform state-of-the-art NBP baselines by 10-15% across multiple evaluation metrics.

new Good Learners Think Their Thinking: Generative PRM Makes Large Reasoning Model More Efficient Math Learner

Authors: Tao He, Rongchuan Mu, Lizi Liao, Yixin Cao, Ming Liu, Bing Qin

Abstract: Large reasoning models (LRMs) have recently shown promise in solving complex math problems when optimized with Reinforcement Learning (RL). But conventional approaches rely on outcome-only rewards that provide sparse feedback, resulting in inefficient optimization process. In this work, we investigate the function of process reward models (PRMs) to accelerate the RL training for LRMs. We propose a novel intrinsic signal-driven generative process evaluation mechanism operating at the thought level to address major bottlenecks in RL-based training. Specifically, instead of requiring PRMs to know how to solve problems, our method uses intrinsic signals in solutions to judge stepwise correctness and aggregate contiguous correct/incorrect steps into coherent 'thought' units. This structured, thought-level rewards enable more reliable credit assignment by reducing ambiguity in step segmentation and alleviating reward hacking. We further introduce a capability-adaptive reward mechanism that dynamically balances exploration and exploitation based on the LRM's current proficiency, guiding learning without stifling creative trial-and-error. These innovations are integrated into a new off-policy RL algorithm, TP-GRPO, which extends grouped proximal optimization with process-based rewards and improves training efficiency. Experiments on 1.5B and 7B parameter LRMs demonstrate that our method achieves higher problem-solving accuracy with significantly fewer training samples than outcome-only reward baselines. The results validate that well-structured process rewards can substantially accelerate LRM optimization in math reasoning tasks. Code is available at https://github.com/cs-holder/tp_grpo.

URLs: https://github.com/cs-holder/tp_grpo.

new Scalable and Precise Patch Robustness Certification for Deep Learning Models with Top-k Predictions

Authors: Qilin Zhou, Haipeng Wang, Zhengyuan Wei, W. K. Chan

Abstract: Patch robustness certification is an emerging verification approach for defending against adversarial patch attacks with provable guarantees for deep learning systems. Certified recovery techniques guarantee the prediction of the sole true label of a certified sample. However, existing techniques, if applicable to top-k predictions, commonly conduct pairwise comparisons on those votes between labels, failing to certify the sole true label within the top k prediction labels precisely due to the inflation on the number of votes controlled by the attacker (i.e., attack budget); yet enumerating all combinations of vote allocation suffers from the combinatorial explosion problem. We propose CostCert, a novel, scalable, and precise voting-based certified recovery defender. CostCert verifies the true label of a sample within the top k predictions without pairwise comparisons and combinatorial explosion through a novel design: whether the attack budget on the sample is infeasible to cover the smallest total additional votes on top of the votes uncontrollable by the attacker to exclude the true labels from the top k prediction labels. Experiments show that CostCert significantly outperforms the current state-of-the-art defender PatchGuard, such as retaining up to 57.3% in certified accuracy when the patch size is 96, whereas PatchGuard has already dropped to zero.

new Designing Dynamic Pricing for Bike-sharing Systems via Differentiable Agent-based Simulation

Authors: Tatsuya Mitomi, Fumiyasu Makinoshima, Fumiya Makihara, Eigo Segawa

Abstract: Bike-sharing systems are emerging in various cities as a new ecofriendly transportation system. In these systems, spatiotemporally varying user demands lead to imbalanced inventory at bicycle stations, resulting in additional relocation costs. Therefore, it is essential to manage user demand through optimal dynamic pricing for the system. However, optimal pricing design for such a system is challenging because the system involves users with diverse backgrounds and their probabilistic choices. To address this problem, we develop a differentiable agent-based simulation to rapidly design dynamic pricing in bike-sharing systems, achieving balanced bicycle inventory despite spatiotemporally heterogeneous trips and probabilistic user decisions. We first validate our approach against conventional methods through numerical experiments involving 25 bicycle stations and five time slots, yielding 100 parameters. Compared to the conventional methods, our approach obtains a more accurate solution with a 73% to 78% reduction in loss while achieving more than a 100-fold increase in convergence speed. We further validate our approach on a large-scale urban bike-sharing system scenario involving 289 bicycle stations, resulting in a total of 1156 parameters. Through simulations using the obtained pricing policies, we confirm that these policies can naturally induce balanced inventory without any manual relocation. Additionally, we find that the cost of discounts to induce the balanced inventory can be minimized by setting appropriate initial conditions.

new Causal Explanation of Concept Drift -- A Truly Actionable Approach

Authors: David Komnick, Kathrin Lammers, Barbara Hammer, Valerie Vaquet, Fabian Hinder

Abstract: In a world that constantly changes, it is crucial to understand how those changes impact different systems, such as industrial manufacturing or critical infrastructure. Explaining critical changes, referred to as concept drift in the field of machine learning, is the first step towards enabling targeted interventions to avoid or correct model failures, as well as malfunctions and errors in the physical world. Therefore, in this work, we extend model-based drift explanations towards causal explanations, which increases the actionability of the provided explanations. We evaluate our explanation strategy on a number of use cases, demonstrating the practical usefulness of our framework, which isolates the causally relevant features impacted by concept drift and, thus, allows for targeted intervention.

new Policy Learning from Large Vision-Language Model Feedback without Reward Modeling

Authors: Tung M. Luu, Donghoon Lee, Younghwan Lee, Chang D. Yoo

Abstract: Offline reinforcement learning (RL) provides a powerful framework for training robotic agents using pre-collected, suboptimal datasets, eliminating the need for costly, time-consuming, and potentially hazardous online interactions. This is particularly useful in safety-critical real-world applications, where online data collection is expensive and impractical. However, existing offline RL algorithms typically require reward labeled data, which introduces an additional bottleneck: reward function design is itself costly, labor-intensive, and requires significant domain expertise. In this paper, we introduce PLARE, a novel approach that leverages large vision-language models (VLMs) to provide guidance signals for agent training. Instead of relying on manually designed reward functions, PLARE queries a VLM for preference labels on pairs of visual trajectory segments based on a language task description. The policy is then trained directly from these preference labels using a supervised contrastive preference learning objective, bypassing the need to learn explicit reward models. Through extensive experiments on robotic manipulation tasks from the MetaWorld, PLARE achieves performance on par with or surpassing existing state-of-the-art VLM-based reward generation methods. Furthermore, we demonstrate the effectiveness of PLARE in real-world manipulation tasks with a physical robot, further validating its practical applicability.

new A Machine Learning Approach for Honey Adulteration Detection using Mineral Element Profiles

Authors: Mokhtar A. Al-Awadhi, Ratnadeep R. Deshmukh

Abstract: This paper aims to develop a Machine Learning (ML)-based system for detecting honey adulteration utilizing honey mineral element profiles. The proposed system comprises two phases: preprocessing and classification. The preprocessing phase involves the treatment of missing-value attributes and normalization. In the classifica-tion phase, we use three supervised ML models: logistic regression, decision tree, and random forest, to dis-criminate between authentic and adulterated honey. To evaluate the performance of the ML models, we use a public dataset comprising measurements of mineral element content of authentic honey, sugar syrups, and adul-terated honey. Experimental findings show that mineral element content in honey provides robust discriminative information for detecting honey adulteration. Results also demonstrate that the random forest-based classifier outperforms other classifiers on this dataset, achieving the highest cross-validation accuracy of 98.37%.

new Detection of Adulteration in Coconut Milk using Infrared Spectroscopy and Machine Learning

Authors: Mokhtar A. Al-Awadhi, Ratnadeep R. Deshmukh

Abstract: In this paper, we propose a system for detecting adulteration in coconut milk, utilizing infrared spectroscopy. The machine learning-based proposed system comprises three phases: preprocessing, feature extraction, and classification. The first phase involves removing irrelevant data from coconut milk spectral signals. In the second phase, we employ the Linear Discriminant Analysis (LDA) algorithm for extracting the most discriminating features. In the third phase, we use the K-Nearest Neighbor (KNN) model to classify coconut milk samples into authentic or adulterated. We evaluate the performance of the proposed system using a public dataset comprising Fourier Transform Infrared (FTIR) spectral information of pure and contaminated coconut milk samples. Findings show that the proposed method successfully detects adulteration with a cross-validation accuracy of 93.33%.

new Merging Memory and Space: A Spatiotemporal State Space Neural Operator

Authors: Nodens F. Koren, Samuel Lanthaler

Abstract: We propose the Spatiotemporal State Space Neural Operator (ST-SSM), a compact architecture for learning solution operators of time-dependent partial differential equations (PDEs). ST-SSM introduces a novel factorization of the spatial and temporal dimensions, using structured state-space models to independently model temporal evolution and spatial interactions. This design enables parameter efficiency and flexible modeling of long-range spatiotemporal dynamics. A theoretical connection is established between SSMs and neural operators, and a unified universality theorem is proved for the resulting class of architectures. Empirically, we demonstrate that our factorized formulation outperforms alternative schemes such as zigzag scanning and parallel independent processing on several PDE benchmarks, including 1D Burgers' equation, 1D Kuramoto-Sivashinsky equation, and 2D Navier-Stokes equations under varying physical conditions. Our model performs competitively with existing baselines while using significantly fewer parameters. In addition, our results reinforce previous findings on the benefits of temporal memory by showing improved performance under partial observability. Our results highlight the advantages of dimensionally factorized operator learning for efficient and generalizable PDE modeling, and put this approach on a firm theoretical footing.

new Coflex: Enhancing HW-NAS with Sparse Gaussian Processes for Efficient and Scalable DNN Accelerator Design

Authors: Yinhui Ma, Tomomasa Yamasaki, Zhehui Wang, Tao Luo, Bo Wang

Abstract: Hardware-Aware Neural Architecture Search (HW-NAS) is an efficient approach to automatically co-optimizing neural network performance and hardware energy efficiency, making it particularly useful for the development of Deep Neural Network accelerators on the edge. However, the extensive search space and high computational cost pose significant challenges to its practical adoption. To address these limitations, we propose Coflex, a novel HW-NAS framework that integrates the Sparse Gaussian Process (SGP) with multi-objective Bayesian optimization. By leveraging sparse inducing points, Coflex reduces the GP kernel complexity from cubic to near-linear with respect to the number of training samples, without compromising optimization performance. This enables scalable approximation of large-scale search space, substantially decreasing computational overhead while preserving high predictive accuracy. We evaluate the efficacy of Coflex across various benchmarks, focusing on accelerator-specific architecture. Our experi- mental results show that Coflex outperforms state-of-the-art methods in terms of network accuracy and Energy-Delay-Product, while achieving a computational speed-up ranging from 1.9x to 9.5x.

new Manifold-regularised Signature Kernel Large-Margin $\ell_p$-SVDD for Multidimensional Time Series Anomaly Detection

Authors: Shervin Rahimzadeh Arashloo

Abstract: We generalise the recently introduced large-margin $\ell_p$-SVDD approach to exploit the geometry of data distribution via manifold regularising and a signature kernel representation for time series anomaly detection. Specifically, we formulate a manifold-regularised variant of the $\ell_p$-SVDD method to encourage label smoothness on the underlying manifold to capture structural information for improved detection performance. Drawing on an existing Representer theorem, we then provide an effective optimisation technique for the proposed method and show that it can benefit from the signature kernel to capture time series complexities for anomaly detection. We theoretically study the proposed approach using Rademacher complexities to analyse its generalisation performance and also provide an experimental assessment of the proposed method across various data sets to compare its performance against other methods.

new Explainable artificial intelligence model predicting the risk of all-cause mortality in patients with type 2 diabetes mellitus

Authors: Olga Vershinina, Jacopo Sabbatinelli, Anna Rita Bonfigli, Dalila Colombaretti, Angelica Giuliani, Mikhail Krivonosov, Arseniy Trukhanov, Claudio Franceschi, Mikhail Ivanchenko, Fabiola Olivieri

Abstract: Objective. Type 2 diabetes mellitus (T2DM) is a highly prevalent non-communicable chronic disease that substantially reduces life expectancy. Accurate estimation of all-cause mortality risk in T2DM patients is crucial for personalizing and optimizing treatment strategies. Research Design and Methods. This study analyzed a cohort of 554 patients (aged 40-87 years) with diagnosed T2DM over a maximum follow-up period of 16.8 years, during which 202 patients (36%) died. Key survival-associated features were identified, and multiple machine learning (ML) models were trained and validated to predict all-cause mortality risk. To improve model interpretability, Shapley additive explanations (SHAP) was applied to the best-performing model. Results. The extra survival trees (EST) model, incorporating ten key features, demonstrated the best predictive performance. The model achieved a C-statistic of 0.776, with the area under the receiver operating characteristic curve (AUC) values of 0.86, 0.80, 0.841, and 0.826 for 5-, 10-, 15-, and 16.8-year all-cause mortality predictions, respectively. The SHAP approach was employed to interpret the model's individual decision-making processes. Conclusions. The developed model exhibited strong predictive performance for mortality risk assessment. Its clinically interpretable outputs enable potential bedside application, improving the identification of high-risk patients and supporting timely treatment optimization.

new Incorporating structural uncertainty in causal decision making

Authors: Maurits Kaptein

Abstract: Practitioners making decisions based on causal effects typically ignore structural uncertainty. We analyze when this uncertainty is consequential enough to warrant methodological solutions (Bayesian model averaging over competing causal structures). Focusing on bivariate relationships ($X \rightarrow Y$ vs. $X \leftarrow Y$), we establish that model averaging is beneficial when: (1) structural uncertainty is moderate to high, (2) causal effects differ substantially between structures, and (3) loss functions are sufficiently sensitive to the size of the causal effect. We prove optimality results of our suggested methodological solution under regularity conditions and demonstrate through simulations that modern causal discovery methods can provide, within limits, the necessary quantification. Our framework complements existing robust causal inference approaches by addressing a distinct source of uncertainty typically overlooked in practice.

new Directional Ensemble Aggregation for Actor-Critics

Authors: Nicklas Werge, Yi-Shan Wu, Bahareh Tasdighi, Melih Kandemir

Abstract: Off-policy reinforcement learning in continuous control tasks depends critically on accurate $Q$-value estimates. Conservative aggregation over ensembles, such as taking the minimum, is commonly used to mitigate overestimation bias. However, these static rules are coarse, discard valuable information from the ensemble, and cannot adapt to task-specific needs or different learning regimes. We propose Directional Ensemble Aggregation (DEA), an aggregation method that adaptively combines $Q$-value estimates in actor-critic frameworks. DEA introduces two fully learnable directional parameters: one that modulates critic-side conservatism and another that guides actor-side policy exploration. Both parameters are learned using ensemble disagreement-weighted Bellman errors, which weight each sample solely by the direction of its Bellman error. This directional learning mechanism allows DEA to adjust conservatism and exploration in a data-driven way, adapting aggregation to both uncertainty levels and the phase of training. We evaluate DEA across continuous control benchmarks and learning regimes - from interactive to sample-efficient - and demonstrate its effectiveness over static ensemble strategies.

new A Verifier Hierarchy

Authors: Maurits Kaptein

Abstract: We investigate the trade-off between certificate length and verifier runtime. We prove a Verifier Trade-off Theorem showing that reducing the inherent verification time of a language from $f(n)$ to $g(n)$, where $f(n) \ge g(n)$, requires certificates of length at least $\Omega(\log(f(n) / g(n)))$. This theorem induces a natural hierarchy based on certificate complexity. We demonstrate its applicability to analyzing conjectured separations between complexity classes (e.g., $\np$ and $\exptime$) and to studying natural problems such as string periodicity and rotation detection. Additionally, we provide perspectives on the $\p$ vs. $\np$ problem by relating it to the existence of sub-linear certificates.

new Differentially Private Clipped-SGD: High-Probability Convergence with Arbitrary Clipping Level

Authors: Saleh Vatan Khah, Savelii Chezhegov, Shahrokh Farahmand, Samuel Horv\'ath, Eduard Gorbunov

Abstract: Gradient clipping is a fundamental tool in Deep Learning, improving the high-probability convergence of stochastic first-order methods like SGD, AdaGrad, and Adam under heavy-tailed noise, which is common in training large language models. It is also a crucial component of Differential Privacy (DP) mechanisms. However, existing high-probability convergence analyses typically require the clipping threshold to increase with the number of optimization steps, which is incompatible with standard DP mechanisms like the Gaussian mechanism. In this work, we close this gap by providing the first high-probability convergence analysis for DP-Clipped-SGD with a fixed clipping level, applicable to both convex and non-convex smooth optimization under heavy-tailed noise, characterized by a bounded central $\alpha$-th moment assumption, $\alpha \in (1,2]$. Our results show that, with a fixed clipping level, the method converges to a neighborhood of the optimal solution with a faster rate than the existing ones. The neighborhood can be balanced against the noise introduced by DP, providing a refined trade-off between convergence speed and privacy guarantees.

new Continual Learning with Synthetic Boundary Experience Blending

Authors: Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen

Abstract: Continual learning (CL) aims to address catastrophic forgetting in models trained sequentially on multiple tasks. While experience replay has shown promise, its effectiveness is often limited by the sparse distribution of stored key samples, leading to overly simplified decision boundaries. We hypothesize that introducing synthetic data near the decision boundary (Synthetic Boundary Data, or SBD) during training serves as an implicit regularizer, improving boundary stability and mitigating forgetting. To validate this hypothesis, we propose a novel training framework, {\bf Experience Blending}, which integrates knowledge from both stored key samples and synthetic, boundary-adjacent data. Experience blending consists of two core components: (1) a multivariate Differential Privacy (DP) noise mechanism that injects batch-wise noise into low-dimensional feature representations, generating SBD; and (2) an end-to-end training strategy that jointly leverages both stored key samples and SBD. Extensive experiments on CIFAR-10, CIFAR-100, and Tiny ImageNet demonstrate that our method outperforms nine CL baselines, achieving accuracy improvements of 10%, 6%, and 13%, respectively.

new Transparent AI: The Case for Interpretability and Explainability

Authors: Dhanesh Ramachandram, Himanshu Joshi, Judy Zhu, Dhari Gandhi, Lucas Hartman, Ananya Raval

Abstract: As artificial intelligence systems increasingly inform high-stakes decisions across sectors, transparency has become foundational to responsible and trustworthy AI implementation. Leveraging our role as a leading institute in advancing AI research and enabling industry adoption, we present key insights and lessons learned from practical interpretability applications across diverse domains. This paper offers actionable strategies and implementation guidance tailored to organizations at varying stages of AI maturity, emphasizing the integration of interpretability as a core design principle rather than a retrospective add-on.

new From LLMs to Edge: Parameter-Efficient Fine-Tuning on Edge Devices

Authors: Georg Slamanig, Francesco Corti, Olga Saukh

Abstract: Parameter-efficient fine-tuning (PEFT) methods reduce the computational costs of updating deep learning models by minimizing the number of additional parameters used to adapt a model to a down- stream task. While extensively researched in large language models (LLMs), their application to smaller models used on edge devices, such as convolutional neural networks, remains underexplored. This paper benchmarks and analyzes popular PEFT methods on convolutional architectures typically deployed in resource-constrained edge environments. We evaluate LoRA, DoRA, and GaLore for updating standard and depthwise convolutional architectures to handle distribution shifts and accommodate unseen classes. We utilize recently proposed PyTorch profilers to compare the updated model performance and computational costs of these PEFT methods with traditional fine-tuning approaches. With resource efficiency in mind, we investigate their update behavior across different rank dimensions. We find that the evaluated PEFT methods are only half as memory-efficient when applied to depthwise-separable convolution architectures, compared to their efficiency with LLMs. Conversely, when targeting convolu- tional architectures optimized for edge deployment, adapter-based PEFT methods can reduce floating point operations (FLOPs) during model updates by up to 95%. These insights offer valuable guidance for selecting PEFT methods based on hardware constraints, performance requirements, and application needs. Our code is online.

new Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions

Authors: Piotr Indyk, Michael Kapralov, Kshiteej Sheth, Tal Wagner

Abstract: Motivated by the problem of fast processing of attention matrices, we study fast algorithms for computing matrix-vector products for asymmetric Gaussian Kernel matrices $K\in \mathbb{R}^{n\times n}$. $K$'s columns are indexed by a set of $n$ keys $k_1,k_2\ldots, k_n\in \mathbb{R}^d$, rows by a set of $n$ queries $q_1,q_2,\ldots,q_n\in \mathbb{R}^d $, and its $i,j$ entry is $K_{ij} = e^{-\|q_i-k_j\|_2^2/2\sigma^2}$ for some bandwidth parameter $\sigma>0$. Given a vector $x\in \mathbb{R}^n$ and error parameter $\epsilon>0$, our task is to output a $y\in \mathbb{R}^n$ such that $\|Kx-y\|_2\leq \epsilon \|x\|_2$ in time subquadratic in $n$ and linear in $d$. Our algorithms rely on the following modelling assumption about the matrices $K$: the sum of the entries of $K$ scales linearly in $n$, as opposed to worst case quadratic growth. We validate this assumption experimentally, for Gaussian kernel matrices encountered in various settings such as fast attention computation in LLMs. We obtain the first subquadratic-time algorithm that works under this assumption, for unrestricted vectors.

new Hardware-Aware Fine-Tuning of Spiking Q-Networks on the SpiNNaker2 Neuromorphic Platform

Authors: Sirine Arfa, Bernhard Vogginger, Christian Mayr

Abstract: Spiking Neural Networks (SNNs) promise orders-of-magnitude lower power consumption and low-latency inference on neuromorphic hardware for a wide range of robotic tasks. In this work, we present an energy-efficient implementation of a reinforcement learning (RL) algorithm using quantized SNNs to solve two classical control tasks. The network is trained using the Q-learning algorithm, then fine-tuned and quantized to low-bit (8-bit) precision for embedded deployment on the SpiNNaker2 neuromorphic chip. To evaluate the comparative advantage of SpiNNaker2 over conventional computing platforms, we analyze inference latency, dynamic power consumption, and energy cost per inference for our SNN models, comparing performance against a GTX 1650 GPU baseline. Our results demonstrate SpiNNaker2's strong potential for scalable, low-energy neuromorphic computing, achieving up to 32x reduction in energy consumption. Inference latency remains on par with GPU-based execution, with improvements observed in certain task settings, reinforcing SpiNNaker2's viability for real-time neuromorphic control and making the neuromorphic approach a compelling direction for efficient deep Q-learning.

new Optimised Feature Subset Selection via Simulated Annealing

Authors: Fernando Mart\'inez-Garc\'ia, \'Alvaro Rubio-Garc\'ia, Samuel Fern\'andez-Lorenzo, Juan Jos\'e Garc\'ia-Ripoll, Diego Porras

Abstract: We introduce SA-FDR, a novel algorithm for $\ell_0$-norm feature selection that considers this task as a combinatorial optimisation problem and solves it by using simulated annealing to perform a global search over the space of feature subsets. The optimisation is guided by the Fisher discriminant ratio, which we use as a computationally efficient proxy for model quality in classification tasks. Our experiments, conducted on datasets with up to hundreds of thousands of samples and hundreds of features, demonstrate that SA-FDR consistently selects more compact feature subsets while achieving a high predictive accuracy. This ability to recover informative yet minimal sets of features stems from its capacity to capture inter-feature dependencies often missed by greedy optimisation approaches. As a result, SA-FDR provides a flexible and effective solution for designing interpretable models in high-dimensional settings, particularly when model sparsity, interpretability, and performance are crucial.

new GraphRAG-R1: Graph Retrieval-Augmented Generation with Process-Constrained Reinforcement Learning

Authors: Chuanyue Yu, Kuo Zhao, Yuhan Li, Heng Chang, Mingjian Feng, Xiangzhe Jiang, Yufei Sun, Jia Li, Yuzhi Zhang, Jianxin Li, Ziwei Zhang

Abstract: Graph Retrieval-Augmented Generation (GraphRAG) has shown great effectiveness in enhancing the reasoning abilities of LLMs by leveraging graph structures for knowledge representation and modeling complex real-world relationships. However, existing GraphRAG methods still face significant bottlenecks when handling complex problems that require multi-hop reasoning, as their query and retrieval phases are largely based on pre-defined heuristics and do not fully utilize the reasoning potentials of LLMs. To address this problem, we propose GraphRAG-R1, an adaptive GraphRAG framework by training LLMs with process-constrained outcome-based reinforcement learning (RL) to enhance the multi-hop reasoning ability. Our method can decompose complex problems, autonomously invoke retrieval tools to acquire necessary information, and perform effective reasoning. Specifically, we utilize a modified version of Group Relative Policy Optimization (GRPO) that supports rollout-with-thinking capability. Next, we design two process-constrained reward functions. To handle the shallow retrieval problem, we design a Progressive Retrieval Attenuation (PRA) reward to encourage essential retrievals. Then, to handle the over-thinking problem, we design Cost-Aware F1 (CAF) reward to balance the model performance with computational costs. We further design a phase-dependent training strategy, containing three training stages corresponding to cold start and these two rewards. Lastly, our method adopts a hybrid graph-textual retrieval to improve the reasoning capacity. Extensive experimental results demonstrate that GraphRAG-R1 boosts LLM capabilities in solving complex reasoning problems compared to state-of-the-art GraphRAG methods on both in-domain and out-of-domain datasets. Furthermore, our framework can be flexibly integrated with various existing retrieval methods, consistently delivering performance improvements.

new EB-gMCR: Energy-Based Generative Modeling for Signal Unmixing and Multivariate Curve Resolution

Authors: Yu-Tang Chang, Shih-Fang Chen

Abstract: Signal unmixing analysis decomposes data into basic patterns and is widely applied in chemical and biological research. Multivariate curve resolution (MCR), a branch of signal unmixing, separates mixed chemical signals into base patterns (components) and their concentrations, playing a key role in understanding composition. Classical MCR is typically framed as matrix factorization (MF) and requires a user-specified component count, usually unknown in real data. As dataset size or component count increases, the scalability and reliability of MF-based MCR face significant challenges. This study reformulates MCR as a generative process (gMCR), and introduces an energy-based deep learning solver, EB-gMCR, that automatically discovers the smallest component set able to reconstruct the data faithfully. EB-gMCR starts from a large candidate pool (e.g., 1024 spectra) and employs a differentiable gating network to retain only active components while estimating their concentrations. On noisy synthetic datasets containing up to 256 latent sources, EB-gMCR maintained R^2 >= 0.98 and recovered the component count within 5% of the ground truth; at lower noise it achieved R^2 >= 0.99 with near exact component estimation. Additional chemical priors, such as non-negativity or nonlinear mixing, enter as simple plug-in functions, enabling adaptation to other instruments or domains without altering the core learning process. By uniting high-capacity generative modeling and hard component selection, EB-gMCR offers a practical route to large-scale signal unmixing analysis, including chemical library-driven scenarios. The source code is available at https://github.com/b05611038/ebgmcr_solver.

URLs: https://github.com/b05611038/ebgmcr_solver.

new Hierarchical Message-Passing Policies for Multi-Agent Reinforcement Learning

Authors: Tommaso Marzi, Cesare Alippi, Andrea Cini

Abstract: Decentralized Multi-Agent Reinforcement Learning (MARL) methods allow for learning scalable multi-agent policies, but suffer from partial observability and induced non-stationarity. These challenges can be addressed by introducing mechanisms that facilitate coordination and high-level planning. Specifically, coordination and temporal abstraction can be achieved through communication (e.g., message passing) and Hierarchical Reinforcement Learning (HRL) approaches to decision-making. However, optimization issues limit the applicability of hierarchical policies to multi-agent systems. As such, the combination of these approaches has not been fully explored. To fill this void, we propose a novel and effective methodology for learning multi-agent hierarchies of message-passing policies. We adopt the feudal HRL framework and rely on a hierarchical graph structure for planning and coordination among agents. Agents at lower levels in the hierarchy receive goals from the upper levels and exchange messages with neighboring agents at the same level. To learn hierarchical multi-agent policies, we design a novel reward-assignment method based on training the lower-level policies to maximize the advantage function associated with the upper levels. Results on relevant benchmarks show that our method performs favorably compared to the state of the art.

new Deep Learning-based Prediction of Clinical Trial Enrollment with Uncertainty Estimates

Authors: Tien Huu Do, Antoine Masquelier, Nae Eoun Lee, Jonathan Crowther

Abstract: Clinical trials are a systematic endeavor to assess the safety and efficacy of new drugs or treatments. Conducting such trials typically demands significant financial investment and meticulous planning, highlighting the need for accurate predictions of trial outcomes. Accurately predicting patient enrollment, a key factor in trial success, is one of the primary challenges during the planning phase. In this work, we propose a novel deep learning-based method to address this critical challenge. Our method, implemented as a neural network model, leverages pre-trained language models (PLMs) to capture the complexities and nuances of clinical documents, transforming them into expressive representations. These representations are then combined with encoded tabular features via an attention mechanism. To account for uncertainties in enrollment prediction, we enhance the model with a probabilistic layer based on the Gamma distribution, which enables range estimation. We apply the proposed model to predict clinical trial duration, assuming site-level enrollment follows a Poisson-Gamma process. We carry out extensive experiments on real-world clinical trial data, and show that the proposed method can effectively predict the number of patients enrolled at a number of sites for a given clinical trial, outperforming established baseline models.

new L-GTA: Latent Generative Modeling for Time Series Augmentation

Authors: Luis Roque, Carlos Soares, Vitor Cerqueira, Luis Torgo

Abstract: Data augmentation is gaining importance across various aspects of time series analysis, from forecasting to classification and anomaly detection tasks. We introduce the Latent Generative Transformer Augmentation (L-GTA) model, a generative approach using a transformer-based variational recurrent autoencoder. This model uses controlled transformations within the latent space of the model to generate new time series that preserve the intrinsic properties of the original dataset. L-GTA enables the application of diverse transformations, ranging from simple jittering to magnitude warping, and combining these basic transformations to generate more complex synthetic time series datasets. Our evaluation of several real-world datasets demonstrates the ability of L-GTA to produce more reliable, consistent, and controllable augmented data. This translates into significant improvements in predictive accuracy and similarity measures compared to direct transformation methods.

new On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective

Authors: Gabriel Mongaras, Eric C. Larson

Abstract: Since its introduction, softmax attention has become the backbone of modern transformer architectures due to its expressiveness and scalability across a wide range of tasks. However, the main drawback of softmax attention is the quadratic memory requirement and computational complexity with respect to the sequence length. By replacing the softmax nonlinearity, linear attention and similar methods have been introduced to avoid the quadratic bottleneck of softmax attention. Despite these linear forms of attention being derived from the original softmax formulation, they typically lag in terms of downstream accuracy. While strong intuition of the softmax nonlinearity on the query and key inner product suggests that it has desirable properties compared to other nonlinearities, the question of why this discrepancy exists still remains unanswered. This work demonstrates that linear attention is an approximation of softmax attention by deriving the recurrent form of softmax attention. Using this form, each part of softmax attention can be described in the language of recurrent neural networks (RNNs). Describing softmax attention as an RNN allows for the ablation of the components of softmax attention to understand the importance of each part and how they interact. In this way, our work helps explain why softmax attention is more expressive than its counterparts.

new OptiGradTrust: Byzantine-Robust Federated Learning with Multi-Feature Gradient Analysis and Reinforcement Learning-Based Trust Weighting

Authors: Mohammad Karami, Fatemeh Ghassemi, Hamed Kebriaei, Hamid Azadegan

Abstract: Federated Learning (FL) enables collaborative model training across distributed medical institutions while preserving patient privacy, but remains vulnerable to Byzantine attacks and statistical heterogeneity. We present OptiGradTrust, a comprehensive defense framework that evaluates gradient updates through a novel six-dimensional fingerprint including VAE reconstruction error, cosine similarity metrics, $L_2$ norm, sign-consistency ratio, and Monte Carlo Shapley value, which drive a hybrid RL-attention module for adaptive trust scoring. To address convergence challenges under data heterogeneity, we develop FedBN-Prox (FedBN-P), combining Federated Batch Normalization with proximal regularization for optimal accuracy-convergence trade-offs. Extensive evaluation across MNIST, CIFAR-10, and Alzheimer's MRI datasets under various Byzantine attack scenarios demonstrates significant improvements over state-of-the-art defenses, achieving up to +1.6 percentage points over FLGuard under non-IID conditions while maintaining robust performance against diverse attack patterns through our adaptive learning approach.

new SHAP-Guided Regularization in Machine Learning Models

Authors: Amal Saadallah

Abstract: Feature attribution methods such as SHapley Additive exPlanations (SHAP) have become instrumental in understanding machine learning models, but their role in guiding model optimization remains underexplored. In this paper, we propose a SHAP-guided regularization framework that incorporates feature importance constraints into model training to enhance both predictive performance and interpretability. Our approach applies entropy-based penalties to encourage sparse, concentrated feature attributions while promoting stability across samples. The framework is applicable to both regression and classification tasks. Our first exploration started with investigating a tree-based model regularization using TreeSHAP. Through extensive experiments on benchmark regression and classification datasets, we demonstrate that our method improves generalization performance while ensuring robust and interpretable feature attributions. The proposed technique offers a novel, explainability-driven regularization approach, making machine learning models both more accurate and more reliable.

new TweakLLM: A Routing Architecture for Dynamic Tailoring of Cached Responses

Authors: Muhammad Taha Cheema, Abeer Aamir, Khawaja Gul Muhammad, Naveed Anwar Bhatti, Ihsan Ayyub Qazi, Zafar Ayyub Qazi

Abstract: Large Language Models (LLMs) process millions of queries daily, making efficient response caching a compelling optimization for reducing cost and latency. However, preserving relevance to user queries using this approach proves difficult due to the personalized nature of chatbot interactions and the limited accuracy of semantic similarity search. To address this, we present TweakLLM, a novel routing architecture that employs a lightweight LLM to dynamically adapt cached responses to incoming prompts. Through comprehensive evaluation, including user studies with side-by-side comparisons, satisfaction voting, as well as multi-agent LLM debates, we demonstrate that TweakLLM maintains response quality comparable to frontier models while significantly improving cache effectiveness. Our results across real-world datasets highlight TweakLLM as a scalable, resource-efficient caching solution for high-volume LLM deployments without compromising user experience.

new One-Step Flow Policy Mirror Descent

Authors: Tianyi Chen, Haitong Ma, Na Li, Kai Wang, Bo Dai

Abstract: Diffusion policies have achieved great success in online reinforcement learning (RL) due to their strong expressive capacity. However, the inference of diffusion policy models relies on a slow iterative sampling process, which limits their responsiveness. To overcome this limitation, we propose Flow Policy Mirror Descent (FPMD), an online RL algorithm that enables 1-step sampling during policy inference. Our approach exploits a theoretical connection between the distribution variance and the discretization error of single-step sampling in straight interpolation flow matching models, and requires no extra distillation or consistency training. We present two algorithm variants based on flow policy and MeanFlow policy parametrizations, respectively. Extensive empirical evaluations on MuJoCo benchmarks demonstrate that our algorithms show strong performance comparable to diffusion policy baselines while requiring hundreds of times fewer function evaluations during inference.

new DepMicroDiff: Diffusion-Based Dependency-Aware Multimodal Imputation for Microbiome Data

Authors: Rabeya Tus Sadia, Qiang Cheng

Abstract: Microbiome data analysis is essential for understanding host health and disease, yet its inherent sparsity and noise pose major challenges for accurate imputation, hindering downstream tasks such as biomarker discovery. Existing imputation methods, including recent diffusion-based models, often fail to capture the complex interdependencies between microbial taxa and overlook contextual metadata that can inform imputation. We introduce DepMicroDiff, a novel framework that combines diffusion-based generative modeling with a Dependency-Aware Transformer (DAT) to explicitly capture both mutual pairwise dependencies and autoregressive relationships. DepMicroDiff is further enhanced by VAE-based pretraining across diverse cancer datasets and conditioning on patient metadata encoded via a large language model (LLM). Experiments on TCGA microbiome datasets show that DepMicroDiff substantially outperforms state-of-the-art baselines, achieving higher Pearson correlation (up to 0.712), cosine similarity (up to 0.812), and lower RMSE and MAE across multiple cancer types, demonstrating its robustness and generalizability for microbiome imputation.

new Anomalous Samples for Few-Shot Anomaly Detection

Authors: Aymane Abdali, Bartosz Boguslawski, Lucas Drumetz, Vincent Gripon

Abstract: Several anomaly detection and classification methods rely on large amounts of non-anomalous or "normal" samples under the assump- tion that anomalous data is typically harder to acquire. This hypothesis becomes questionable in Few-Shot settings, where as little as one anno- tated sample can make a significant difference. In this paper, we tackle the question of utilizing anomalous samples in training a model for bi- nary anomaly classification. We propose a methodology that incorporates anomalous samples in a multi-score anomaly detection score leveraging recent Zero-Shot and memory-based techniques. We compare the utility of anomalous samples to that of regular samples and study the benefits and limitations of each. In addition, we propose an augmentation-based validation technique to optimize the aggregation of the different anomaly scores and demonstrate its effectiveness on popular industrial anomaly detection datasets.

new Improving annotator selection in Active Learning using a mood and fatigue-aware Recommender System

Authors: Diana Mortagua

Abstract: This study centers on overcoming the challenge of selecting the best annotators for each query in Active Learning (AL), with the objective of minimizing misclassifications. AL recognizes the challenges related to cost and time when acquiring labeled data, and decreases the number of labeled data needed. Nevertheless, there is still the necessity to reduce annotation errors, aiming to be as efficient as possible, to achieve the expected accuracy faster. Most strategies for query-annotator pairs do not consider internal factors that affect productivity, such as mood, attention, motivation, and fatigue levels. This work addresses this gap in the existing literature, by not only considering how the internal factors influence annotators (mood and fatigue levels) but also presenting a new query-annotator pair strategy, using a Knowledge-Based Recommendation System (RS). The RS ranks the available annotators, allowing to choose one or more to label the queried instance using their past accuracy values, and their mood and fatigue levels, as well as information about the instance queried. This work bases itself on existing literature on mood and fatigue influence on human performance, simulating annotators in a realistic manner, and predicting their performance with the RS. The results show that considering past accuracy values, as well as mood and fatigue levels reduces the number of annotation errors made by the annotators, and the uncertainty of the model through its training, when compared to not using internal factors. Accuracy and F1-score values were also better in the proposed approach, despite not being as substantial as the aforementioned. The methodologies and findings presented in this study begin to explore the open challenge of human cognitive factors affecting AL.

new Consensus-Driven Active Model Selection

Authors: Justin Kay, Grant Van Horn, Subhransu Maji, Daniel Sheldon, Sara Beery

Abstract: The widespread availability of off-the-shelf machine learning models poses a challenge: which model, of the many available candidates, should be chosen for a given data analysis task? This question of model selection is traditionally answered by collecting and annotating a validation dataset -- a costly and time-intensive process. We propose a method for active model selection, using predictions from candidate models to prioritize the labeling of test data points that efficiently differentiate the best candidate. Our method, CODA, performs consensus-driven active model selection by modeling relationships between classifiers, categories, and data points within a probabilistic framework. The framework uses the consensus and disagreement between models in the candidate pool to guide the label acquisition process, and Bayesian inference to update beliefs about which model is best as more information is collected. We validate our approach by curating a collection of 26 benchmark tasks capturing a range of model selection scenarios. CODA outperforms existing methods for active model selection significantly, reducing the annotation effort required to discover the best model by upwards of 70% compared to the previous state-of-the-art. Code and data are available at https://github.com/justinkay/coda.

URLs: https://github.com/justinkay/coda.

cross Typing Tensor Calculus in 2-Categories (I)

Authors: Fatimah Rita Ahmadi

Abstract: To formalize calculations in linear algebra for the development of efficient algorithms and a framework suitable for functional programming languages and faster parallelized computations, we adopt an approach that treats elements of linear algebra, such as matrices, as morphisms in the category of matrices, $\mathbf{Mat_{k}}$. This framework is further extended by generalizing the results to arbitrary monoidal semiadditive categories. To enrich this perspective and accommodate higher-rank matrices (tensors), we define semiadditive 2-categories, where matrices $T_{ij}$ are represented as 1-morphisms, and tensors with four indices $T_{ijkl}$ as 2-morphisms. This formalization provides an index-free, typed linear algebra framework that includes matrices and tensors with up to four indices. Furthermore, we extend the framework to monoidal semiadditive 2-categories and demonstrate detailed operations and vectorization within the 2-category of 2Vec introduced by Kapranov and Voevodsky.

cross DNN-based Methods of Jointly Sensing Number and Directions of Targets via a Green Massive H2AD MIMO Receiver

Authors: Bin Deng, Jiatong Bai, Feilong Zhao, Zuming Xie, Maolin Li, Yan Wang, Feng Shu

Abstract: As a green MIMO structure, the heterogeneous hybrid analog-digital H2AD MIMO architecture has been shown to own a great potential to replace the massive or extremely large-scale fully-digital MIMO in the future wireless networks to address the three challenging problems faced by the latter: high energy consumption, high circuit cost, and high complexity. However, how to intelligently sense the number and direction of multi-emitters via such a structure is still an open hard problem. To address this, we propose a two-stage sensing framework that jointly estimates the number and direction values of multiple targets. Specifically, three target number sensing methods are designed: an improved eigen-domain clustering (EDC) framework, an enhanced deep neural network (DNN) based on five key statistical features, and an improved one-dimensional convolutional neural network (1D-CNN) utilizing full eigenvalues. Subsequently, a low-complexity and high-accuracy DOA estimation is achieved via the introduced online micro-clustering (OMC-DOA) method. Furthermore, we derive the Cram\'er-Rao lower bound (CRLB) for the H2AD under multiple-source conditions as a theoretical performance benchmark. Simulation results show that the developed three methods achieve 100\% number of targets sensing at moderate-to-high SNRs, while the improved 1D-CNN exhibits superior under extremely-low SNR conditions. The introduced OMC-DOA outperforms existing clustering and fusion-based DOA methods in multi-source environments.

cross A Privacy-Preserving Federated Framework with Hybrid Quantum-Enhanced Learning for Financial Fraud Detection

Authors: Abhishek Sawaika, Swetang Krishna, Tushar Tomar, Durga Pritam Suggisetti, Aditi Lal, Tanmaya Shrivastav, Nouhaila Innan, Muhammad Shafique

Abstract: Rapid growth of digital transactions has led to a surge in fraudulent activities, challenging traditional detection methods in the financial sector. To tackle this problem, we introduce a specialised federated learning framework that uniquely combines a quantum-enhanced Long Short-Term Memory (LSTM) model with advanced privacy preserving techniques. By integrating quantum layers into the LSTM architecture, our approach adeptly captures complex cross-transactional patters, resulting in an approximate 5% performance improvement across key evaluation metrics compared to conventional models. Central to our framework is "FedRansel", a novel method designed to defend against poisoning and inference attacks, thereby reducing model degradation and inference accuracy by 4-8%, compared to standard differential privacy mechanisms. This pseudo-centralised setup with a Quantum LSTM model, enhances fraud detection accuracy and reinforces the security and confidentiality of sensitive financial data.

cross A Language Model-Driven Semi-Supervised Ensemble Framework for Illicit Market Detection Across Deep/Dark Web and Social Platforms

Authors: Navid Yazdanjue, Morteza Rakhshaninejad, Hossein Yazdanjouei, Mohammad Sadegh Khorshidi, Mikko S. Niemela, Fang Chen, Amir H. Gandomi

Abstract: Illegal marketplaces have increasingly shifted to concealed parts of the internet, including the deep and dark web, as well as platforms such as Telegram, Reddit, and Pastebin. These channels enable the anonymous trade of illicit goods including drugs, weapons, and stolen credentials. Detecting and categorizing such content remains challenging due to limited labeled data, the evolving nature of illicit language, and the structural heterogeneity of online sources. This paper presents a hierarchical classification framework that combines fine-tuned language models with a semi-supervised ensemble learning strategy to detect and classify illicit marketplace content across diverse platforms. We extract semantic representations using ModernBERT, a transformer model for long documents, finetuned on domain-specific data from deep and dark web pages, Telegram channels, Subreddits, and Pastebin pastes to capture specialized jargon and ambiguous linguistic patterns. In addition, we incorporate manually engineered features such as document structure, embedded patterns including Bitcoin addresses, emails, and IPs, and metadata, which complement language model embeddings. The classification pipeline operates in two stages. The first stage uses a semi-supervised ensemble of XGBoost, Random Forest, and SVM with entropy-based weighted voting to detect sales-related documents. The second stage further classifies these into drug, weapon, or credential sales. Experiments on three datasets, including our multi-source corpus, DUTA, and CoDA, show that our model outperforms several baselines, including BERT, ModernBERT, DarkBERT, ALBERT, Longformer, and BigBird. The model achieves an accuracy of 0.96489, an F1-score of 0.93467, and a TMCC of 0.95388, demonstrating strong generalization, robustness under limited supervision, and effectiveness in real-world illicit content detection.

cross Semantic Convergence: Investigating Shared Representations Across Scaled LLMs

Authors: Daniel Son, Sanjana Rathore, Andrew Rufail, Adrian Simon, Daniel Zhang, Soham Dave, Cole Blondin, Kevin Zhu, Sean O'Brien

Abstract: We investigate feature universality in Gemma-2 language models (Gemma-2-2B and Gemma-2-9B), asking whether models with a four-fold difference in scale still converge on comparable internal concepts. Using the Sparse Autoencoder (SAE) dictionary-learning pipeline, we utilize SAEs on each model's residual-stream activations, align the resulting monosemantic features via activation correlation, and compare the matched feature spaces with SVCCA and RSA. Middle layers yield the strongest overlap, while early and late layers show far less similarity. Preliminary experiments extend the analysis from single tokens to multi-token subspaces, showing that semantically similar subspaces interact similarly with language models. These results strengthen the case that large language models carve the world into broadly similar, interpretable features despite size differences, reinforcing universality as a foundation for cross-model interpretability.

cross SigBERT: Combining Narrative Medical Reports and Rough Path Signature Theory for Survival Risk Estimation in Oncology

Authors: Paul Minchella, Lo\"ic Verlingue, St\'ephane Chr\'etien, R\'emi Vaucher, Guillaume Metzler

Abstract: Electronic medical reports (EHR) contain a vast amount of information that can be leveraged for machine learning applications in healthcare. However, existing survival analysis methods often struggle to effectively handle the complexity of textual data, particularly in its sequential form. Here, we propose SigBERT, an innovative temporal survival analysis framework designed to efficiently process a large number of clinical reports per patient. SigBERT processes timestamped medical reports by extracting and averaging word embeddings into sentence embeddings. To capture temporal dynamics from the time series of sentence embedding coordinates, we apply signature extraction from rough path theory to derive geometric features for each patient, which significantly enhance survival model performance by capturing complex temporal dynamics. These features are then integrated into a LASSO-penalized Cox model to estimate patient-specific risk scores. The model was trained and evaluated on a real-world oncology dataset from the L\'eon B\'erard Center corpus, with a C-index score of 0.75 (sd 0.014) on the independent test cohort. SigBERT integrates sequential medical data to enhance risk estimation, advancing narrative-based survival analysis.

cross ELMES: An Automated Framework for Evaluating Large Language Models in Educational Scenarios

Authors: Shou'ang Wei, Xinyun Wang, Shuzhen Bi, Jian Chen, Ruijia Li, Bo Jiang, Xin Lin, Min Zhang, Yu Song, BingDong Li, Aimin Zhou, Hao Hao

Abstract: The emergence of Large Language Models (LLMs) presents transformative opportunities for education, generating numerous novel application scenarios. However, significant challenges remain: evaluation metrics vary substantially across different educational scenarios, while many emerging scenarios lack appropriate assessment metrics. Current benchmarks predominantly measure general intelligence rather than pedagogical capabilities. To address this gap, we introduce ELMES, an open-source automated evaluation framework specifically designed for assessing LLMs in educational settings. ELMES features a modular architecture that enables researchers to create dynamic, multi-agent dialogues through simple configuration files, facilitating flexible scenario design without requiring extensive programming expertise. The framework incorporates a hybrid evaluation engine that objectively quantifies traditionally subjective pedagogical metrics using an LLM-as-a-Judge methodology. We conduct systematic benchmarking of state-of-the-art LLMs across four critical educational scenarios: Knowledge Point Explanation, Guided Problem-Solving Teaching, Interdisciplinary Lesson Plan Generation, and Contextualized Question Generation, employing fine-grained metrics developed in collaboration with education specialists. Our results demonstrate distinct capability distributions among models, revealing context-specific strengths and limitations. ELMES provides educators and researchers with an accessible evaluation framework that significantly reduces adaptation barriers for diverse educational applications while advancing the practical implementation of LLMs in pedagogy. The framework is publicly available at \emph{https://github.com/sii-research/elmes.git}.

URLs: https://github.com/sii-research/elmes.git

cross Unifying Post-hoc Explanations of Knowledge Graph Completions

Authors: Alessandro Lonardi, Samy Badreddine, Tarek R. Besold, Pablo Sanchez Martin

Abstract: Post-hoc explainability for Knowledge Graph Completion (KGC) lacks formalization and consistent evaluations, hindering reproducibility and cross-study comparisons. This paper argues for a unified approach to post-hoc explainability in KGC. First, we propose a general framework to characterize post-hoc explanations via multi-objective optimization, balancing their effectiveness and conciseness. This unifies existing post-hoc explainability algorithms in KGC and the explanations they produce. Next, we suggest and empirically support improved evaluation protocols using popular metrics like Mean Reciprocal Rank and Hits@$k$. Finally, we stress the importance of interpretability as the ability of explanations to address queries meaningful to end-users. By unifying methods and refining evaluation standards, this work aims to make research in KGC explainability more reproducible and impactful.

cross Automated Label Placement on Maps via Large Language Models

Authors: Harry Shomer, Jiejun Xu

Abstract: Label placement is a critical aspect of map design, serving as a form of spatial annotation that directly impacts clarity and interpretability. Despite its importance, label placement remains largely manual and difficult to scale, as existing automated systems struggle to integrate cartographic conventions, adapt to context, or interpret labeling instructions. In this work, we introduce a new paradigm for automatic label placement (ALP) that formulates the task as a data editing problem and leverages large language models (LLMs) for context-aware spatial annotation. To support this direction, we curate MAPLE, the first known benchmarking dataset for evaluating ALP on real-world maps, encompassing diverse landmark types and label placement annotations from open-source data. Our method retrieves labeling guidelines relevant to each landmark type leveraging retrieval-augmented generation (RAG), integrates them into prompts, and employs instruction-tuned LLMs to generate ideal label coordinates. We evaluate four open-source LLMs on MAPLE, analyzing both overall performance and generalization across different types of landmarks. This includes both zero-shot and instruction-tuned performance. Our results demonstrate that LLMs, when guided by structured prompts and domain-specific retrieval, can learn to perform accurate spatial edits, aligning the generated outputs with expert cartographic standards. Overall, our work presents a scalable framework for AI-assisted map finishing and demonstrates the potential of foundation models in structured data editing tasks. The code and data can be found at https://github.com/HarryShomer/MAPLE.

URLs: https://github.com/HarryShomer/MAPLE.

cross LLMs Between the Nodes: Community Discovery Beyond Vectors

Authors: Ekta Gujral, Apurva Sinha

Abstract: Community detection in social network graphs plays a vital role in uncovering group dynamics, influence pathways, and the spread of information. Traditional methods focus primarily on graph structural properties, but recent advancements in Large Language Models (LLMs) open up new avenues for integrating semantic and contextual information into this task. In this paper, we present a detailed investigation into how various LLM-based approaches perform in identifying communities within social graphs. We introduce a two-step framework called CommLLM, which leverages the GPT-4o model along with prompt-based reasoning to fuse language model outputs with graph structure. Evaluations are conducted on six real-world social network datasets, measuring performance using key metrics such as Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), Variation of Information (VOI), and cluster purity. Our findings reveal that LLMs, particularly when guided by graph-aware strategies, can be successfully applied to community detection tasks in small to medium-sized graphs. We observe that the integration of instruction-tuned models and carefully engineered prompts significantly improves the accuracy and coherence of detected communities. These insights not only highlight the potential of LLMs in graph-based research but also underscore the importance of tailoring model interactions to the specific structure of graph data.

cross CHECK-MAT: Checking Hand-Written Mathematical Answers for the Russian Unified State Exam

Authors: Ruslan Khrulev

Abstract: This paper introduces a novel benchmark, EGE-Math Solutions Assessment Benchmark, for evaluating Vision-Language Models (VLMs) on their ability to assess hand-written mathematical solutions. Unlike existing benchmarks that focus on problem solving, our approach centres on understanding student solutions, identifying mistakes, and assigning grades according to fixed criteria. We compile 122 scanned solutions from the Russian Unified State Exam (EGE) together with official expert grades, and evaluate seven modern VLMs from Google, OpenAI, Arcee AI, and Alibaba Cloud in three inference modes. The results reveal current limitations in mathematical reasoning and human-rubric alignment, opening new research avenues in AI-assisted assessment. You can find code in https://github.com/Karifannaa/Auto-check-EGE-math

URLs: https://github.com/Karifannaa/Auto-check-EGE-math

cross Learning to Prune Branches in Modern Tree-Fruit Orchards

Authors: Abhinav Jain, Cindy Grimm, Stefan Lee

Abstract: Dormant tree pruning is labor-intensive but essential to maintaining modern highly-productive fruit orchards. In this work we present a closed-loop visuomotor controller for robotic pruning. The controller guides the cutter through a cluttered tree environment to reach a specified cut point and ensures the cutters are perpendicular to the branch. We train the controller using a novel orchard simulation that captures the geometric distribution of branches in a target apple orchard configuration. Unlike traditional methods requiring full 3D reconstruction, our controller uses just optical flow images from a wrist-mounted camera. We deploy our learned policy in simulation and the real-world for an example V-Trellis envy tree with zero-shot transfer, achieving a 30% success rate -- approximately half the performance of an oracle planner.

cross A Smoothing Newton Method for Rank-one Matrix Recovery

Authors: Tyler Maunu, Gabriel Abreu

Abstract: We consider the phase retrieval problem, which involves recovering a rank-one positive semidefinite matrix from rank-one measurements. A recently proposed algorithm based on Bures-Wasserstein gradient descent (BWGD) exhibits superlinear convergence, but it is unstable, and existing theory can only prove local linear convergence for higher rank matrix recovery. We resolve this gap by revealing that BWGD implements Newton's method with a nonsmooth and nonconvex objective. We develop a smoothing framework that regularizes the objective, enabling a stable method with rigorous superlinear convergence guarantees. Experiments on synthetic data demonstrate this superior stability while maintaining fast convergence.

cross Data Readiness for Scientific AI at Scale

Authors: Wesley Brewer, Patrick Widener, Valentine Anantharaj, Feiyi Wang, Tom Beck, Arjun Shankar, Sarp Oral

Abstract: This paper examines how Data Readiness for AI (DRAI) principles apply to leadership-scale scientific datasets used to train foundation models. We analyze archetypal workflows across four representative domains - climate, nuclear fusion, bio/health, and materials - to identify common preprocessing patterns and domain-specific constraints. We introduce a two-dimensional readiness framework composed of Data Readiness Levels (raw to AI-ready) and Data Processing Stages (ingest to shard), both tailored to high performance computing (HPC) environments. This framework outlines key challenges in transforming scientific data for scalable AI training, emphasizing transformer-based generative models. Together, these dimensions form a conceptual maturity matrix that characterizes scientific data readiness and guides infrastructure development toward standardized, cross-domain support for scalable and reproducible AI for science.

cross Early Goal-Guided Multi-Scale Fusion for Real-Time Vision-Language Driving

Authors: Santosh Patapati, Trisanth Srinivasan

Abstract: Autonomous vehicles must react in milliseconds while reasoning about road geometry and traffic intent to navigate complex situations. We introduce NovaDrive, a single-branch vision-language architecture that processes front-camera images, HD-map tiles, LiDAR depth, and textual waypoints in a single branch. A lightweight, two-stage cross-attention block first aligns waypoint tokens with the HD map, then refines attention over fine-grained image and depth patches. Coupled with a novel smoothness loss that discourages abrupt steering and speed changes, this design eliminates the need for recurrent memory. We fine-tune the top 15 layers of an 11B LLaMA-3.2 vision-language backbone, enabling real-time inference. On the nuScenes / Waymo subset of the MD-NEX Outdoor benchmark, NovaDrive raises success rate to 84% (+4%), boosts path-efficiency (SPL) to 0.66 (+0.11), and reduces collision frequency from 2.6% to 1.2% (-1.4%) relative to the previous state-of-the-art. Our ablations confirm that waypoint tokens, partial VLM fine-tuning, and the cross-attention fusion each contribute the most to these gains. Beyond safety, NovaDrive's shorter routes (resulting from the novel smoothness loss) translate to lower fuel or battery usage, pointing toward leaner, more easily updated driving stacks. NovaDrive can be extended to other embodied-AI domains as well.

cross Vision-Language Fusion for Real-Time Autonomous Driving: Goal-Centered Cross-Attention of Camera, HD-Map, & Waypoints

Authors: Santosh Patapati, Trisanth Srinivasan, Murari Ambati

Abstract: Autonomous cars need geometric accuracy and semantic understanding to navigate complex environments, yet most stacks handle them separately. We present XYZ-Drive, a single vision-language model that reads a front-camera frame, a 25m $\times$ 25m overhead map, and the next waypoint, then outputs steering and speed. A lightweight goal-centered cross-attention layer lets waypoint tokens highlight relevant image and map patches, supporting both action and textual explanations, before the fused tokens enter a partially fine-tuned LLaMA-3.2 11B model. On the MD-NEX Outdoor-Driving benchmark XYZ-Drive attains 95% success and 0.80 Success weighted by Path Length (SPL), surpassing PhysNav-DG by 15%. and halving collisions, all while significantly improving efficiency by using only a single branch. Sixteen ablations explain the gains. Removing any modality (vision, waypoint, map) drops success by up to 11%, confirming their complementary roles and rich connections. Replacing goal-centered attention with simple concatenation cuts 3% in performance, showing query-based fusion injects map knowledge more effectively. Keeping the transformer frozen loses 5%, showing the importance of fine-tuning when applying VLMs for specific tasks such as autonomous driving. Coarsening map resolution from 10 cm to 40 cm blurs lane edges and raises crash rate. Overall, these results demonstrate that early, token-level fusion of intent and map layout enables accurate, transparent, real-time driving.

cross RASL: Retrieval Augmented Schema Linking for Massive Database Text-to-SQL

Authors: Jeffrey Eben, Aitzaz Ahmad, Stephen Lau

Abstract: Despite advances in large language model (LLM)-based natural language interfaces for databases, scaling to enterprise-level data catalogs remains an under-explored challenge. Prior works addressing this challenge rely on domain-specific fine-tuning - complicating deployment - and fail to leverage important semantic context contained within database metadata. To address these limitations, we introduce a component-based retrieval architecture that decomposes database schemas and metadata into discrete semantic units, each separately indexed for targeted retrieval. Our approach prioritizes effective table identification while leveraging column-level information, ensuring the total number of retrieved tables remains within a manageable context budget. Experiments demonstrate that our method maintains high recall and accuracy, with our system outperforming baselines over massive databases with varying structure and available metadata. Our solution enables practical text-to-SQL systems deployable across diverse enterprise settings without specialized fine-tuning, addressing a critical scalability gap in natural language database interfaces.

cross On the Complexity of Finding Stationary Points in Nonconvex Simple Bilevel Optimization

Authors: Jincheng Cao, Ruichen Jiang, Erfan Yazdandoost Hamedani, Aryan Mokhtari

Abstract: In this paper, we study the problem of solving a simple bilevel optimization problem, where the upper-level objective is minimized over the solution set of the lower-level problem. We focus on the general setting in which both the upper- and lower-level objectives are smooth but potentially nonconvex. Due to the absence of additional structural assumptions for the lower-level objective-such as convexity or the Polyak-{\L}ojasiewicz (PL) condition-guaranteeing global optimality is generally intractable. Instead, we introduce a suitable notion of stationarity for this class of problems and aim to design a first-order algorithm that finds such stationary points in polynomial time. Intuitively, stationarity in this setting means the upper-level objective cannot be substantially improved locally without causing a larger deterioration in the lower-level objective. To this end, we show that a simple and implementable variant of the dynamic barrier gradient descent (DBGD) framework can effectively solve the considered nonconvex simple bilevel problems up to stationarity. Specifically, to reach an $(\epsilon_f, \epsilon_g)$-stationary point-where $\epsilon_f$ and $\epsilon_g$ denote the target stationarity accuracies for the upper- and lower-level objectives, respectively-the considered method achieves a complexity of $\mathcal{O}\left(\max\left(\epsilon_f^{-\frac{3+p}{1+p}}, \epsilon_g^{-\frac{3+p}{2}}\right)\right)$, where $p \geq 0$ is an arbitrary constant balancing the terms. To the best of our knowledge, this is the first complexity result for a discrete-time algorithm that guarantees joint stationarity for both levels in general nonconvex simple bilevel problems.

cross Extended Factorization Machine Annealing for Rapid Discovery of Transparent Conducting Materials

Authors: Daisuke Makino, Tatsuya Goto, Yoshinori Suga

Abstract: The development of novel transparent conducting materials (TCMs) is essential for enhancing the performance and reducing the cost of next-generation devices such as solar cells and displays. In this research, we focus on the (Al$_x$Ga$_y$In$_z$)$_2$O$_3$ system and extend the FMA framework, which combines a Factorization Machine (FM) and annealing, to search for optimal compositions and crystal structures with high accuracy and low cost. The proposed method introduces (i) the binarization of continuous variables, (ii) the utilization of good solutions using a Hopfield network, (iii) the activation of global search through adaptive random flips, and (iv) fine-tuning via a bit-string local search. Validation using the (Al$_x$Ga$_y$In$_z$)$_2$O$_3$ data from the Kaggle "Nomad2018 Predicting Transparent Conductors" competition demonstrated that our method achieves faster and more accurate searches than Bayesian optimization and genetic algorithms. Furthermore, its application to multi-objective optimization showed its capability in designing materials by simultaneously considering both the band gap and formation energy. These results suggest that applying our method to larger, more complex search problems and diverse material designs that reflect realistic experimental conditions is expected to contribute to the further advancement of materials informatics.

cross LENS: Learning Ensemble Confidence from Neural States for Multi-LLM Answer Integration

Authors: Jizhou Guo

Abstract: Large Language Models (LLMs) have demonstrated impressive performance across various tasks, with different models excelling in distinct domains and specific abilities. Effectively combining the predictions of multiple LLMs is crucial for enhancing system robustness and performance. However, existing ensemble methods often rely on simple techniques like voting or logits ensembling, which overlook the varying confidence and reliability of models in different contexts. In this work, we propose LENS (Learning ENsemble confidence from Neural States), a novel approach that learns to estimate model confidence by analyzing internal representations. For each LLM, we train a lightweight linear confidence predictor that leverages layer-wise hidden states and normalized probabilities as inputs. This allows for more nuanced weighting of model predictions based on their context-dependent reliability. Our method does not require modifying the model parameters and requires negligible additional computation. Experimental results on multiple-choice and boolean question-answering tasks demonstrate that LENS outperforms traditional ensemble methods by a substantial margin. Our findings suggest that internal representations provide valuable signals for determining model confidence and can be effectively leveraged for ensemble learning.

cross CNN-based solution for mango classification in agricultural environments

Authors: Beatriz D\'iaz Pe\'on, Jorge Torres G\'omez, Ariel Fajardo M\'arquez

Abstract: This article exemplifies the design of a fruit detection and classification system using Convolutional Neural Networks (CNN). The goal is to develop a system that automatically assesses fruit quality for farm inventory management. Specifically, a method for mango fruit classification was developed using image processing, ensuring both accuracy and efficiency. Resnet-18 was selected as the preliminary architecture for classification, while a cascade detector was used for detection, balancing execution speed and computational resource consumption. Detection and classification results were displayed through a graphical interface developed in MatLab App Designer, streamlining system interaction. The integration of convolutional neural networks and cascade detectors proffers a reliable solution for fruit classification and detection, with potential applications in agricultural quality control.

cross Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

Authors: Jianghui Wang, Vinay Joshi, Saptarshi Majumder, Xu Chao, Bin Ding, Ziqiong Liu, Pratik Prabhanjan Brahma, Dong Li, Zicheng Liu, Emad Barsoum

Abstract: The demand for AI-generated GPU kernels is rapidly growing, influenced by the need for scalable, hardware-optimized solutions in both industry and academia. As deep learning workloads grow in complexity and diversity, it is imperative to automate low-level kernel development to meet performance and productivity demands. Major cloud providers, semiconductor companies, and research institutions are now investing heavily in AI-driven code generation for GPUs, aiming to reduce manual optimization efforts while achieving near-expert performance on hardware like AMD MI300X. The Triton language, a Python-based DSL for GPU programming, has emerged as a popular target for such AI-generated kernels due to its balance of performance and ease-of-coding. In this work, we present an evaluation suite for Triton-based GPU kernels and GEAK (Generating Efficient AI-centric GPU Kernels)-a framework that leverages cutting-edge LLMs to generate performant Triton code specifically for AMD GPUs, including the AMD MI300X and MI250. GEAK leverages inference-time compute scaling to produce Triton-based GPU kernels using a reasoning loop adapted from Reflexion-style feedback mechanisms. On two evaluation benchmarks, GEAK significantly outperformed the baselines of directly prompting frontier LLMs as well as Reflexion-based generation pipelines by achieving correctness up to $63$% and execution speed up of up to $2.59$X. These results highlight the promise of GEAK-like agentic code generation for accelerating the adoption of diverse hardware platforms and democratizing access to expert-level kernel performance.

cross Are Recommenders Self-Aware? Label-Free Recommendation Performance Estimation via Model Uncertainty

Authors: Jiayu Li, Ziyi Ye, Guohao Jian, Zhiqiang Guo, Weizhi Ma, Qingyao Ai, Min Zhang

Abstract: Can a recommendation model be self-aware? This paper investigates the recommender's self-awareness by quantifying its uncertainty, which provides a label-free estimation of its performance. Such self-assessment can enable more informed understanding and decision-making before the recommender engages with any users. To this end, we propose an intuitive and effective method, probability-based List Distribution uncertainty (LiDu). LiDu measures uncertainty by determining the probability that a recommender will generate a certain ranking list based on the prediction distributions of individual items. We validate LiDu's ability to represent model self-awareness in two settings: (1) with a matrix factorization model on a synthetic dataset, and (2) with popular recommendation algorithms on real-world datasets. Experimental results show that LiDu is more correlated with recommendation performance than a series of label-free performance estimators. Additionally, LiDu provides valuable insights into the dynamic inner states of models throughout training and inference. This work establishes an empirical connection between recommendation uncertainty and performance, framing it as a step towards more transparent and self-evaluating recommender systems.

cross Not Just What, But When: Integrating Irregular Intervals to LLM for Sequential Recommendation

Authors: Wei-Wei Du, Takuma Udagawa, Kei Tateno

Abstract: Time intervals between purchasing items are a crucial factor in sequential recommendation tasks, whereas existing approaches focus on item sequences and often overlook by assuming the intervals between items are static. However, dynamic intervals serve as a dimension that describes user profiling on not only the history within a user but also different users with the same item history. In this work, we propose IntervalLLM, a novel framework that integrates interval information into LLM and incorporates the novel interval-infused attention to jointly consider information of items and intervals. Furthermore, unlike prior studies that address the cold-start scenario only from the perspectives of users and items, we introduce a new viewpoint: the interval perspective to serve as an additional metric for evaluating recommendation methods on the warm and cold scenarios. Extensive experiments on 3 benchmarks with both traditional- and LLM-based baselines demonstrate that our IntervalLLM achieves not only 4.4% improvements in average but also the best-performing warm and cold scenarios across all users, items, and the proposed interval perspectives. In addition, we observe that the cold scenario from the interval perspective experiences the most significant performance drop among all recommendation methods. This finding underscores the necessity of further research on interval-based cold challenges and our integration of interval information in the realm of sequential recommendation tasks. Our code is available here: https://github.com/sony/ds-research-code/tree/master/recsys25-IntervalLLM.

URLs: https://github.com/sony/ds-research-code/tree/master/recsys25-IntervalLLM.

cross Model Directions, Not Words: Mechanistic Topic Models Using Sparse Autoencoders

Authors: Carolina Zheng, Nicolas Beltran-Velez, Sweta Karlekar, Claudia Shi, Achille Nazaret, Asif Mallik, Amir Feder, David M. Blei

Abstract: Traditional topic models are effective at uncovering latent themes in large text collections. However, due to their reliance on bag-of-words representations, they struggle to capture semantically abstract features. While some neural variants use richer representations, they are similarly constrained by expressing topics as word lists, which limits their ability to articulate complex topics. We introduce Mechanistic Topic Models (MTMs), a class of topic models that operate on interpretable features learned by sparse autoencoders (SAEs). By defining topics over this semantically rich space, MTMs can reveal deeper conceptual themes with expressive feature descriptions. Moreover, uniquely among topic models, MTMs enable controllable text generation using topic-based steering vectors. To properly evaluate MTM topics against word-list-based approaches, we propose \textit{topic judge}, an LLM-based pairwise comparison evaluation framework. Across five datasets, MTMs match or exceed traditional and neural baselines on coherence metrics, are consistently preferred by topic judge, and enable effective steering of LLM outputs.

cross Enabling Few-Shot Alzheimer's Disease Diagnosis on Tabular Biomarker Data with LLMs

Authors: Sophie Kearney, Shu Yang, Zixuan Wen, Bojian Hou, Duy Duong-Tran, Tianlong Chen, Jason Moore, Marylyn Ritchie, Li Shen

Abstract: Early and accurate diagnosis of Alzheimer's disease (AD), a complex neurodegenerative disorder, requires analysis of heterogeneous biomarkers (e.g., neuroimaging, genetic risk factors, cognitive tests, and cerebrospinal fluid proteins) typically represented in a tabular format. With flexible few-shot reasoning, multimodal integration, and natural-language-based interpretability, large language models (LLMs) offer unprecedented opportunities for prediction with structured biomedical data. We propose a novel framework called TAP-GPT, Tabular Alzheimer's Prediction GPT, that adapts TableGPT2, a multimodal tabular-specialized LLM originally developed for business intelligence tasks, for AD diagnosis using structured biomarker data with small sample sizes. Our approach constructs few-shot tabular prompts using in-context learning examples from structured biomedical data and finetunes TableGPT2 using the parameter-efficient qLoRA adaption for a clinical binary classification task of AD or cognitively normal (CN). The TAP-GPT framework harnesses the powerful tabular understanding ability of TableGPT2 and the encoded prior knowledge of LLMs to outperform more advanced general-purpose LLMs and a tabular foundation model (TFM) developed for prediction tasks. To our knowledge, this is the first application of LLMs to the prediction task using tabular biomarker data, paving the way for future LLM-driven multi-agent frameworks in biomedical informatics.

cross Generalized Reinforcement Learning for Retriever-Specific Query Rewriter with Unstructured Real-World Documents

Authors: Sungguk Cha, DongWook Kim, Taeseung Hahn, Mintae Kim, Youngsub Han, Byoung-Ki Jeon

Abstract: Retrieval-Augmented Generation (RAG) systems rely heavily on effective query formulation to unlock external knowledge, yet optimizing queries for diverse, unstructured real-world documents remains a challenge. We introduce \textbf{RL-QR}, a reinforcement learning framework for retriever-specific query rewriting that eliminates the need for human-annotated datasets and extends applicability to both text-only and multi-modal databases. By synthesizing scenario-question pairs and leveraging Generalized Reward Policy Optimization (GRPO), RL-QR trains query rewriters tailored to specific retrievers, enhancing retrieval performance across varied domains. Experiments on industrial in-house data demonstrate significant improvements, with $\text{RL-QR}_{\text{multi-modal}}$ achieving an 11\% relative gain in NDCG@3 for multi-modal RAG and $\text{RL-QR}_{\text{lexical}}$ yielding a 9\% gain for lexical retrievers. However, challenges persist with semantic and hybrid retrievers, where rewriters failed to improve performance, likely due to training misalignments. Our findings highlight RL-QR's potential to revolutionize query optimization for RAG systems, offering a scalable, annotation-free solution for real-world retrieval tasks, while identifying avenues for further refinement in semantic retrieval contexts.

cross Evaluating LLMs' Multilingual Capabilities for Bengali: Benchmark Creation and Performance Analysis

Authors: Shimanto Bhowmik, Tawsif Tashwar Dipto, Md Sazzad Islam, Sheryl Hsu, Tahsin Reasat

Abstract: Bengali is an underrepresented language in NLP research. However, it remains a challenge due to its unique linguistic structure and computational constraints. In this work, we systematically investigate the challenges that hinder Bengali NLP performance by focusing on the absence of standardized evaluation benchmarks. We then evaluated 10 recent open source Large Language Models (LLMs) in 8 of the translated datasets and performed a comprehensive error analysis to pinpoint their primary failure modes. Our findings reveal consistent performance gaps for Bengali compared to English, particularly for smaller models and specific model families like Mistral. We also identified promising robustness in certain architectures, such as DeepSeek, that maintain more stable performance across languages. Our analysis reveals an inverse relationship between tokenization efficiency and LLM accuracy where models tend to perform worse when inputs are excessively tokenized, whereas more efficient \& concise tokenization results in improved performance. These findings highlight critical areas where current models fall short and underscore the need for improved dataset quality and evaluation methodologies tailored to multilingual contexts. This work will catalyze further research on NLP for underrepresented languages, helping to democratize access to advanced language technologies worldwide. The code and dataset used in this research is publicly available at https://github.com/BengaliAI/bn-llm-benchmark.

URLs: https://github.com/BengaliAI/bn-llm-benchmark.

cross Simulation-based inference for Precision Neutrino Physics through Neural Monte Carlo tuning

Authors: A. Gavrikov, A. Serafini, D. Dolzhikov, A. Garfagnini, M. Gonchar, M. Grassi, R. Brugnera, V. Cerrone, L. V. D'Auria, R. M. Guizzetti, L. Lastrucci, G. Andronico, V. Antonelli, A. Barresi, D. Basilico, M. Beretta, A. Bergnoli, M. Borghesi, A. Brigatti, R. Bruno, A. Budano, B. Caccianiga, A. Cammi, R. Caruso, D. Chiesa, C. Clementi, C. Coletta, S. Dusini, A. Fabbri, G. Felici, G. Ferrante, M. G. Giammarchi, N. Giudice, N. Guardone, F. Houria, C. Landini, I. Lippi, L. Loi, P. Lombardi, F. Mantovani, S. M. Mari, A. Martini, L. Miramonti, M. Montuschi, M. Nastasi, D. Orestano, F. Ortica, A. Paoloni, L. Pelicci, E. Percalli, F. Petrucci, E. Previtali, G. Ranucci, A. C. Re, B. Ricci, A. Romani, C. Sirignano, M. Sisti, L. Stanco, E. Stanescu Farilla, V. Strati, M. D. C Torri, C. Tuv\`e, C. Venettacci, G. Verde, L. Votano

Abstract: Precise modeling of detector energy response is crucial for next-generation neutrino experiments which present computational challenges due to lack of analytical likelihoods. We propose a solution using neural likelihood estimation within the simulation-based inference framework. We develop two complementary neural density estimators that model likelihoods of calibration data: conditional normalizing flows and a transformer-based regressor. We adopt JUNO - a large neutrino experiment - as a case study. The energy response of JUNO depends on several parameters, all of which should be tuned, given their non-linear behavior and strong correlations in the calibration data. To this end, we integrate the modeled likelihoods with Bayesian nested sampling for parameter inference, achieving uncertainties limited only by statistics with near-zero systematic biases. The normalizing flows model enables unbinned likelihood analysis, while the transformer provides an efficient binned alternative. By providing both options, our framework offers flexibility to choose the most appropriate method for specific needs. Finally, our approach establishes a template for similar applications across experimental neutrino and broader particle physics.

cross Impact of Hyperparameter Optimization on the Accuracy of Lightweight Deep Learning Models for Real-Time Image Classification

Authors: Vineet Kumar Rakesh, Soumya Mazumdar, Tapas Samanta, Sarbajit Pal, Amitabha Das

Abstract: Lightweight convolutional and transformer-based models have become vital for real-time image classification in resource-constrained applications, such as embedded systems and edge devices. This work analyzes the influence of hyperparameter adjustment on the accuracy and convergence behavior of seven efficient deep learning architectures: EfficientNetV2-S, ConvNeXt-T, MobileViT v2 (XXS/XS/S), MobileNetV3-L, TinyViT-21M, and RepVGG-A2. All models are trained on the ImageNet-1K dataset under consistent training settings, with an emphasis on real-time practicality. An comprehensive ablation study is undertaken to separate the effect of critical hyperparameters, including learning rate schedules, batch sizes, input resolution, data augmentation, regularization approaches, and optimizer choice. To assess appropriateness for real-time applications, each model is assessed not only in terms of Top-1 and Top-5 classification accuracy, but also in terms of inference time, parameter count, model size, and frames-per-second (FPS) on a GPU-accelerated edge deployment simulation. Results demonstrate that cosine learning rate decay and adjustable batch size may greatly boost both accuracy and convergence speed, while keeping low latency and memory cost. Notably, RepVGG-A2 achieves over 80% Top-1 accuracy with efficient inference performance, offering a compelling balance between accuracy and deployment cost for VGG-style models. The results give practical guidance for constructing resource-efficient deep learning models appropriate for real-time image processing pipelines. All code and training logs are publicly accessible at https://github.com/VineetKumarRakesh/lcnn-opt.

URLs: https://github.com/VineetKumarRakesh/lcnn-opt.

cross MUST-RAG: MUSical Text Question Answering with Retrieval Augmented Generation

Authors: Daeyong Kwon, SeungHeon Doh, Juhan Nam

Abstract: Recent advancements in Large language models (LLMs) have demonstrated remarkable capabilities across diverse domains. While they exhibit strong zero-shot performance on various tasks, LLMs' effectiveness in music-related applications remains limited due to the relatively small proportion of music-specific knowledge in their training data. To address this limitation, we propose MusT-RAG, a comprehensive framework based on Retrieval Augmented Generation (RAG) to adapt general-purpose LLMs for text-only music question answering (MQA) tasks. RAG is a technique that provides external knowledge to LLMs by retrieving relevant context information when generating answers to questions. To optimize RAG for the music domain, we (1) propose MusWikiDB, a music-specialized vector database for the retrieval stage, and (2) utilizes context information during both inference and fine-tuning processes to effectively transform general-purpose LLMs into music-specific models. Our experiment demonstrates that MusT-RAG significantly outperforms traditional fine-tuning approaches in enhancing LLMs' music domain adaptation capabilities, showing consistent improvements across both in-domain and out-of-domain MQA benchmarks. Additionally, our MusWikiDB proves substantially more effective than general Wikipedia corpora, delivering superior performance and computational efficiency.

cross SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution

Authors: Han Li, Yuling Shi, Shaoxin Lin, Xiaodong Gu, Heng Lian, Xin Wang, Yantao Jia, Tao Huang, Qianxiang Wang

Abstract: Issue resolution has made remarkable progress thanks to the advanced reasoning capabilities of large language models (LLMs). Recently, agent-based frameworks such as SWE-agent have further advanced this progress by enabling autonomous, tool-using agents to tackle complex software engineering tasks. While existing agent-based issue resolution approaches are primarily based on agents' independent explorations, they often get stuck in local solutions and fail to identify issue patterns that span across different parts of the codebase. To address this limitation, we propose SWE-Debate, a competitive multi-agent debate framework that encourages diverse reasoning paths and achieves more consolidated issue localization. SWE-Debate first creates multiple fault propagation traces as localization proposals by traversing a code dependency graph. Then, it organizes a three-round debate among specialized agents, each embodying distinct reasoning perspectives along the fault propagation trace. This structured competition enables agents to collaboratively converge on a consolidated fix plan. Finally, this consolidated fix plan is integrated into an MCTS-based code modification agent for patch generation. Experiments on the SWE-bench benchmark show that SWE-Debate achieves new state-of-the-art results in open-source agent frameworks and outperforms baselines by a large margin.

cross Optimal Transport Learning: Balancing Value Optimization and Fairness in Individualized Treatment Rules

Authors: Wenhai Cui, Xiaoting Ji, Wen Su, Xiaodong Yan, Xingqiu Zhao

Abstract: Individualized treatment rules (ITRs) have gained significant attention due to their wide-ranging applications in fields such as precision medicine, ridesharing, and advertising recommendations. However, when ITRs are influenced by sensitive attributes such as race, gender, or age, they can lead to outcomes where certain groups are unfairly advantaged or disadvantaged. To address this gap, we propose a flexible approach based on the optimal transport theory, which is capable of transforming any optimal ITR into a fair ITR that ensures demographic parity. Recognizing the potential loss of value under fairness constraints, we introduce an ``improved trade-off ITR," designed to balance value optimization and fairness while accommodating varying levels of fairness through parameter adjustment. To maximize the value of the improved trade-off ITR under specific fairness levels, we propose a smoothed fairness constraint for estimating the adjustable parameter. Additionally, we establish a theoretical upper bound on the value loss for the improved trade-off ITR. We demonstrate performance of the proposed method through extensive simulation studies and application to the Next 36 entrepreneurial program dataset.

cross SWE-Exp: Experience-Driven Software Issue Resolution

Authors: Silin Chen, Shaoxin Lin, Xiaodong Gu, Yuling Shi, Heng Lian, Longfei Yun, Dong Chen, Weiguo Sun, Lin Cao, Qianxiang Wang

Abstract: Recent advances in large language model (LLM) agents have shown remarkable progress in software issue resolution, leveraging advanced techniques such as multi-agent collaboration and Monte Carlo Tree Search (MCTS). However, current agents act as memoryless explorers - treating each problem separately without retaining or reusing knowledge from previous repair experiences. This leads to redundant exploration of failed trajectories and missed chances to adapt successful issue resolution methods to similar problems. To address this problem, we introduce SWE-Exp, an experience - enhanced approach that distills concise and actionable experience from prior agent trajectories, enabling continuous learning across issues. Our method introduces a multi-faceted experience bank that captures both successful and failed repair attempts. Specifically, it extracts reusable issue resolution knowledge at different levels - from high-level problem comprehension to specific code changes. Experiments show that SWE-Exp achieves state-of-the-art resolution rate (41.6% Pass@1) on SWE-bench-Verified under open-source agent frameworks. Our approach establishes a new paradigm in which automated software engineering agents systematically accumulate and leverage repair expertise, fundamentally shifting from trial-and-error exploration to strategic, experience-driven issue resolution.

cross AGA: An adaptive group alignment framework for structured medical cross-modal representation learning

Authors: Wei Li, Xun Gong, Jiao Li, Xiaobin Sun

Abstract: Learning medical visual representations from paired images and reports is a promising direction in representation learning. However, current vision-language pretraining methods in the medical domain often simplify clinical reports into single entities or fragmented tokens, ignoring their inherent structure. In addition, contrastive learning frameworks typically depend on large quantities of hard negative samples, which is impractical for small-scale medical datasets. To tackle these challenges, we propose Adaptive Grouped Alignment (AGA), a new framework that captures structured semantics from paired medical images and reports. AGA introduces a bidirectional grouping mechanism based on a sparse similarity matrix. For each image-report pair, we compute fine-grained similarities between text tokens and image patches. Each token selects its top-matching patches to form a visual group, and each patch selects its most related tokens to form a language group. To enable adaptive grouping, we design two threshold gating modules, called Language Grouped Threshold Gate and Vision Grouped Threshold Gate, which learn grouping thresholds dynamically. Group representations are computed as weighted averages based on similarity scores. To align each token with its group representation, we introduce an Instance Aware Group Alignment loss that operates within each image-text pair, removing the need for external negatives. Finally, a Bidirectional Cross-modal Grouped Alignment module is applied to enhance fine-grained alignment between visual and linguistic group representations. Extensive experiments on public and private datasets show that our method achieves strong performance on image-text retrieval and classification tasks under both fine-tuning and zero-shot settings.

cross Honey Adulteration Detection using Hyperspectral Imaging and Machine Learning

Authors: Mokhtar A. Al-Awadhi, Ratnadeep R. Deshmukh

Abstract: This paper aims to develop a machine learning-based system for automatically detecting honey adulteration with sugar syrup, based on honey hyperspectral imaging data. First, the floral source of a honey sample is classified by a botanical origin identification subsystem. Then, the sugar syrup adulteration is identified, and its concentration is quantified by an adulteration detection subsystem. Both subsystems consist of two steps. The first step involves extracting relevant features from the honey sample using Linear Discriminant Analysis (LDA). In the second step, we utilize the K-Nearest Neighbors (KNN) model to classify the honey botanical origin in the first subsystem and identify the adulteration level in the second subsystem. We assess the proposed system performance on a public honey hyperspectral image dataset. The result indicates that the proposed system can detect adulteration in honey with an overall cross-validation accuracy of 96.39%, making it an appropriate alternative to the current chemical-based detection methods.

cross Adjoint-Based Aerodynamic Shape Optimization with a Manifold Constraint Learned by Diffusion Models

Authors: Long Chen, Emre Oezkaya, Jan Rottmayer, Nicolas R. Gauger, Zebang Shen, Yinyu Ye

Abstract: We introduce an adjoint-based aerodynamic shape optimization framework that integrates a diffusion model trained on existing designs to learn a smooth manifold of aerodynamically viable shapes. This manifold is enforced as an equality constraint to the shape optimization problem. Central to our method is the computation of adjoint gradients of the design objectives (e.g., drag and lift) with respect to the manifold space. These gradients are derived by first computing shape derivatives with respect to conventional shape design parameters (e.g., Hicks-Henne parameters) and then backpropagating them through the diffusion model to its latent space via automatic differentiation. Our framework preserves mathematical rigor and can be integrated into existing adjoint-based design workflows with minimal modification. Demonstrated on extensive transonic RANS airfoil design cases using off-the-shelf and general-purpose nonlinear optimizers, our approach eliminates ad hoc parameter tuning and variable scaling, maintains robustness across initialization and optimizer choices, and achieves superior aerodynamic performance compared to conventional approaches. This work establishes how AI generated priors integrates effectively with adjoint methods to enable robust, high-fidelity aerodynamic shape optimization through automatic differentiation.

cross Machine learning and machine learned prediction in chest X-ray images

Authors: Shereiff Garrett, Abhinav Adhikari, Sarina Gautam, DaShawn Marquis Morris, Chandra Mani Adhikari

Abstract: Machine learning and artificial intelligence are fast-growing fields of research in which data is used to train algorithms, learn patterns, and make predictions. This approach helps to solve seemingly intricate problems with significant accuracy without explicit programming by recognizing complex relationships in data. Taking an example of 5824 chest X-ray images, we implement two machine learning algorithms, namely, a baseline convolutional neural network (CNN) and a DenseNet-121, and present our analysis in making machine-learned predictions in predicting patients with ailments. Both baseline CNN and DenseNet-121 perform very well in the binary classification problem presented in this work. Gradient-weighted class activation mapping shows that DenseNet-121 correctly focuses on essential parts of the input chest X-ray images in its decision-making more than the baseline CNN.

cross H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation

Authors: Hongzhe Bi, Lingxuan Wu, Tianwei Lin, Hengkai Tan, Zhizhong Su, Hang Su, Jun Zhu

Abstract: Imitation learning for robotic manipulation faces a fundamental challenge: the scarcity of large-scale, high-quality robot demonstration data. Recent robotic foundation models often pre-train on cross-embodiment robot datasets to increase data scale, while they face significant limitations as the diverse morphologies and action spaces across different robot embodiments make unified training challenging. In this paper, we present H-RDT (Human to Robotics Diffusion Transformer), a novel approach that leverages human manipulation data to enhance robot manipulation capabilities. Our key insight is that large-scale egocentric human manipulation videos with paired 3D hand pose annotations provide rich behavioral priors that capture natural manipulation strategies and can benefit robotic policy learning. We introduce a two-stage training paradigm: (1) pre-training on large-scale egocentric human manipulation data, and (2) cross-embodiment fine-tuning on robot-specific data with modular action encoders and decoders. Built on a diffusion transformer architecture with 2B parameters, H-RDT uses flow matching to model complex action distributions. Extensive evaluations encompassing both simulation and real-world experiments, single-task and multitask scenarios, as well as few-shot learning and robustness assessments, demonstrate that H-RDT outperforms training from scratch and existing state-of-the-art methods, including Pi0 and RDT, achieving significant improvements of 13.9% and 40.5% over training from scratch in simulation and real-world experiments, respectively. The results validate our core hypothesis that human manipulation data can serve as a powerful foundation for learning bimanual robotic manipulation policies.

cross Consistent Point Matching

Authors: Halid Ziya Yerebakan, Gerardo Hermosillo Valadez

Abstract: This study demonstrates that incorporating a consistency heuristic into the point-matching algorithm \cite{yerebakan2023hierarchical} improves robustness in matching anatomical locations across pairs of medical images. We validated our approach on diverse longitudinal internal and public datasets spanning CT and MRI modalities. Notably, it surpasses state-of-the-art results on the Deep Lesion Tracking dataset. Additionally, we show that the method effectively addresses landmark localization. The algorithm operates efficiently on standard CPU hardware and allows configurable trade-offs between speed and robustness. The method enables high-precision navigation between medical images without requiring a machine learning model or training data.

cross DivControl: Knowledge Diversion for Controllable Image Generation

Authors: Yucheng Xie, Fu Feng, Ruixiao Shi, Jing Wang, Yong Rui, Xin Geng

Abstract: Diffusion models have advanced from text-to-image (T2I) to image-to-image (I2I) generation by incorporating structured inputs such as depth maps, enabling fine-grained spatial control. However, existing methods either train separate models for each condition or rely on unified architectures with entangled representations, resulting in poor generalization and high adaptation costs for novel conditions. To this end, we propose DivControl, a decomposable pretraining framework for unified controllable generation and efficient adaptation. DivControl factorizes ControlNet via SVD into basic components-pairs of singular vectors-which are disentangled into condition-agnostic learngenes and condition-specific tailors through knowledge diversion during multi-condition training. Knowledge diversion is implemented via a dynamic gate that performs soft routing over tailors based on the semantics of condition instructions, enabling zero-shot generalization and parameter-efficient adaptation to novel conditions. To further improve condition fidelity and training efficiency, we introduce a representation alignment loss that aligns condition embeddings with early diffusion features. Extensive experiments demonstrate that DivControl achieves state-of-the-art controllability with 36.4$\times$ less training cost, while simultaneously improving average performance on basic conditions. It also delivers strong zero-shot and few-shot performance on unseen conditions, demonstrating superior scalability, modularity, and transferability.

cross SAMSA: Segment Anything Model Enhanced with Spectral Angles for Hyperspectral Interactive Medical Image Segmentation

Authors: Alfie Roddan, Tobias Czempiel, Chi Xu, Daniel S. Elson, Stamatia Giannarou

Abstract: Hyperspectral imaging (HSI) provides rich spectral information for medical imaging, yet encounters significant challenges due to data limitations and hardware variations. We introduce SAMSA, a novel interactive segmentation framework that combines an RGB foundation model with spectral analysis. SAMSA efficiently utilizes user clicks to guide both RGB segmentation and spectral similarity computations. The method addresses key limitations in HSI segmentation through a unique spectral feature fusion strategy that operates independently of spectral band count and resolution. Performance evaluation on publicly available datasets has shown 81.0% 1-click and 93.4% 5-click DICE on a neurosurgical and 81.1% 1-click and 89.2% 5-click DICE on an intraoperative porcine hyperspectral dataset. Experimental results demonstrate SAMSA's effectiveness in few-shot and zero-shot learning scenarios and using minimal training examples. Our approach enables seamless integration of datasets with different spectral characteristics, providing a flexible framework for hyperspectral medical image analysis.

cross villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models

Authors: Xiaoyu Chen, Hangxing Wei, Pushi Zhang, Chuheng Zhang, Kaixin Wang, Yanjiang Guo, Rushuai Yang, Yucen Wang, Xinquan Xiao, Li Zhao, Jianyu Chen, Jiang Bian

Abstract: Visual-Language-Action (VLA) models have emerged as a popular paradigm for learning robot manipulation policies that can follow language instructions and generalize to novel scenarios. Recent work has begun to explore the incorporation of latent actions, an abstract representation of visual change between two frames, into VLA pre-training. In this paper, we introduce villa-X, a novel Visual-Language-Latent-Action (ViLLA) framework that advances latent action modeling for learning generalizable robot manipulation policies. Our approach improves both how latent actions are learned and how they are incorporated into VLA pre-training. Together, these contributions enable villa-X to achieve superior performance across simulated environments including SIMPLER and LIBERO, as well as on two real-world robot setups including gripper and dexterous hand manipulation. We believe the ViLLA paradigm holds significant promise, and that our villa-X provides a strong foundation for future research.

cross DICOM De-Identification via Hybrid AI and Rule-Based Framework for Scalable, Uncertainty-Aware Redaction

Authors: Kyle Naddeo, Nikolas Koutsoubis, Rahul Krish, Ghulam Rasool, Nidhal Bouaynaya, Tony OSullivan, Raj Krish

Abstract: Access to medical imaging and associated text data has the potential to drive major advances in healthcare research and patient outcomes. However, the presence of Protected Health Information (PHI) and Personally Identifiable Information (PII) in Digital Imaging and Communications in Medicine (DICOM) files presents a significant barrier to the ethical and secure sharing of imaging datasets. This paper presents a hybrid de-identification framework developed by Impact Business Information Solutions (IBIS) that combines rule-based and AI-driven techniques, and rigorous uncertainty quantification for comprehensive PHI/PII removal from both metadata and pixel data. Our approach begins with a two-tiered rule-based system targeting explicit and inferred metadata elements, further augmented by a large language model (LLM) fine-tuned for Named Entity Recognition (NER), and trained on a suite of synthetic datasets simulating realistic clinical PHI/PII. For pixel data, we employ an uncertainty-aware Faster R-CNN model to localize embedded text, extract candidate PHI via Optical Character Recognition (OCR), and apply the NER pipeline for final redaction. Crucially, uncertainty quantification provides confidence measures for AI-based detections to enhance automation reliability and enable informed human-in-the-loop verification to manage residual risks. This uncertainty-aware deidentification framework achieves robust performance across benchmark datasets and regulatory standards, including DICOM, HIPAA, and TCIA compliance metrics. By combining scalable automation, uncertainty quantification, and rigorous quality assurance, our solution addresses critical challenges in medical data de-identification and supports the secure, ethical, and trustworthy release of imaging data for research.

cross Rule2Text: Natural Language Explanation of Logical Rules in Knowledge Graphs

Authors: Nasim Shirvani-Mahdavi, Devin Wingfield, Amin Ghasemi, Chengkai Li

Abstract: Knowledge graphs (KGs) often contain sufficient information to support the inference of new facts. Identifying logical rules not only improves the completeness of a knowledge graph but also enables the detection of potential errors, reveals subtle data patterns, and enhances the overall capacity for reasoning and interpretation. However, the complexity of such rules, combined with the unique labeling conventions of each KG, can make them difficult for humans to understand. In this paper, we explore the potential of large language models to generate natural language explanations for logical rules. Specifically, we extract logical rules using the AMIE 3.5.1 rule discovery algorithm from the benchmark dataset FB15k-237 and two large-scale datasets, FB-CVT-REV and FB+CVT-REV. We examine various prompting strategies, including zero- and few-shot prompting, including variable entity types, and chain-of-thought reasoning. We conduct a comprehensive human evaluation of the generated explanations based on correctness, clarity, and hallucination, and also assess the use of large language models as automatic judges. Our results demonstrate promising performance in terms of explanation correctness and clarity, although several challenges remain for future research. All scripts and data used in this study are publicly available at https://github.com/idirlab/KGRule2NL}{https://github.com/idirlab/KGRule2NL.

URLs: https://github.com/idirlab/KGRule2NL, https://github.com/idirlab/KGRule2NL.

cross Scaled Beta Models and Feature Dilution for Dynamic Ticket Pricing

Authors: Jonathan R. Landers

Abstract: A novel approach is presented for identifying distinct signatures of performing acts in the secondary ticket resale market by analyzing dynamic pricing distributions. Using a newly curated, time series dataset from the SeatGeek API, we model ticket pricing distributions as scaled Beta distributions. This enables accurate parameter estimation from incomplete statistical data using a hybrid of quantile matching and the method of moments. Incorporating the estimated $\alpha$ and $\beta$ parameters into Random Forest classifiers significantly improves pairwise artist classification accuracy, demonstrating the unique economic signatures in event pricing data. Additionally, we provide theoretical and empirical evidence that incorporating zero-variance (constant-value) features into Random Forest models acts as an implicit regularizer, enhancing feature variety and robustness. This regularization promotes deeper, more varied trees in the ensemble, improving the bias-variance tradeoff and mitigating overfitting to dominant features. These findings are validated on both the new ticket pricing dataset and the standard UCI ML handwritten digits dataset.

cross Formal Bayesian Transfer Learning via the Total Risk Prior

Authors: Nathan Wycoff, Ali Arab, Lisa O. Singh

Abstract: In analyses with severe data-limitations, augmenting the target dataset with information from ancillary datasets in the application domain, called source datasets, can lead to significantly improved statistical procedures. However, existing methods for this transfer learning struggle to deal with situations where the source datasets are also limited and not guaranteed to be well-aligned with the target dataset. A typical strategy is to use the empirical loss minimizer on the source data as a prior mean for the target parameters, which places the estimation of source parameters outside of the Bayesian formalism. Our key conceptual contribution is to use a risk minimizer conditional on source parameters instead. This allows us to construct a single joint prior distribution for all parameters from the source datasets as well as the target dataset. As a consequence, we benefit from full Bayesian uncertainty quantification and can perform model averaging via Gibbs sampling over indicator variables governing the inclusion of each source dataset. We show how a particular instantiation of our prior leads to a Bayesian Lasso in a transformed coordinate system and discuss computational techniques to scale our approach to moderately sized datasets. We also demonstrate that recently proposed minimax-frequentist transfer learning techniques may be viewed as an approximate Maximum a Posteriori approach to our model. Finally, we demonstrate superior predictive performance relative to the frequentist baseline on a genetics application, especially when the source data are limited.

cross SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model

Authors: Mingkai Deng, Jinyu Hou, Yilin Shen, Hongxia Jin, Graham Neubig, Zhiting Hu, Eric Xing

Abstract: AI agents built on large language models (LLMs) hold enormous promise, but current practice focuses on a one-task-one-agent approach, which not only falls short of scalability and generality, but also suffers from the fundamental limitations of autoregressive LLMs. On the other hand, humans are general agents who reason by mentally simulating the outcomes of their actions and plans. Moving towards a more general and powerful AI agent, we introduce SimuRA, a goal-oriented architecture for generalized agentic reasoning. Based on a principled formulation of optimal agent in any environment, \modelname overcomes the limitations of autoregressive reasoning by introducing a world model for planning via simulation. The generalized world model is implemented using LLM, which can flexibly plan in a wide range of environments using the concept-rich latent space of natural language. Experiments on difficult web browsing tasks show that \modelname improves the success of flight search from 0\% to 32.2\%. World-model-based planning, in particular, shows consistent advantage of up to 124\% over autoregressive planning, demonstrating the advantage of world model simulation as a reasoning paradigm. We are excited about the possibility for training a single, general agent model based on LLMs that can act superintelligently in all environments. To start, we make SimuRA, a web-browsing agent built on \modelname with pretrained LLMs, available as a research demo for public testing.

cross XSpecMesh: Quality-Preserving Auto-Regressive Mesh Generation Acceleration via Multi-Head Speculative Decoding

Authors: Dian Chen, Yansong Qu, Xinyang Li, Ming Li, Shengchuan Zhang

Abstract: Current auto-regressive models can generate high-quality, topologically precise meshes; however, they necessitate thousands-or even tens of thousands-of next-token predictions during inference, resulting in substantial latency. We introduce XSpecMesh, a quality-preserving acceleration method for auto-regressive mesh generation models. XSpecMesh employs a lightweight, multi-head speculative decoding scheme to predict multiple tokens in parallel within a single forward pass, thereby accelerating inference. We further propose a verification and resampling strategy: the backbone model verifies each predicted token and resamples any tokens that do not meet the quality criteria. In addition, we propose a distillation strategy that trains the lightweight decoding heads by distilling from the backbone model, encouraging their prediction distributions to align and improving the success rate of speculative predictions. Extensive experiments demonstrate that our method achieves a 1.7x speedup without sacrificing generation quality. Our code will be released.

cross SUB: Benchmarking CBM Generalization via Synthetic Attribute Substitutions

Authors: Jessica Bader, Leander Girrbach, Stephan Alaniz, Zeynep Akata

Abstract: Concept Bottleneck Models (CBMs) and other concept-based interpretable models show great promise for making AI applications more transparent, which is essential in fields like medicine. Despite their success, we demonstrate that CBMs struggle to reliably identify the correct concepts under distribution shifts. To assess the robustness of CBMs to concept variations, we introduce SUB: a fine-grained image and concept benchmark containing 38,400 synthetic images based on the CUB dataset. To create SUB, we select a CUB subset of 33 bird classes and 45 concepts to generate images which substitute a specific concept, such as wing color or belly pattern. We introduce a novel Tied Diffusion Guidance (TDG) method to precisely control generated images, where noise sharing for two parallel denoising processes ensures that both the correct bird class and the correct attribute are generated. This novel benchmark enables rigorous evaluation of CBMs and similar interpretable models, contributing to the development of more robust methods. Our code is available at https://github.com/ExplainableML/sub and the dataset at http://huggingface.co/datasets/Jessica-bader/SUB.

URLs: https://github.com/ExplainableML/sub, http://huggingface.co/datasets/Jessica-bader/SUB.

replace Disparate Conditional Prediction in Multiclass Classifiers

Authors: Sivan Sabato, Eran Treister, Elad Yom-Tov

Abstract: We propose methods for auditing multiclass classifiers for fairness under multiclass equalized odds,by estimating the deviation from equalized odds when the classifier is not completely fair. We generalize to multiclass classifiers the measure of Disparate Conditional Prediction (DCP), originally suggested by Sabato & Yom-Tov (2020) for binary classifiers. DCP is defined as the fraction of the population for which the classifier predicts with conditional prediction probabilities that differ from the closest common baseline. We provide new local-optimization methods for estimating the multiclass DCPunder two different regimes,one in which the conditional confusion matrices for each protected sub-population are known, and one in which these cannot be estimated, for instance, because the classifier is inaccessible or because good-quality individual-level data is not available. These methods can be used to detect classifiers that likely treat a significant fraction of the population unfairly. Experiments demonstrate the accuracy of the methods. Code is provided at https://github.com/sivansabato/ DCPmulticlass.

URLs: https://github.com/sivansabato/

replace Insights into Closed-form IPM-GAN Discriminator Guidance for Diffusion Modeling

Authors: Aadithya Srikanth, Siddarth Asokan, Nishanth Shetty, Chandra Sekhar Seelamantula

Abstract: Diffusion models are a state-of-the-art generative modeling framework that transform noise to images via Langevin sampling, guided by the score, which is the gradient of the logarithm of the data distribution. Recent works have shown empirically that the generation quality can be improved when guided by classifier network, which is typically the discriminator trained in a generative adversarial network (GAN) setting. In this paper, we propose a theoretical framework to analyze the effect of the GAN discriminator on Langevin-based sampling, and show that the IPM-GAN optimization can be seen as one of smoothed score-matching, wherein the scores of the data and the generator distributions are convolved with the kernel function associated with the IPM. The proposed approach serves to unify score-based training and optimization of IPM-GANs. Based on these insights, we demonstrate that closed-form kernel-based discriminator guidance, results in improvements (in terms of CLIP-FID and KID metrics) when applied atop baseline diffusion models. We demonstrate these results on the denoising diffusion implicit model (DDIM) and latent diffusion model (LDM) settings on various standard datasets. We also show that the proposed approach can be combined with existing accelerated-diffusion techniques to improve latent-space image generation.

replace Optimal and Near-Optimal Adaptive Vector Quantization

Authors: Ran Ben-Basat, Yaniv Ben-Itzhak, Michael Mitzenmacher, Shay Vargaftik

Abstract: Quantization is a fundamental optimization for many machine-learning use cases, including compressing gradients, model weights and activations, and datasets. The most accurate form of quantization is \emph{adaptive}, where the error is minimized with respect to a given input, rather than optimizing for the worst case. However, optimal adaptive quantization methods are considered infeasible in terms of both their runtime and memory requirements. We revisit the Adaptive Vector Quantization (AVQ) problem and present algorithms that find optimal solutions with asymptotically improved time and space complexity. We also present an even faster near-optimal algorithm for large inputs. Our experiments show our algorithms may open the door to using AVQ more extensively in a variety of machine learning applications.

replace Molecule Graph Networks with Many-body Equivariant Interactions

Authors: Zetian Mao, Chuan-Shen Hu, Jiawen Li, Chen Liang, Diptesh Das, Masato Sumita, Kelin Xia, Koji Tsuda

Abstract: Message passing neural networks have demonstrated significant efficacy in predicting molecular interactions. Introducing equivariant vectorial representations augments expressivity by capturing geometric data symmetries, thereby improving model accuracy. However, two-body bond vectors in opposition may cancel each other out during message passing, leading to the loss of directional information on their shared node. In this study, we develop Equivariant N-body Interaction Networks (ENINet) that explicitly integrates l = 1 equivariant many-body interactions to enhance directional symmetric information in the message passing scheme. We provided a mathematical analysis demonstrating the necessity of incorporating many-body equivariant interactions and generalized the formulation to $N$-body interactions. Experiments indicate that integrating many-body equivariant representations enhances prediction accuracy across diverse scalar and tensorial quantum chemical properties.

replace Deciphering interventional dynamical causality from non-intervention complex systems

Authors: Jifan Shi, Yang Li, Juan Zhao, Siyang Leng, Rui Bao, Kazuyuki Aihara, Luonan Chen, Wei Lin

Abstract: Detecting and quantifying causality is a focal topic in the fields of science, engineering, and interdisciplinary studies. However, causal studies on non-intervention systems attract much attention but remain extremely challenging. Delay-embedding technique provides a promising approach. In this study, we propose a framework named Interventional Dynamical Causality (IntDC) in contrast to the traditional Constructive Dynamical Causality (ConDC). ConDC, including Granger causality, transfer entropy and convergence of cross-mapping, measures the causality by constructing a dynamical model without considering interventions. A computational criterion, Interventional Embedding Entropy (IEE), is proposed to measure causal strengths in an interventional manner. IEE is an intervened causal information flow but in the delay-embedding space. Further, the IEE theoretically and numerically enables the deciphering of IntDC solely from observational (non-interventional) time-series data, without requiring any knowledge of dynamical models or real interventions in the considered system. In particular, IEE can be applied to rank causal effects according to their importance and construct causal networks from data. We conducted numerical experiments to demonstrate that IEE can find causal edges accurately, eliminate effects of confounding, and quantify causal strength robustly over traditional indices. We also applied IEE to real-world tasks. IEE performed as an accurate and robust tool for causal analyses solely from the observational data. The IntDC framework and IEE algorithm provide an efficient approach to the study of causality from time series in diverse non-intervention complex systems.

replace Artificial Inductive Bias for Synthetic Tabular Data Generation in Data-Scarce Scenarios

Authors: Patricia A. Apell\'aniz, Ana Jim\'enez, Borja Arroyo Galende, Juan Parras, Santiago Zazo

Abstract: While synthetic tabular data generation using Deep Generative Models (DGMs) offers a compelling solution to data scarcity and privacy concerns, their effectiveness relies on the availability of substantial training data, often lacking in real-world scenarios. To overcome this limitation, we propose a novel methodology that explicitly integrates artificial inductive biases into the generative process to improve data quality in low-data regimes. Our framework leverages transfer learning and meta-learning techniques to construct and inject informative inductive biases into DGMs. We evaluate four approaches (pre-training, model averaging, Model-Agnostic Meta-Learning (MAML), and Domain Randomized Search (DRS)) and analyze their impact on the quality of the generated text. Experimental results show that incorporating inductive bias substantially improves performance, with transfer learning methods outperforming meta-learning, achieving up to 60\% gains in Jensen-Shannon divergence. The methodology is model-agnostic and especially relevant in domains such as healthcare and finance, where high-quality synthetic data are essential, and data availability is often limited.

replace Parallel Split Learning with Global Sampling

Authors: Mohammad Kohankhaki, Ahmad Ayad, Mahdi Barhoush, Anke Schmeink

Abstract: Distributed deep learning in resource-constrained environments faces scalability and generalization challenges due to large effective batch sizes and non-identically distributed client data. We introduce a server-driven sampling strategy that maintains a fixed global batch size by dynamically adjusting client-side batch sizes. This decouples the effective batch size from the number of participating devices and ensures that global batches better reflect the overall data distribution. Using standard concentration bounds, we establish tighter deviation guarantees compared to existing approaches. Empirical results on a benchmark dataset confirm that the proposed method improves model accuracy, training efficiency, and convergence stability, offering a scalable solution for learning at the network edge.

replace Coarse Graining with Neural Operators for Simulating Chaotic Systems

Authors: Chuwei Wang, Julius Berner, Zongyi Li, Di Zhou, Jiayun Wang, Jane Bae, Anima Anandkumar

Abstract: Accurately predicting the long-term behavior of chaotic systems is crucial for various applications such as climate modeling. However, achieving such predictions typically requires iterative computations over a dense spatiotemporal grid to account for the unstable nature of chaotic systems, which is expensive and impractical in many real-world situations. An alternative approach to such a full-resolved simulation is using a coarse grid and then correcting its errors through a \textit{closure model}, which approximates the overall information from fine scales not captured in the coarse-grid simulation. Recently, ML approaches have been used for closure modeling, but they typically require a large number of training samples from expensive fully-resolved simulations (FRS). In this work, we prove an even more fundamental limitation, i.e., the standard approach to learning closure models suffers from a large approximation error for generic problems, no matter how large the model is, and it stems from the non-uniqueness of the mapping. We propose an alternative end-to-end learning approach using a physics-informed neural operator (PINO) that overcomes this limitation by not using a closure model or a coarse-grid solver. We first train the PINO model on data from a coarse-grid solver and then fine-tune it with (a small amount of) FRS and physics-based losses on a fine grid. The discretization-free nature of neural operators means that they do not suffer from the restriction of a coarse grid that closure models face, and they can provably approximate the long-term statistics of chaotic systems. In our experiments, our PINO model achieves a 330x speedup compared to FRS with a relative error $\sim 10\%$. In contrast, the closure model coupled with a coarse-grid solver is $60$x slower than PINO while having a much higher error $\sim186\%$ when the closure model is trained on the same FRS dataset.

replace On the Approximation of Stationary Processes using the ARMA Model

Authors: Anand Ganesh, Babhrubahan Bose, Anand Rajagopalan

Abstract: We look at a problem related to Autoregressive Moving Average (ARMA) models, on quantifying the approximation error between a true stationary process $X_t$ and an ARMA model $Y_t$. We take the transfer function representation $x(L)$ of a stationary process $X_t$ and show that the $L^{\infty}$ norm of $x$ acts as a valid norm on $X_t$ that controls the $\ell^2$ norm of its Wold coefficients. We then show that a certain subspace of stationary processes, which includes ARMA models, forms a Banach algebra under the $L^{\infty}$ norm that respects the multiplicative structure of $H^{\infty}$ transfer functions and thus improves on the structural properties of the cepstral norm for ARMA models. The natural definition of invertibility in this algebra is consistent with the original definition of ARMA invertibility, and generalizes better to non-ARMA processes than Wiener's $\ell^1$ condition. Finally, we calculate some explicit approximation bounds in the simpler context of continuous transfer functions, and critique some heuristic ideas on Pad\'e approximations and parsimonious models.

replace Accumulator-Aware Post-Training Quantization for Large Language Models

Authors: Ian Colbert, Giuseppe Franco, Fabian Grob, Jinjie Zhang, Rayan Saab

Abstract: When quantizing weights and activations to increasingly narrower representations, the cost of additions begins to dominate that of multiplications in multiply-accumulate (MAC) units. Recent studies show that reducing addition costs via low-precision accumulation improves throughput, power, and area across inference platforms, albeit with an increased risk of overflow. Accumulator-aware quantization research has so far only considered the quantization-aware training (QAT) paradigm, in which models are fine-tuned or trained from scratch with quantization in the loop. As models and datasets continue to grow in size, QAT techniques become increasingly more expensive, which has motivated the recent surge in post-training quantization (PTQ) research. To bridge this gap, we introduce AXE, the first accumulator-aware quantization framework explicitly designed to endow overflow avoidance guarantees to PTQ algorithms. We present theoretical motivation for AXE and demonstrate its flexibility by implementing it on top of two existing algorithms: GPFQ and OPTQ. We design AXE to support multi-stage accumulation, opening the door to full datapath optimization for the first time. We evaluate AXE using recent language generation models; when quantizing Llama3 8B for a 16-bit multi-stage accumulation datapath, AXE maintains up to 98% of the FP16 perplexity, surpassing naive bit width manipulation by up to 15%.

replace ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning

Authors: Yarden As, Bhavya Sukhija, Lenart Treven, Carmelo Sferrazza, Stelian Coros, Andreas Krause

Abstract: Reinforcement learning (RL) is ubiquitous in the development of modern AI systems. However, state-of-the-art RL agents require extensive, and potentially unsafe, interactions with their environments to learn effectively. These limitations confine RL agents to simulated environments, hindering their ability to learn directly in real-world settings. In this work, we present ActSafe, a novel model-based RL algorithm for safe and efficient exploration. ActSafe learns a well-calibrated probabilistic model of the system and plans optimistically w.r.t. the epistemic uncertainty about the unknown dynamics, while enforcing pessimism w.r.t. the safety constraints. Under regularity assumptions on the constraints and dynamics, we show that ActSafe guarantees safety during learning while also obtaining a near-optimal policy in finite time. In addition, we propose a practical variant of ActSafe that builds on latest model-based RL advancements and enables safe exploration even in high-dimensional settings such as visual control. We empirically show that ActSafe obtains state-of-the-art performance in difficult exploration tasks on standard safe deep RL benchmarks while ensuring safety during learning.

replace Electricity Price Prediction Using Multi-Kernel Gaussian Process Regression Combined with Kernel-Based Support Vector Regression

Authors: Abhinav Das, Stephan Schl\"uter, Lorenz Schneider

Abstract: This paper presents a new hybrid model for predicting German electricity prices. The algorithm is based on a combination of Gaussian Process Regression (GPR) and Support Vector Regression (SVR). Although GPR is a competent model for learning stochastic patterns within data and for interpolation, its performance for out-of-sample data is not very promising. By choosing a suitable data-dependent covariance function, we can enhance the performance of GPR for the German hourly power prices being tested. However, since the out-of-sample prediction is dependent on the training data, the prediction is vulnerable to noise and outliers. To overcome this issue, a separate prediction is calculated using SVR, which applies margin-based optimization. This method is advantageous when dealing with non-linear processes and outliers, since only certain necessary points (support vectors) in the training data are responsible for regression. The individual predictions are then linearly combined using uniform weights. When tested on historic German power prices, this approach outperforms the publicly available benchmarks, namely the LASSO estimated autoregressive regression model, deep neural network provided in the recent research by [1].

replace MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Authors: Bhavya Sukhija, Stelian Coros, Andreas Krause, Pieter Abbeel, Carmelo Sferrazza

Abstract: Reinforcement learning (RL) algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. Most common RL algorithms use undirected exploration, i.e., select random sequences of actions. Exploration can also be directed using intrinsic rewards, such as curiosity or model epistemic uncertainty. However, effectively balancing task and intrinsic rewards is challenging and often task-dependent. In this work, we introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. MaxInfoRL steers exploration towards informative transitions, by maximizing intrinsic rewards such as the information gain about the underlying task. When combined with Boltzmann exploration, this approach naturally trades off maximization of the value function with that of the entropy over states, rewards, and actions. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits. We then apply this general formulation to a variety of off-policy model-free RL methods for continuous state-action spaces, yielding novel algorithms that achieve superior performance across hard exploration problems and complex scenarios such as visual control tasks.

replace InfAlign: Inference-aware language model alignment

Authors: Ananth Balashankar, Ziteng Sun, Jonathan Berant, Jacob Eisenstein, Michael Collins, Adrian Hutter, Jong Lee, Chirag Nagpal, Flavien Prost, Aradhana Sinha, Ananda Theertha Suresh, Ahmad Beirami

Abstract: Language model alignment is a critical step in training modern generative language models. Alignment targets to improve win rate of a sample from the aligned model against the base model. Today, we are increasingly using inference-time algorithms (e.g., Best-of-N, controlled decoding, tree search) to decode from language models rather than standard sampling. We show that this train/test mismatch makes standard RLHF framework sub-optimal in view of such inference-time methods. To this end, we propose a framework for inference-aware alignment (InfAlign), which aims to optimize inference-time win rate of the aligned policy against the base model. We prove that for any inference-time decoding procedure, the optimal aligned policy is the solution to the standard RLHF problem with a transformation of the reward. This motivates us to provide the calibrate-and-transform RL (InfAlign-CTRL) algorithm to solve this problem, which involves a reward calibration step and a KL-regularized reward maximization step with a transformation of the calibrated reward. For best-of-N sampling and best-of-N jailbreaking, we propose specific transformations offering up to 3-8% improvement on inference-time win rates. Finally, we also show that our proposed reward calibration method is a strong baseline for optimizing standard win rate.

replace Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models

Authors: Zerui Tao, Yuhta Takida, Naoki Murata, Qibin Zhao, Yuki Mitsufuji

Abstract: Parameter-Efficient Fine-Tuning (PEFT) of text-to-image models has become an increasingly popular technique with many applications. Among the various PEFT methods, Low-Rank Adaptation (LoRA) and its variants have gained significant attention due to their effectiveness, enabling users to fine-tune models with limited computational resources. However, the approximation gap between the low-rank assumption and desired fine-tuning weights prevents the simultaneous acquisition of ultra-parameter-efficiency and better performance. To reduce this gap and further improve the power of LoRA, we propose a new PEFT method that combines two classes of adaptations, namely, transform and residual adaptations. In specific, we first apply a full-rank and dense transform to the pre-trained weight. This learnable transform is expected to align the pre-trained weight as closely as possible to the desired weight, thereby reducing the rank of the residual weight. Then, the residual part can be effectively approximated by more compact and parameter-efficient structures, with a smaller approximation error. To achieve ultra-parameter-efficiency in practice, we design highly flexible and effective tensor decompositions for both the transform and residual adaptations. Additionally, popular PEFT methods such as DoRA can be summarized under this transform plus residual adaptation scheme. Experiments are conducted on fine-tuning Stable Diffusion models in subject-driven and controllable generation. The results manifest that our method can achieve better performances and parameter efficiency compared to LoRA and several baselines.

replace Advancing Generative Artificial Intelligence and Large Language Models for Demand Side Management with Internet of Electric Vehicles

Authors: Hanwen Zhang, Ruichen Zhang, Wei Zhang, Dusit Niyato, Yonggang Wen, Chunyan Miao

Abstract: Generative artificial intelligence, particularly through large language models (LLMs), is poised to transform energy optimization and demand side management (DSM) within microgrids. This paper explores the integration of LLMs into energy management, emphasizing their roles in automating the optimization of DSM strategies with Internet of electric vehicles. We investigate challenges and solutions associated with DSM and explore the new opportunities presented by leveraging LLMs. Then, we propose an innovative solution that enhances LLMs with retrieval-augmented generation for automatic problem formulation, code generation, and customizing optimization. We present a case study to demonstrate the effectiveness of our proposed solution in charging scheduling and optimization for electric vehicles, highlighting our solution's significant advancements in energy efficiency and user adaptability. This work underscores the potential of LLMs for energy optimization and fosters a new era of intelligent DSM solutions.

replace Tailored Forecasting from Short Time Series via Meta-learning

Authors: Declan A. Norton, Edward Ott, Andrew Pomerance, Brian Hunt, Michelle Girvan

Abstract: Machine learning models can effectively forecast dynamical systems from time-series data, but they typically require large amounts of past data, making forecasting particularly challenging for systems with limited history. To overcome this, we introduce Meta-learning for Tailored Forecasting using Related Time Series (METAFORS), which generalizes knowledge across systems to enable forecasting in data-limited scenarios. By learning from a library of models trained on longer time series from potentially related systems, METAFORS builds and initializes a model tailored to short time-series data from the system of interest. Using a reservoir computing implementation and testing on simulated chaotic systems, we demonstrate that METAFORS can reliably predict both short-term dynamics and long-term statistics without requiring contextual labels. We see this even when test and related systems exhibit substantially different behaviors, highlighting METAFORS' strengths in data-limited scenarios.

replace Achieving Deep Continual Learning via Evolution

Authors: Aojun Lu, Junchao Ke, Chunhui Ding, Jiahao Fan, Jiancheng Lv, Yanan Sun

Abstract: Deep neural networks, despite their remarkable success, remain fundamentally limited in their ability to perform Continual Learning (CL). While most current methods aim to enhance the capabilities of a single model, Inspired by the collective learning mechanisms of human populations, we introduce Evolving Continual Learning (ECL), a framework that maintains and evolves a diverse population of neural network models. ECL continually searches for an optimal architecture for each introduced incremental task. This tailored model is trained on the corresponding task and archived as a specialized expert, contributing to a growing collection of skills. This approach inherently resolves the core CL challenges: stability is achieved through the isolation of expert models, while plasticity is greatly enhanced by evolving unique, task-specific architectures. Experimental results demonstrate that ECL significantly outperforms state-of-the-art individual-level CL methods. By shifting the focus from individual adaptation to collective evolution, ECL presents a novel path toward AI systems capable of CL.

replace Kandinsky Conformal Prediction: Beyond Class- and Covariate-Conditional Coverage

Authors: Konstantina Bairaktari, Jiayun Wu, Zhiwei Steven Wu

Abstract: Conformal prediction is a powerful distribution-free framework for constructing prediction sets with coverage guarantees. Classical methods, such as split conformal prediction, provide marginal coverage, ensuring that the prediction set contains the label of a random test point with a target probability. However, these guarantees may not hold uniformly across different subpopulations, leading to disparities in coverage. Prior work has explored coverage guarantees conditioned on events related to the covariates and label of the test point. We present Kandinsky conformal prediction, a framework that significantly expands the scope of conditional coverage guarantees. In contrast to Mondrian conformal prediction, which restricts its coverage guarantees to disjoint groups -- reminiscent of the rigid, structured grids of Piet Mondrian's art -- our framework flexibly handles overlapping and fractional group memberships defined jointly on covariates and labels, reflecting the layered, intersecting forms in Wassily Kandinsky's compositions. Our algorithm unifies and extends existing methods, encompassing covariate-based group conditional, class conditional, and Mondrian conformal prediction as special cases, while achieving a minimax-optimal high-probability conditional coverage bound. Finally, we demonstrate the practicality of our approach through empirical evaluation on real-world datasets.

replace Decision by Supervised Learning with Deep Ensembles: A Practical Framework for Robust Portfolio Optimization

Authors: Juhyeong Kim, Sungyoon Choi, Youngbin Lee, Yejin Kim, Yongmin Choi, Yongjae Lee

Abstract: We propose Decision by Supervised Learning (DSL), a practical framework for robust portfolio optimization. DSL reframes portfolio construction as a supervised learning problem: models are trained to predict optimal portfolio weights, using cross-entropy loss and portfolios constructed by maximizing the Sharpe or Sortino ratio. To further enhance stability and reliability, DSL employs Deep Ensemble methods, substantially reducing variance in portfolio allocations. Through comprehensive backtesting across diverse market universes and neural architectures, shows superior performance compared to both traditional strategies and leading machine learning-based methods, including Prediction-Focused Learning and End-to-End Learning. We show that increasing the ensemble size leads to higher median returns and more stable risk-adjusted performance. The code is available at https://github.com/DSLwDE/DSLwDE.

URLs: https://github.com/DSLwDE/DSLwDE.

replace Satellite Federated Fine-Tuning for Foundation Models in Space Computing Power Networks

Authors: Yan Zhu, Jingyang Zhu, Ting Wang, Yuanming Shi, Chunxiao Jiang, Khaled Ben Letaief

Abstract: Advancements in artificial intelligence (AI) and low-earth orbit (LEO) satellites have promoted the application of large remote sensing foundation models for various downstream tasks. However, direct downloading of these models for fine-tuning on the ground is impeded by privacy concerns and limited bandwidth. Satellite federated learning (FL) offers a solution by enabling model fine-tuning directly on-board satellites and aggregating model updates without data downloading. Nevertheless, for large foundation models, the computational capacity of satellites is insufficient to support effective on-board fine-tuning in traditional satellite FL frameworks. To address these challenges, we propose a satellite-ground collaborative federated fine-tuning framework. The key of the framework lies in how to reasonably decompose and allocate model components to alleviate insufficient on-board computation capabilities. During fine-tuning, satellites exchange intermediate results with ground stations or other satellites for forward propagation and back propagation, which brings communication challenges due to the special communication topology of space transmission networks, such as intermittent satellite-ground communication, short duration of satellite-ground communication windows, and unstable inter-orbit inter-satellite links (ISLs). To reduce transmission delays, we further introduce tailored communication strategies that integrate both communication and computing resources. Specifically, we propose a parallel intra-orbit communication strategy, a topology-aware satellite-ground communication strategy, and a latency-minimalization inter-orbit communication strategy to reduce space communication costs. Simulation results demonstrate significant reductions in training time with improvements of approximately 33%.

replace Affect Models Have Weak Generalizability to Atypical Speech

Authors: Jaya Narain, Amrit Romana, Vikramjit Mitra, Colin Lea, Shirley Ren

Abstract: Speech and voice conditions can alter the acoustic properties of speech, which could impact the performance of paralinguistic models for affect for people with atypical speech. We evaluate publicly available models for recognizing categorical and dimensional affect from speech on a dataset of atypical speech, comparing results to datasets of typical speech. We investigate three dimensions of speech atypicality: intelligibility, which is related to pronounciation; monopitch, which is related to prosody, and harshness, which is related to voice quality. We look at (1) distributional trends of categorical affect predictions within the dataset, (2) distributional comparisons of categorical affect predictions to similar datasets of typical speech, and (3) correlation strengths between text and speech predictions for spontaneous speech for valence and arousal. We find that the output of affect models is significantly impacted by the presence and degree of speech atypicalities. For instance, the percentage of speech predicted as sad is significantly higher for all types and grades of atypical speech when compared to similar typical speech datasets. In a preliminary investigation on improving robustness for atypical speech, we find that fine-tuning models on pseudo-labeled atypical speech data improves performance on atypical speech without impacting performance on typical speech. Our results emphasize the need for broader training and evaluation datasets for speech emotion models, and for modeling approaches that are robust to voice and speech differences.

replace Intersectional Divergence: Measuring Fairness in Regression

Authors: Joe Germino, Nuno Moniz, Nitesh V. Chawla

Abstract: Fairness in machine learning research is commonly framed in the context of classification tasks, leaving critical gaps in regression. In this paper, we propose a novel approach to measure intersectional fairness in regression tasks, going beyond the focus on single protected attributes from existing work to consider combinations of all protected attributes. Furthermore, we contend that it is insufficient to measure the average error of groups without regard for imbalanced domain preferences. Accordingly, we propose Intersectional Divergence (ID) as the first fairness measure for regression tasks that 1) describes fair model behavior across multiple protected attributes and 2) differentiates the impact of predictions in target ranges most relevant to users. We extend our proposal demonstrating how ID can be adapted into a loss function, IDLoss, that satisfies convergence guarantees and has piecewise smooth properties that enable practical optimization. Through an extensive experimental evaluation, we demonstrate how ID allows unique insights into model behavior and fairness, and how incorporating IDLoss into optimization can considerably improve single-attribute and intersectional model fairness while maintaining a competitive balance in predictive performance.

replace Hypergraph Neural Sheaf Diffusion: A Symmetric Simplicial Set Framework for Higher-Order Learning

Authors: Seongjin Choi, Gahee Kim, Yong-Geun Oh

Abstract: The absence of intrinsic adjacency relations and orientation systems in hypergraphs creates fundamental challenges for constructing sheaf Laplacians of arbitrary degrees. We resolve these limitations through symmetric simplicial sets derived directly from hypergraphs, called symmetric simplicial lifting, which encode all possible oriented subrelations within each hyperedge as ordered tuples. This construction canonically defines adjacency via facet maps while inherently preserving hyperedge provenance. We establish that the normalized degree zero sheaf Laplacian on our symmetric simplicial lifting reduces exactly to the traditional graph normalized sheaf Laplacian when restricted to graphs, validating its mathematical consistency with prior graph-based sheaf theory. Furthermore, the induced structure preserves all structural information from the original hypergraph, ensuring that every multi-way relational detail is faithfully retained. Leveraging this framework, we introduce Hypergraph Neural Sheaf Diffusion (HNSD), the first principled extension of neural sheaf diffusion to hypergraphs. HNSD operates via normalized degree zero sheaf Laplacian over symmetric simplicial lifting, resolving orientation ambiguity and adjacency sparsity inherent to hypergraph learning. Experimental evaluations demonstrate HNSDs competitive performance across established benchmarks.

replace SinBasis Networks: Matrix-Equivalent Feature Extraction for Wave-Like Optical Spectrograms

Authors: Yuzhou Zhu, Zheng Zhang, Ruyi Zhang, Liang Zhou

Abstract: Wave-like images-from attosecond streaking spectrograms to optical spectra, audio mel-spectrograms and periodic video frames-encode critical harmonic structures that elude conventional feature extractors. We propose a unified, matrix-equivalent framework that reinterprets convolution and attention as linear transforms on flattened inputs, revealing filter weights as basis vectors spanning latent feature subspaces. To infuse spectral priors we apply elementwise $\sin(\cdot)$ mappings to each weight matrix. Embedding these transforms into CNN, ViT and Capsule architectures yields Sin-Basis Networks with heightened sensitivity to periodic motifs and built-in invariance to spatial shifts. Experiments on a diverse collection of wave-like image datasets-including 80,000 synthetic attosecond streaking spectrograms, thousands of Raman, photoluminescence and FTIR spectra, mel-spectrograms from AudioSet and cycle-pattern frames from Kinetics-demonstrate substantial gains in reconstruction accuracy, translational robustness and zero-shot cross-domain transfer. Theoretical analysis via matrix isomorphism and Mercer-kernel truncation quantifies how sinusoidal reparametrization enriches expressivity while preserving stability in data-scarce regimes. Sin-Basis Networks thus offer a lightweight, physics-informed approach to deep learning across all wave-form imaging modalities.

replace A Theoretical Framework for Explaining Reinforcement Learning with Shapley Values

Authors: Daniel Beechey, Thomas M. S. Smith, \"Ozg\"ur \c{S}im\c{s}ek

Abstract: Reinforcement learning agents can achieve super-human performance in complex decision-making tasks, but their behaviour is often difficult to understand and explain. This lack of explanation limits deployment, especially in safety-critical settings where understanding and trust are essential. We identify three core explanatory targets that together provide a comprehensive view of reinforcement learning agents: behaviour, outcomes, and predictions. We develop a unified theoretical framework for explaining these three elements of reinforcement learning agents through the influence of individual features that the agent observes in its environment. We derive feature influences by using Shapley values, which collectively and uniquely satisfy a set of well-motivated axioms for fair and consistent credit assignment. The proposed approach, Shapley Values for Explaining Reinforcement Learning (SVERL), provides a single theoretical framework to comprehensively and meaningfully explain reinforcement learning agents. It yields explanations with precise semantics that are not only interpretable but also mathematically justified, enabling us to identify and correct conceptual issues in prior explanations. Through illustrative examples, we show how SVERL produces useful, intuitive explanations of agent behaviour, outcomes, and predictions, which are not apparent from observing agent behaviour alone.

replace How Can I Publish My LLM Benchmark Without Giving the True Answers Away?

Authors: Takashi Ishida, Thanawat Lodkaew, Ikko Yamane

Abstract: Publishing a large language model (LLM) benchmark on the Internet risks contaminating future LLMs: the benchmark may be unintentionally (or intentionally) used to train or select a model. A common mitigation is to keep the benchmark private and let participants submit their models or predictions to the organizers. However, this strategy will require trust in a single organization and still permits test-set overfitting through repeated queries. To overcome this issue, we propose a way to publish benchmarks without completely disclosing the ground-truth answers to the questions, while still maintaining the ability to openly evaluate LLMs. Our main idea is to inject randomness to the answers by preparing several logically correct answers, and only include one of them as the solution in the benchmark. This reduces the best possible accuracy, i.e., Bayes accuracy, of the benchmark. Not only is this helpful to keep us from disclosing the ground truth, but this approach also offers a test for detecting data contamination. In principle, even fully capable models should not surpass the Bayes accuracy. If a model surpasses this ceiling despite this expectation, this is a strong signal of data contamination. We present experimental evidence that our method can detect data contamination accurately on a wide range of benchmarks, models, and training methodologies.

replace A ZeNN architecture to avoid the Gaussian trap

Authors: Lu\'is Carvalho, Jo\~ao L. Costa, Jos\'e Mour\~ao, Gon\c{c}alo Oliveira

Abstract: We propose a new simple architecture, Zeta Neural Networks (ZeNNs), in order to overcome several shortcomings of standard multi-layer perceptrons (MLPs). Namely, in the large width limit, MLPs are non-parametric, they do not have a well-defined pointwise limit, they lose non-Gaussian attributes and become unable to perform feature learning; moreover, finite width MLPs perform poorly in learning high frequencies. The new ZeNN architecture is inspired by three simple principles from harmonic analysis: i) Enumerate the perceptons and introduce a non-learnable weight to enforce convergence; ii) Introduce a scaling (or frequency) factor; iii) Choose activation functions that lead to near orthogonal systems. We will show that these ideas allow us to fix the referred shortcomings of MLPs. In fact, in the infinite width limit, ZeNNs converge pointwise, they exhibit a rich asymptotic structure beyond Gaussianity, and perform feature learning. Moreover, when appropriate activation functions are chosen, (finite width) ZeNNs excel at learning high-frequency features of functions with low dimensional domains.

replace Adapt before Continual Learning

Authors: Aojun Lu, Tao Feng, Hangjie Yuan, Chunhui Ding, Yanan Sun

Abstract: Continual Learning (CL) seeks to enable neural networks to incrementally acquire new knowledge (plasticity) while retaining existing knowledge (stability). Although pre-trained models (PTMs) have provided a strong foundation for CL, existing approaches face a fundamental challenge in balancing these two competing objectives. Current methods typically address stability by freezing the PTM backbone, which severely limits the model's plasticity, particularly when incoming data distribution diverges largely from the pre-training data. Alternatively, sequentially fine-tuning the entire PTM can adapt to new knowledge but often leads to catastrophic forgetting, highlighting the critical stability-plasticity trade-off in PTM-based CL. To address this limitation, we propose Adapting PTMs before the core CL} process (ACL), a novel framework that introduces a plug-and-play adaptation phase prior to learning each new task. During this phase, ACL refines the PTM backbone by aligning embeddings with their original class prototypes while distancing them from irrelevant classes. This mechanism theoretically and empirically demonstrates desirable balance between stability and plasticity, significantly improving CL performance across benchmarks and integrated methods. Code is available at https://github.com/byyx666/ACL_code.

URLs: https://github.com/byyx666/ACL_code.

replace GrokAlign: Geometric Characterisation and Acceleration of Grokking

Authors: Thomas Walker, Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

Abstract: A key challenge for the machine learning community is to understand and accelerate the training dynamics of deep networks that lead to delayed generalisation and emergent robustness to input perturbations, also known as grokking. Prior work has associated phenomena like delayed generalisation with the transition of a deep network from a linear to a feature learning regime, and emergent robustness with changes to the network's functional geometry, in particular the arrangement of the so-called linear regions in deep networks employing continuous piecewise affine nonlinearities. Here, we explain how grokking is realised in the Jacobian of a deep network and demonstrate that aligning a network's Jacobians with the training data (in the sense of cosine similarity) ensures grokking under a low-rank Jacobian assumption. Our results provide a strong theoretical motivation for the use of Jacobian regularisation in optimizing deep networks -- a method we introduce as GrokAlign -- which we show empirically to induce grokking much sooner than more conventional regularizers like weight decay. Moreover, we introduce centroid alignment as a tractable and interpretable simplification of Jacobian alignment that effectively identifies and tracks the stages of deep network training dynamics. Accompanying webpage (https://thomaswalker1.github.io/blog/grokalign.html) and code (https://github.com/ThomasWalker1/grokalign).

URLs: https://thomaswalker1.github.io/blog/grokalign.html), https://github.com/ThomasWalker1/grokalign).

replace Two-dimensional Parallel Tempering for Constrained Optimization

Authors: Corentin Delacour, M Mahmudul Hasan Sajeeb, Joao P. Hespanha, Kerem Y. Camsari

Abstract: Sampling Boltzmann probability distributions plays a key role in machine learning and optimization, motivating the design of hardware accelerators such as Ising machines. While the Ising model can in principle encode arbitrary optimization problems, practical implementations are often hindered by soft constraints that either slow down mixing when too strong, or fail to enforce feasibility when too weak. We introduce a two-dimensional extension of the powerful parallel tempering algorithm (PT) that addresses this challenge by adding a second dimension of replicas interpolating the penalty strengths. This scheme ensures constraint satisfaction in the final replicas, analogous to low-energy states at low temperature. The resulting two-dimensional parallel tempering algorithm (2D-PT) improves mixing in heavily constrained replicas and eliminates the need to explicitly tune the penalty strength. In a representative example of graph sparsification with copy constraints, 2D-PT achieves near-ideal mixing, with Kullback-Leibler divergence decaying as O(1/t). When applied to sparsified Wishart instances, 2D-PT yields orders of magnitude speedup over conventional PT with the same number of replicas. The method applies broadly to constrained Ising problems and can be deployed on existing Ising machines.

replace MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement

Authors: Jaehyun Nam, Jinsung Yoon, Jiefeng Chen, Jinwoo Shin, Sercan \"O. Ar{\i}k, Tomas Pfister

Abstract: Agents based on large language models (LLMs) for machine learning engineering (MLE) can automatically implement ML models via code generation. However, existing approaches to build such agents often rely heavily on inherent LLM knowledge and employ coarse exploration strategies that modify the entire code structure at once. This limits their ability to select effective task-specific models and perform deep exploration within specific components, such as experimenting extensively with feature engineering options. To overcome these, we propose MLE-STAR, a novel approach to build MLE agents. MLE-STAR first leverages external knowledge by using a search engine to retrieve effective models from the web, forming an initial solution, then iteratively refines it by exploring various strategies targeting specific ML components. This exploration is guided by ablation studies analyzing the impact of individual code blocks. Furthermore, we introduce a novel ensembling method using an effective strategy suggested by MLE-STAR. Our experimental results show that MLE-STAR achieves medals in 64% of the Kaggle competitions on the MLE-bench Lite, significantly outperforming the best alternative.

replace Recursive Learning-Based Virtual Buffering for Analytical Global Placement

Authors: Andrew B. Kahng, Yiting Liu, Zhiang Wang

Abstract: Due to the skewed scaling of interconnect versus cell delay in modern technology nodes, placement with buffer porosity (i.e., cell density) awareness is essential for timing closure in physical synthesis flows. However, existing approaches face two key challenges: (i) traditional van Ginneken-Lillis-style buffering approaches are computationally expensive during global placement; and (ii) machine learning-based approaches, such as BufFormer, lack a thorough consideration of Electrical Rule Check (ERC) violations and fail to "close the loop" back into the physical design flow. In this work, we propose MLBuf-RePlAce, the first open-source learning-driven virtual buffering-aware analytical global placement framework, built on top of the OpenROAD infrastructure. MLBuf-RePlAce adopts an efficient recursive learning-based generative buffering approach to predict buffer types and locations, addressing ERC violations during global placement. We compare MLBuf-RePlAce against the default virtual buffering-based timing-driven global placer in OpenROAD, using open-source testcases from the TILOS MacroPlacement and OpenROAD-flow-scripts repositories. Without degradation of post-route power, MLBuf-RePlAce achieves (maximum, average) improvements of (56%, 31%) in total negative slack (TNS) within the open-source OpenROAD flow. When evaluated by completion in a commercial flow, MLBuf-RePlAce achieves (maximum, average) improvements of (53%, 28%) in TNS with an average of 0.2% improvement in post-route power.

replace Some Theoretical Results on Layerwise Effective Dimension Oscillations in Finite Width ReLU Networks

Authors: Darshan Makwana

Abstract: We analyze the layerwise effective dimension (rank of the feature matrix) in fully-connected ReLU networks of finite width. Specifically, for a fixed batch of $m$ inputs and random Gaussian weights, we derive closed-form expressions for the expected rank of the \$m\times n\$ hidden activation matrices. Our main result shows that $\mathbb{E}[EDim(\ell)]=m[1-(1-2/\pi)^\ell]+O(e^{-c m})$ so that the rank deficit decays geometrically with ratio $1-2 / \pi \approx 0.3634$. We also prove a sub-Gaussian concentration bound, and identify the "revival" depths at which the expected rank attains local maxima. In particular, these peaks occur at depths $\ell_k^*\approx(k+1/2)\pi/\log(1/\rho)$ with height $\approx (1-e^{-\pi/2}) m \approx 0.79m$. We further show that this oscillatory rank behavior is a finite-width phenomenon: under orthogonal weight initialization or strong negative-slope leaky-ReLU, the rank remains (nearly) full. These results provide a precise characterization of how random ReLU layers alternately collapse and partially revive the subspace of input variations, adding nuance to prior work on expressivity of deep networks.

replace TPP-SD: Accelerating Transformer Point Process Sampling with Speculative Decoding

Authors: Shukai Gong, Yiyang Fu, Fengyuan Ran, Quyu Kong, Feng Zhou

Abstract: We propose TPP-SD, a novel approach that accelerates Transformer temporal point process (TPP) sampling by adapting speculative decoding (SD) techniques from language models. By identifying the structural similarities between thinning algorithms for TPPs and speculative decoding for language models, we develop an efficient sampling framework that leverages a smaller draft model to generate multiple candidate events, which are then verified by the larger target model in parallel. TPP-SD maintains the same output distribution as autoregressive sampling while achieving significant acceleration. Experiments on both synthetic and real datasets demonstrate that our approach produces samples from identical distributions as standard methods, but with 2-6$\times$ speedup. Our ablation studies analyze the impact of hyperparameters such as draft length and draft model size on sampling efficiency. TPP-SD bridges the gap between powerful Transformer TPP models and the practical need for rapid sequence sampling.

replace MolPIF: A Parameter Interpolation Flow Model for Molecule Generation

Authors: Yaowei Jin, Junjie Wang, Wenkai Xiang, Duanhua Cao, Dan Teng, Zhehuan Fan, Jiacheng Xiong, Xia Sheng, Chuanlong Zeng, Duo An, Mingyue Zheng, Shuangjia Zheng, Qian Shi

Abstract: Advances in deep learning for molecular generation show promise in accelerating drug discovery. Bayesian Flow Networks (BFNs) have recently shown impressive performance across diverse chemical tasks, with their success often ascribed to the paradigm of modeling in a low-variance parameter space. However, the Bayesian inference-based strategy imposes limitations on designing more flexible distribution transformation pathways, making it challenging to adapt to diverse data distributions and varied task requirements. Furthermore, the potential for simpler, more efficient parameter-space-based models is unexplored. To address this, we propose a novel Parameter Interpolation Flow model (named PIF) with detailed theoretical foundation, training, and inference procedures. We then develop MolPIF for structure-based drug design, demonstrating its superior performance across diverse metrics compared to baselines. This work validates the effectiveness of parameter-space-based generative modeling paradigm for molecules and offers new perspectives for model design.

replace Diffusion Beats Autoregressive in Data-Constrained Settings

Authors: Mihir Prabhudesai, Mengning Wu, Amir Zadeh, Katerina Fragkiadaki, Deepak Pathak

Abstract: Autoregressive (AR) models have long dominated the landscape of large language models, driving progress across a wide range of tasks. Recently, diffusion-based language models have emerged as a promising alternative, though their advantages over AR models remain underexplored. In this paper, we systematically study masked diffusion models in data-constrained settings-where training involves repeated passes over limited data-and find that they significantly outperform AR models when compute is abundant but data is scarce. Diffusion models make better use of repeated data, achieving lower validation loss and superior downstream performance. We interpret this advantage as implicit data augmentation: masked diffusion exposes the model to a diverse distribution of token orderings and prediction tasks, unlike AR's fixed left-to-right factorization. We find new scaling laws for diffusion models and derive a closed-form expression for the critical compute threshold at which diffusion begins to outperform AR. These results suggest that when data, not compute, is the bottleneck, diffusion models offer a compelling alternative to the standard AR paradigm. Our code is available at: https://diffusion-scaling.github.io.

URLs: https://diffusion-scaling.github.io.

replace A Comprehensive Review of Diffusion Models in Smart Agriculture: Progress, Applications, and Challenges

Authors: Xing Hu, Haodong Chen, Qianqian Duan, Danfeng Hong, Ruijiao Li, Huiliang Shang, Linghua Jiang, Haima Yang, Dawei Zhang

Abstract: With the global population growing and arable land resources becoming increasingly scarce,smart agriculture and precision agriculture have emerged as key directions for the future ofagricultural development.Artificial intelligence (AI) technologies, particularly deep learning models, have found widespread applications in areas such as crop monitoring and pest detection. As an emerging generative model, diffusion models have shown significant promise in tasks like agricultural image processing, data augmentation, and remote sensing. Compared to traditional generative adversarial networks (GANs), diffusion models offer superior training stability and generation quality, effectively addressing challenges such as limited agricultural data and imbalanced image samples. This paper reviews the latest advancements in the application of diffusion models in agriculture, focusing on their potential in crop pest and disease detection, remote sensing image enhancement, crop growth prediction, and agricultural resource management. Experimental results demonstrate that diffusion models significantly improve model accuracy and robustness in data augmentation, image generation, and denoising, especially in complex environments. Despite challenges related to computational efficiency and generalization capabilities, diffusion models are expected to play an increasingly important role in smart and precision agriculture as technology advances, providing substantial support for the sustainable development of global agriculture.

replace GCL-GCN: Graphormer and Contrastive Learning Enhanced Attributed Graph Clustering Network

Authors: Binxiong Li, Xu Xiang, Xue Li, Quanzhou Lou, Binyu Zhao, Yujie Liu, Huijie Tang, Benhan Yang

Abstract: Attributed graph clustering holds significant importance in modern data analysis. However, due to the complexity of graph data and the heterogeneity of node attributes, leveraging graph information for clustering remains challenging. To address this, we propose a novel deep graph clustering model, GCL-GCN, specifically designed to address the limitations of existing models in capturing local dependencies and complex structures when dealing with sparse and heterogeneous graph data. GCL-GCN introduces an innovative Graphormer module that combines centrality encoding and spatial relationships, effectively capturing both global and local information between nodes, thereby enhancing the quality of node representations. Additionally, we propose a novel contrastive learning module that significantly enhances the discriminative power of feature representations. In the pre-training phase, this module increases feature distinction through contrastive learning on the original feature matrix, ensuring more identifiable initial representations for subsequent graph convolution and clustering tasks. Extensive experimental results on six datasets demonstrate that GCL-GCN outperforms 14 advanced methods in terms of clustering quality and robustness. Specifically, on the Cora dataset, it improves ACC, NMI, and ARI by 4.94%, 13.01%, and 10.97%, respectively, compared to the primary comparison method MBN.

replace Compositional Function Networks: A High-Performance Alternative to Deep Neural Networks with Built-in Interpretability

Authors: Fang Li

Abstract: Deep Neural Networks (DNNs) deliver impressive performance but their black-box nature limits deployment in high-stakes domains requiring transparency. We introduce Compositional Function Networks (CFNs), a novel framework that builds inherently interpretable models by composing elementary mathematical functions with clear semantics. Unlike existing interpretable approaches that are limited to simple additive structures, CFNs support diverse compositional patterns -- sequential, parallel, and conditional -- enabling complex feature interactions while maintaining transparency. A key innovation is that CFNs are fully differentiable, allowing efficient training through standard gradient descent. We demonstrate CFNs' versatility across multiple domains, from symbolic regression to image classification with deep hierarchical networks. Our empirical evaluation shows CFNs achieve competitive performance against black-box models (96.24% accuracy on CIFAR-10) while outperforming state-of-the-art interpretable models like Explainable Boosting Machines. By combining the hierarchical expressiveness and efficient training of deep learning with the intrinsic interpretability of well-defined mathematical functions, CFNs offer a powerful framework for applications where both performance and accountability are paramount.

replace AdaptHetero: Machine Learning Interpretation-Driven Subgroup Adaptation for EHR-Based Clinical Prediction

Authors: Ling Liao, Eva Aagaard

Abstract: Machine learning interpretation (MLI) has primarily been leveraged to build clinician trust and uncover actionable insights in EHRs. However, the intrinsic complexity and heterogeneity of EHR data limit its effectiveness in guiding subgroup-specific modeling. We propose AdaptHetero, a novel MLI-driven framework that transforms interpretability insights into actionable guidance for tailoring model training and evaluation across subpopulations within individual hospital systems. Evaluated on three large-scale EHR datasets: GOSSIS-1-eICU, WiDS, and MIMIC-IV, AdaptHetero consistently identifies heterogeneous model behaviors in predicting ICU mortality, in-hospital death, and hidden hypoxemia. By integrating SHAP-based interpretation and unsupervised clustering, the framework enhances the identification of clinically meaningful subgroup-specific characteristics, leading to improved predictive performance and optimized clinical deployment.

replace MemShare: Memory Efficient Inference for Large Reasoning Models through KV Cache Reuse

Authors: Kaiwen Chen, Xin Tan, Minchen Yu, Hong Xu

Abstract: Large Reasoning Models (LRMs) have achieved significant advances in mathematical reasoning and formal logic tasks. However, their tendency to generate lengthy chain-of-thought sequences leads to substantial memory overhead during inference. We observe that LRMs frequently produce highly similar intermediate reasoning steps, which correspond to similar KV cache states across layers. Motivated by this observation, we propose MemShare, a novel KV cache management approach that effectively reduces memory overhead. MemShare employs a collaborative filtering algorithm to efficiently identify reusable KV cache blocks and enables zero copy cache reuse to significantly reduce memory overhead, improve throughput while maintaining accuracy. Experimental results demonstrate that MemShare delivers up to 84.79\% improvement in throughput while maintaining better accuracy compared to existing KV cache management methods.

replace Spatial-Temporal Reinforcement Learning for Network Routing with Non-Markovian Traffic

Authors: Molly Wang, Kin. K Leung

Abstract: Reinforcement Learning (RL) has been widely used for packet routing in communication networks, but traditional RL methods rely on the Markov assumption that the current state contains all necessary information for decision-making. In reality, internet traffic is non-Markovian, and past states do influence routing performance. Moreover, common deep RL approaches use function approximators, such as neural networks, that do not model the spatial structure in network topologies. To address these shortcomings, we design a network environment with non-Markovian traffic and introduce a spatial-temporal RL (STRL) framework for packet routing. Our approach outperforms traditional baselines by more than 19% during training and 7% for inference despite a change in network topology.

replace CS-SHRED: Enhancing SHRED for Robust Recovery of Spatiotemporal Dynamics

Authors: Romulo B. da Silva, Diego Passos, C\'assio M. Oishi, J. Nathan Kutz

Abstract: We present CS-SHRED, a novel deep learning architecture that integrates Compressed Sensing (CS) into a Shallow Recurrent Decoder (SHRED) to reconstruct spatiotemporal dynamics from incomplete, compressed, or corrupted data. Our approach introduces two key innovations. First, by incorporating CS techniques into the SHRED architecture, our method leverages a batch-based forward framework with $\ell_1$ regularization to robustly recover signals even in scenarios with sparse sensor placements, noisy measurements, and incomplete sensor acquisitions. Second, an adaptive loss function dynamically combines Mean Squared Error (MSE) and Mean Absolute Error (MAE) terms with a piecewise Signal-to-Noise Ratio (SNR) regularization, which suppresses noise and outliers in low-SNR regions while preserving fine-scale features in high-SNR regions. We validate CS-SHRED on challenging problems including viscoelastic fluid flows, maximum specific humidity fields, sea surface temperature distributions, and rotating turbulent flows. Compared to the traditional SHRED approach, CS-SHRED achieves significantly higher reconstruction fidelity -- as demonstrated by improved SSIM and PSNR values, lower normalized errors, and enhanced LPIPS scores-thereby providing superior preservation of small-scale structures and increased robustness against noise and outliers. Our results underscore the advantages of the jointly trained CS and SHRED design architecture which includes an LSTM sequence model for characterizing the temporal evolution with a shallow decoder network (SDN) for modeling the high-dimensional state space. The SNR-guided adaptive loss function for the spatiotemporal data recovery establishes CS-SHRED as a promising tool for a wide range of applications in environmental, climatic, and scientific data analyses.

replace H2Tune: Federated Foundation Model Fine-Tuning with Hybrid Heterogeneity

Authors: Wei Guo, Siyuan Lu, Yiqi Tong, Zhaojun Hu, Fuzhen Zhuang, Xiao Zhang, Tao Fan, Jin Dong

Abstract: Different from existing federated fine-tuning (FFT) methods for foundation models, hybrid heterogeneous federated fine-tuning (HHFFT) is an under-explored scenario where clients exhibit double heterogeneity in model architectures and downstream tasks. This hybrid heterogeneity introduces two significant challenges: 1) heterogeneous matrix aggregation, where clients adopt different large-scale foundation models based on their task requirements and resource limitations, leading to dimensional mismatches during LoRA parameter aggregation; and 2) multi-task knowledge interference, where local shared parameters, trained with both task-shared and task-specific knowledge, cannot ensure only task-shared knowledge is transferred between clients. To address these challenges, we propose H2Tune, a federated foundation model fine-tuning with hybrid heterogeneity. Our framework H2Tune consists of three key components: (i) sparsified triple matrix decomposition to align hidden dimensions across clients through constructing rank-consistent middle matrices, with adaptive sparsification based on client resources; (ii) relation-guided matrix layer alignment to handle heterogeneous layer structures and representation capabilities; and (iii) alternating task-knowledge disentanglement mechanism to decouple shared and specific knowledge of local model parameters through alternating optimization. Theoretical analysis proves a convergence rate of O(1/\sqrt{T}). Extensive experiments show our method achieves up to 15.4% accuracy improvement compared to state-of-the-art baselines. Our code is available at https://anonymous.4open.science/r/H2Tune-1407.

URLs: https://anonymous.4open.science/r/H2Tune-1407.

replace G-Core: A Simple, Scalable and Balanced RLHF Trainer

Authors: Junyu Wu, Weiming Chang, Xiaotao Liu, Guanyou He, Haoqiang Hong, Boqi Liu, Hongtao Tian, Tao Yang, Yunsheng Shi, Feng Lin, Ting Yao

Abstract: Reinforcement Learning from Human Feedback (RLHF) has become an increasingly popular paradigm for training large language models (LLMs) and diffusion models. While existing RLHF training systems have enabled significant progress, they often face challenges in scaling to multi-modal and diffusion workflows and adapting to dynamic workloads. In particular, current approaches may encounter limitations in controller scalability, flexible resource placement, and efficient orchestration when handling complex RLHF pipelines, especially in scenarios involving dynamic sampling or generative reward modeling. In this paper, we present \textbf{G-Core}, a simple, scalable, and balanced RLHF training framework designed to address these challenges. G-Core introduces a parallel controller programming model, enabling flexible and efficient orchestration of complex RLHF workflows without the bottlenecks of a single centralized controller. Furthermore, we propose a dynamic placement schema that adaptively partitions resources and schedules workloads, significantly reducing hardware idle time and improving utilization, even under highly variable training conditions. G-Core has successfully trained models that support WeChat product features serving a large-scale user base, demonstrating its effectiveness and robustness in real-world scenarios. Our results show that G-Core advances the state of the art in RLHF training, providing a solid foundation for future research and deployment of large-scale, human-aligned models.

replace-cross Divided Attention: Unsupervised Multi-Object Discovery with Contextually Separated Slots

Authors: Dong Lao, Zhengyang Hu, Francesco Locatello, Yanchao Yang, Stefano Soatto

Abstract: We investigate the emergence of objects in visual perception in the absence of any semantic annotation. The resulting model has received no supervision, does not use any pre-trained features, and yet it can segment the domain of an image into multiple independently moving regions. The resulting motion segmentation method can handle an unknown and varying number of objects in real-time. The core multi-modal conditional encoder-decoder architecture has one modality (optical flow) feed the encoder to produce a collection of latent codes (slots), and the other modality (color image) conditions the decoder to generate the first modality (flow) from the slots. The training criterion is designed to foster 'information separation' among the slots, while the architecture explicitly allocates activations to individual slots, leading to a method we call Divided Attention (DivA). At test time, DivA handles a different number of objects and different image resolution than seen at training, and is invariant to permutations of the slots. DivA achieves state-of-the-art performance while tripling the runtime speed of comparable methods, up to 104 FPS, and reduces the performance gap from supervised methods to 12% or less. Objects bootstrapped by DivA can then be used to prime static classifiers via contrastive learning. On fewer than 5,000 video clips, training DINO on DivA's object proposals narrows the performance gap to ImageNet-based training by up to 30.2% compared to training directly on the video frames.

replace-cross Weighted least-squares approximation with determinantal point processes and generalized volume sampling

Authors: Anthony Nouy, Bertrand Michel

Abstract: We consider the problem of approximating a function from $L^2$ by an element of a given $m$-dimensional space $V_m$, associated with some feature map $\boldsymbol{\varphi}$, using evaluations of the function at random points $x_1, \dots,x_n$. After recalling some results on optimal weighted least-squares using independent and identically distributed points, we consider weighted least-squares using projection determinantal point processes (DPP) or volume sampling. These distributions introduce dependence between the points that promotes diversity in the selected features $\boldsymbol{\varphi}(x_i)$. We first provide a generalized version of volume-rescaled sampling yielding quasi-optimality results in expectation with a number of samples $n = O(m\log(m))$, that means that the expected $L^2$ error is bounded by a constant times the best approximation error in $L^2$. Also, further assuming that the function is in some normed vector space $H$ continuously embedded in $L^2$, we further prove that the approximation error in $L^2$ is almost surely bounded by the best approximation error measured in the $H$-norm. This includes the cases of functions from $L^\infty$ or reproducing kernel Hilbert spaces. Finally, we present an alternative strategy consisting in using independent repetitions of projection DPP (or volume sampling), yielding similar error bounds as with i.i.d. or volume sampling, but in practice with a much lower number of samples. Numerical experiments illustrate the performance of the different strategies.

replace-cross Momentum-based gradient descent methods for Lie groups

Authors: C\'edric M. Campos, David Mart\'in de Diego, Jos\'e Torrente

Abstract: Polyak's Heavy Ball (PHB; Polyak, 1964), a.k.a. Classical Momentum, and Nesterov's Accelerated Gradient (NAG; Nesterov, 1983) are well-established momentum-descent methods for optimization. Although the latter generally outperforms the former, primarily, generalizations of PHB-like methods to nonlinear spaces have not been sufficiently explored in the literature. In this paper, we propose a generalization of NAG-like methods for Lie group optimization. This generalization is based on the variational one-to-one correspondence between classical and accelerated momentum methods (Campos et al., 2023). We provide numerical experiments for chosen retractions on the group of rotations based on the Frobenius norm and the Rosenbrock function to demonstrate the effectiveness of our proposed methods, and that align with results of the Euclidean case, that is, a faster convergence rate for NAG.

replace-cross Controlling diverse robots by inferring Jacobian fields with deep networks

Authors: Sizhe Lester Li, Annan Zhang, Boyuan Chen, Hanna Matusik, Chao Liu, Daniela Rus, Vincent Sitzmann

Abstract: Mirroring the complex structures and diverse functions of natural organisms is a long-standing challenge in robotics. Modern fabrication techniques have greatly expanded the feasible hardware, but using these systems requires control software to translate the desired motions into actuator commands. Conventional robots can easily be modeled as rigid links connected by joints, but it remains an open challenge to model and control biologically inspired robots that are often soft or made of several materials, lack sensing capabilities, and may change their material properties with use. Here, we introduce a method that uses deep neural networks to map a video stream of a robot to its visuomotor Jacobian field (the sensitivity of all 3D points to the robot's actuators). Our method enables the control of robots from only a single camera, makes no assumptions about the robots' materials, actuation, or sensing, and is trained without expert intervention by observing the execution of random commands. We demonstrate our method on a diverse set of robot manipulators that vary in actuation, materials, fabrication, and cost. Our approach achieves accurate closed-loop control and recovers the causal dynamic structure of each robot. Because it enables robot control using a generic camera as the only sensor, we anticipate that our work will broaden the design space of robotic systems and serve as a starting point for lowering the barrier to robotic automation.

replace-cross FovEx: Human-Inspired Explanations for Vision Transformers and Convolutional Neural Networks

Authors: Mahadev Prasad Panda, Matteo Tiezzi, Martina Vilas, Gemma Roig, Bjoern M. Eskofier, Dario Zanca

Abstract: Explainability in artificial intelligence (XAI) remains a crucial aspect for fostering trust and understanding in machine learning models. Current visual explanation techniques, such as gradient-based or class-activation-based methods, often exhibit a strong dependence on specific model architectures. Conversely, perturbation-based methods, despite being model-agnostic, are computationally expensive as they require evaluating models on a large number of forward passes. In this work, we introduce Foveation-based Explanations (FovEx), a novel XAI method inspired by human vision. FovEx seamlessly integrates biologically inspired perturbations by iteratively creating foveated renderings of the image and combines them with gradient-based visual explorations to determine locations of interest efficiently. These locations are selected to maximize the performance of the model to be explained with respect to the downstream task and then combined to generate an attribution map. We provide a thorough evaluation with qualitative and quantitative assessments on established benchmarks. Our method achieves state-of-the-art performance on both transformers (on 4 out of 5 metrics) and convolutional models (on 3 out of 5 metrics), demonstrating its versatility among various architectures. Furthermore, we show the alignment between the explanation map produced by FovEx and human gaze patterns (+14\% in NSS compared to RISE, +203\% in NSS compared to GradCAM). This comparison enhances our confidence in FovEx's ability to close the interpretation gap between humans and machines.

replace-cross Quantum Transfer Learning for MNIST Classification Using a Hybrid Quantum-Classical Approach

Authors: Soumyadip Sarkar

Abstract: We implement a hybrid quantum-classical model for image classification that compresses MNIST digit images into a low-dimensional feature space and then maps these features onto a 5-qubit quantum state. First, an autoencoder compresses each $28\times28$ image (784 pixels) into a 64-dimensional latent vector, preserving salient features of the digit with minimal reconstruction error. We further reduce the latent representation to 5 principal components using Principal Component Analysis (PCA), to match the 5 available qubits. These 5 features are encoded as rotation angles in a quantum circuit with 5 qubits. The quantum feature map applies single-qubit rotations ($R_y$ gates) proportional to the feature values, followed by a Hadamard gate and a cascade of entangling CNOT gates to produce a non-product entangled state. Measuring the 5-qubit state yields a 32-dimensional probability distribution over basis outcomes, which serves as a quantum-enhanced feature vector for classification. A classical neural network with a softmax output is then trained on these 32-dimensional quantum feature vectors to predict the digit class. We evaluate the hybrid model on the MNIST dataset and compare it to a purely classical baseline that uses the 64-dimensional autoencoder latent features for classification. The results show that the hybrid model can successfully classify digits, demonstrating the feasibility of integrating quantum computing in the classification pipeline, although its accuracy (about 75\% on test data) currently falls below the classical baseline (about 98\% on the same compressed data).

replace-cross Neural-ANOVA: Analytical Model Decomposition using Automatic Integration

Authors: Steffen Limmer, Steffen Udluft, Clemens Otte

Abstract: The analysis of variance (ANOVA) decomposition offers a systematic method to understand the interaction effects that contribute to a specific decision output. In this paper we introduce Neural-ANOVA, an approach to decompose neural networks into the sum of lower-order models using the functional ANOVA decomposition. Our approach formulates a learning problem, which enables fast analytical evaluation of integrals over subspaces that appear in the calculation of the ANOVA decomposition. Finally, we conduct numerical experiments to provide insights into the approximation properties compared to other regression approaches from the literature.

replace-cross Neutral Residues: Revisiting Adapters for Model Extension

Authors: Franck Signe Talla, Edouard Grave, Herv\'e J\'egou

Abstract: We address the problem of extending a pretrained large language model to a new domain that was not seen during training. Standard techniques, such as finetuning or low-rank adaptation (LoRA) are successful at domain adaptation, but do not formally add capacity to the model. This often leads to a trade-off, between performing well on the new domain vs. degrading performance on the original domain. Here, we revisit and improve adapters to extend LLMs from three angles: data, architecture and training procedure, which are advantageously considered jointly. The resulting method, called neutral residues, modifies adapters in a way that leads each new residual block to output near-zeros on the original domain. This solution leads to strong results when adapting a state-of-the-art model originally trained on English to a new language. Neutral residues significantly outperform competing approaches such as finetuning, LoRA or vanilla adapters in terms of the trade-off between learning the new language and not forgetting English.

replace-cross Entanglement-induced provable and robust quantum learning advantages

Authors: Haimeng Zhao, Dong-Ling Deng

Abstract: Quantum computing holds unparalleled potentials to enhance machine learning. However, a demonstration of quantum learning advantage has not been achieved so far. We make a step forward by rigorously establishing a noise-robust, unconditional quantum learning advantage in expressivity, inference speed, and training efficiency, compared to commonly-used classical models. Our proof is information-theoretic and pinpoints the origin of this advantage: entanglement can be used to reduce the communication required by non-local tasks. In particular, we design a task that can be solved with certainty by quantum models with a constant number of parameters using entanglement, whereas commonly-used classical models must scale linearly to achieve a larger-than-exponentially-small accuracy. We show that the quantum model is trainable with constant resources and robust against constant noise. Through numerical and trapped-ion experiments on IonQ Aria, we demonstrate the desired advantage. Our results provide valuable guidance for demonstrating quantum learning advantages with current noisy intermediate-scale devices.

replace-cross Graph Sampling for Scalable and Expressive Graph Neural Networks on Homophilic Graphs

Authors: Haolin Li, Haoyu Wang, Luana Ruiz

Abstract: Graph Neural Networks (GNNs) excel in many graph machine learning tasks but face challenges when scaling to large networks. GNN transferability allows training on smaller graphs and applying the model to larger ones, but existing methods often rely on random subsampling, leading to disconnected subgraphs and reduced model expressivity. We propose a novel graph sampling algorithm that leverages feature homophily to preserve graph structure. By minimizing the trace of the data correlation matrix, our method better preserves the graph Laplacian trace -- a proxy for the graph connectivity -- than random sampling, while achieving lower complexity than spectral methods. Experiments on citation networks show improved performance in preserving Laplacian trace and GNN transferability compared to random sampling.

replace-cross Physics-informed Gaussian Processes as Linear Model Predictive Controller

Authors: J\"orn Tebbe, Andreas Besginow, Markus Lange-Hegermann

Abstract: We introduce a novel algorithm for controlling linear time invariant systems in a tracking problem. The controller is based on a Gaussian Process (GP) whose realizations satisfy a system of linear ordinary differential equations with constant coefficients. Control inputs for tracking are determined by conditioning the prior GP on the setpoints, i.e. control as inference. The resulting Model Predictive Control scheme incorporates pointwise soft constraints by introducing virtual setpoints to the posterior Gaussian process. We show theoretically that our controller satisfies open-loop stability for the optimal control problem by leveraging general results from Bayesian inference and demonstrate this result in a numerical example.

replace-cross A Deep Learning Powered Numerical Relativity Surrogate for Binary Black Hole Waveforms

Authors: Osvaldo Gramaxo Freitas, Anastasios Theodoropoulos, Nino Villanueva, Tiago Fernandes, Solange Nunes, Jos\'e A. Font, Antonio Onofre, Alejandro Torres-Forn\'e, Jos\'e D. Martin-Guerrero

Abstract: Gravitational-wave approximants are essential for gravitational-wave astronomy, allowing the coverage binary black hole parameter space for inference or match filtering without costly numerical relativity (NR) simulations, but generally trading some accuracy for computational efficiency. To reduce this trade-off, NR surrogate models can be constructed using interpolation within NR waveform space. We present a 2-stage training approach for neural network-based NR surrogate models. Initially trained on approximant-generated waveforms and then fine-tuned with NR data, these dual-stage artificial neural surrogate (\texttt{DANSur}) models offer rapid and competitively accurate waveform generation, generating millions in under 20ms on a GPU while keeping mean mismatches with NR around $10^{-4}$. Implemented in the \textsc{bilby} framework, we show they can be used for parameter estimation tasks.

replace-cross Insights into resource utilization of code small language models serving with runtime engines and execution providers

Authors: Francisco Dur\'an, Matias Martinez, Patricia Lago, Silverio Mart\'inez-Fern\'andez

Abstract: The rapid growth of language models, particularly in code generation, requires substantial computational resources, raising concerns about energy consumption and environmental impact. Optimizing language models inference resource utilization is crucial, and Small Language Models (SLMs) offer a promising solution to reduce resource demands. Our goal is to analyze the impact of deep learning serving configurations, defined as combinations of runtime engines and execution providers, on resource utilization, in terms of energy consumption, execution time, and computing-resource utilization from the point of view of software engineers conducting inference in the context of code generation SLMs. We conducted a technology-oriented, multi-stage experimental pipeline using twelve code generation SLMs to investigate energy consumption, execution time, and computing-resource utilization across the configurations. Significant differences emerged across configurations. CUDA execution provider configurations outperformed CPU execution provider configurations in both energy consumption and execution time. Among the configurations, TORCH paired with CUDA demonstrated the greatest energy efficiency, achieving energy savings from 37.99% up to 89.16% compared to other serving configurations. Similarly, optimized runtime engines like ONNX with the CPU execution provider achieved from 8.98% up to 72.04% energy savings within CPU-based configurations. Also, TORCH paired with CUDA exhibited efficient computing-resource utilization. Serving configuration choice significantly impacts resource utilization. While further research is needed, we recommend the above configurations best suited to software engineers' requirements for enhancing serving resource utilization efficiency.

replace-cross Tensor Product Neural Networks for Functional ANOVA Model

Authors: Seokhun Park, Insung Kong, Yongchan Choi, Chanmoo Park, Yongdai Kim

Abstract: Interpretability for machine learning models is becoming more and more important as machine learning models become more complex. The functional ANOVA model, which decomposes a high-dimensional function into a sum of lower dimensional functions (commonly referred to as components), is one of the most popular tools for interpretable AI, and recently, various neural networks have been developed for estimating each component in the functional ANOVA model. However, such neural networks are highly unstable when estimating each component since the components themselves are not uniquely defined. That is, there are multiple functional ANOVA decompositions for a given function. In this paper, we propose a novel neural network which guarantees a unique functional ANOVA decomposition and thus is able to estimate each component stably and accurately. We call our proposed neural network ANOVA Tensor Product Neural Network (ANOVA-TPNN) since it is motivated by the tensor product basis expansion. Theoretically, we prove that ANOVA-TPNN can approximate any smooth function well. Empirically, we show that ANOVA-TPNN provide much more stable estimation of each component and thus much more stable interpretation when training data and initial values of the model parameters vary than existing neural networks do.

replace-cross MVCNet: Multi-View Contrastive Network for Motor Imagery Classification

Authors: Ziwei Wang, Siyang Li, Xiaoqing Chen, Dongrui Wu

Abstract: Electroencephalography (EEG)-based brain-computer interfaces (BCIs) enable neural interaction by decoding brain activity for external communication. Motor imagery (MI) decoding has received significant attention due to its intuitive mechanism. However, most existing models rely on single-stream architectures and overlook the multi-view nature of EEG signals, leading to limited performance and generalization. We propose a multi-view contrastive network (MVCNet), a dual-branch architecture that parallelly integrates CNN and Transformer blocks to capture both local spatial-temporal features and global temporal dependencies. To enhance the informativeness of training data, MVCNet incorporates a unified augmentation pipeline across time, frequency, and spatial domains. Two contrastive modules are further introduced: a cross-view contrastive module that enforces consistency of original and augmented views, and a cross-model contrastive module that aligns features extracted from both branches. Final representations are fused and jointly optimized by contrastive and classification losses. Experiments on five public MI datasets across three scenarios demonstrate that MVCNet consistently outperforms nine state-of-the-art MI decoding networks, highlighting its effectiveness and generalization ability. MVCNet provides a robust solution for MI decoding by integrating multi-view information and dual-branch modeling, contributing to the development of more reliable BCI systems.

replace-cross Lattice Protein Folding with Variational Annealing

Authors: Shoummo Ahsan Khandoker, Estelle M. Inack, Mohamed Hibat-Allah

Abstract: Understanding the principles of protein folding is a cornerstone of computational biology, with implications for drug design, bioengineering, and the understanding of fundamental biological processes. Lattice protein folding models offer a simplified yet powerful framework for studying the complexities of protein folding, enabling the exploration of energetically optimal folds under constrained conditions. However, finding these optimal folds is a computationally challenging combinatorial optimization problem. In this work, we introduce a novel upper-bound training scheme that employs masking to identify the lowest-energy folds in two-dimensional Hydrophobic-Polar (HP) lattice protein folding. By leveraging Dilated Recurrent Neural Networks (RNNs) integrated with an annealing process driven by temperature-like fluctuations, our method accurately predicts optimal folds for benchmark systems of up to 60 beads. Our approach also effectively masks invalid folds from being sampled without compromising the autoregressive sampling properties of RNNs. This scheme is generalizable to three spatial dimensions and can be extended to lattice protein models with larger alphabets. Our findings emphasize the potential of advanced machine learning techniques in tackling complex protein folding problems and a broader class of constrained combinatorial optimization challenges.

replace-cross Learning 3D Scene Analogies with Neural Contextual Scene Maps

Authors: Junho Kim, Gwangtak Bae, Eun Sun Lee, Young Min Kim

Abstract: Understanding scene contexts is crucial for machines to perform tasks and adapt prior knowledge in unseen or noisy 3D environments. As data-driven learning is intractable to comprehensively encapsulate diverse ranges of layouts and open spaces, we propose teaching machines to identify relational commonalities in 3D spaces. Instead of focusing on point-wise or object-wise representations, we introduce 3D scene analogies, which are smooth maps between 3D scene regions that align spatial relationships. Unlike well-studied single instance-level maps, these scene-level maps smoothly link large scene regions, potentially enabling unique applications in trajectory transfer in AR/VR, long demonstration transfer for imitation learning, and context-aware object rearrangement. To find 3D scene analogies, we propose neural contextual scene maps, which extract descriptor fields summarizing semantic and geometric contexts, and holistically align them in a coarse-to-fine manner for map estimation. This approach reduces reliance on individual feature points, making it robust to input noise or shape variations. Experiments demonstrate the effectiveness of our approach in identifying scene analogies and transferring trajectories or object placements in diverse indoor scenes, indicating its potential for robotics and AR/VR applications. Project page including the code is available through this link: https://82magnolia.github.io/3d_scene_analogies/.

URLs: https://82magnolia.github.io/3d_scene_analogies/.

replace-cross ModalTune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology

Authors: Vishwesh Ramanathan, Tony Xu, Pushpak Pati, Faruk Ahmed, Maged Goubran, Anne L. Martel

Abstract: Prediction tasks in digital pathology are challenging due to the massive size of whole-slide images (WSIs) and the weak nature of training signals. Advances in computing, data availability, and self-supervised learning (SSL) have paved the way for slide-level foundation models (SLFMs) that can improve prediction tasks in low-data regimes. However, current methods under-utilize shared information between tasks and modalities. To overcome this challenge, we propose ModalTune, a novel fine-tuning framework which introduces the Modal Adapter to integrate new modalities without modifying SLFM weights. Additionally, we use large-language models (LLMs) to encode labels as text, capturing semantic relationships across multiple tasks and cancer types in a single training recipe. ModalTune achieves state-of-the-art (SOTA) results against both uni-modal and multi-modal models across four cancer types, jointly improving survival and cancer subtype prediction while remaining competitive in pan-cancer settings. Additionally, we show ModalTune is generalizable to two out-of-distribution (OOD) datasets. To our knowledge, this is the first unified fine-tuning framework for multi-modal, multi-task, and pan-cancer modeling in digital pathology.

replace-cross Accenture-NVS1: A Novel View Synthesis Dataset

Authors: Thomas Sugg, Kyle O'Brien, Lekh Poudel, Alex Dumouchelle, Michelle Jou, Marc Bosch, Deva Ramanan, Srinivasa Narasimhan, Shubham Tulsiani

Abstract: This paper introduces ACC-NVS1, a specialized dataset designed for research on Novel View Synthesis specifically for airborne and ground imagery. Data for ACC-NVS1 was collected in Austin, TX and Pittsburgh, PA in 2023 and 2024. The collection encompasses six diverse real-world scenes captured from both airborne and ground cameras, resulting in a total of 148,000 images. ACC-NVS1 addresses challenges such as varying altitudes and transient objects. This dataset is intended to supplement existing datasets, providing additional resources for comprehensive research, rather than serving as a benchmark.

replace-cross EP-Diffuser: An Efficient Diffusion Model for Traffic Scene Generation and Prediction via Polynomial Representations

Authors: Yue Yao, Mohamed-Khalil Bouzidi, Daniel Goehring, Joerg Reichardt

Abstract: As the prediction horizon increases, predicting the future evolution of traffic scenes becomes increasingly difficult due to the multi-modal nature of agent motion. Most state-of-the-art (SotA) prediction models primarily focus on forecasting the most likely future. However, for the safe operation of autonomous vehicles, it is equally important to cover the distribution for plausible motion alternatives. To address this, we introduce EP-Diffuser, a novel parameter-efficient diffusion-based generative model designed to capture the distribution of possible traffic scene evolutions. Conditioned on road layout and agent history, our model acts as a predictor and generates diverse, plausible scene continuations. We benchmark EP-Diffuser against two SotA models in terms of accuracy and plausibility of predictions on the Argoverse 2 dataset. Despite its significantly smaller model size, our approach achieves both highly accurate and plausible traffic scene predictions. We further evaluate model generalization ability in an out-of-distribution (OoD) test setting using Waymo Open dataset and show superior robustness of our approach. The code and model checkpoints are available at: https://github.com/continental/EP-Diffuser.

URLs: https://github.com/continental/EP-Diffuser.

replace-cross Robust and Fine-Grained Detection of AI Generated Texts

Authors: Ram Mohan Rao Kadiyala, Siddartha Pullakhandam, Kanwal Mehreen, Drishti Sharma, Siddhant Gupta, Jebish Purbey, Ashay Srivastava, Subhasya TippaReddy, Arvind Reddy Bobbili, Suraj Telugara Chandrashekhar, Modabbir Adeeb, Srinadh Vura, Suman Debnath, Hamza Farooq

Abstract: An ideal detection system for machine generated content is supposed to work well on any generator as many more advanced LLMs come into existence day by day. Existing systems often struggle with accurately identifying AI-generated content over shorter texts. Further, not all texts might be entirely authored by a human or LLM, hence we focused more over partial cases i.e human-LLM co-authored texts. Our paper introduces a set of models built for the task of token classification which are trained on an extensive collection of human-machine co-authored texts, which performed well over texts of unseen domains, unseen generators, texts by non-native speakers and those with adversarial inputs. We also introduce a new dataset of over 2.4M such texts mostly co-authored by several popular proprietary LLMs over 23 languages. We also present findings of our models' performance over each texts of each domain and generator. Additional findings include comparison of performance against each adversarial method, length of input texts and characteristics of generated texts compared to the original human authored texts.

replace-cross CEE: An Inference-Time Jailbreak Defense for Embodied Intelligence via Subspace Concept Rotation

Authors: Jirui Yang, Zheyu Lin, Zhihui Lu, Yinggui Wang, Lei Wang, Tao Wei, Xin Du, Shuhan Yang

Abstract: Large Language Models (LLMs) are increasingly becoming the cognitive core of Embodied Intelligence (EI) systems, such as robots and autonomous vehicles. However, this integration also exposes them to serious jailbreak risks, where malicious instructions can be transformed into dangerous physical actions. Existing defense mechanisms suffer from notable drawbacks--including high training costs, significant inference delays, and complex hyperparameter tuning--which limit their practical applicability. To address these challenges, we propose a novel and efficient inference-time defense framework: Concept Enhancement Engineering (CEE). CEE enhances the model's inherent safety mechanisms by directly manipulating its internal representations, requiring neither additional training nor external modules, thereby improving defense efficiency. Furthermore, CEE introduces a rotation-based control mechanism that enables stable and linearly tunable behavioral control of the model. This design eliminates the need for tedious manual tuning and avoids the output degradation issues commonly observed in other representation engineering methods. Extensive experiments across multiple EI safety benchmarks and diverse attack scenarios demonstrate that CEE significantly improves the defense success rates of various multimodal LLMs. It effectively mitigates safety risks while preserving high-quality generation and inference efficiency, offering a promising solution for deploying safer embodied intelligence systems.

replace-cross Learning dynamically inspired invariant subspaces for Koopman and transfer operator approximation

Authors: Gary Froyland, Kevin K\"uhl

Abstract: Transfer and Koopman operator methods offer a framework for representing complex, nonlinear dynamical systems via linear transformations, enabling a deeper understanding of the underlying dynamics. The spectra of these operators provide important insights into system predictability and emergent behaviour, although efficiently estimating them from data can be challenging. We approach this issue through the lens of general operator and representational learning, in which we approximate these linear operators using efficient finite-dimensional representations. Specifically, we machine-learn orthonormal basis functions that are dynamically tailored to the system. This learned basis provides a particularly accurate approximation of the operator's action as well as a nearly invariant finite-dimensional subspace. We illustrate our approach with examples that showcase the retrieval of spectral properties from the estimated operator, and emphasise the dynamically adaptive quality of the machine-learned basis.

replace-cross Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding

Authors: Shiyue Wang, Haozheng Xu, Yuhan Zhang, Jingran Lin, Changhong Lu, Xiangfeng Wang, Wenhao Li

Abstract: Multi-Agent Path Finding (MAPF) is a fundamental problem in artificial intelligence and robotics, requiring the computation of collision-free paths for multiple agents navigating from their start locations to designated goals. As autonomous systems become increasingly prevalent in warehouses, urban transportation, and other complex environments, MAPF has evolved from a theoretical challenge to a critical enabler of real-world multi-robot coordination. This comprehensive survey bridges the long-standing divide between classical algorithmic approaches and emerging learning-based methods in MAPF research. We present a unified framework that encompasses search-based methods (including Conflict-Based Search, Priority-Based Search, and Large Neighborhood Search), compilation-based approaches (SAT, SMT, CSP, ASP, and MIP formulations), and data-driven techniques (reinforcement learning, supervised learning, and hybrid strategies). Through systematic analysis of experimental practices across 200+ papers, we uncover significant disparities in evaluation methodologies, with classical methods typically tested on larger-scale instances (up to 200 by 200 grids with 1000+ agents) compared to learning-based approaches (predominantly 10-100 agents). We provide a comprehensive taxonomy of evaluation metrics, environment types, and baseline selections, highlighting the need for standardized benchmarking protocols. Finally, we outline promising future directions including mixed-motive MAPF with game-theoretic considerations, language-grounded planning with large language models, and neural solver architectures that combine the rigor of classical methods with the flexibility of deep learning. This survey serves as both a comprehensive reference for researchers and a practical guide for deploying MAPF solutions in increasingly complex real-world applications.

replace-cross Identifying Super Spreaders in Multilayer Networks

Authors: Micha{\l} Czuba, Mateusz Stolarski, Adam Pir\'og, Piotr Bielak, Piotr Br\'odka

Abstract: Identifying super-spreaders can be framed as a subtask of the influence maximisation problem. It seeks to pinpoint agents within a network that, if selected as single diffusion seeds, disseminate information most effectively. Multilayer networks, a specific class of heterogeneous graphs, can capture diverse types of interactions (e.g., physical-virtual or professional-social), and thus offer a more accurate representation of complex relational structures. In this work, we introduce a novel approach to identifying super-spreaders in such networks by leveraging graph neural networks. To this end, we construct a dataset by simulating information diffusion across hundreds of networks - to the best of our knowledge, the first of its kind tailored specifically to multilayer networks. We further formulate the task as a variation of the ranking prediction problem based on a four-dimensional vector that quantifies each agent's spreading potential: (i) the number of activations; (ii) the duration of the diffusion process; (iii) the peak number of activations; and (iv) the simulation step at which this peak occurs. Our model, TopSpreadersNetwork, comprises a relationship-agnostic encoder and a custom aggregation layer. This design enables generalisation to previously unseen data and adapts to varying graph sizes. In an extensive evaluation, we compare our model against classic centrality-based heuristics and competitive deep learning methods. The results, obtained across a broad spectrum of real-world and synthetic multilayer networks, demonstrate that TopSpreadersNetwork achieves superior performance in identifying high-impact nodes, while also offering improved interpretability through its structured output.

replace-cross EaqVLA: Encoding-aligned Quantization for Vision-Language-Action Models

Authors: Feng Jiang, Zihao Zheng, Xiuping Cui, Maoliang Li, JIayu Chen, Xiang Chen

Abstract: With the development of Embodied Artificial intelligence, the end-to-end control policy such as Vision-Language-Action (VLA) model has become the mainstream. Existing VLA models faces expensive computing/storage cost, which need to be optimized. Quantization is considered as the most effective method which can not only reduce the memory cost but also achieve computation acceleration. However, we find the token alignment of VLA models hinders the application of existing quantization methods. To address this, we proposed an optimized framework called EaqVLA, which apply encoding-aligned quantization to VLA models. Specifically, we propose an complete analysis method to find the misalignment in various granularity. Based on the analysis results, we propose a mixed precision quantization with the awareness of encoding alignment. Experiments shows that the porposed EaqVLA achieves better quantization performance (with the minimal quantization loss for end-to-end action control and xxx times acceleration) than existing quantization methods.

replace-cross HER2 Expression Prediction with Flexible Multi-Modal Inputs via Dynamic Bidirectional Reconstruction

Authors: Jie Qin, Wei Yang, Yan Su, Yiran Zhu, Weizhen Li, Yunyue Pan, Chengchang Pan, Honggang Qi

Abstract: In breast cancer HER2 assessment, clinical evaluation relies on combined H&E and IHC images, yet acquiring both modalities is often hindered by clinical constraints and cost. We propose an adaptive bimodal prediction framework that flexibly supports single- or dual-modality inputs through two core innovations: a dynamic branch selector activating modality completion or joint inference based on input availability, and a cross-modal GAN (CM-GAN) enabling feature-space reconstruction of missing modalities. This design dramatically improves H&E-only accuracy from 71.44% to 94.25%, achieves 95.09% with full dual-modality inputs, and maintains 90.28% reliability under single-modality conditions. The "dual-modality preferred, single-modality compatible" architecture delivers near-dual-modality accuracy without mandatory synchronized acquisition, offering a cost-effective solution for resource-limited regions and significantly improving HER2 assessment accessibility.

replace-cross Tile and Slide : A New Framework for Scaling NeRF from Local to Global 3D Earth Observation

Authors: Camille Billouard, Dawa Derksen, Alexandre Constantin, Bruno Vallet

Abstract: Neural Radiance Fields (NeRF) have recently emerged as a paradigm for 3D reconstruction from multiview satellite imagery. However, state-of-the-art NeRF methods are typically constrained to small scenes due to the memory footprint during training, which we study in this paper. Previous work on large-scale NeRFs palliate this by dividing the scene into NeRFs. This paper introduces Snake-NeRF, a framework that scales to large scenes. Our out-of-core method eliminates the need to load all images and networks simultaneously, and operates on a single device. We achieve this by dividing the region of interest into NeRFs that 3D tile without overlap. Importantly, we crop the images with overlap to ensure each NeRFs is trained with all the necessary pixels. We introduce a novel $2\times 2$ 3D tile progression strategy and segmented sampler, which together prevent 3D reconstruction errors along the tile edges. Our experiments conclude that large satellite images can effectively be processed with linear time complexity, on a single GPU, and without compromise in quality.

replace-cross AI Should Sense Better, Not Just Scale Bigger: Adaptive Sensing as a Paradigm Shift

Authors: Eunsu Baek, Keondo Park, Jeonggil Ko, Min-hwan Oh, Taesik Gong, Hyung-Sin Kim

Abstract: Current AI advances largely rely on scaling neural models and expanding training datasets to achieve generalization and robustness. Despite notable successes, this paradigm incurs significant environmental, economic, and ethical costs, limiting sustainability and equitable access. Inspired by biological sensory systems, where adaptation occurs dynamically at the input (e.g., adjusting pupil size, refocusing vision)--we advocate for adaptive sensing as a necessary and foundational shift. Adaptive sensing proactively modulates sensor parameters (e.g., exposure, sensitivity, multimodal configurations) at the input level, significantly mitigating covariate shifts and improving efficiency. Empirical evidence from recent studies demonstrates that adaptive sensing enables small models (e.g., EfficientNet-B0) to surpass substantially larger models (e.g., OpenCLIP-H) trained with significantly more data and compute. We (i) outline a roadmap for broadly integrating adaptive sensing into real-world applications spanning humanoid, healthcare, autonomous systems, agriculture, and environmental monitoring, (ii) critically assess technical and ethical integration challenges, and (iii) propose targeted research directions, such as standardized benchmarks, real-time adaptive algorithms, multimodal integration, and privacy-preserving methods. Collectively, these efforts aim to transition the AI community toward sustainable, robust, and equitable artificial intelligence systems.

replace-cross LaViPlan : Language-Guided Visual Path Planning with RLVR

Authors: Hayeon Oh

Abstract: Out-of-distribution (OOD) scenarios in autonomous driving refer to situations that deviate from the training domain, often leading to unexpected and potentially hazardous behavior from planners that lack prior exposure to such cases. Recently, Vision-Language Models (VLMs) have been introduced into autonomous driving research for their promising generalization capabilities in OOD settings. Early studies demonstrated that VLMs could recognize OOD scenarios and generate user-level decisions such as "go straight" or "turn right." However, a new challenge has emerged due to the misalignment between the VLM's high-level decisions or visual reasoning expressed in language, and the low-level predicted trajectories interpreted as actions. In this paper, we propose LaViPlan, a framework that leverages Reinforcement Learning with Verifiable Rewards (RLVR) to optimize VLMs using planning-oriented metrics. This approach addresses the vision-language-action misalignment observed in existing VLMs fine-tuned via supervised learning, which can recognize driving scenarios but often produce context-unaware decisions. Experimental results demonstrate that our method improves situational awareness and decision-making under OOD conditions, highlighting its potential to mitigate the misalignment issue. This work introduces a promising post-training paradigm for VLM agents in the context of autonomous driving.

replace-cross Advancing Vision-based Human Action Recognition: Exploring Vision-Language CLIP Model for Generalisation in Domain-Independent Tasks

Authors: Utkarsh Shandilya, Marsha Mariya Kappan, Sanyam Jain, Vijeta Sharma

Abstract: Human action recognition plays a critical role in healthcare and medicine, supporting applications such as patient behavior monitoring, fall detection, surgical robot supervision, and procedural skill assessment. While traditional models like CNNs and RNNs have achieved moderate success, they often struggle to generalize across diverse and complex actions. Recent advancements in vision-language models, especially the transformer-based CLIP model, offer promising capabilities for generalizing action recognition from video data. In this work, we evaluate CLIP on the UCF-101 dataset and systematically analyze its performance under three masking strategies: (1) percentage-based and shape-based black masking at 10%, 30%, and 50%, (2) feature-specific masking to suppress bias-inducing elements, and (3) isolation masking that retains only class-specific regions. Our results reveal that CLIP exhibits inconsistent behavior and frequent misclassifications, particularly when essential visual cues are obscured. To overcome these limitations, we propose incorporating class-specific noise, learned via a custom loss function, to reinforce attention to class-defining features. This enhancement improves classification accuracy and model confidence while reducing bias. We conclude with a discussion on the challenges of applying such models in clinical domains and outline directions for future work to improve generalizability across domain-independent healthcare scenarios.

replace-cross PurpCode: Reasoning for Safer Code Generation

Authors: Jiawei Liu, Nirav Diwan, Zhe Wang, Haoyu Zhai, Xiaona Zhou, Kiet A. Nguyen, Tianjiao Yu, Muntasir Wahed, Yinlin Deng, Hadjer Benkraouda, Yuxiang Wei, Lingming Zhang, Ismini Lourentzou, Gang Wang

Abstract: We introduce PurpCode, the first post-training recipe for training safe code reasoning models towards generating secure code and defending against malicious cyberactivities. PurpCode trains a reasoning model in two stages: (i) Rule Learning, which explicitly teaches the model to reference cybersafety rules to generate vulnerability-free code and to avoid facilitating malicious cyberactivities; and (ii) Reinforcement Learning, which optimizes model safety and preserves model utility through diverse, multi-objective reward mechanisms. To empower the training pipelines with comprehensive cybersafety data, we conduct internal red-teaming to synthesize comprehensive and high-coverage prompts based on real-world tasks for inducing unsafe cyberactivities in the model. Based on PurpCode, we develop a reasoning-based coding model, namely PurpCode-32B, which demonstrates state-of-the-art cybersafety, outperforming various frontier models. Meanwhile, our alignment method decreases the model overrefusal rates in both general and cybersafety-specific scenarios, while preserving model utility in both code generation and common security knowledge.

replace-cross SmartPNT-MSF: A Multi-Sensor Fusion Dataset for Positioning and Navigation Research

Authors: Feng Zhu, Zihang Zhang, Kangcheng Teng, Abduhelil Yakup, Xiaohong Zhang

Abstract: High-precision navigation and positioning systems are critical for applications in autonomous vehicles and mobile mapping, where robust and continuous localization is essential. To test and enhance the performance of algorithms, some research institutions and companies have successively constructed and publicly released datasets. However, existing datasets still suffer from limitations in sensor diversity and environmental coverage. To address these shortcomings and advance development in related fields, the SmartPNT Multisource Integrated Navigation, Positioning, and Attitude Dataset has been developed. This dataset integrates data from multiple sensors, including Global Navigation Satellite Systems (GNSS), Inertial Measurement Units (IMU), optical cameras, and LiDAR, to provide a rich and versatile resource for research in multi-sensor fusion and high-precision navigation. The dataset construction process is thoroughly documented, encompassing sensor configurations, coordinate system definitions, and calibration procedures for both cameras and LiDAR. A standardized framework for data collection and processing ensures consistency and scalability, enabling large-scale analysis. Validation using state-of-the-art Simultaneous Localization and Mapping (SLAM) algorithms, such as VINS-Mono and LIO-SAM, demonstrates the dataset's applicability for advanced navigation research. Covering a wide range of real-world scenarios, including urban areas, campuses, tunnels, and suburban environments, the dataset offers a valuable tool for advancing navigation technologies and addressing challenges in complex environments. By providing a publicly accessible, high-quality dataset, this work aims to bridge gaps in sensor diversity, data accessibility, and environmental representation, fostering further innovation in the field.

replace-cross TokenBlowUp: Resolving Representational Singularities in LLM Token Spaces via Monoidal Transformations

Authors: Dongfang Zhao

Abstract: Recent work has provided compelling evidence challenging the foundational manifold hypothesis for the token embedding spaces of Large Language Models (LLMs). These findings reveal the presence of geometric singularities around polysemous tokens, which can lead to representational instability. Existing methodologies, which presuppose a smooth data manifold, are ill-equipped to address such intrinsic structural flaws. In this paper, we formalize this problem in the language of scheme theory and propose a rigorous resolution by applying the scheme-theoretic blow-up at each singular point. This procedure replaces a singular point in the ambient affine scheme with its exceptional divisor, which we identify as a canonical geometric space -- a projective space of directions -- that houses the disambiguated semantic meanings of the token. This process of ``representational desingularization'' constructs a new geometric landscape for embeddings. We prove a formal theorem guaranteeing the geometric regularization of this new space, showing that the original pathologies are resolved. Finally, we outline the architectural implications of our framework, arguing for a paradigm shift from static look-ups to dynamic, geometrically-grounded computation.

replace-cross GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis

Authors: Haoyang Liu, Yijiang Li, Haohan Wang

Abstract: Gene expression analysis holds the key to many biomedical discoveries, yet extracting insights from raw transcriptomic data remains formidable due to the complexity of multiple large, semi-structured files and the need for extensive domain expertise. Current automation approaches are often limited by either inflexible workflows that break down in edge cases or by fully autonomous agents that lack the necessary precision for rigorous scientific inquiry. GenoMAS charts a different course by presenting a team of LLM-based scientists that integrates the reliability of structured workflows with the adaptability of autonomous agents. GenoMAS orchestrates six specialized LLM agents through typed message-passing protocols, each contributing complementary strengths to a shared analytic canvas. At the heart of GenoMAS lies a guided-planning framework: programming agents unfold high-level task guidelines into Action Units and, at each juncture, elect to advance, revise, bypass, or backtrack, thereby maintaining logical coherence while bending gracefully to the idiosyncrasies of genomic data. On the GenoTEX benchmark, GenoMAS reaches a Composite Similarity Correlation of 89.13% for data preprocessing and an F$_1$ of 60.48% for gene identification, surpassing the best prior art by 10.61% and 16.85% respectively. Beyond metrics, GenoMAS surfaces biologically plausible gene-phenotype associations corroborated by the literature, all while adjusting for latent confounders. Code is available at https://github.com/Liu-Hy/GenoMAS.

URLs: https://github.com/Liu-Hy/GenoMAS.

replace-cross Efficient Pain Recognition via Respiration Signals: A Single Cross-Attention Transformer Multi-Window Fusion Pipeline

Authors: Stefanos Gkikas, Ioannis Kyprakis, Manolis Tsiknakis

Abstract: Pain is a complex condition affecting a large portion of the population. Accurate and consistent evaluation is essential for individuals experiencing pain, and it supports the development of effective and advanced management strategies. Automatic pain assessment systems provide continuous monitoring and support clinical decision-making, aiming to reduce distress and prevent functional decline. This study has been submitted to the \textit{Second Multimodal Sensing Grand Challenge for Next-Gen Pain Assessment (AI4PAIN)}. The proposed method introduces a pipeline that leverages respiration as the input signal and incorporates a highly efficient cross-attention transformer alongside a multi-windowing strategy. Extensive experiments demonstrate that respiration is a valuable physiological modality for pain assessment. Moreover, experiments revealed that compact and efficient models, when properly optimized, can achieve strong performance, often surpassing larger counterparts. The proposed multi-window approach effectively captures both short-term and long-term features, as well as global characteristics, thereby enhancing the model's representational capacity.

replace-cross Unveiling the Influence of Amplifying Language-Specific Neurons

Authors: Inaya Rahmanisa, Lyzander Marciano Andrylie, Mahardika Krisna Ihsani, Alfan Farizki Wicaksono, Haryo Akbarianto Wibowo, Alham Fikri Aji

Abstract: Language-specific neurons in LLMs that strongly correlate with individual languages have been shown to influence model behavior by deactivating them. However, their role in amplification remains underexplored. This work investigates the effect of amplifying language-specific neurons through interventions across 18 languages, including low-resource ones, using three models primarily trained in different languages. We compare amplification factors by their effectiveness in steering to the target language using a proposed Language Steering Shift (LSS) evaluation score, then evaluate it on downstream tasks: commonsense reasoning (XCOPA, XWinograd), knowledge (Include), and translation (FLORES). The optimal amplification factors effectively steer output toward nearly all tested languages. Intervention using this factor on downstream tasks improves self-language performance in some cases but generally degrades cross-language results. These findings highlight the effect of language-specific neurons in multilingual behavior, where amplification can be beneficial especially for low-resource languages, but provides limited advantage for cross-lingual transfer.

replace-cross Enhancing Multi-Agent Collaboration with Attention-Based Actor-Critic Policies

Authors: Hugo Garrido-Lestache, Jeremy Kedziora

Abstract: This paper introduces Team-Attention-Actor-Critic (TAAC), a reinforcement learning algorithm designed to enhance multi-agent collaboration in cooperative environments. TAAC employs a Centralized Training/Centralized Execution scheme incorporating multi-headed attention mechanisms in both the actor and critic. This design facilitates dynamic, inter-agent communication, allowing agents to explicitly query teammates, thereby efficiently managing the exponential growth of joint-action spaces while ensuring a high degree of collaboration. We further introduce a penalized loss function which promotes diverse yet complementary roles among agents. We evaluate TAAC in a simulated soccer environment against benchmark algorithms representing other multi-agent paradigms, including Proximal Policy Optimization and Multi-Agent Actor-Attention-Critic. We find that TAAC exhibits superior performance and enhanced collaborative behaviors across a variety of metrics (win rates, goal differentials, Elo ratings, inter-agent connectivity, balanced spatial distributions, and frequent tactical interactions such as ball possession swaps).