new Revisiting Deepfake Detection: Chronological Continual Learning and the Limits of Generalization

Authors: Federico Fontana, Anxhelo Diko, Romeo Lanzino, Marco Raoul Marini, Bachir Kaddar, Gian Luca Foresti, Luigi Cinque

Abstract: The rapid evolution of deepfake generation technologies poses critical challenges for detection systems, as non-continual learning methods demand frequent and expensive retraining. We reframe deepfake detection (DFD) as a Continual Learning (CL) problem, proposing an efficient framework that incrementally adapts to emerging visual manipulation techniques while retaining knowledge of past generators. Our framework, unlike prior approaches that rely on unreal simulation sequences, simulates the real-world chronological evolution of deepfake technologies in extended periods across 7 years. Simultaneously, our framework builds upon lightweight visual backbones to allow for the real-time performance of DFD systems. Additionally, we contribute two novel metrics: Continual AUC (C-AUC) for historical performance and Forward Transfer AUC (FWT-AUC) for future generalization. Through extensive experimentation (over 600 simulations), we empirically demonstrate that while efficient adaptation (+155 times faster than full retraining) and robust retention of historical knowledge is possible, the generalization of current approaches to future generators without additional training remains near-random (FWT-AUC $\approx$ 0.5) due to the unique imprint characterizing each existing generator. Such observations are the foundation of our newly proposed Non-Universal Deepfake Distribution Hypothesis. \textbf{Code will be released upon acceptance.}

new How Far Are We from True Unlearnability?

Authors: Kai Ye, Liangcai Su, Chenxiong Qian

Abstract: High-quality data plays an indispensable role in the era of large models, but the use of unauthorized data for model training greatly damages the interests of data owners. To overcome this threat, several unlearnable methods have been proposed, which generate unlearnable examples (UEs) by compromising the training availability of data. Clearly, due to unknown training purposes and the powerful representation learning capabilities of existing models, these data are expected to be unlearnable for models across multiple tasks, i.e., they will not help improve the model's performance. However, unexpectedly, we find that on the multi-task dataset Taskonomy, UEs still perform well in tasks such as semantic segmentation, failing to exhibit cross-task unlearnability. This phenomenon leads us to question: How far are we from attaining truly unlearnable examples? We attempt to answer this question from the perspective of model optimization. To this end, we observe the difference in the convergence process between clean and poisoned models using a simple model architecture. Subsequently, from the loss landscape we find that only a part of the critical parameter optimization paths show significant differences, implying a close relationship between the loss landscape and unlearnability. Consequently, we employ the loss landscape to explain the underlying reasons for UEs and propose Sharpness-Aware Learnability (SAL) to quantify the unlearnability of parameters based on this explanation. Furthermore, we propose an Unlearnable Distance (UD) to measure the unlearnability of data based on the SAL distribution of parameters in clean and poisoned models. Finally, we conduct benchmark tests on mainstream unlearnable methods using the proposed UD, aiming to promote community awareness of the capability boundaries of existing unlearnable methods.

new JEL: A Novel Model Linking Knowledge Graph entities to News Mentions

Authors: Michael Kishelev, Pranab Bhadani, Wanying Ding, Vinay Chaudhri

Abstract: We present JEL, a novel computationally efficient end-to-end multi-neural network based entity linking model, which beats current state-of-art model. Knowledge Graphs have emerged as a compelling abstraction for capturing critical relationships among the entities of interest and integrating data from multiple heterogeneous sources. A core problem in leveraging a knowledge graph is linking its entities to the mentions (e.g., people, company names) that are encountered in textual sources (e.g., news, blogs., etc) correctly, since there are thousands of entities to consider for each mention. This task of linking mentions and entities is referred as Entity Linking (EL). It is a fundamental task in natural language processing and is beneficial in various uses cases, such as building a New Analytics platform. News Analytics, in JPMorgan, is an essential task that benefits multiple groups across the firm. According to a survey conducted by the Innovation Digital team 1 , around 25 teams across the firm are actively looking for news analytics solutions, and more than \$2 million is being spent annually on external vendor costs. Entity linking is critical for bridging unstructured news text with knowledge graphs, enabling users access to vast amounts of curated data in a knowledge graph and dramatically facilitating their daily work.

new Performance Assessment Strategies for Generative AI Applications in Healthcare

Authors: Victor Garcia, Mariia Sidulova, Aldo Badano

Abstract: Generative artificial intelligence (GenAI) represent an emerging paradigm within artificial intelligence, with applications throughout the medical enterprise. Assessing GenAI applications necessitates a comprehensive understanding of the clinical task and awareness of the variability in performance when implemented in actual clinical environments. Presently, a prevalent method for evaluating the performance of generative models relies on quantitative benchmarks. Such benchmarks have limitations and may suffer from train-to-the-test overfitting, optimizing performance for a specified test set at the cost of generalizability across other task and data distributions. Evaluation strategies leveraging human expertise and utilizing cost-effective computational models as evaluators are gaining interest. We discuss current state-of-the-art methodologies for assessing the performance of GenAI applications in healthcare and medical devices.

new Hammer and Anvil: A Principled Defense Against Backdoors in Federated Learning

Authors: Lucas Fenaux, Zheng Wang, Jacob Yan, Nathan Chung, Florian Kerschbaum

Abstract: Federated Learning is a distributed learning technique in which multiple clients cooperate to train a machine learning model. Distributed settings facilitate backdoor attacks by malicious clients, who can embed malicious behaviors into the model during their participation in the training process. These malicious behaviors are activated during inference by a specific trigger. No defense against backdoor attacks has stood the test of time, especially against adaptive attackers, a powerful but not fully explored category of attackers. In this work, we first devise a new adaptive adversary that surpasses existing adversaries in capabilities, yielding attacks that only require one or two malicious clients out of 20 to break existing state-of-the-art defenses. Then, we present Hammer and Anvil, a principled defense approach that combines two defenses orthogonal in their underlying principle to produce a combined defense that, given the right set of parameters, must succeed against any attack. We show that our best combined defense, Krum+, is successful against our new adaptive adversary and state-of-the-art attacks.

new Domain Knowledge is Power: Leveraging Physiological Priors for Self Supervised Representation Learning in Electrocardiography

Authors: Nooshin Maghsoodi, Sarah Nassar, Paul F R Wilson, Minh Nguyen Nhat To, Sophia Mannina, Shamel Addas, Stephanie Sibley, David Maslove, Purang Abolmaesumi, Parvin Mousavi

Abstract: Objective: Electrocardiograms (ECGs) play a crucial role in diagnosing heart conditions; however, the effectiveness of artificial intelligence (AI)-based ECG analysis is often hindered by the limited availability of labeled data. Self-supervised learning (SSL) can address this by leveraging large-scale unlabeled data. We introduce PhysioCLR (Physiology-aware Contrastive Learning Representation for ECG), a physiology-aware contrastive learning framework that incorporates domain-specific priors to enhance the generalizability and clinical relevance of ECG-based arrhythmia classification. Methods: During pretraining, PhysioCLR learns to bring together embeddings of samples that share similar clinically relevant features while pushing apart those that are dissimilar. Unlike existing methods, our method integrates ECG physiological similarity cues into contrastive learning, promoting the learning of clinically meaningful representations. Additionally, we introduce ECG- specific augmentations that preserve the ECG category post augmentation and propose a hybrid loss function to further refine the quality of learned representations. Results: We evaluate PhysioCLR on two public ECG datasets, Chapman and Georgia, for multilabel ECG diagnoses, as well as a private ICU dataset labeled for binary classification. Across the Chapman, Georgia, and private cohorts, PhysioCLR boosts the mean AUROC by 12% relative to the strongest baseline, underscoring its robust cross-dataset generalization. Conclusion: By embedding physiological knowledge into contrastive learning, PhysioCLR enables the model to learn clinically meaningful and transferable ECG eatures. Significance: PhysioCLR demonstrates the potential of physiology-informed SSL to offer a promising path toward more effective and label-efficient ECG diagnostics.

new Optimization Methods and Software for Federated Learning

Authors: Konstantin Burlachenko

Abstract: Federated Learning (FL) is a novel, multidisciplinary Machine Learning paradigm where multiple clients, such as mobile devices, collaborate to solve machine learning problems. Initially introduced in Kone{\v{c}}n{\'y} et al. (2016a,b); McMahan et al. (2017), FL has gained further attention through its inclusion in the National AI Research and Development Strategic Plan (2023 Update) of the United States (Science and on Artificial Intelligence, 2023). The FL training process is inherently decentralized and often takes place in less controlled settings compared to data centers, posing unique challenges distinct from those in fully controlled environments. In this thesis, we identify five key challenges in Federated Learning and propose novel approaches to address them. These challenges arise from the heterogeneity of data and devices, communication issues, and privacy concerns for clients in FL training. Moreover, even well-established theoretical advances in FL require diverse forms of practical implementation to enhance their real-world applicability. Our contributions advance FL algorithms and systems, bridging theoretical advancements and practical implementations. More broadly, our work serves as a guide for researchers navigating the complexities of translating theoretical methods into efficient real-world implementations and software. Additionally, it offers insights into the reverse process of adapting practical implementation aspects back into theoretical algorithm design. This reverse process is particularly intriguing, as the practical perspective compels us to examine the underlying mechanics and flexibilities of algorithms more deeply, often uncovering new dimensions of the algorithms under study.

new In-Context Learning Enhanced Credibility Transformer

Authors: Kishan Padayachy, Ronald Richman, Salvatore Scognamiglio, Mario V. W\"uthrich

Abstract: The starting point of our network architecture is the Credibility Transformer which extends the classical Transformer architecture by a credibility mechanism to improve model learning and predictive performance. This Credibility Transformer learns credibilitized CLS tokens that serve as learned representations of the original input features. In this paper we present a new paradigm that augments this architecture by an in-context learning mechanism, i.e., we increase the information set by a context batch consisting of similar instances. This allows the model to enhance the CLS token representations of the instances by additional in-context information and fine-tuning. We empirically verify that this in-context learning enhances predictive accuracy by adapting to similar risk patterns. Moreover, this in-context learning also allows the model to generalize to new instances which, e.g., have feature levels in the categorical covariates that have not been present when the model was trained -- for a relevant example, think of a new vehicle model which has just been developed by a car manufacturer.

new torchmil: A PyTorch-based library for deep Multiple Instance Learning

Authors: Francisco M. Castro-Mac\'ias, Francisco J. S\'aez-Maldonado, Pablo Morales-\'Alvarez, Rafael Molina

Abstract: Multiple Instance Learning (MIL) is a powerful framework for weakly supervised learning, particularly useful when fine-grained annotations are unavailable. Despite growing interest in deep MIL methods, the field lacks standardized tools for model development, evaluation, and comparison, which hinders reproducibility and accessibility. To address this, we present torchmil, an open-source Python library built on PyTorch. torchmil offers a unified, modular, and extensible framework, featuring basic building blocks for MIL models, a standardized data format, and a curated collection of benchmark datasets and models. The library includes comprehensive documentation and tutorials to support both practitioners and researchers. torchmil aims to accelerate progress in MIL and lower the entry barrier for new users. Available at https://torchmil.readthedocs.io.

URLs: https://torchmil.readthedocs.io.

new From Limited Data to Rare-event Prediction: LLM-powered Feature Engineering and Multi-model Learning in Venture Capital

Authors: Mihir Kumar, Aaron Ontoyin Yin, Zakari Salifu, Kelvin Amoaba, Afriyie Kwesi Samuel, Fuat Alican, Yigit Ihlamur

Abstract: This paper presents a framework for predicting rare, high-impact outcomes by integrating large language models (LLMs) with a multi-model machine learning (ML) architecture. The approach combines the predictive strength of black-box models with the interpretability required for reliable decision-making. We use LLM-powered feature engineering to extract and synthesize complex signals from unstructured data, which are then processed within a layered ensemble of models including XGBoost, Random Forest, and Linear Regression. The ensemble first produces a continuous estimate of success likelihood, which is then thresholded to produce a binary rare-event prediction. We apply this framework to the domain of Venture Capital (VC), where investors must evaluate startups with limited and noisy early-stage data. The empirical results show strong performance: the model achieves precision between 9.8X and 11.1X the random classifier baseline in three independent test subsets. Feature sensitivity analysis further reveals interpretable success drivers: the startup's category list accounts for 15.6% of predictive influence, followed by the number of founders, while education level and domain expertise contribute smaller yet consistent effects.

new MMM-fair: An Interactive Toolkit for Exploring and Operationalizing Multi-Fairness Trade-offs

Authors: Swati Swati, Arjun Roy, Emmanouil Panagiotou, Eirini Ntoutsi

Abstract: Fairness-aware classification requires balancing performance and fairness, often intensified by intersectional biases. Conflicting fairness definitions further complicate the task, making it difficult to identify universally fair solutions. Despite growing regulatory and societal demands for equitable AI, popular toolkits offer limited support for exploring multi-dimensional fairness and related trade-offs. To address this, we present mmm-fair, an open-source toolkit leveraging boosting-based ensemble approaches that dynamically optimizes model weights to jointly minimize classification errors and diverse fairness violations, enabling flexible multi-objective optimization. The system empowers users to deploy models that align with their context-specific needs while reliably uncovering intersectional biases often missed by state-of-the-art methods. In a nutshell, mmm-fair uniquely combines in-depth multi-attribute fairness, multi-objective optimization, a no-code, chat-based interface, LLM-powered explanations, interactive Pareto exploration for model selection, custom fairness constraint definition, and deployment-ready models in a single open-source toolkit, a combination rarely found in existing fairness tools. Demo walkthrough available at: https://youtu.be/_rcpjlXFqkw.

URLs: https://youtu.be/_rcpjlXFqkw.

new Machine Learning with Multitype Protected Attributes: Intersectional Fairness through Regularisation

Authors: Ho Ming Lee, Katrien Antonio, Benjamin Avanzi, Lorenzo Marchi, Rui Zhou

Abstract: Ensuring equitable treatment (fairness) across protected attributes (such as gender or ethnicity) is a critical issue in machine learning. Most existing literature focuses on binary classification, but achieving fairness in regression tasks-such as insurance pricing or hiring score assessments-is equally important. Moreover, anti-discrimination laws also apply to continuous attributes, such as age, for which many existing methods are not applicable. In practice, multiple protected attributes can exist simultaneously; however, methods targeting fairness across several attributes often overlook so-called "fairness gerrymandering", thereby ignoring disparities among intersectional subgroups (e.g., African-American women or Hispanic men). In this paper, we propose a distance covariance regularisation framework that mitigates the association between model predictions and protected attributes, in line with the fairness definition of demographic parity, and that captures both linear and nonlinear dependencies. To enhance applicability in the presence of multiple protected attributes, we extend our framework by incorporating two multivariate dependence measures based on distance covariance: the previously proposed joint distance covariance (JdCov) and our novel concatenated distance covariance (CCdCov), which effectively address fairness gerrymandering in both regression and classification tasks involving protected attributes of various types. We discuss and illustrate how to calibrate regularisation strength, including a method based on Jensen-Shannon divergence, which quantifies dissimilarities in prediction distributions across groups. We apply our framework to the COMPAS recidivism dataset and a large motor insurance claims dataset.

new MARLINE: Multi-Source Mapping Transfer Learning for Non-Stationary Environments

Authors: Honghui Du, Leandro Minku, Huiyu Zhou

Abstract: Concept drift is a major problem in online learning due to its impact on the predictive performance of data stream mining systems. Recent studies have started exploring data streams from different sources as a strategy to tackle concept drift in a given target domain. These approaches make the assumption that at least one of the source models represents a concept similar to the target concept, which may not hold in many real-world scenarios. In this paper, we propose a novel approach called Multi-source mApping with tRansfer LearnIng for Non-stationary Environments (MARLINE). MARLINE can benefit from knowledge from multiple data sources in non-stationary environments even when source and target concepts do not match. This is achieved by projecting the target concept to the space of each source concept, enabling multiple source sub-classifiers to contribute towards the prediction of the target concept as part of an ensemble. Experiments on several synthetic and real-world datasets show that MARLINE was more accurate than several state-of-the-art data stream learning approaches.

new The Domain Mixed Unit: A New Neural Arithmetic Layer

Authors: Paul Curry

Abstract: The Domain Mixed Unit (DMU) is a new neural arithmetic unit that learns a single parameter gate that mixes between log-space and linear-space representations while performing either addition (DMU add) or subtraction (DMU sub). Two initializations are proposed for the DMU: one covering addition and multiplication, and another covering subtraction and division. The DMU achieves state-of-the-art performance on the NALM Benchmark, a dataset designed to test the ability of neural arithmetic units to generalize arithmetic operations, specifically performing with the highest percentage solved over all seeds on multiplication and division. The DMU will be submitted as a pull request to the open-source NALM benchmark, and its code is available on GitHub at https://github.com/marict?tab=repositories

URLs: https://github.com/marict?tab=repositories

new Multi-Label Transfer Learning in Non-Stationary Data Streams

Authors: Honghui Du, Leandro Minku, Aonghus Lawlor, Huiyu Zhou

Abstract: Label concepts in multi-label data streams often experience drift in non-stationary environments, either independently or in relation to other labels. Transferring knowledge between related labels can accelerate adaptation, yet research on multi-label transfer learning for data streams remains limited. To address this, we propose two novel transfer learning methods: BR-MARLENE leverages knowledge from different labels in both source and target streams for multi-label classification; BRPW-MARLENE builds on this by explicitly modelling and transferring pairwise label dependencies to enhance learning performance. Comprehensive experiments show that both methods outperform state-of-the-art multi-label stream approaches in non-stationary environments, demonstrating the effectiveness of inter-label knowledge transfer for improved predictive performance.

new Selective Induction Heads: How Transformers Select Causal Structures In Context

Authors: Francesco D'Angelo, Francesco Croce, Nicolas Flammarion

Abstract: Transformers have exhibited exceptional capabilities in sequence modeling tasks, leveraging self-attention and in-context learning. Critical to this success are induction heads, attention circuits that enable copying tokens based on their previous occurrences. In this work, we introduce a novel framework that showcases transformers' ability to dynamically handle causal structures. Existing works rely on Markov Chains to study the formation of induction heads, revealing how transformers capture causal dependencies and learn transition probabilities in-context. However, they rely on a fixed causal structure that fails to capture the complexity of natural languages, where the relationship between tokens dynamically changes with context. To this end, our framework varies the causal structure through interleaved Markov chains with different lags while keeping the transition probabilities fixed. This setting unveils the formation of Selective Induction Heads, a new circuit that endows transformers with the ability to select the correct causal structure in-context. We empirically demonstrate that transformers learn this mechanism to predict the next token by identifying the correct lag and copying the corresponding token from the past. We provide a detailed construction of a 3-layer transformer to implement the selective induction head, and a theoretical analysis proving that this mechanism asymptotically converges to the maximum likelihood solution. Our findings advance the understanding of how transformers select causal structures, providing new insights into their functioning and interpretability.

new ArtifactGen: Benchmarking WGAN-GP vs Diffusion for Label-Aware EEG Artifact Synthesis

Authors: Hritik Arasu, Faisal R Jahangiri

Abstract: Artifacts in electroencephalography (EEG) -- muscle, eye movement, electrode, chewing, and shiver -- confound automated analysis yet are costly to label at scale. We study whether modern generative models can synthesize realistic, label-aware artifact segments suitable for augmentation and stress-testing. Using the TUH EEG Artifact (TUAR) corpus, we curate subject-wise splits and fixed-length multi-channel windows (e.g., 250 samples) with preprocessing tailored to each model (per-window min-max for adversarial training; per-recording/channel $z$-score for diffusion). We compare a conditional WGAN-GP with a projection discriminator to a 1D denoising diffusion model with classifier-free guidance, and evaluate along three axes: (i) fidelity via Welch band-power deltas ($\Delta\delta,\ \Delta\theta,\ \Delta\alpha,\ \Delta\beta$), channel-covariance Frobenius distance, autocorrelation $L_2$, and distributional metrics (MMD/PRD); (ii) specificity via class-conditional recovery with lightweight $k$NN/classifiers; and (iii) utility via augmentation effects on artifact recognition. In our setting, WGAN-GP achieves closer spectral alignment and lower MMD to real data, while both models exhibit weak class-conditional recovery, limiting immediate augmentation gains and revealing opportunities for stronger conditioning and coverage. We release a reproducible pipeline -- data manifests, training configurations, and evaluation scripts -- to establish a baseline for EEG artifact synthesis and to surface actionable failure modes for future work.

new Rollout-LaSDI: Enhancing the long-term accuracy of Latent Space Dynamics

Authors: Robert Stephany, Youngsoo Choi

Abstract: Solving complex partial differential equations is vital in the physical sciences, but often requires computationally expensive numerical methods. Reduced-order models (ROMs) address this by exploiting dimensionality reduction to create fast approximations. While modern ROMs can solve parameterized families of PDEs, their predictive power degrades over long time horizons. We address this by (1) introducing a flexible, high-order, yet inexpensive finite-difference scheme and (2) proposing a Rollout loss that trains ROMs to make accurate predictions over arbitrary time horizons. We demonstrate our approach on the 2D Burgers equation.

new Prescribe-then-Select: Adaptive Policy Selection for Contextual Stochastic Optimization

Authors: Caio de Prospero Iglesias, Kimberly Villalobos Carballo, Dimitris Bertsimas

Abstract: We address the problem of policy selection in contextual stochastic optimization (CSO), where covariates are available as contextual information and decisions must satisfy hard feasibility constraints. In many CSO settings, multiple candidate policies--arising from different modeling paradigms--exhibit heterogeneous performance across the covariate space, with no single policy uniformly dominating. We propose Prescribe-then-Select (PS), a modular framework that first constructs a library of feasible candidate policies and then learns a meta-policy to select the best policy for the observed covariates. We implement the meta-policy using ensembles of Optimal Policy Trees trained via cross-validation on the training set, making policy choice entirely data-driven. Across two benchmark CSO problems--single-stage newsvendor and two-stage shipment planning--PS consistently outperforms the best single policy in heterogeneous regimes of the covariate space and converges to the dominant policy when such heterogeneity is absent. All the code to reproduce the results can be found at https://anonymous.4open.science/r/Prescribe-then-Select-TMLR.

URLs: https://anonymous.4open.science/r/Prescribe-then-Select-TMLR.

new Sketched Gaussian Mechanism for Private Federated Learning

Authors: Qiaobo Li, Zhijie Chen, Arindam Banerjee

Abstract: Communication cost and privacy are two major considerations in federated learning (FL). For communication cost, gradient compression by sketching the clients' transmitted model updates is often used for reducing per-round communication. For privacy, the Gaussian mechanism (GM), which consists of clipping updates and adding Gaussian noise, is commonly used to guarantee client-level differential privacy. Existing literature on private FL analyzes privacy of sketching and GM in an isolated manner, illustrating that sketching provides privacy determined by the sketching dimension and that GM has to supply any additional desired privacy. In this paper, we introduce the Sketched Gaussian Mechanism (SGM), which directly combines sketching and the Gaussian mechanism for privacy. Using R\'enyi-DP tools, we present a joint analysis of SGM's overall privacy guarantee, which is significantly more flexible and sharper compared to isolated analysis of sketching and GM privacy. In particular, we prove that the privacy level of SGM for a fixed noise magnitude is proportional to $1/\sqrt{b}$, where $b$ is the sketching dimension, indicating that (for moderate $b$) SGM can provide much stronger privacy guarantees than the original GM under the same noise budget. We demonstrate the application of SGM to FL with either gradient descent or adaptive server optimizers, and establish theoretical results on optimization convergence, which exhibits only a logarithmic dependence on the number of parameters $d$. Experimental results confirm that at the same privacy level, SGM based FL is at least competitive with non-sketching private FL variants and outperforms them in some settings. Moreover, using adaptive optimization at the server improves empirical performance while maintaining the privacy guarantees.

new Ensemble Distribution Distillation for Self-Supervised Human Activity Recognition

Authors: Matthew Nolan, Lina Yao, Robert Davidson

Abstract: Human Activity Recognition (HAR) has seen significant advancements with the adoption of deep learning techniques, yet challenges remain in terms of data requirements, reliability and robustness. This paper explores a novel application of Ensemble Distribution Distillation (EDD) within a self-supervised learning framework for HAR aimed at overcoming these challenges. By leveraging unlabeled data and a partially supervised training strategy, our approach yields an increase in predictive accuracy, robust estimates of uncertainty, and substantial increases in robustness against adversarial perturbation; thereby significantly improving reliability in real-world scenarios without increasing computational complexity at inference. We demonstrate this with an evaluation on several publicly available datasets. The contributions of this work include the development of a self-supervised EDD framework, an innovative data augmentation technique designed for HAR, and empirical validation of the proposed method's effectiveness in increasing robustness and reliability.

new Strategies for Improving Communication Efficiency in Distributed and Federated Learning: Compression, Local Training, and Personalization

Authors: Kai Yi

Abstract: Distributed and federated learning are essential paradigms for training models across decentralized data sources while preserving privacy, yet communication overhead remains a major bottleneck. This dissertation explores strategies to improve communication efficiency, focusing on model compression, local training, and personalization. We establish a unified framework for biased and unbiased compression operators with convergence guarantees, then propose adaptive local training strategies that incorporate personalization to accelerate convergence and mitigate client drift. In particular, Scafflix balances global and personalized objectives, achieving superior performance under both IID and non-IID settings. We further introduce privacy-preserving pruning frameworks that optimize sparsity while minimizing communication costs, with Cohort-Squeeze leveraging hierarchical aggregation to reduce cross-device overhead. Finally, SymWanda, a symmetric post-training pruning method, enhances robustness under high sparsity and maintains accuracy without retraining. Extensive experiments on benchmarks and large-scale language models demonstrate favorable trade-offs among accuracy, convergence, and communication, offering theoretical and practical insights for scalable, efficient distributed learning.

new The CRITICAL Records Integrated Standardization Pipeline (CRISP): End-to-End Processing of Large-scale Multi-institutional OMOP CDM Data

Authors: Xiaolong Luo, Michael Lingzhi Li

Abstract: While existing critical care EHR datasets such as MIMIC and eICU have enabled significant advances in clinical AI research, the CRITICAL dataset opens new frontiers by providing extensive scale and diversity -- containing 1.95 billion records from 371,365 patients across four geographically diverse CTSA institutions. CRITICAL's unique strength lies in capturing full-spectrum patient journeys, including pre-ICU, ICU, and post-ICU encounters across both inpatient and outpatient settings. This multi-institutional, longitudinal perspective creates transformative opportunities for developing generalizable predictive models and advancing health equity research. However, the richness of this multi-site resource introduces substantial complexity in data harmonization, with heterogeneous collection practices and diverse vocabulary usage patterns requiring sophisticated preprocessing approaches. We present CRISP to unlock the full potential of this valuable resource. CRISP systematically transforms raw Observational Medical Outcomes Partnership Common Data Model data into ML-ready datasets through: (1) transparent data quality management with comprehensive audit trails, (2) cross-vocabulary mapping of heterogeneous medical terminologies to unified SNOMED-CT standards, with deduplication and unit standardization, (3) modular architecture with parallel optimization enabling complete dataset processing in $<$1 day even on standard computing hardware, and (4) comprehensive baseline model benchmarks spanning multiple clinical prediction tasks to establish reproducible performance standards. By providing processing pipeline, baseline implementations, and detailed transformation documentation, CRISP saves researchers months of preprocessing effort and democratizes access to large-scale multi-institutional critical care data, enabling them to focus on advancing clinical AI.

new Mitigating Catastrophic Forgetting in Large Language Models with Forgetting-aware Pruning

Authors: Wei Huang, Anda Cheng, Yinggui Wang

Abstract: Recent advancements in large language models (LLMs) have shown impressive capabilities in various downstream tasks but typically face Catastrophic Forgetting (CF) during fine-tuning. In this paper, we propose the Forgetting-Aware Pruning Metric (FAPM), a novel pruning-based approach to balance CF and downstream task performance. Our investigation reveals that the degree to which task vectors (i.e., the subtraction of pre-trained weights from the weights fine-tuned on downstream tasks) overlap with pre-trained model parameters is a critical factor for CF. Based on this finding, FAPM employs the ratio of the task vector to pre-trained model parameters as a metric to quantify CF, integrating this measure into the pruning criteria. Importantly, FAPM does not necessitate modifications to the training process or model architecture, nor does it require any auxiliary data. We conducted extensive experiments across eight datasets, covering natural language inference, General Q&A, Medical Q&A, Math Q&A, reading comprehension, and cloze tests. The results demonstrate that FAPM limits CF to just 0.25\% while maintaining 99.67\% accuracy on downstream tasks. We provide the code to reproduce our results.

new Interpretable Physics Reasoning and Performance Taxonomy in Vision-Language Models

Authors: Pranav Pawar, Kavish Shah, Akshat Bhalani, Komal Kasat, Dev Mittal, Hadi Gala, Deepali Patil, Nikita Raichada, Monali Deshmukh

Abstract: As Vision-Language Models (VLMs) grow in sophistication, their ability to perform reasoning is coming under increasing supervision. While they excel at many tasks, their grasp of fundamental scientific principles, such as physics, remains an underexplored frontier. To reflect the advancements in these capabilities, we introduce a novel and accessible framework designed to rigorously evaluate VLMs on their understanding of 2D physics. Our framework features a pragmatic scenario generator that creates a diverse testbed of over 400 problems across four core domains: Projectile Motion, Collision Dynamics, Mechanics, and Fluid Dynamics. Through comprehensive evaluation of four state-of-the-art VLMs, we demonstrate a strong correlation between model scale and reasoning ability, with our top-performing model, Qwen2.5-VL-7B, achieving an overall score of 0.815. We find that while models excel at formulaic problems, they struggle significantly with domains requiring abstract spatial reasoning. By designing this framework, we aim to democratize the study of scientific reasoning in VLMs and foster deeper insights into their capabilities and limitations.

new Adaptive Rainfall Forecasting from Multiple Geographical Models Using Matrix Profile and Ensemble Learning

Authors: Dung T. Tran, Huyen Ngoc Huyen, Hong Nguyen, Xuan-Vu Phan, Nam-Phong Nguyen

Abstract: Rainfall forecasting in Vietnam is highly challenging due to its diverse climatic conditions and strong geographical variability across river basins, yet accurate and reliable forecasts are vital for flood management, hydropower operation, and disaster preparedness. In this work, we propose a Matrix Profile-based Weighted Ensemble (MPWE), a regime-switching framework that dynamically captures covariant dependencies among multiple geographical model forecasts while incorporating redundancy-aware weighting to balance contributions across models. We evaluate MPWE using rainfall forecasts from eight major basins in Vietnam, spanning five forecast horizons (1 hour and accumulated rainfall over 12, 24, 48, 72, and 84 hours). Experimental results show that MPWE consistently achieves lower mean and standard deviation of prediction errors compared to geographical models and ensemble baselines, demonstrating both improved accuracy and stability across basins and horizons.

new \emph{FoQuS}: A Forgetting-Quality Coreset Selection Framework for Automatic Modulation Recognition

Authors: Yao Lu, Chunfeng Sun, Dongwei Xu, Yun Lin, Qi Xuan, Guan Gui

Abstract: Deep learning-based Automatic Modulation Recognition (AMR) model has made significant progress with the support of large-scale labeled data. However, when developing new models or performing hyperparameter tuning, the time and energy consumption associated with repeated training using massive amounts of data are often unbearable. To address the above challenges, we propose \emph{FoQuS}, which approximates the effect of full training by selecting a coreset from the original dataset, thereby significantly reducing training overhead. Specifically, \emph{FoQuS} records the prediction trajectory of each sample during full-dataset training and constructs three importance metrics based on training dynamics. Experiments show that \emph{FoQuS} can maintain high recognition accuracy and good cross-architecture generalization on multiple AMR datasets using only 1\%-30\% of the original data.

new EvolKV: Evolutionary KV Cache Compression for LLM Inference

Authors: Bohan Yu, Yekun Chai

Abstract: Existing key-value (KV) cache compression methods typically rely on heuristics, such as uniform cache allocation across layers or static eviction policies, however, they ignore the critical interplays among layer-specific feature patterns and task performance, which can lead to degraded generalization. In this paper, we propose EvolKV, an adaptive framework for layer-wise, task-driven KV cache compression that jointly optimizes the memory efficiency and task performance. By reformulating cache allocation as a multi-objective optimization problem, EvolKV leverages evolutionary search to dynamically configure layer budgets while directly maximizing downstream performance. Extensive experiments on 11 tasks demonstrate that our approach outperforms all baseline methods across a wide range of KV cache budgets on long-context tasks and surpasses heuristic baselines by up to 7 percentage points on GSM8K. Notably, EvolKV achieves superior performance over the full KV cache setting on code completion while utilizing only 1.5% of the original budget, suggesting the untapped potential in learned compression strategies for KV cache budget allocation.

new Accelerating Reinforcement Learning Algorithms Convergence using Pre-trained Large Language Models as Tutors With Advice Reusing

Authors: Lukas Toral, Teddy Lazebnik

Abstract: Reinforcement Learning (RL) algorithms often require long training to become useful, especially in complex environments with sparse rewards. While techniques like reward shaping and curriculum learning exist to accelerate training, these are often extremely specific and require the developer's professionalism and dedicated expertise in the problem's domain. Tackling this challenge, in this study, we explore the effectiveness of pre-trained Large Language Models (LLMs) as tutors in a student-teacher architecture with RL algorithms, hypothesizing that LLM-generated guidance allows for faster convergence. In particular, we explore the effectiveness of reusing the LLM's advice on the RL's convergence dynamics. Through an extensive empirical examination, which included 54 configurations, varying the RL algorithm (DQN, PPO, A2C), LLM tutor (Llama, Vicuna, DeepSeek), and environment (Blackjack, Snake, Connect Four), our results demonstrate that LLM tutoring significantly accelerates RL convergence while maintaining comparable optimal performance. Furthermore, the advice reuse mechanism shows a further improvement in training duration but also results in less stable convergence dynamics. Our findings suggest that LLM tutoring generally improves convergence, and its effectiveness is sensitive to the specific task, RL algorithm, and LLM model combination.

new Accelerating Mixture-of-Expert Inference with Adaptive Expert Split Mechanism

Authors: Jiaming Yan, Jianchun Liu, Hongli Xu, Liusheng Huang

Abstract: Mixture-of-Experts (MoE) has emerged as a promising architecture for modern large language models (LLMs). However, massive parameters impose heavy GPU memory (i.e., VRAM) demands, hindering the widespread adoption of MoE LLMs. Offloading the expert parameters to CPU RAM offers an effective way to alleviate the VRAM requirements for MoE inference. Existing approaches typically cache a small subset of experts in VRAM and dynamically prefetch experts from RAM during inference, leading to significant degradation in inference speed due to the poor cache hit rate and substantial expert loading latency. In this work, we propose MoEpic, an efficient MoE inference system with a novel expert split mechanism. Specifically, each expert is vertically divided into two segments: top and bottom. MoEpic caches the top segment of hot experts, so that more experts will be stored under the limited VRAM budget, thereby improving the cache hit rate. During each layer's inference, MoEpic predicts and prefetches the activated experts for the next layer. Since the top segments of cached experts are exempt from fetching, the loading time is reduced, which allows efficient transfer-computation overlap. Nevertheless, the performance of MoEpic critically depends on the cache configuration (i.e., each layer's VRAM budget and expert split ratio). To this end, we propose a divide-and-conquer algorithm based on fixed-point iteration for adaptive cache configuration. Extensive experiments on popular MoE LLMs demonstrate that MoEpic can save about half of the GPU cost, while lowering the inference latency by about 37.51%-65.73% compared to the baselines.

new Prediction Loss Guided Decision-Focused Learning

Authors: Haeun Jeon, Hyunglip Bae, Chanyeong Kim, Yongjae Lee, Woo Chang Kim

Abstract: Decision-making under uncertainty is often considered in two stages: predicting the unknown parameters, and then optimizing decisions based on predictions. While traditional prediction-focused learning (PFL) treats these two stages separately, decision-focused learning (DFL) trains the predictive model by directly optimizing the decision quality in an end-to-end manner. However, despite using exact or well-approximated gradients, vanilla DFL often suffers from unstable convergence due to its flat-and-sharp loss landscapes. In contrast, PFL yields more stable optimization, but overlooks the downstream decision quality. To address this, we propose a simple yet effective approach: perturbing the decision loss gradient using the prediction loss gradient to construct an update direction. Our method requires no additional training and can be integrated with any DFL solvers. Using the sigmoid-like decaying parameter, we let the prediction loss gradient guide the decision loss gradient to train a predictive model that optimizes decision quality. Also, we provide a theoretical convergence guarantee to Pareto stationary point under mild assumptions. Empirically, we demonstrate our method across three stochastic optimization problems, showing promising results compared to other baselines. We validate that our approach achieves lower regret with more stable training, even in situations where either PFL or DFL struggles.

new Rethinking the Backbone in Class Imbalanced Federated Source Free Domain Adaptation: The Utility of Vision Foundation Models

Authors: Kosuke Kihara, Junki Mori, Taiki Miyagawa, Akinori F. Ebihara

Abstract: Federated Learning (FL) offers a framework for training models collaboratively while preserving data privacy of each client. Recently, research has focused on Federated Source-Free Domain Adaptation (FFREEDA), a more realistic scenario wherein client-held target domain data remains unlabeled, and the server can access source domain data only during pre-training. We extend this framework to a more complex and realistic setting: Class Imbalanced FFREEDA (CI-FFREEDA), which takes into account class imbalances in both the source and target domains, as well as label shifts between source and target and among target clients. The replication of existing methods in our experimental setup lead us to rethink the focus from enhancing aggregation and domain adaptation methods to improving the feature extractors within the network itself. We propose replacing the FFREEDA backbone with a frozen vision foundation model (VFM), thereby improving overall accuracy without extensive parameter tuning and reducing computational and communication costs in federated learning. Our experimental results demonstrate that VFMs effectively mitigate the effects of domain gaps, class imbalances, and even non-IID-ness among target clients, suggesting that strong feature extractors, not complex adaptation or FL methods, are key to success in the real-world FL.

new Efficient Decoding Methods for Language Models on Encrypted Data

Authors: Matan Avitan, Moran Baruch, Nir Drucker, Itamar Zimerman, Yoav Goldberg

Abstract: Large language models (LLMs) power modern AI applications, but processing sensitive data on untrusted servers raises privacy concerns. Homomorphic encryption (HE) enables computation on encrypted data for secure inference. However, neural text generation requires decoding methods like argmax and sampling, which are non-polynomial and thus computationally expensive under encryption, creating a significant performance bottleneck. We introduce cutmax, an HE-friendly argmax algorithm that reduces ciphertext operations compared to prior methods, enabling practical greedy decoding under encryption. We also propose the first HE-compatible nucleus (top-p) sampling method, leveraging cutmax for efficient stochastic decoding with provable privacy guarantees. Both techniques are polynomial, supporting efficient inference in privacy-preserving settings. Moreover, their differentiability facilitates gradient-based sequence-level optimization as a polynomial alternative to straight-through estimators. We further provide strong theoretical guarantees for cutmax, proving it converges globally to a unique two-level fixed point, independent of the input values beyond the identity of the maximizer, which explains its rapid convergence in just a few iterations. Evaluations on realistic LLM outputs show latency reductions of 24x-35x over baselines, advancing secure text generation.

new Two Sides of the Same Optimization Coin: Model Degradation and Representation Collapse in Graph Foundation Models

Authors: Xunkai Li, Daohan Su, Sicheng Liu, Ru Zhang, Zhenjun Li, Bing Zhou, Rong-Hua Li, Guoren Wang

Abstract: Graph foundation models, inspired by the success of LLMs, are designed to learn the optimal embedding from multi-domain TAGs for the downstream cross-task generalization capability. During our investigation, graph VQ-MAE stands out among the increasingly diverse landscape of GFM architectures. This is attributed to its ability to jointly encode topology and textual attributes from multiple domains into discrete embedding spaces with clear semantic boundaries. Despite its potential, domain generalization conflicts cause imperceptible pitfalls. In this paper, we instantiate two of them, and they are just like two sides of the same GFM optimization coin - Side 1 Model Degradation: The encoder and codebook fail to capture the diversity of inputs; Side 2 Representation Collapse: The hidden embedding and codebook vector fail to preserve semantic separability due to constraints from narrow representation subspaces. These two pitfalls (sides) collectively impair the decoder and generate the low-quality reconstructed supervision, causing the GFM optimization dilemma during pre-training (coin). Through empirical investigation, we attribute the above challenges to Information Bottleneck and Regularization Deficit. To address them, we propose MoT (Mixture-of-Tinkers) - (1) Information Tinker for Two Pitfalls, which utilizes an edge-wise semantic fusion strategy and a mixture-of-codebooks with domain-aware routing to improve information capacity. (2) Regularization Tinker for Optimization Coin, which utilizes two additional regularizations to further improve gradient supervision in our proposed Information Tinker. Notably, as a flexible architecture, MoT adheres to the scaling laws of GFM, offering a controllable model scale. Compared to SOTA baselines, experiments on 22 datasets across 6 domains demonstrate that MoT achieves significant improvements in supervised, few-shot, and zero-shot scenarios.

new Adapting Vision-Language Models for Neutrino Event Classification in High-Energy Physics

Authors: Dikshant Sagar, Kaiwen Yu, Alejandro Yankelevich, Jianming Bian, Pierre Baldi

Abstract: Recent advances in Large Language Models (LLMs) have demonstrated their remarkable capacity to process and reason over structured and unstructured data modalities beyond natural language. In this work, we explore the applications of Vision Language Models (VLMs), specifically a fine-tuned variant of LLaMa 3.2, to the task of identifying neutrino interactions in pixelated detector data from high-energy physics (HEP) experiments. We benchmark this model against a state-of-the-art convolutional neural network (CNN) architecture, similar to those used in the NOvA and DUNE experiments, which have achieved high efficiency and purity in classifying electron and muon neutrino events. Our evaluation considers both the classification performance and interpretability of the model predictions. We find that VLMs can outperform CNNs, while also providing greater flexibility in integrating auxiliary textual or semantic information and offering more interpretable, reasoning-based predictions. This work highlights the potential of VLMs as a general-purpose backbone for physics event classification, due to their high performance, interpretability, and generalizability, which opens new avenues for integrating multimodal reasoning in experimental neutrino physics.

new An Interpretable Deep Learning Model for General Insurance Pricing

Authors: Patrick J. Laub, Tu Pho, Bernard Wong

Abstract: This paper introduces the Actuarial Neural Additive Model, an inherently interpretable deep learning model for general insurance pricing that offers fully transparent and interpretable results while retaining the strong predictive power of neural networks. This model assigns a dedicated neural network (or subnetwork) to each individual covariate and pairwise interaction term to independently learn its impact on the modeled output while implementing various architectural constraints to allow for essential interpretability (e.g. sparsity) and practical requirements (e.g. smoothness, monotonicity) in insurance applications. The development of our model is grounded in a solid foundation, where we establish a concrete definition of interpretability within the insurance context, complemented by a rigorous mathematical framework. Comparisons in terms of prediction accuracy are made with traditional actuarial and state-of-the-art machine learning methods using both synthetic and real insurance datasets. The results show that the proposed model outperforms other methods in most cases while offering complete transparency in its internal logic, underscoring the strong interpretability and predictive capability.

new SHAining on Process Mining: Explaining Event Log Characteristics Impact on Algorithms

Authors: Andrea Maldonado, Christian M. M. Frey, Sai Anirudh Aryasomayajula, Ludwig Zellner, Stephan A. Fahrenkrog-Petersen, Thomas Seidl

Abstract: Process mining aims to extract and analyze insights from event logs, yet algorithm metric results vary widely depending on structural event log characteristics. Existing work often evaluates algorithms on a fixed set of real-world event logs but lacks a systematic analysis of how event log characteristics impact algorithms individually. Moreover, since event logs are generated from processes, where characteristics co-occur, we focus on associational rather than causal effects to assess how strong the overlapping individual characteristic affects evaluation metrics without assuming isolated causal effects, a factor often neglected by prior work. We introduce SHAining, the first approach to quantify the marginal contribution of varying event log characteristics to process mining algorithms' metrics. Using process discovery as a downstream task, we analyze over 22,000 event logs covering a wide span of characteristics to uncover which affect algorithms across metrics (e.g., fitness, precision, complexity) the most. Furthermore, we offer novel insights about how the value of event log characteristics correlates with their contributed impact, assessing the algorithm's robustness.

new Modified Loss of Momentum Gradient Descent: Fine-Grained Analysis

Authors: Matias D. Cattaneo, Boris Shigida

Abstract: We analyze gradient descent with Polyak heavy-ball momentum (HB) whose fixed momentum parameter $\beta \in (0, 1)$ provides exponential decay of memory. Building on Kovachki and Stuart (2021), we prove that on an exponentially attractive invariant manifold the algorithm is exactly plain gradient descent with a modified loss, provided that the step size $h$ is small enough. Although the modified loss does not admit a closed-form expression, we describe it with arbitrary precision and prove global (finite "time" horizon) approximation bounds $O(h^{R})$ for any finite order $R \geq 2$. We then conduct a fine-grained analysis of the combinatorics underlying the memoryless approximations of HB, in particular, finding a rich family of polynomials in $\beta$ hidden inside which contains Eulerian and Narayana polynomials. We derive continuous modified equations of arbitrary approximation order (with rigorous bounds) and the principal flow that approximates the HB dynamics, generalizing Rosca et al. (2023). Approximation theorems cover both full-batch and mini-batch HB. Our theoretical results shed new light on the main features of gradient descent with heavy-ball momentum, and outline a road-map for similar analysis of other optimization algorithms.

new Heart Disease Prediction: A Comparative Study of Optimisers Performance in Deep Neural Networks

Authors: Chisom Chibuike, Adeyinka Ogunsanya

Abstract: Optimization has been an important factor and topic of interest in training deep learning models, yet less attention has been given to how we select the optimizers we use to train these models. Hence, there is a need to dive deeper into how we select the optimizers we use for training and the metrics that determine this selection. In this work, we compare the performance of 10 different optimizers in training a simple Multi-layer Perceptron model using a heart disease dataset from Kaggle. We set up a consistent training paradigm and evaluate the optimizers based on metrics such as convergence speed and stability. We also include some other Machine Learning Evaluation metrics such as AUC, Precision, and Recall, which are central metrics to classification problems. Our results show that there are trade-offs between convergence speed and stability, as optimizers like Adagrad and Adadelta, which are more stable, took longer time to converge. Across all our metrics, we chose RMSProp to be the most effective optimizer for this heart disease prediction task because it offered a balanced performance across key metrics. It achieved a precision of 0.765, a recall of 0.827, and an AUC of 0.841, along with faster training time. However, it was not the most stable. We recommend that, in less compute-constrained environments, this method of choosing optimizers through a thorough evaluation should be adopted to increase the scientific nature and performance in training deep learning models.

new Variational Rank Reduction Autoencoders for Generative

Authors: Alicia Tierz, Jad Mounayer, Beatriz Moya, Francisco Chinesta

Abstract: Generative thermal design for complex geometries is fundamental in many areas of engineering, yet it faces two main challenges: the high computational cost of high-fidelity simulations and the limitations of conventional generative models. Approaches such as autoencoders (AEs) and variational autoencoders (VAEs) often produce unstructured latent spaces with discontinuities, which restricts their capacity to explore designs and generate physically consistent solutions. To address these limitations, we propose a hybrid framework that combines Variational Rank-Reduction Autoencoders (VRRAEs) with Deep Operator Networks (DeepONets). The VRRAE introduces a truncated SVD within the latent space, leading to continuous, interpretable, and well-structured representations that mitigate posterior collapse and improve geometric reconstruction. The DeepONet then exploits this compact latent encoding in its branch network, together with spatial coordinates in the trunk network, to predict temperature gradients efficiently and accurately. This hybrid approach not only enhances the quality of generated geometries and the accuracy of gradient prediction, but also provides a substantial advantage in inference efficiency compared to traditional numerical solvers. Overall, the study underscores the importance of structured latent representations for operator learning and highlights the potential of combining generative models and operator networks in thermal design and broader engineering applications.

new Data Skeleton Learning: Scalable Active Clustering with Sparse Graph Structures

Authors: Wen-Bo Xie, Xun Fu, Bin Chen, Yan-Li Lee, Tao Deng, Tian Zou, Xin Wang, Zhen Liu, Jaideep Srivastavad

Abstract: In this work, we focus on the efficiency and scalability of pairwise constraint-based active clustering, crucial for processing large-scale data in applications such as data mining, knowledge annotation, and AI model pre-training. Our goals are threefold: (1) to reduce computational costs for iterative clustering updates; (2) to enhance the impact of user-provided constraints to minimize annotation requirements for precise clustering; and (3) to cut down memory usage in practical deployments. To achieve these aims, we propose a graph-based active clustering algorithm that utilizes two sparse graphs: one for representing relationships between data (our proposed data skeleton) and another for updating this data skeleton. These two graphs work in concert, enabling the refinement of connected subgraphs within the data skeleton to create nested clusters. Our empirical analysis confirms that the proposed algorithm consistently facilitates more accurate clustering with dramatically less input of user-provided constraints, and outperforms its counterparts in terms of computational performance and scalability, while maintaining robustness across various distance metrics.

new MAESTRO: Multi-modal Adaptive Ensemble for Spectro-Temporal Robust Optimization

Authors: Hong Liu

Abstract: Timely and robust influenza incidence forecasting is critical for public health decision-making. To address this, we present MAESTRO, a Multi-modal Adaptive Ensemble for Spectro-Temporal Robust Optimization. MAESTRO achieves robustness by adaptively fusing multi-modal inputs-including surveillance, web search trends, and meteorological data-and leveraging a comprehensive spectro-temporal architecture. The model first decomposes time series into seasonal and trend components. These are then processed through a hybrid feature enhancement pipeline combining Transformer-based encoders, a Mamba state-space model for long-range dependencies, multi-scale temporal convolutions, and a frequency-domain analysis module. A cross-channel attention mechanism further integrates information across the different data modalities. Finally, a temporal projection head performs sequence-to-sequence forecasting, with an optional estimator to quantify prediction uncertainty. Evaluated on over 11 years of Hong Kong influenza data (excluding the COVID-19 period), MAESTRO shows strong competitive performance, demonstrating a superior model fit and relative accuracy, achieving a state-of-the-art R-square of 0.956. Extensive ablations confirm the significant contributions of both multi-modal fusion and the spectro-temporal components. Our modular and reproducible pipeline is made publicly available to facilitate deployment and extension to other regions and pathogens.Our publicly available pipeline presents a powerful, unified framework, demonstrating the critical synergy of advanced spectro-temporal modeling and multi-modal data fusion for robust epidemiological forecasting.

new Interpretability as Alignment: Making Internal Understanding a Design Principle

Authors: Aadit Sengupta, Pratinav Seth, Vinay Kumar Sankarapu

Abstract: Large neural models are increasingly deployed in high-stakes settings, raising concerns about whether their behavior reliably aligns with human values. Interpretability provides a route to internal transparency by revealing the computations that drive outputs. We argue that interpretability especially mechanistic approaches should be treated as a design principle for alignment, not an auxiliary diagnostic tool. Post-hoc methods such as LIME or SHAP offer intuitive but correlational explanations, while mechanistic techniques like circuit tracing or activation patching yield causal insight into internal failures, including deceptive or misaligned reasoning that behavioral methods like RLHF, red teaming, or Constitutional AI may overlook. Despite these advantages, interpretability faces challenges of scalability, epistemic uncertainty, and mismatches between learned representations and human concepts. Our position is that progress on safe and trustworthy AI will depend on making interpretability a first-class objective of AI research and development, ensuring that systems are not only effective but also auditable, transparent, and aligned with human intent.

new Classification of 24-hour movement behaviors from wrist-worn accelerometer data: from handcrafted features to deep learning techniques

Authors: Alireza Sameh, Mehrdad Rostami, Mourad Oussalah, Vahid Farrahi

Abstract: Purpose: We compared the performance of deep learning (DL) and classical machine learning (ML) algorithms for the classification of 24-hour movement behavior into sleep, sedentary, light intensity physical activity (LPA), and moderate-to-vigorous intensity physical activity (MVPA). Methods: Open-access data from 151 adults wearing a wrist-worn accelerometer (Axivity-AX3) was used. Participants were randomly divided into training, validation, and test sets (121, 15, and 15 participants each). Raw acceleration signals were segmented into non-overlapping 10-second windows, and then a total of 104 handcrafted features were extracted. Four DL algorithms-Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BiLSTM), Gated Recurrent Units (GRU), and One-Dimensional Convolutional Neural Network (1D-CNN)-were trained using raw acceleration signals and with handcrafted features extracted from these signals to predict 24-hour movement behavior categories. The handcrafted features were also used to train classical ML algorithms, namely Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), Logistic Regression (LR), Artificial Neural Network (ANN), and Decision Tree (DT) for classifying 24-hour movement behavior intensities. Results: LSTM, BiLSTM, and GRU showed an overall accuracy of approximately 85% when trained with raw acceleration signals, and 1D-CNN an overall accuracy of approximately 80%. When trained on handcrafted features, the overall accuracy for both DL and classical ML algorithms ranged from 70% to 81%. Overall, there was a higher confusion in classification of MVPA and LPA, compared to sleep and sedentary categories. Conclusion: DL methods with raw acceleration signals had only slightly better performance in predicting 24-hour movement behavior intensities, compared to when DL and classical ML were trained with handcrafted features.

new Towards Interpretable Deep Neural Networks for Tabular Data

Authors: Khawla Elhadri, J\"org Schl\"otterer, Christin Seifert

Abstract: Tabular data is the foundation of many applications in fields such as finance and healthcare. Although DNNs tailored for tabular data achieve competitive predictive performance, they are blackboxes with little interpretability. We introduce XNNTab, a neural architecture that uses a sparse autoencoder (SAE) to learn a dictionary of monosemantic features within the latent space used for prediction. Using an automated method, we assign human-interpretable semantics to these features. This allows us to represent predictions as linear combinations of semantically meaningful components. Empirical evaluations demonstrate that XNNTab attains performance on par with or exceeding that of state-of-the-art, black-box neural models and classical machine learning approaches while being fully interpretable.

new An upper bound of the silhouette validation metric for clustering

Authors: Hugo Str\"ang, Tai Dinh

Abstract: The silhouette coefficient summarizes, per observation, cohesion versus separation in [-1, 1]; the average silhouette width (ASW) is a common internal measure of clustering quality where higher values indicate more coveted results. However, the dataset-specific maximum of ASW is typically unknown, and the standard upper limit 1 is often unattainable. In this work, we derive for each data point in a given dataset a sharp upper bound on its silhouette width. By aggregating these individual bounds, we present a canonical data-dependent upper bound on ASW that often assumes values well below 1. The presented bounds can indicate whether individual data points can ever be well placed, enable early stopping of silhouette-based optimization loops, and help answer a key question: How close is my clustering result to the best possible outcome on this specific data? Across synthetic and real datasets, the bounds are provably near-tight in many cases and offer significant enrichment of cluster quality evaluation.

new Generative Data Refinement: Just Ask for Better Data

Authors: Minqi Jiang, Jo\~ao G. M. Ara\'ujo, Will Ellsworth, Sian Gooding, Edward Grefenstette

Abstract: For a fixed parameter size, the capabilities of large models are primarily determined by the quality and quantity of its training data. Consequently, training datasets now grow faster than the rate at which new data is indexed on the web, leading to projected data exhaustion over the next decade. Much more data exists as user-generated content that is not publicly indexed, but incorporating such data comes with considerable risks, such as leaking private information and other undesirable content. We introduce a framework, Generative Data Refinement (GDR), for using pretrained generative models to transform a dataset with undesirable content into a refined dataset that is more suitable for training. Our experiments show that GDR can outperform industry-grade solutions for dataset anonymization, as well as enable direct detoxification of highly unsafe datasets. Moreover, we show that by generating synthetic data that is conditioned on each example in the real dataset, GDR's refined outputs naturally match the diversity of web scale datasets, and thereby avoid the often challenging task of generating diverse synthetic data via model prompting. The simplicity and effectiveness of GDR make it a powerful tool for scaling up the total stock of training data for frontier models.

new Replicable Reinforcement Learning with Linear Function Approximation

Authors: Eric Eaton, Marcel Hussing, Michael Kearns, Aaron Roth, Sikata Bela Sengupta, Jessica Sorrell

Abstract: Replication of experimental results has been a challenge faced by many scientific disciplines, including the field of machine learning. Recent work on the theory of machine learning has formalized replicability as the demand that an algorithm produce identical outcomes when executed twice on different samples from the same distribution. Provably replicable algorithms are especially interesting for reinforcement learning (RL), where algorithms are known to be unstable in practice. While replicable algorithms exist for tabular RL settings, extending these guarantees to more practical function approximation settings has remained an open problem. In this work, we make progress by developing replicable methods for linear function approximation in RL. We first introduce two efficient algorithms for replicable random design regression and uncentered covariance estimation, each of independent interest. We then leverage these tools to provide the first provably efficient replicable RL algorithms for linear Markov decision processes in both the generative model and episodic settings. Finally, we evaluate our algorithms experimentally and show how they can inspire more consistent neural policies.

new Signal Fidelity Index-Aware Calibration for Dementia Predictions Across Heterogeneous Real-World Data

Authors: Jingya Cheng, Jiazi Tian, Federica Spoto, Alaleh Azhir, Daniel Mork, Hossein Estiri

Abstract: \textbf{Background:} Machine learning models trained on electronic health records (EHRs) often degrade across healthcare systems due to distributional shift. A fundamental but underexplored factor is diagnostic signal decay: variability in diagnostic quality and consistency across institutions, which affects the reliability of codes used for training and prediction. \textbf{Objective:} To develop a Signal Fidelity Index (SFI) quantifying diagnostic data quality at the patient level in dementia, and to test SFI-aware calibration for improving model performance across heterogeneous datasets without outcome labels. \textbf{Methods:} We built a simulation framework generating 2,500 synthetic datasets, each with 1,000 patients and realistic demographics, encounters, and coding patterns based on dementia risk factors. The SFI was derived from six interpretable components: diagnostic specificity, temporal consistency, entropy, contextual concordance, medication alignment, and trajectory stability. SFI-aware calibration applied a multiplicative adjustment, optimized across 50 simulation batches. \textbf{Results:} At the optimal parameter ($\alpha$ = 2.0), SFI-aware calibration significantly improved all metrics (p $<$ 0.001). Gains ranged from 10.3\% for Balanced Accuracy to 32.5\% for Recall, with notable increases in Precision (31.9\%) and F1-score (26.1\%). Performance approached reference standards, with F1-score and Recall within 1\% and Balanced Accuracy and Detection Rate improved by 52.3\% and 41.1\%, respectively. \textbf{Conclusions:} Diagnostic signal decay is a tractable barrier to model generalization. SFI-aware calibration provides a practical, label-free strategy to enhance prediction across healthcare contexts, particularly for large-scale administrative datasets lacking outcome labels.

new Perfectly-Private Analog Secure Aggregation in Federated Learning

Authors: Delio Jaramillo-Velez, Charul Rajput, Ragnar Freij-Hollanti, Camilla Hollanti, Alexandre Graell i Amat

Abstract: In federated learning, multiple parties train models locally and share their parameters with a central server, which aggregates them to update a global model. To address the risk of exposing sensitive data through local models, secure aggregation via secure multiparty computation has been proposed to enhance privacy. At the same time, perfect privacy can only be achieved by a uniform distribution of the masked local models to be aggregated. This raises a problem when working with real valued data, as there is no measure on the reals that is invariant under the masking operation, and hence information leakage is bound to occur. Shifting the data to a finite field circumvents this problem, but as a downside runs into an inherent accuracy complexity tradeoff issue due to fixed point modular arithmetic as opposed to floating point numbers that can simultaneously handle numbers of varying magnitudes. In this paper, a novel secure parameter aggregation method is proposed that employs the torus rather than a finite field. This approach guarantees perfect privacy for each party's data by utilizing the uniform distribution on the torus, while avoiding accuracy losses. Experimental results show that the new protocol performs similarly to the model without secure aggregation while maintaining perfect privacy. Compared to the finite field secure aggregation, the torus-based protocol can in some cases significantly outperform it in terms of model accuracy and cosine similarity, hence making it a safer choice.

new Reshaping the Forward-Forward Algorithm with a Similarity-Based Objective

Authors: James Gong, Raymond Luo, Emma Wang, Leon Ge, Bruce Li, Felix Marattukalam, Waleed Abdulla

Abstract: Backpropagation is the pivotal algorithm underpinning the success of artificial neural networks, yet it has critical limitations such as biologically implausible backward locking and global error propagation. To circumvent these constraints, the Forward-Forward algorithm was proposed as a more biologically plausible method that replaces the backward pass with an additional forward pass. Despite this advantage, the Forward-Forward algorithm significantly trails backpropagation in accuracy, and its optimal form exhibits low inference efficiency due to multiple forward passes required. In this work, the Forward-Forward algorithm is reshaped through its integration with similarity learning frameworks, eliminating the need for multiple forward passes during inference. This proposed algorithm is named Forward-Forward Algorithm Unified with Similarity-based Tuplet loss (FAUST). Empirical evaluations on MNIST, Fashion-MNIST, and CIFAR-10 datasets indicate that FAUST substantially improves accuracy, narrowing the gap with backpropagation. On CIFAR-10, FAUST achieves 56.22\% accuracy with a simple multi-layer perceptron architecture, approaching the backpropagation benchmark of 57.63\% accuracy.

new A layered architecture for log analysis in complex IT systems

Authors: Thorsten Wittkopp

Abstract: In the evolving IT landscape, stability and reliability of systems are essential, yet their growing complexity challenges DevOps teams in implementation and maintenance. Log analysis, a core element of AIOps, provides critical insights into complex behaviors and failures. This dissertation introduces a three-layered architecture to support DevOps in failure resolution. The first layer, Log Investigation, performs autonomous log labeling and anomaly classification. We propose a method that labels log data without manual effort, enabling supervised training and precise evaluation of anomaly detection. Additionally, we define a taxonomy that groups anomalies into three categories, ensuring appropriate method selection. The second layer, Anomaly Detection, detects behaviors deviating from the norm. We propose a flexible Anomaly Detection method adaptable to unsupervised, weakly supervised, and supervised training. Evaluations on public and industry datasets show F1-scores between 0.98 and 1.0, ensuring reliable anomaly detection. The third layer, Root Cause Analysis, identifies minimal log sets describing failures, their origin, and event sequences. By balancing training data and identifying key services, our Root Cause Analysis method consistently detects 90-98% of root cause log lines within the top 10 candidates, providing actionable insights for mitigation. Our research addresses how log analysis methods can be designed and optimized to help DevOps resolve failures efficiently. By integrating these three layers, the architecture equips teams with robust methods to enhance IT system reliability.

new Machine Learning-Based Prediction of Speech Arrest During Direct Cortical Stimulation Mapping

Authors: Nikasadat Emami, Amirhossein Khalilian-Gourtani, Jianghao Qian, Antoine Ratouchniak, Xupeng Chen, Yao Wang, Adeen Flinker

Abstract: Identifying cortical regions critical for speech is essential for safe brain surgery in or near language areas. While Electrical Stimulation Mapping (ESM) remains the clinical gold standard, it is invasive and time-consuming. To address this, we analyzed intracranial electrocorticographic (ECoG) data from 16 participants performing speech tasks and developed machine learning models to directly predict if the brain region underneath each ECoG electrode is critical. Ground truth labels indicating speech arrest were derived independently from Electrical Stimulation Mapping (ESM) and used to train classification models. Our framework integrates neural activity signals, anatomical region labels, and functional connectivity features to capture both local activity and network-level dynamics. We found that models combining region and connectivity features matched the performance of the full feature set, and outperformed models using either type alone. To classify each electrode, trial-level predictions were aggregated using an MLP applied to histogram-encoded scores. Our best-performing model, a trial-level RBF-kernel Support Vector Machine together with MLP-based aggregation, achieved strong accuracy on held-out participants (ROC-AUC: 0.87, PR-AUC: 0.57). These findings highlight the value of combining spatial and network information with non-linear modeling to improve functional mapping in presurgical evaluation.

new Securing Private Federated Learning in a Malicious Setting: A Scalable TEE-Based Approach with Client Auditing

Authors: Shun Takagi, Satoshi Hasegawa

Abstract: In cross-device private federated learning, differentially private follow-the-regularized-leader (DP-FTRL) has emerged as a promising privacy-preserving method. However, existing approaches assume a semi-honest server and have not addressed the challenge of securely removing this assumption. This is due to its statefulness, which becomes particularly problematic in practical settings where clients can drop out or be corrupted. While trusted execution environments (TEEs) might seem like an obvious solution, a straightforward implementation can introduce forking attacks or availability issues due to state management. To address this problem, our paper introduces a novel server extension that acts as a trusted computing base (TCB) to realize maliciously secure DP-FTRL. The TCB is implemented with an ephemeral TEE module on the server side to produce verifiable proofs of server actions. Some clients, upon being selected, participate in auditing these proofs with small additional communication and computational demands. This extension solution reduces the size of the TCB while maintaining the system's scalability and liveness. We provide formal proofs based on interactive differential privacy, demonstrating privacy guarantee in malicious settings. Finally, we experimentally show that our framework adds small constant overhead to clients in several realistic settings.

new Compressing CNN models for resource-constrained systems by channel and layer pruning

Authors: Ahmed Sadaqa, Di Liu

Abstract: Convolutional Neural Networks (CNNs) have achieved significant breakthroughs in various fields. However, these advancements have led to a substantial increase in the complexity and size of these networks. This poses a challenge when deploying large and complex networks on edge devices. Consequently, model compression has emerged as a research field aimed at reducing the size and complexity of CNNs. One prominent technique in model compression is model pruning. This paper will present a new technique of pruning that combines both channel and layer pruning in what is called a "hybrid pruning framework". Inspired by EfficientNet, a renowned CNN architecture known for scaling up networks from both channel and layer perspectives, this hybrid approach applies the same principles but in reverse, where it scales down the network through pruning. Experiments on the hybrid approach demonstrated a notable decrease in the overall complexity of the model, with only a minimal reduction in accuracy compared to the baseline model. This complexity reduction translates into reduced latency when deploying the pruned models on an NVIDIA JETSON TX2 embedded AI device.

new Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Authors: Jeffrey Amico, Gabriel Passamani Andrade, John Donaghy, Ben Fielding, Tristin Forbus, Harry Grieve, Semih Kara, Jari Kolehmainen, Yihua Lou, Christopher Nies, Edward Phillip Flores Nu\~no, Diogo Ortega, Shikhar Rastogi, Austin Virts, Matthew J. Wright

Abstract: Post-training language models (LMs) with reinforcement learning (RL) can enhance their complex reasoning capabilities without supervised fine-tuning, as demonstrated by DeepSeek-R1-Zero. However, effectively utilizing RL for LMs requires significant parallelization to scale-up inference, which introduces non-trivial technical challenges (e.g. latency, memory, and reliability) alongside ever-growing financial costs. We present Swarm sAmpling Policy Optimization (SAPO), a fully decentralized and asynchronous RL post-training algorithm. SAPO is designed for decentralized networks of heterogenous compute nodes, where each node manages its own policy model(s) while "sharing" rollouts with others in the network; no explicit assumptions about latency, model homogeneity, or hardware are required and nodes can operate in silo if desired. As a result, the algorithm avoids common bottlenecks in scaling RL post-training while also allowing (and even encouraging) new possibilities. By sampling rollouts "shared" across the network, it enables "Aha moments" to propagate, thereby bootstrapping the learning process. In this paper we show SAPO achieved cumulative reward gains of up to 94% in controlled experiments. We also share insights from tests on a network with thousands of nodes contributed by Gensyn community members running the algorithm on diverse hardware and models during an open-source demo.

new Data-driven generative simulation of SDEs using diffusion models

Authors: Xuefeng Gao, Jiale Zha, Xun Yu Zhou

Abstract: This paper introduces a new approach to generating sample paths of unknown stochastic differential equations (SDEs) using diffusion models, a class of generative AI models commonly employed in image and video applications. Unlike the traditional Monte Carlo methods for simulating SDEs, which require explicit specifications of the drift and diffusion coefficients, our method takes a model-free, data-driven approach. Given a finite set of sample paths from an SDE, we utilize conditional diffusion models to generate new, synthetic paths of the same SDE. To demonstrate the effectiveness of our approach, we conduct a simulation experiment to compare our method with alternative benchmark ones including neural SDEs. Furthermore, in an empirical study we leverage these synthetically generated sample paths to enhance the performance of reinforcement learning algorithms for continuous-time mean-variance portfolio selection, hinting promising applications of diffusion models in financial analysis and decision-making.

new DEQuify your force field: More efficient simulations using deep equilibrium models

Authors: Andreas Burger, Luca Thiede, Al\'an Aspuru-Guzik, Nandita Vijaykumar

Abstract: Machine learning force fields show great promise in enabling more accurate molecular dynamics simulations compared to manually derived ones. Much of the progress in recent years was driven by exploiting prior knowledge about physical systems, in particular symmetries under rotation, translation, and reflections. In this paper, we argue that there is another important piece of prior information that, thus fa,r hasn't been explored: Simulating a molecular system is necessarily continuous, and successive states are therefore extremely similar. Our contribution is to show that we can exploit this information by recasting a state-of-the-art equivariant base model as a deep equilibrium model. This allows us to recycle intermediate neural network features from previous time steps, enabling us to improve both accuracy and speed by $10\%-20\%$ on the MD17, MD22, and OC20 200k datasets, compared to the non-DEQ base model. The training is also much more memory efficient, allowing us to train more expressive models on larger systems.

new ChemBOMAS: Accelerated BO in Chemistry with LLM-Enhanced Multi-Agent System

Authors: Dong Han, Zhehong Ai, Pengxiang Cai, Shuzhou Sun, Shanya Lu, Jianpeng Chen, Ben Gao, Lingli Ge, Weida Wang, Xiangxin Zhou, Xihui Liu, Mao Su, Wanli Ouyang, Lei Bai, Dongzhan Zhou, Tao XU, Yuqiang Li, Shufei Zhang

Abstract: The efficiency of Bayesian optimization (BO) in chemistry is often hindered by sparse experimental data and complex reaction mechanisms. To overcome these limitations, we introduce ChemBOMAS, a new framework named LLM-Enhanced Multi-Agent System for accelerating BO in chemistry. ChemBOMAS's optimization process is enhanced by LLMs and synergistically employs two strategies: knowledge-driven coarse-grained optimization and data-driven fine-grained optimization. First, in the knowledge-driven coarse-grained optimization stage, LLMs intelligently decompose the vast search space by reasoning over existing chemical knowledge to identify promising candidate regions. Subsequently, in the data-driven fine-grained optimization stage, LLMs enhance the BO process within these candidate regions by generating pseudo-data points, thereby improving data utilization efficiency and accelerating convergence. Benchmark evaluations** further confirm that ChemBOMAS significantly enhances optimization effectiveness and efficiency compared to various BO algorithms. Importantly, the practical utility of ChemBOMAS was validated through wet-lab experiments conducted under pharmaceutical industry protocols, targeting conditional optimization for a previously unreported and challenging chemical reaction. In the wet experiment, ChemBOMAS achieved an optimal objective value of 96%. This was substantially higher than the 15% achieved by domain experts. This real-world success, together with strong performance on benchmark evaluations, highlights ChemBOMAS as a powerful tool to accelerate chemical discovery.

new PracMHBench: Re-evaluating Model-Heterogeneous Federated Learning Based on Practical Edge Device Constraints

Authors: Yuanchun Guo, Bingyan Liu, Yulong Sha, Zhensheng Xian

Abstract: Federating heterogeneous models on edge devices with diverse resource constraints has been a notable trend in recent years. Compared to traditional federated learning (FL) that assumes an identical model architecture to cooperate, model-heterogeneous FL is more practical and flexible since the model can be customized to satisfy the deployment requirement. Unfortunately, no prior work ever dives into the existing model-heterogeneous FL algorithms under the practical edge device constraints and provides quantitative analysis on various data scenarios and metrics, which motivates us to rethink and re-evaluate this paradigm. In our work, we construct the first system platform \textbf{PracMHBench} to evaluate model-heterogeneous FL on practical constraints of edge devices, where diverse model heterogeneity algorithms are classified and tested on multiple data tasks and metrics. Based on the platform, we perform extensive experiments on these algorithms under the different edge constraints to observe their applicability and the corresponding heterogeneity pattern.

new AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

Authors: Zhiheng Xi, Jixuan Huang, Chenyang Liao, Baodai Huang, Honglin Guo, Jiaqi Liu, Rui Zheng, Junjie Ye, Jiazheng Zhang, Wenxiang Chen, Wei He, Yiwen Ding, Guanyu Li, Zehui Chen, Zhengyin Du, Xuesong Yao, Yufei Xu, Jiecao Chen, Tao Gui, Zuxuan Wu, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang

Abstract: Developing autonomous LLM agents capable of making a series of intelligent decisions to solve complex, real-world tasks is a fast-evolving frontier. Like human cognitive development, agents are expected to acquire knowledge and skills through exploration and interaction with the environment. Despite advances, the community still lacks a unified, interactive reinforcement learning (RL) framework that can effectively train such agents from scratch -- without relying on supervised fine-tuning (SFT) -- across diverse and realistic environments. To bridge this gap, we introduce AgentGym-RL, a new framework to train LLM agents for multi-turn interactive decision-making through RL. The framework features a modular and decoupled architecture, ensuring high flexibility and extensibility. It encompasses a wide variety of real-world scenarios, and supports mainstream RL algorithms. Furthermore, we propose ScalingInter-RL, a training approach designed for exploration-exploitation balance and stable RL optimization. In early stages, it emphasizes exploitation by restricting the number of interactions, and gradually shifts towards exploration with larger horizons to encourage diverse problem-solving strategies. In this way, the agent develops more diverse behaviors and is less prone to collapse under long horizons. We perform extensive experiments to validate the stability and effectiveness of both the AgentGym-RL framework and the ScalingInter-RL approach. Our agents match or surpass commercial models on 27 tasks across diverse environments. We offer key insights and will open-source the complete AgentGym-RL framework -- including code and datasets -- to empower the research community in developing the next generation of intelligent agents.

new Using AI to Optimize Patient Transfer and Resource Utilization During Mass-Casualty Incidents: A Simulation Platform

Authors: Zhaoxun "Lorenz" Liu, Wagner H. Souza, Jay Han, Amin Madani

Abstract: Mass casualty incidents (MCIs) overwhelm healthcare systems and demand rapid, accurate patient-hospital allocation decisions under extreme pressure. Here, we developed and validated a deep reinforcement learning-based decision-support AI agent to optimize patient transfer decisions during simulated MCIs by balancing patient acuity levels, specialized care requirements, hospital capacities, and transport logistics. To integrate this AI agent, we developed MasTER, a web-accessible command dashboard for MCI management simulations. Through a controlled user study with 30 participants (6 trauma experts and 24 non-experts), we evaluated three interaction approaches with the AI agent (human-only, human-AI collaboration, and AI-only) across 20- and 60-patient MCI scenarios in the Greater Toronto Area. Results demonstrate that increasing AI involvement significantly improves decision quality and consistency. The AI agent outperforms trauma surgeons (p < 0.001) and enables non-experts to achieve expert-level performance when assisted, contrasting sharply with their significantly inferior unassisted performance (p < 0.001). These findings establish the potential for our AI-driven decision support to enhance both MCI preparedness training and real-world emergency response management.

new Fourier Learning Machines: Nonharmonic Fourier-Based Neural Networks for Scientific Machine Learning

Authors: Mominul Rubel, Adam Meyers, Gabriel Nicolosi

Abstract: We introduce the Fourier Learning Machine (FLM), a neural network (NN) architecture designed to represent a multidimensional nonharmonic Fourier series. The FLM uses a simple feedforward structure with cosine activation functions to learn the frequencies, amplitudes, and phase shifts of the series as trainable parameters. This design allows the model to create a problem-specific spectral basis adaptable to both periodic and nonperiodic functions. Unlike previous Fourier-inspired NN models, the FLM is the first architecture able to represent a complete, separable Fourier basis in multiple dimensions using a standard Multilayer Perceptron-like architecture. A one-to-one correspondence between the Fourier coefficients and amplitudes and phase-shifts is demonstrated, allowing for the translation between a full, separable basis form and the cosine phase--shifted one. Additionally, we evaluate the performance of FLMs on several scientific computing problems, including benchmark Partial Differential Equations (PDEs) and a family of Optimal Control Problems (OCPs). Computational experiments show that the performance of FLMs is comparable, and often superior, to that of established architectures like SIREN and vanilla feedforward NNs.

new ADHDeepNet From Raw EEG to Diagnosis: Improving ADHD Diagnosis through Temporal-Spatial Processing, Adaptive Attention Mechanisms, and Explainability in Raw EEG Signals

Authors: Ali Amini, Mohammad Alijanpour, Behnam Latifi, Ali Motie Nasrabadi

Abstract: Attention Deficit Hyperactivity Disorder (ADHD) is a common brain disorder in children that can persist into adulthood, affecting social, academic, and career life. Early diagnosis is crucial for managing these impacts on patients and the healthcare system but is often labor-intensive and time-consuming. This paper presents a novel method to improve ADHD diagnosis precision and timeliness by leveraging Deep Learning (DL) approaches and electroencephalogram (EEG) signals. We introduce ADHDeepNet, a DL model that utilizes comprehensive temporal-spatial characterization, attention modules, and explainability techniques optimized for EEG signals. ADHDeepNet integrates feature extraction and refinement processes to enhance ADHD diagnosis. The model was trained and validated on a dataset of 121 participants (61 ADHD, 60 Healthy Controls), employing nested cross-validation for robust performance. The proposed two-stage methodology uses a 10-fold cross-subject validation strategy. Initially, each iteration optimizes the model's hyper-parameters with inner 2-fold cross-validation. Then, Additive Gaussian Noise (AGN) with various standard deviations and magnification levels is applied for data augmentation. ADHDeepNet achieved 100% sensitivity and 99.17% accuracy in classifying ADHD/HC subjects. To clarify model explainability and identify key brain regions and frequency bands for ADHD diagnosis, we analyzed the learned weights and activation patterns of the model's primary layers. Additionally, t-distributed Stochastic Neighbor Embedding (t-SNE) visualized high-dimensional data, aiding in interpreting the model's decisions. This study highlights the potential of DL and EEG in enhancing ADHD diagnosis accuracy and efficiency.

new Merge-of-Thought Distillation

Authors: Zhanming Shen, Zeyu Qin, Zenan Huang, Hao Chen, Jiaqi Hu, Yihong Zhuang, Guoshan Lu, Gang Chen, Junbo Zhao

Abstract: Efficient reasoning distillation for long chain-of-thought (CoT) models is increasingly constrained by the assumption of a single oracle teacher, despite practical availability of multiple candidate teachers and growing CoT corpora. We revisit teacher selection and observe that different students have different "best teachers," and even for the same student the best teacher can vary across datasets. Therefore, to unify multiple teachers' reasoning abilities into student with overcoming conflicts among various teachers' supervision, we propose Merge-of-Thought Distillation (MoT), a lightweight framework that alternates between teacher-specific supervised fine-tuning branches and weight-space merging of the resulting student variants. On competition math benchmarks, using only about 200 high-quality CoT samples, applying MoT to a Qwen3-14B student surpasses strong models including DEEPSEEK-R1, QWEN3-30B-A3B, QWEN3-32B, and OPENAI-O1, demonstrating substantial gains. Besides, MoT consistently outperforms the best single-teacher distillation and the naive multi-teacher union, raises the performance ceiling while mitigating overfitting, and shows robustness to distribution-shifted and peer-level teachers. Moreover, MoT reduces catastrophic forgetting, improves general reasoning beyond mathematics and even cultivates a better teacher, indicating that consensus-filtered reasoning features transfer broadly. These results position MoT as a simple, scalable route to efficiently distilling long CoT capabilities from diverse teachers into compact students.

new A Survey of TinyML Applications in Beekeeping for Hive Monitoring and Management

Authors: Willy Sucipto, Jianlong Zhou, Ray Seung Min Kwon, Fang Chen

Abstract: Honey bee colonies are essential for global food security and ecosystem stability, yet they face escalating threats from pests, diseases, and environmental stressors. Traditional hive inspections are labor-intensive and disruptive, while cloud-based monitoring solutions remain impractical for remote or resource-limited apiaries. Recent advances in Internet of Things (IoT) and Tiny Machine Learning (TinyML) enable low-power, real-time monitoring directly on edge devices, offering scalable and non-invasive alternatives. This survey synthesizes current innovations at the intersection of TinyML and apiculture, organized around four key functional areas: monitoring hive conditions, recognizing bee behaviors, detecting pests and diseases, and forecasting swarming events. We further examine supporting resources, including publicly available datasets, lightweight model architectures optimized for embedded deployment, and benchmarking strategies tailored to field constraints. Critical limitations such as data scarcity, generalization challenges, and deployment barriers in off-grid environments are highlighted, alongside emerging opportunities in ultra-efficient inference pipelines, adaptive edge learning, and dataset standardization. By consolidating research and engineering practices, this work provides a foundation for scalable, AI-driven, and ecologically informed monitoring systems to support sustainable pollinator management.

cross ToDMA: Large Model-Driven Token-Domain Multiple Access for Semantic Communications

Authors: Li Qiao, Mahdi Boloursaz Mashhadi, Zhen Gao, Robert Schober, Deniz G\"und\"uz

Abstract: Token communications (TokCom) is an emerging generative semantic communication concept that reduces transmission rates by using context and multimodal large language model (MLLM)-based token processing, with tokens serving as universal semantic units across modalities. In this paper, we propose a semantic multiple access scheme in the token domain, referred to as token domain multiple access (ToDMA), where a large number of devices share a token codebook and a modulation codebook for source and channel coding, respectively. Specifically, each transmitter first tokenizes its source signal and modulate each token to a codeword. At the receiver, compressed sensing is employed first to detect active tokens and the corresponding channel state information (CSI) from the superposed signals. Then, the source token sequences are reconstructed by clustering the token-associated CSI across multiple time slots. In case of token collisions, some active tokens cannot be assigned and some positions in the reconstructed token sequences are empty. We propose to use pre-trained MLLMs to leverage the context, predict masked tokens, and thus mitigate token collisions. Simulation results demonstrate the effectiveness of the proposed ToDMA framework for both text and image transmission tasks, achieving significantly lower latency compared to context-unaware orthogonal communication schemes, while also delivering superior distortion and perceptual quality compared to state-of-the-art context-unaware non-orthogonal communication methods.

cross Steering Protein Language Models

Authors: Long-Kai Huang, Rongyi Zhu, Bing He, Jianhua Yao

Abstract: Protein Language Models (PLMs), pre-trained on extensive evolutionary data from natural proteins, have emerged as indispensable tools for protein design. While powerful, PLMs often struggle to produce proteins with precisely specified functionalities or properties due to inherent challenges in controlling their outputs. In this work, we investigate the potential of Activation Steering, a technique originally developed for controlling text generation in Large Language Models (LLMs), to direct PLMs toward generating protein sequences with targeted properties. We propose a simple yet effective method that employs activation editing to steer PLM outputs, and extend this approach to protein optimization through a novel editing site identification module. Through comprehensive experiments on lysozyme-like sequence generation and optimization, we demonstrate that our methods can be seamlessly integrated into both auto-encoding and autoregressive PLMs without requiring additional training. These results highlight a promising direction for precise protein engineering using foundation models.

cross Signals vs. Videos: Advancing Motion Intention Recognition for Human-Robot Collaboration in Construction

Authors: Charan Gajjala Chenchu, Kinam Kim, Gao Lu, Zia Ud Din

Abstract: Human-robot collaboration (HRC) in the construction industry depends on precise and prompt recognition of human motion intentions and actions by robots to maximize safety and workflow efficiency. There is a research gap in comparing data modalities, specifically signals and videos, for motion intention recognition. To address this, the study leverages deep learning to assess two different modalities in recognizing workers' motion intention at the early stage of movement in drywall installation tasks. The Convolutional Neural Network - Long Short-Term Memory (CNN-LSTM) model utilizing surface electromyography (sEMG) data achieved an accuracy of around 87% with an average time of 0.04 seconds to perform prediction on a sample input. Meanwhile, the pre-trained Video Swin Transformer combined with transfer learning harnessed video sequences as input to recognize motion intention and attained an accuracy of 94% but with a longer average time of 0.15 seconds for a similar prediction. This study emphasizes the unique strengths and trade-offs of both data formats, directing their systematic deployments to enhance HRC in real-world construction projects.

cross DLGE: Dual Local-Global Encoding for Generalizable Cross-BCI-Paradigm

Authors: Jingyuan Wang, Junhua Li

Abstract: Deep learning models have been frequently used to decode a single brain-computer interface (BCI) paradigm based on electroencephalography (EEG). It is challenging to decode multiple BCI paradigms using one model due to diverse barriers, such as different channel configurations and disparate task-related representations. In this study, we propose Dual Local-Global Encoder (DLGE), enabling the classification across different BCI paradigms. To address the heterogeneity in EEG channel configurations across paradigms, we employ an anatomically inspired brain-region partitioning and padding strategy to standardize EEG channel configuration. In the proposed model, the local encoder is designed to learn shared features across BCI paradigms within each brain region based on time-frequency information, which integrates temporal attention on individual channels with spatial attention among channels for each brain region. These shared features are subsequently aggregated in the global encoder to form respective paradigm-specific feature representations. Three BCI paradigms (motor imagery, resting state, and driving fatigue) were used to evaluate the proposed model. The results demonstrate that our model is capable of processing diverse BCI paradigms without retraining and retuning, achieving average macro precision, recall, and F1-score of 60.16\%, 59.88\%, and 59.56\%, respectively. We made an initial attempt to develop a general model for cross-BCI-paradigm classification, avoiding retraining or redevelopment for each paradigm. This study paves the way for the development of an effective but simple model for cross-BCI-paradigm decoding, which might benefit the design of portable devices for universal BCI decoding.

cross STROKEVISION-BENCH: A Multimodal Video And 2D Pose Benchmark For Tracking Stroke Recovery

Authors: David Robinson, Animesh Gupta, Rizwan Quershi, Qiushi Fu, Mubarak Shah

Abstract: Despite advancements in rehabilitation protocols, clinical assessment of upper extremity (UE) function after stroke largely remains subjective, relying heavily on therapist observation and coarse scoring systems. This subjectivity limits the sensitivity of assessments to detect subtle motor improvements, which are critical for personalized rehabilitation planning. Recent progress in computer vision offers promising avenues for enabling objective, quantitative, and scalable assessment of UE motor function. Among standardized tests, the Box and Block Test (BBT) is widely utilized for measuring gross manual dexterity and tracking stroke recovery, providing a structured setting that lends itself well to computational analysis. However, existing datasets targeting stroke rehabilitation primarily focus on daily living activities and often fail to capture clinically structured assessments such as block transfer tasks. Furthermore, many available datasets include a mixture of healthy and stroke-affected individuals, limiting their specificity and clinical utility. To address these critical gaps, we introduce StrokeVision-Bench, the first-ever dedicated dataset of stroke patients performing clinically structured block transfer tasks. StrokeVision-Bench comprises 1,000 annotated videos categorized into four clinically meaningful action classes, with each sample represented in two modalities: raw video frames and 2D skeletal keypoints. We benchmark several state-of-the-art video action recognition and skeleton-based action classification methods to establish performance baselines for this domain and facilitate future research in automated stroke rehabilitation assessment.

cross Network Contagion in Financial Labor Markets: Predicting Turnover in Hong Kong

Authors: Abdulla AlKetbi, Patrick Yam, Gautier Marti, Raed Jaradat

Abstract: Employee turnover is a critical challenge in financial markets, yet little is known about the role of professional networks in shaping career moves. Using the Hong Kong Securities and Futures Commission (SFC) public register (2007-2024), we construct temporal networks of 121,883 professionals and 4,979 firms to analyze and predict employee departures. We introduce a graph-based feature propagation framework that captures peer influence and organizational stability. Our analysis shows a contagion effect: professionals are 23% more likely to leave when over 30% of their peers depart within six months. Embedding these network signals into machine learning models improves turnover prediction by 30% over baselines. These results highlight the predictive power of temporal network effects in workforce dynamics, and demonstrate how network-based analytics can inform regulatory monitoring, talent management, and systemic risk assessment.

cross CardioComposer: Flexible and Compositional Anatomical Structure Generation with Disentangled Geometric Guidance

Authors: Karim Kadry, Shoaib Goraya, Ajay Manicka, Abdalla Abdelwahed, Farhad Nezami, Elazer Edelman

Abstract: Generative models of 3D anatomy, when integrated with biophysical simulators, enable the study of structure-function relationships for clinical research and medical device design. However, current models face a trade-off between controllability and anatomical realism. We propose a programmable and compositional framework for guiding unconditional diffusion models of human anatomy using interpretable ellipsoidal primitives embedded in 3D space. Our method involves the selection of certain tissues within multi-tissue segmentation maps, upon which we apply geometric moment losses to guide the reverse diffusion process. This framework supports the independent control over size, shape, and position, as well as the composition of multi-component constraints during inference.

cross Video Parallel Scaling: Aggregating Diverse Frame Subsets for VideoLLMs

Authors: Hyungjin Chung, Hyelin Nam, Jiyeon Kim, Hyojun Go, Byeongjun Park, Junho Kim, Joonseok Lee, Seongsu Ha, Byung-Hoon Kim

Abstract: Video Large Language Models (VideoLLMs) face a critical bottleneck: increasing the number of input frames to capture fine-grained temporal detail leads to prohibitive computational costs and performance degradation from long context lengths. We introduce Video Parallel Scaling (VPS), an inference-time method that expands a model's perceptual bandwidth without increasing its context window. VPS operates by running multiple parallel inference streams, each processing a unique, disjoint subset of the video's frames. By aggregating the output probabilities from these complementary streams, VPS integrates a richer set of visual information than is possible with a single pass. We theoretically show that this approach effectively contracts the Chinchilla scaling law by leveraging uncorrelated visual evidence, thereby improving performance without additional training. Extensive experiments across various model architectures and scales (2B-32B) on benchmarks such as Video-MME and EventHallusion demonstrate that VPS consistently and significantly improves performance. It scales more favorably than other parallel alternatives (e.g. Self-consistency) and is complementary to other decoding strategies, offering a memory-efficient and robust framework for enhancing the temporal reasoning capabilities of VideoLLMs.

cross Enhancing Privacy Preservation and Reducing Analysis Time with Federated Transfer Learning in Digital Twins-based Computed Tomography Scan Analysis

Authors: Avais Jan, Qasim Zia, Murray Patterson

Abstract: The application of Digital Twin (DT) technology and Federated Learning (FL) has great potential to change the field of biomedical image analysis, particularly for Computed Tomography (CT) scans. This paper presents Federated Transfer Learning (FTL) as a new Digital Twin-based CT scan analysis paradigm. FTL uses pre-trained models and knowledge transfer between peer nodes to solve problems such as data privacy, limited computing resources, and data heterogeneity. The proposed framework allows real-time collaboration between cloud servers and Digital Twin-enabled CT scanners while protecting patient identity. We apply the FTL method to a heterogeneous CT scan dataset and assess model performance using convergence time, model accuracy, precision, recall, F1 score, and confusion matrix. It has been shown to perform better than conventional FL and Clustered Federated Learning (CFL) methods with better precision, accuracy, recall, and F1-score. The technique is beneficial in settings where the data is not independently and identically distributed (non-IID), and it offers reliable, efficient, and secure solutions for medical diagnosis. These findings highlight the possibility of using FTL to improve decision-making in digital twin-based CT scan analysis, secure and efficient medical image analysis, promote privacy, and open new possibilities for applying precision medicine and smart healthcare systems.

cross MCTED: A Machine-Learning-Ready Dataset for Digital Elevation Model Generation From Mars Imagery

Authors: Rafa{\l} Osadnik, Pablo G\'omez, Eleni Bohacek, Rickbir Bahia

Abstract: This work presents a new dataset for the Martian digital elevation model prediction task, ready for machine learning applications called MCTED. The dataset has been generated using a comprehensive pipeline designed to process high-resolution Mars orthoimage and DEM pairs from Day et al., yielding a dataset consisting of 80,898 data samples. The source images are data gathered by the Mars Reconnaissance Orbiter using the CTX instrument, providing a very diverse and comprehensive coverage of the Martian surface. Given the complexity of the processing pipelines used in large-scale DEMs, there are often artefacts and missing data points in the original data, for which we developed tools to solve or mitigate their impact. We divide the processed samples into training and validation splits, ensuring samples in both splits cover no mutual areas to avoid data leakage. Every sample in the dataset is represented by the optical image patch, DEM patch, and two mask patches, indicating values that were originally missing or were altered by us. This allows future users of the dataset to handle altered elevation regions as they please. We provide statistical insights of the generated dataset, including the spatial distribution of samples, the distributions of elevation values, slopes and more. Finally, we train a small U-Net architecture on the MCTED dataset and compare its performance to a monocular depth estimation foundation model, DepthAnythingV2, on the task of elevation prediction. We find that even a very small architecture trained on this dataset specifically, beats a zero-shot performance of a depth estimation foundation model like DepthAnythingV2. We make the dataset and code used for its generation completely open source in public repositories.

cross AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs

Authors: Sidharth Surapaneni, Hoang Nguyen, Jash Mehta, Aman Tiwari, Oluwanifemi Bamgbose, Akshay Kalkunte, Sai Rajeswar, Sathwik Tejaswi Madhusudhan

Abstract: Large Audio Language Models (LALMs) are rapidly advancing, but evaluating them remains challenging due to inefficient toolkits that limit fair comparison and systematic assessment. Current frameworks suffer from three critical issues: slow processing that bottlenecks large-scale studies, inconsistent prompting that hurts reproducibility, and narrow task coverage that misses important audio reasoning capabilities. We introduce AU-Harness, an efficient and comprehensive evaluation framework for LALMs. Our system achieves a speedup of up to 127% over existing toolkits through optimized batch processing and parallel execution, enabling large-scale evaluations previously impractical. We provide standardized prompting protocols and flexible configurations for fair model comparison across diverse scenarios. Additionally, we introduce two new evaluation categories: LLM-Adaptive Diarization for temporal audio understanding and Spoken Language Reasoning for complex audio-based cognitive tasks. Through evaluation across 380+ tasks, we reveal significant gaps in current LALMs, particularly in temporal understanding and complex spoken language reasoning tasks. Our findings also highlight a lack of standardization in instruction modality existent across audio benchmarks, which can lead up performance differences up to 9.5 absolute points on the challenging complex instruction following downstream tasks. AU-Harness provides both practical evaluation tools and insights into model limitations, advancing systematic LALM development.

cross Forecasting Generative Amplification

Authors: Henning Bahl, Sascha Diefenbacher, Nina Elmer, Tilman Plehn, Jonas Spinner

Abstract: Generative networks are perfect tools to enhance the speed and precision of LHC simulations. It is important to understand their statistical precision, especially when generating events beyond the size of the training dataset. We present two complementary methods to estimate the amplification factor without large holdout datasets. Averaging amplification uses Bayesian networks or ensembling to estimate amplification from the precision of integrals over given phase-space volumes. Differential amplification uses hypothesis testing to quantify amplification without any resolution loss. Applied to state-of-the-art event generators, both methods indicate that amplification is possible in specific regions of phase space, but not yet across the entire distribution.

cross SCA-LLM: Spectral-Attentive Channel Prediction with Large Language Models in MIMO-OFDM

Authors: Ke He, Le He, Lisheng Fan, Xianfu Lei, Thang X. Vu, George K. Karagiannidis, Symeon Chatzinotas

Abstract: In recent years, the success of large language models (LLMs) has inspired growing interest in exploring their potential applications in wireless communications, especially for channel prediction tasks. However, directly applying LLMs to channel prediction faces a domain mismatch issue stemming from their text-based pre-training. To mitigate this, the ``adapter + LLM" paradigm has emerged, where an adapter is designed to bridge the domain gap between the channel state information (CSI) data and LLMs. While showing initial success, existing adapters may not fully exploit the potential of this paradigm. To address this limitation, this work provides a key insight that learning representations from the spectral components of CSI features can more effectively help bridge the domain gap. Accordingly, we propose a spectral-attentive framework, named SCA-LLM, for channel prediction in multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) systems. Specifically, its novel adapter can capture finer spectral details and better adapt the LLM for channel prediction than previous methods. Extensive simulations show that SCA-LLM achieves state-of-the-art prediction performance and strong generalization, yielding up to $-2.4~\text{dB}$ normalized mean squared error (NMSE) advantage over the previous LLM based method. Ablation studies further confirm the superiority of SCA-LLM in mitigating domain mismatch.

cross Bias after Prompting: Persistent Discrimination in Large Language Models

Authors: Nivedha Sivakumar, Natalie Mackraz, Samira Khorshidi, Krishna Patel, Barry-John Theobald, Luca Zappella, Nicholas Apostoloff

Abstract: A dangerous assumption that can be made from prior work on the bias transfer hypothesis (BTH) is that biases do not transfer from pre-trained large language models (LLMs) to adapted models. We invalidate this assumption by studying the BTH in causal models under prompt adaptations, as prompting is an extremely popular and accessible adaptation strategy used in real-world applications. In contrast to prior work, we find that biases can transfer through prompting and that popular prompt-based mitigation methods do not consistently prevent biases from transferring. Specifically, the correlation between intrinsic biases and those after prompt adaptation remain moderate to strong across demographics and tasks -- for example, gender (rho >= 0.94) in co-reference resolution, and age (rho >= 0.98) and religion (rho >= 0.69) in question answering. Further, we find that biases remain strongly correlated when varying few-shot composition parameters, such as sample size, stereotypical content, occupational distribution and representational balance (rho >= 0.90). We evaluate several prompt-based debiasing strategies and find that different approaches have distinct strengths, but none consistently reduce bias transfer across models, tasks or demographics. These results demonstrate that correcting bias, and potentially improving reasoning ability, in intrinsic models may prevent propagation of biases to downstream tasks.

cross Contributions to Robust and Efficient Methods for Analysis of High Dimensional Data

Authors: Kai Yang

Abstract: A ubiquitous feature of data of our era is their extra-large sizes and dimensions. Analyzing such high-dimensional data poses significant challenges, since the feature dimension is often much larger than the sample size. This thesis introduces robust and computationally efficient methods to address several common challenges associated with high-dimensional data. In my first manuscript, I propose a coherent approach to variable screening that accommodates nonlinear associations. I develop a novel variable screening method that transcends traditional linear assumptions by leveraging mutual information, with an intended application in neuroimaging data. This approach allows for accurate identification of important variables by capturing nonlinear as well as linear relationships between the outcome and covariates. Building on this foundation, I develop new optimization methods for sparse estimation using nonconvex penalties in my second manuscript. These methods address notable challenges in current statistical computing practices, facilitating computationally efficient and robust analyses of complex datasets. The proposed method can be applied to a general class of optimization problems. In my third manuscript, I contribute to robust modeling of high-dimensional correlated observations by developing a mixed-effects model based on Tsallis power-law entropy maximization and discussed the theoretical properties of such distribution. This model surpasses the constraints of conventional Gaussian models by accommodating a broader class of distributions with enhanced robustness to outliers. Additionally, I develop a proximal nonlinear conjugate gradient algorithm that accelerates convergence while maintaining numerical stability, along with rigorous statistical properties for the proposed framework.

cross OCTANE -- Optimal Control for Tensor-based Autoencoder Network Emergence: Explicit Case

Authors: Ratna Khatri, Anthony Kolshorn, Colin Olson, Harbir Antil

Abstract: This paper presents a novel, mathematically rigorous framework for autoencoder-type deep neural networks that combines optimal control theory and low-rank tensor methods to yield memory-efficient training and automated architecture discovery. The learning task is formulated as an optimization problem constrained by differential equations representing the encoder and decoder components of the network and the corresponding optimality conditions are derived via a Lagrangian approach. Efficient memory compression is enabled by approximating differential equation solutions on low-rank tensor manifolds using an adaptive explicit integration scheme. These concepts are combined to form OCTANE (Optimal Control for Tensor-based Autoencoder Network Emergence) -- a unified training framework that yields compact autoencoder architectures, reduces memory usage, and enables effective learning, even with limited training data. The framework's utility is illustrated with application to image denoising and deblurring tasks and recommendations regarding governing hyperparameters are provided.

cross RAPID Quantum Detection and Demodulation of Covert Communications: Breaking the Noise Limit with Solid-State Spin Sensors

Authors: Amirhossein Taherpour, Abbas Taherpour, Tamer Khattab

Abstract: We introduce a comprehensive framework for the detection and demodulation of covert electromagnetic signals using solid-state spin sensors. Our approach, named RAPID, is a two-stage hybrid strategy that leverages nitrogen-vacancy (NV) centers to operate below the classical noise floor employing a robust adaptive policy via imitation and distillation. We first formulate the joint detection and estimation task as a unified stochastic optimal control problem, optimizing a composite Bayesian risk objective under realistic physical constraints. The RAPID algorithm solves this by first computing a robust, non-adaptive baseline protocol grounded in the quantum Fisher information matrix (QFIM), and then using this baseline to warm-start an online, adaptive policy learned via deep reinforcement learning (Soft Actor-Critic). This method dynamically optimizes control pulses, interrogation times, and measurement bases to maximize information gain while actively suppressing non-Markovian noise and decoherence. Numerical simulations demonstrate that the protocol achieves a significant sensitivity gain over static methods, maintains high estimation precision in correlated noise environments, and, when applied to sensor arrays, enables coherent quantum beamforming that achieves Heisenberg-like scaling in precision. This work establishes a theoretically rigorous and practically viable pathway for deploying quantum sensors in security-critical applications such as electronic warfare and covert surveillance.

cross Generative Quasi-Continuum Modeling of Confined Fluids at the Nanoscale

Authors: Bugra Yalcin, Ishan Nadkarni, Jinu Jeong, Chenxing Liang, Narayana R. Aluru

Abstract: We present a data-efficient, multiscale framework for predicting the density profiles of confined fluids at the nanoscale. While accurate density estimates require prohibitively long timescales that are inaccessible by ab initio molecular dynamics (AIMD) simulations, machine-learned molecular dynamics (MLMD) offers a scalable alternative, enabling the generation of force predictions at ab initio accuracy with reduced computational cost. However, despite their efficiency, MLMD simulations remain constrained by femtosecond timesteps, which limit their practicality for computing long-time averages needed for accurate density estimation. To address this, we propose a conditional denoising diffusion probabilistic model (DDPM) based quasi-continuum approach that predicts the long-time behavior of force profiles along the confinement direction, conditioned on noisy forces extracted from a limited AIMD dataset. The predicted smooth forces are then linked to continuum theory via the Nernst-Planck equation to reveal the underlying density behavior. We test the framework on water confined between two graphene nanoscale slits and demonstrate that density profiles for channel widths outside of the training domain can be recovered with ab initio accuracy. Compared to AIMD and MLMD simulations, our method achieves orders-of-magnitude speed-up in runtime and requires significantly less training data than prior works.

cross RepViT-CXR: A Channel Replication Strategy for Vision Transformers in Chest X-ray Tuberculosis and Pneumonia Classification

Authors: Faisal Ahmed

Abstract: Chest X-ray (CXR) imaging remains one of the most widely used diagnostic tools for detecting pulmonary diseases such as tuberculosis (TB) and pneumonia. Recent advances in deep learning, particularly Vision Transformers (ViTs), have shown strong potential for automated medical image analysis. However, most ViT architectures are pretrained on natural images and require three-channel inputs, while CXR scans are inherently grayscale. To address this gap, we propose RepViT-CXR, a channel replication strategy that adapts single-channel CXR images into a ViT-compatible format without introducing additional information loss. We evaluate RepViT-CXR on three benchmark datasets. On the TB-CXR dataset,our method achieved an accuracy of 99.9% and an AUC of 99.9%, surpassing prior state-of-the-art methods such as Topo-CXR (99.3% accuracy, 99.8% AUC). For the Pediatric Pneumonia dataset, RepViT-CXR obtained 99.0% accuracy, with 99.2% recall, 99.3% precision, and an AUC of 99.0%, outperforming strong baselines including DCNN and VGG16. On the Shenzhen TB dataset, our approach achieved 91.1% accuracy and an AUC of 91.2%, marking a performance improvement over previously reported CNN-based methods. These results demonstrate that a simple yet effective channel replication strategy allows ViTs to fully leverage their representational power on grayscale medical imaging tasks. RepViT-CXR establishes a new state of the art for TB and pneumonia detection from chest X-rays, showing strong potential for deployment in real-world clinical screening systems.

cross Retrieval-Augmented VLMs for Multimodal Melanoma Diagnosis

Authors: Jihyun Moon, Charmgil Hong

Abstract: Accurate and early diagnosis of malignant melanoma is critical for improving patient outcomes. While convolutional neural networks (CNNs) have shown promise in dermoscopic image analysis, they often neglect clinical metadata and require extensive preprocessing. Vision-language models (VLMs) offer a multimodal alternative but struggle to capture clinical specificity when trained on general-domain data. To address this, we propose a retrieval-augmented VLM framework that incorporates semantically similar patient cases into the diagnostic prompt. Our method enables informed predictions without fine-tuning and significantly improves classification accuracy and error correction over conventional baselines. These results demonstrate that retrieval-augmented prompting provides a robust strategy for clinical decision support.

cross Chordless cycle filtrations for dimensionality detection in complex networks via topological data analysis

Authors: Aina Ferr\`a Marc\'us, Robert Jankowski, Meritxell Vila Mi\~nana, Carles Casacuberta, M. \'Angeles Serrano

Abstract: Many complex networks, ranging from social to biological systems, exhibit structural patterns consistent with an underlying hyperbolic geometry. Revealing the dimensionality of this latent space can disentangle the structural complexity of communities, impact efficient network navigation, and fundamentally shape connectivity and system behavior. We introduce a novel topological data analysis weighting scheme for graphs, based on chordless cycles, aimed at estimating the dimensionality of networks in a data-driven way. We further show that the resulting descriptors can effectively estimate network dimensionality using a neural network architecture trained in a synthetic graph database constructed for this purpose, which does not need retraining to transfer effectively to real-world networks. Thus, by combining cycle-aware filtrations, algebraic topology, and machine learning, our approach provides a robust and effective method for uncovering the hidden geometry of complex networks and guiding accurate modeling and low-dimensional embedding.

cross kNNSampler: Stochastic Imputations for Recovering Missing Value Distributions

Authors: Parastoo Pashmchi, Jerome Benoit, Motonobu Kanagawa

Abstract: We study a missing-value imputation method, termed kNNSampler, that imputes a given unit's missing response by randomly sampling from the observed responses of the $k$ most similar units to the given unit in terms of the observed covariates. This method can sample unknown missing values from their distributions, quantify the uncertainties of missing values, and be readily used for multiple imputation. Unlike popular kNNImputer, which estimates the conditional mean of a missing response given an observed covariate, kNNSampler is theoretically shown to estimate the conditional distribution of a missing response given an observed covariate. Experiments demonstrate its effectiveness in recovering the distribution of missing values. The code for kNNSampler is made publicly available (https://github.com/SAP/knn-sampler).

URLs: https://github.com/SAP/knn-sampler).

cross Co-Investigator AI: The Rise of Agentic AI for Smarter, Trustworthy AML Compliance Narratives

Authors: Prathamesh Vasudeo Naik, Naresh Kumar Dintakurthi, Zhanghao Hu, Yue Wang, Robby Qiu

Abstract: Generating regulatorily compliant Suspicious Activity Report (SAR) remains a high-cost, low-scalability bottleneck in Anti-Money Laundering (AML) workflows. While large language models (LLMs) offer promising fluency, they suffer from factual hallucination, limited crime typology alignment, and poor explainability -- posing unacceptable risks in compliance-critical domains. This paper introduces Co-Investigator AI, an agentic framework optimized to produce Suspicious Activity Reports (SARs) significantly faster and with greater accuracy than traditional methods. Drawing inspiration from recent advances in autonomous agent architectures, such as the AI Co-Scientist, our approach integrates specialized agents for planning, crime type detection, external intelligence gathering, and compliance validation. The system features dynamic memory management, an AI-Privacy Guard layer for sensitive data handling, and a real-time validation agent employing the Agent-as-a-Judge paradigm to ensure continuous narrative quality assurance. Human investigators remain firmly in the loop, empowered to review and refine drafts in a collaborative workflow that blends AI efficiency with domain expertise. We demonstrate the versatility of Co-Investigator AI across a range of complex financial crime scenarios, highlighting its ability to streamline SAR drafting, align narratives with regulatory expectations, and enable compliance teams to focus on higher-order analytical work. This approach marks the beginning of a new era in compliance reporting -- bringing the transformative benefits of AI agents to the core of regulatory processes and paving the way for scalable, reliable, and transparent SAR generation.

cross LLM-Guided Ans\"atze Design for Quantum Circuit Born Machines in Financial Generative Modeling

Authors: Yaswitha Gujju, Romain Harang, Tetsuo Shibuya

Abstract: Quantum generative modeling using quantum circuit Born machines (QCBMs) shows promising potential for practical quantum advantage. However, discovering ans\"atze that are both expressive and hardware-efficient remains a key challenge, particularly on noisy intermediate-scale quantum (NISQ) devices. In this work, we introduce a prompt-based framework that leverages large language models (LLMs) to generate hardware-aware QCBM architectures. Prompts are conditioned on qubit connectivity, gate error rates, and hardware topology, while iterative feedback, including Kullback-Leibler (KL) divergence, circuit depth, and validity, is used to refine the circuits. We evaluate our method on a financial modeling task involving daily changes in Japanese government bond (JGB) interest rates. Our results show that the LLM-generated ans\"atze are significantly shallower and achieve superior generative performance compared to the standard baseline when executed on real IBM quantum hardware using 12 qubits. These findings demonstrate the practical utility of LLM-driven quantum architecture search and highlight a promising path toward robust, deployable generative models for near-term quantum devices.

cross Facet: highly efficient E(3)-equivariant networks for interatomic potentials

Authors: Nicholas Miklaucic, Lai Wei, Rongzhi Dong, Nihang Fu, Sadman Sadeed Omee, Qingyang Li, Sourin Dey, Victor Fung, Jianjun Hu

Abstract: Computational materials discovery is limited by the high cost of first-principles calculations. Machine learning (ML) potentials that predict energies from crystal structures are promising, but existing methods face computational bottlenecks. Steerable graph neural networks (GNNs) encode geometry with spherical harmonics, respecting atomic symmetries -- permutation, rotation, and translation -- for physically realistic predictions. Yet maintaining equivariance is difficult: activation functions must be modified, and each layer must handle multiple data types for different harmonic orders. We present Facet, a GNN architecture for efficient ML potentials, developed through systematic analysis of steerable GNNs. Our innovations include replacing expensive multi-layer perceptrons (MLPs) for interatomic distances with splines, which match performance while cutting computational and memory demands. We also introduce a general-purpose equivariant layer that mixes node information via spherical grid projection followed by standard MLPs -- faster than tensor products and more expressive than linear or gate layers. On the MPTrj dataset, Facet matches leading models with far fewer parameters and under 10% of their training compute. On a crystal relaxation task, it runs twice as fast as MACE models. We further show SevenNet-0's parameters can be reduced by over 25% with no accuracy loss. These techniques enable more than 10x faster training of large-scale foundation models for ML potentials, potentially reshaping computational materials discovery.

cross LD-ViCE: Latent Diffusion Model for Video Counterfactual Explanations

Authors: Payal Varshney, Adriano Lucieri, Christoph Balada, Sheraz Ahmed, Andreas Dengel

Abstract: Video-based AI systems are increasingly adopted in safety-critical domains such as autonomous driving and healthcare. However, interpreting their decisions remains challenging due to the inherent spatiotemporal complexity of video data and the opacity of deep learning models. Existing explanation techniques often suffer from limited temporal coherence, insufficient robustness, and a lack of actionable causal insights. Current counterfactual explanation methods typically do not incorporate guidance from the target model, reducing semantic fidelity and practical utility. We introduce Latent Diffusion for Video Counterfactual Explanations (LD-ViCE), a novel framework designed to explain the behavior of video-based AI models. Compared to previous approaches, LD-ViCE reduces the computational costs of generating explanations by operating in latent space using a state-of-the-art diffusion model, while producing realistic and interpretable counterfactuals through an additional refinement step. Our experiments demonstrate the effectiveness of LD-ViCE across three diverse video datasets, including EchoNet-Dynamic (cardiac ultrasound), FERV39k (facial expression), and Something-Something V2 (action recognition). LD-ViCE outperforms a recent state-of-the-art method, achieving an increase in R2 score of up to 68% while reducing inference time by half. Qualitative analysis confirms that LD-ViCE generates semantically meaningful and temporally coherent explanations, offering valuable insights into the target model behavior. LD-ViCE represents a valuable step toward the trustworthy deployment of AI in safety-critical domains.

cross Spherical Brownian Bridge Diffusion Models for Conditional Cortical Thickness Forecasting

Authors: Ivan Stoyanov, Fabian Bongratz, Christian Wachinger

Abstract: Accurate forecasting of individualized, high-resolution cortical thickness (CTh) trajectories is essential for detecting subtle cortical changes, providing invaluable insights into neurodegenerative processes and facilitating earlier and more precise intervention strategies. However, CTh forecasting is a challenging task due to the intricate non-Euclidean geometry of the cerebral cortex and the need to integrate multi-modal data for subject-specific predictions. To address these challenges, we introduce the Spherical Brownian Bridge Diffusion Model (SBDM). Specifically, we propose a bidirectional conditional Brownian bridge diffusion process to forecast CTh trajectories at the vertex level of registered cortical surfaces. Our technical contribution includes a new denoising model, the conditional spherical U-Net (CoS-UNet), which combines spherical convolutions and dense cross-attention to integrate cortical surfaces and tabular conditions seamlessly. Compared to previous approaches, SBDM achieves significantly reduced prediction errors, as demonstrated by our experiments based on longitudinal datasets from the ADNI and OASIS. Additionally, we demonstrate SBDM's ability to generate individual factual and counterfactual CTh trajectories, offering a novel framework for exploring hypothetical scenarios of cortical development.

cross Behind the Scenes: Mechanistic Interpretability of LoRA-adapted Whisper for Speech Emotion Recognition

Authors: Yujian Ma, Jinqiu Sang, Ruizhe Li

Abstract: Large pre-trained speech models such as Whisper offer strong generalization but pose significant challenges for resource-efficient adaptation. Low-Rank Adaptation (LoRA) has become a popular parameter-efficient fine-tuning method, yet its underlying mechanisms in speech tasks remain poorly understood. In this work, we conduct the first systematic mechanistic interpretability study of LoRA within the Whisper encoder for speech emotion recognition (SER). Using a suite of analytical tools, including layer contribution probing, logit-lens inspection, and representational similarity via singular value decomposition (SVD) and centered kernel alignment (CKA), we reveal two key mechanisms: a delayed specialization process that preserves general features in early layers before consolidating task-specific information, and a forward alignment, backward differentiation dynamic between LoRA's matrices. Our findings clarify how LoRA reshapes encoder hierarchies, providing both empirical insights and a deeper mechanistic understanding for designing efficient and interpretable adaptation strategies in large speech models. Our code is available at https://github.com/harryporry77/Behind-the-Scenes.

URLs: https://github.com/harryporry77/Behind-the-Scenes.

cross Gaussian Process Regression -- Neural Network Hybrid with Optimized Redundant Coordinates

Authors: Sergei Manzhos, Manabu Ihara

Abstract: Recently, a Gaussian Process Regression - neural network (GPRNN) hybrid machine learning method was proposed, which is based on additive-kernel GPR in redundant coordinates constructed by rules [J. Phys. Chem. A 127 (2023) 7823]. The method combined the expressive power of an NN with the robustness of linear regression, in particular, with respect to overfitting when the number of neurons is increased beyond optimal. We introduce opt-GPRNN, in which the redundant coordinates of GPRNN are optimized with a Monte Carlo algorithm and show that when combined with optimization of redundant coordinates, GPRNN attains the lowest test set error with much fewer terms / neurons and retains the advantage of avoiding overfitting when the number of neurons is increased beyond optimal value. The method, opt-GPRNN possesses an expressive power closer to that of a multilayer NN and could obviate the need for deep NNs in some applications. With optimized redundant coordinates, a dimensionality reduction regime is also possible. Examples of application to machine learning an interatomic potential and materials informatics are given.

cross HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants

Authors: Benjamin Sturgeon, Daniel Samuelson, Jacob Haimes, Jacy Reese Anthis

Abstract: As humans delegate more tasks and decisions to artificial intelligence (AI), we risk losing control of our individual and collective futures. Relatively simple algorithmic systems already steer human decision-making, such as social media feed algorithms that lead people to unintentionally and absent-mindedly scroll through engagement-optimized content. In this paper, we develop the idea of human agency by integrating philosophical and scientific theories of agency with AI-assisted evaluation methods: using large language models (LLMs) to simulate and validate user queries and to evaluate AI responses. We develop HumanAgencyBench (HAB), a scalable and adaptive benchmark with six dimensions of human agency based on typical AI use cases. HAB measures the tendency of an AI assistant or agent to Ask Clarifying Questions, Avoid Value Manipulation, Correct Misinformation, Defer Important Decisions, Encourage Learning, and Maintain Social Boundaries. We find low-to-moderate agency support in contemporary LLM-based assistants and substantial variation across system developers and dimensions. For example, while Anthropic LLMs most support human agency overall, they are the least supportive LLMs in terms of Avoid Value Manipulation. Agency support does not appear to consistently result from increasing LLM capabilities or instruction-following behavior (e.g., RLHF), and we encourage a shift towards more robust safety and alignment targets.

cross Agents of Discovery

Authors: Sascha Diefenbacher, Anna Hallin, Gregor Kasieczka, Michael Kr\"amer, Anne Lauscher, Tim Lukas

Abstract: The substantial data volumes encountered in modern particle physics and other domains of fundamental physics research allow (and require) the use of increasingly complex data analysis tools and workflows. While the use of machine learning (ML) tools for data analysis has recently proliferated, these tools are typically special-purpose algorithms that rely, for example, on encoded physics knowledge to reach optimal performance. In this work, we investigate a new and orthogonal direction: Using recent progress in large language models (LLMs) to create a team of agents -- instances of LLMs with specific subtasks -- that jointly solve data analysis-based research problems in a way similar to how a human researcher might: by creating code to operate standard tools and libraries (including ML systems) and by building on results of previous iterations. If successful, such agent-based systems could be deployed to automate routine analysis components to counteract the increasing complexity of modern tool chains. To investigate the capabilities of current-generation commercial LLMs, we consider the task of anomaly detection via the publicly available and highly-studied LHC Olympics dataset. Several current models by OpenAI (GPT-4o, o4-mini, GPT-4.1, and GPT-5) are investigated and their stability tested. Overall, we observe the capacity of the agent-based system to solve this data analysis problem. The best agent-created solutions mirror the performance of human state-of-the-art results.

cross Motion-Based User Identification across XR and Metaverse Applications by Deep Classification and Similarity Learning

Authors: Lukas Schach, Christian Rack, Ryan P. McMahan, Marc Erich Latoschik

Abstract: This paper examines the generalization capacity of two state-of-the-art classification and similarity learning models in reliably identifying users based on their motions in various Extended Reality (XR) applications. We developed a novel dataset containing a wide range of motion data from 49 users in five different XR applications: four XR games with distinct tasks and action patterns, and an additional social XR application with no predefined task sets. The dataset is used to evaluate the performance and, in particular, the generalization capacity of the two models across applications. Our results indicate that while the models can accurately identify individuals within the same application, their ability to identify users across different XR applications remains limited. Overall, our results provide insight into current models generalization capabilities and suitability as biometric methods for user verification and identification. The results also serve as a much-needed risk assessment of hazardous and unwanted user identification in XR and Metaverse applications. Our cross-application XR motion dataset and code are made available to the public to encourage similar research on the generalization of motion-based user identification in typical Metaverse application use cases.

cross PEHRT: A Common Pipeline for Harmonizing Electronic Health Record data for Translational Research

Authors: Jessica Gronsbell, Vidul Ayakulangara Panickan, Chris Lin, Thomas Charlon, Chuan Hong, Doudou Zhou, Linshanshan Wang, Jianhui Gao, Shirley Zhou, Yuan Tian, Yaqi Shi, Ziming Gan, Tianxi Cai

Abstract: Integrative analysis of multi-institutional Electronic Health Record (EHR) data enhances the reliability and generalizability of translational research by leveraging larger, more diverse patient cohorts and incorporating multiple data modalities. However, harmonizing EHR data across institutions poses major challenges due to data heterogeneity, semantic differences, and privacy concerns. To address these challenges, we introduce $\textit{PEHRT}$, a standardized pipeline for efficient EHR data harmonization consisting of two core modules: (1) data pre-processing and (2) representation learning. PEHRT maps EHR data to standard coding systems and uses advanced machine learning to generate research-ready datasets without requiring individual-level data sharing. Our pipeline is also data model agnostic and designed for streamlined execution across institutions based on our extensive real-world experience. We provide a complete suite of open source software, accompanied by a user-friendly tutorial, and demonstrate the utility of PEHRT in a variety of tasks using data from diverse healthcare systems.

cross Implicit Shape-Prior for Few-Shot Assisted 3D Segmentation

Authors: Mathilde Monvoisin, Louise Piecuch, Blanche Texier, C\'edric H\'emon, Ana\"is Barateau, J\'er\'emie Huet, Antoine Nordez, Anne-Sophie Boureau, Jean-Claude Nunes, Diana Mateus

Abstract: The objective of this paper is to significantly reduce the manual workload required from medical professionals in complex 3D segmentation tasks that cannot be yet fully automated. For instance, in radiotherapy planning, organs at risk must be accurately identified in computed tomography (CT) or magnetic resonance imaging (MRI) scans to ensure they are spared from harmful radiation. Similarly, diagnosing age-related degenerative diseases such as sarcopenia, which involve progressive muscle volume loss and strength, is commonly based on muscular mass measurements often obtained from manual segmentation of medical volumes. To alleviate the manual-segmentation burden, this paper introduces an implicit shape prior to segment volumes from sparse slice manual annotations generalized to the multi-organ case, along with a simple framework for automatically selecting the most informative slices to guide and minimize the next interactions. The experimental validation shows the method's effectiveness on two medical use cases: assisted segmentation in the context of at risks organs for brain cancer patients, and acceleration of the creation of a new database with unseen muscle shapes for patients with sarcopenia.

cross MasconCube: Fast and Accurate Gravity Modeling with an Explicit Representation

Authors: Pietro Fanti, Dario Izzo

Abstract: The geodesy of irregularly shaped small bodies presents fundamental challenges for gravitational field modeling, particularly as deep space exploration missions increasingly target asteroids and comets. Traditional approaches suffer from critical limitations: spherical harmonics diverge within the Brillouin sphere where spacecraft typically operate, polyhedral models assume unrealistic homogeneous density distributions, and existing machine learning methods like GeodesyNets and Physics-Informed Neural Networks (PINN-GM) require extensive computational resources and training time. This work introduces MasconCubes, a novel self-supervised learning approach that formulates gravity inversion as a direct optimization problem over a regular 3D grid of point masses (mascons). Unlike implicit neural representations, MasconCubes explicitly model mass distributions while leveraging known asteroid shape information to constrain the solution space. Comprehensive evaluation on diverse asteroid models including Bennu, Eros, Itokawa, and synthetic planetesimals demonstrates that MasconCubes achieve superior performance across multiple metrics. Most notably, MasconCubes demonstrate computational efficiency advantages with training times approximately 40 times faster than GeodesyNets while maintaining physical interpretability through explicit mass distributions. These results establish MasconCubes as a promising approach for mission-critical gravitational modeling applications requiring high accuracy, computational efficiency, and physical insight into internal mass distributions of irregular celestial bodies.

cross A hierarchical entropy method for the delocalization of bias in high-dimensional Langevin Monte Carlo

Authors: Daniel Lacker, Fuzhong Zhou

Abstract: The unadjusted Langevin algorithm is widely used for sampling from complex high-dimensional distributions. It is well known to be biased, with the bias typically scaling linearly with the dimension when measured in squared Wasserstein distance. However, the recent paper of Chen et al. (2024) identifies an intriguing new delocalization effect: For a class of distributions with sparse interactions, the bias between low-dimensional marginals scales only with the lower dimension, not the full dimension. In this work, we strengthen the results of Chen et al. (2024) in the sparse interaction regime by removing a logarithmic factor, measuring distance in relative entropy (a.k.a. KL-divergence), and relaxing the strong log-concavity assumption. In addition, we expand the scope of the delocalization phenomenon by showing that it holds for a class of distributions with weak interactions. Our proofs are based on a hierarchical analysis of the marginal relative entropies, inspired by the authors' recent work on propagation of chaos.

cross Robust Belief-State Policy Learning for Quantum Network Routing Under Decoherence and Time-Varying Conditions

Authors: Amirhossein Taherpour, Abbas Taherpour, Tamer Khattab

Abstract: This paper presents a feature-based Partially Observable Markov Decision Process (POMDP) framework for quantum network routing, combining belief-state planning with Graph Neural Networks (GNNs) to address partial observability, decoherence, and scalability challenges in dynamic quantum systems. Our approach encodes complex quantum network dynamics, including entanglement degradation and time-varying channel noise, into a low-dimensional feature space, enabling efficient belief updates and scalable policy learning. The core of our framework is a hybrid GNN-POMDP architecture that processes graph-structured representations of entangled links to learn routing policies, coupled with a noise-adaptive mechanism that fuses POMDP belief updates with GNN outputs for robust decision making. We provide a theoretical analysis establishing guarantees for belief convergence, policy improvement, and robustness to noise. Experiments on simulated quantum networks with up to 100 nodes demonstrate significant improvements in routing fidelity and entanglement delivery rates compared to state-of-the-art baselines, particularly under high decoherence and nonstationary conditions.

cross Deep Unrolling of Sparsity-Induced RDO for 3D Point Cloud Attribute Coding

Authors: Tam Thuc Do, Philip A. Chou, Gene Cheung

Abstract: Given encoded 3D point cloud geometry available at the decoder, we study the problem of lossy attribute compression in a multi-resolution B-spline projection framework. A target continuous 3D attribute function is first projected onto a sequence of nested subspaces $\mathcal{F}^{(p)}_{l_0} \subseteq \cdots \subseteq \mathcal{F}^{(p)}_{L}$, where $\mathcal{F}^{(p)}_{l}$ is a family of functions spanned by a B-spline basis function of order $p$ at a chosen scale and its integer shifts. The projected low-pass coefficients $F_l^*$ are computed by variable-complexity unrolling of a rate-distortion (RD) optimization algorithm into a feed-forward network, where the rate term is the sparsity-promoting $\ell_1$-norm. Thus, the projection operation is end-to-end differentiable. For a chosen coarse-to-fine predictor, the coefficients are then adjusted to account for the prediction from a lower-resolution to a higher-resolution, which is also optimized in a data-driven manner.

cross TANGO: Traversability-Aware Navigation with Local Metric Control for Topological Goals

Authors: Stefan Podgorski, Sourav Garg, Mehdi Hosseinzadeh, Lachlan Mares, Feras Dayoub, Ian Reid

Abstract: Visual navigation in robotics traditionally relies on globally-consistent 3D maps or learned controllers, which can be computationally expensive and difficult to generalize across diverse environments. In this work, we present a novel RGB-only, object-level topometric navigation pipeline that enables zero-shot, long-horizon robot navigation without requiring 3D maps or pre-trained controllers. Our approach integrates global topological path planning with local metric trajectory control, allowing the robot to navigate towards object-level sub-goals while avoiding obstacles. We address key limitations of previous methods by continuously predicting local trajectory using monocular depth and traversability estimation, and incorporating an auto-switching mechanism that falls back to a baseline controller when necessary. The system operates using foundational models, ensuring open-set applicability without the need for domain-specific fine-tuning. We demonstrate the effectiveness of our method in both simulated environments and real-world tests, highlighting its robustness and deployability. Our approach outperforms existing state-of-the-art methods, offering a more adaptable and effective solution for visual navigation in open-set environments. The source code is made publicly available: https://github.com/podgorki/TANGO.

URLs: https://github.com/podgorki/TANGO.

cross Tokenizing Loops of Antibodies

Authors: Ada Fang, Robert G. Alberstein, Simon Kelow, Fr\'ed\'eric A. Dreyer

Abstract: The complementarity-determining regions of antibodies are loop structures that are key to their interactions with antigens, and of high importance to the design of novel biologics. Since the 1980s, categorizing the diversity of CDR structures into canonical clusters has enabled the identification of key structural motifs of antibodies. However, existing approaches have limited coverage and cannot be readily incorporated into protein foundation models. Here we introduce ImmunoGlobulin LOOp Tokenizer, Igloo, a multimodal antibody loop tokenizer that encodes backbone dihedral angles and sequence. Igloo is trained using a contrastive learning objective to map loops with similar backbone dihedral angles closer together in latent space. Igloo can efficiently retrieve the closest matching loop structures from a structural antibody database, outperforming existing methods on identifying similar H3 loops by 5.9\%. Igloo assigns tokens to all loops, addressing the limited coverage issue of canonical clusters, while retaining the ability to recover canonical loop conformations. To demonstrate the versatility of Igloo tokens, we show that they can be incorporated into protein language models with IglooLM and IglooALM. On predicting binding affinity of heavy chain variants, IglooLM outperforms the base protein language model on 8 out of 10 antibody-antigen targets. Additionally, it is on par with existing state-of-the-art sequence-based and multimodal protein language models, performing comparably to models with $7\times$ more parameters. IglooALM samples antibody loops which are diverse in sequence and more consistent in structure than state-of-the-art antibody inverse folding models. Igloo demonstrates the benefit of introducing multimodal tokens for antibody loops for encoding the diverse landscape of antibody loops, improving protein foundation models, and for antibody CDR design.

cross Explainability of CNN Based Classification Models for Acoustic Signal

Authors: Zubair Faruqui, Mackenzie S. McIntire, Rahul Dubey, Jay McEntee

Abstract: Explainable Artificial Intelligence (XAI) has emerged as a critical tool for interpreting the predictions of complex deep learning models. While XAI has been increasingly applied in various domains within acoustics, its use in bioacoustics, which involves analyzing audio signals from living organisms, remains relatively underexplored. In this paper, we investigate the vocalizations of a bird species with strong geographic variation throughout its range in North America. Audio recordings were converted into spectrogram images and used to train a deep Convolutional Neural Network (CNN) for classification, achieving an accuracy of 94.8\%. To interpret the model's predictions, we applied both model-agnostic (LIME, SHAP) and model-specific (DeepLIFT, Grad-CAM) XAI techniques. These techniques produced different but complementary explanations, and when their explanations were considered together, they provided more complete and interpretable insights into the model's decision-making. This work highlights the importance of using a combination of XAI techniques to improve trust and interoperability, not only in broader acoustics signal analysis but also argues for broader applicability in different domain specific tasks.

cross Decentralized Stochastic Nonconvex Optimization under the Relaxed Smoothness

Authors: Luo Luo, Xue Cui, Tingkai Jia, Cheng Chen

Abstract: This paper studies decentralized optimization problem $f(\mathbf{x})=\frac{1}{m}\sum_{i=1}^m f_i(\mathbf{x})$, where each local function has the form of $f_i(\mathbf{x}) = {\mathbb E}\left[F(\mathbf{x};{\xi}_i)\right]$ which is $(L_0,L_1)$-smooth but possibly nonconvex and the random variable ${\xi}_i$ follows distribution ${\mathcal D}_i$. We propose a novel algorithm called decentralized normalized stochastic gradient descent (DNSGD), which can achieve the $\epsilon$-stationary point on each local agent. We present a new framework for analyzing decentralized first-order methods in the relaxed smooth setting, based on the Lyapunov function related to the product of the gradient norm and the consensus error. The analysis shows upper bounds on sample complexity of ${\mathcal O}(m^{-1}(L_f\sigma^2\Delta_f\epsilon^{-4} + \sigma^2\epsilon^{-2} + L_f^{-2}L_1^3\sigma^2\Delta_f\epsilon^{-1} + L_f^{-2}L_1^2\sigma^2))$ per agent and communication complexity of $\tilde{\mathcal O}((L_f\epsilon^{-2} + L_1\epsilon^{-1})\gamma^{-1/2}\Delta_f)$, where $L_f=L_0 +L_1\zeta$, $\sigma^2$ is the variance of the stochastic gradient, $\Delta_f$ is the initial optimal function value gap, $\gamma$ is the spectral gap of the network, and $\zeta$ is the degree of the gradient dissimilarity. In the special case of $L_1=0$, the above results (nearly) match the lower bounds on decentralized nonconvex optimization in the standard smooth setting. We also conduct numerical experiments to show the empirical superiority of our method.

cross Bregman Douglas-Rachford Splitting Method

Authors: Shiqian Ma, Lin Xiao, Renbo Zhao

Abstract: In this paper, we propose the Bregman Douglas-Rachford splitting (BDRS) method and its variant Bregman Peaceman-Rachford splitting method for solving maximal monotone inclusion problem. We show that BDRS is equivalent to a Bregman alternating direction method of multipliers (ADMM) when applied to the dual of the problem. A special case of the Bregman ADMM is an alternating direction version of the exponential multiplier method. To the best of our knowledge, algorithms proposed in this paper are new to the literature. We also discuss how to use our algorithms to solve the discrete optimal transport (OT) problem. We prove the convergence of the algorithms under certain assumptions, though we point out that one assumption does not apply to the OT problem.

cross Learning Turbulent Flows with Generative Models: Super-resolution, Forecasting, and Sparse Flow Reconstruction

Authors: Vivek Oommen, Siavash Khodakarami, Aniruddha Bora, Zhicheng Wang, George Em Karniadakis

Abstract: Neural operators are promising surrogates for dynamical systems but when trained with standard L2 losses they tend to oversmooth fine-scale turbulent structures. Here, we show that combining operator learning with generative modeling overcomes this limitation. We consider three practical turbulent-flow challenges where conventional neural operators fail: spatio-temporal super-resolution, forecasting, and sparse flow reconstruction. For Schlieren jet super-resolution, an adversarially trained neural operator (adv-NO) reduces the energy-spectrum error by 15x while preserving sharp gradients at neural operator-like inference cost. For 3D homogeneous isotropic turbulence, adv-NO trained on only 160 timesteps from a single trajectory forecasts accurately for five eddy-turnover times and offers 114x wall-clock speed-up at inference than the baseline diffusion-based forecasters, enabling near-real-time rollouts. For reconstructing cylinder wake flows from highly sparse Particle Tracking Velocimetry-like inputs, a conditional generative model infers full 3D velocity and pressure fields with correct phase alignment and statistics. These advances enable accurate reconstruction and forecasting at low compute cost, bringing near-real-time analysis and control within reach in experimental and computational fluid mechanics. See our project page: https://vivekoommen.github.io/Gen4Turb/

URLs: https://vivekoommen.github.io/Gen4Turb/

cross PCGBandit: One-shot acceleration of transient PDE solvers via online-learned preconditioners

Authors: Mikhail Khodak, Min Ki Jung, Brian Wynne, Edmond chow, Egemen Kolemen

Abstract: Data-driven acceleration of scientific computing workflows has been a high-profile aim of machine learning (ML) for science, with numerical simulation of transient partial differential equations (PDEs) being one of the main applications. The focus thus far has been on methods that require classical simulations to train, which when combined with the data-hungriness and optimization challenges of neural networks has caused difficulties in demonstrating a convincing advantage against strong classical baselines. We consider an alternative paradigm in which the learner uses a classical solver's own data to accelerate it, enabling a one-shot speedup of the simulation. Concretely, since transient PDEs often require solving a sequence of related linear systems, the feedback from repeated calls to a linear solver such as preconditioned conjugate gradient (PCG) can be used by a bandit algorithm to online-learn an adaptive sequence of solver configurations (e.g. preconditioners). The method we develop, PCGBandit, is implemented directly on top of the popular open source software OpenFOAM, which we use to show its effectiveness on a set of fluid and magnetohydrodynamics (MHD) problems.

cross QCardEst/QCardCorr: Quantum Cardinality Estimation and Correction

Authors: Tobias Winker, Jinghua Groppe, Sven Groppe

Abstract: Cardinality estimation is an important part of query optimization in DBMS. We develop a Quantum Cardinality Estimation (QCardEst) approach using Quantum Machine Learning with a Hybrid Quantum-Classical Network. We define a compact encoding for turning SQL queries into a quantum state, which requires only qubits equal to the number of tables in the query. This allows the processing of a complete query with a single variational quantum circuit (VQC) on current hardware. In addition, we compare multiple classical post-processing layers to turn the probability vector output of VQC into a cardinality value. We introduce Quantum Cardinality Correction QCardCorr, which improves classical cardinality estimators by multiplying the output with a factor generated by a VQC to improve the cardinality estimation. With QCardCorr, we have an improvement over the standard PostgreSQL optimizer of 6.37 times for JOB-light and 8.66 times for STATS. For JOB-light we even outperform MSCN by a factor of 3.47.

cross Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation

Authors: Joachim Baumann, Paul R\"ottger, Aleksandra Urman, Albert Wendsj\"o, Flor Miriam Plaza-del-Arco, Johannes B. Gruber, Dirk Hovy

Abstract: Large language models (LLMs) are rapidly transforming social science research by enabling the automation of labor-intensive tasks like data annotation and text analysis. However, LLM outputs vary significantly depending on the implementation choices made by researchers (e.g., model selection, prompting strategy, or temperature settings). Such variation can introduce systematic biases and random errors, which propagate to downstream analyses and cause Type I, Type II, Type S, or Type M errors. We call this LLM hacking. We quantify the risk of LLM hacking by replicating 37 data annotation tasks from 21 published social science research studies with 18 different models. Analyzing 13 million LLM labels, we test 2,361 realistic hypotheses to measure how plausible researcher choices affect statistical conclusions. We find incorrect conclusions based on LLM-annotated data in approximately one in three hypotheses for state-of-the-art models, and in half the hypotheses for small language models. While our findings show that higher task performance and better general model capabilities reduce LLM hacking risk, even highly accurate models do not completely eliminate it. The risk of LLM hacking decreases as effect sizes increase, indicating the need for more rigorous verification of findings near significance thresholds. Our extensive analysis of LLM hacking mitigation techniques emphasizes the importance of human annotations in reducing false positive findings and improving model selection. Surprisingly, common regression estimator correction techniques are largely ineffective in reducing LLM hacking risk, as they heavily trade off Type I vs. Type II errors. Beyond accidental errors, we find that intentional LLM hacking is unacceptably simple. With few LLMs and just a handful of prompt paraphrases, anything can be presented as statistically significant.

cross A Survey of Reinforcement Learning for Large Reasoning Models

Authors: Kaiyan Zhang, Yuxin Zuo, Bingxiang He, Youbang Sun, Runze Liu, Che Jiang, Yuchen Fan, Kai Tian, Guoli Jia, Pengfei Li, Yu Fu, Xingtai Lv, Yuchen Zhang, Sihang Zeng, Shang Qu, Haozhan Li, Shijie Wang, Yuru Wang, Xinwei Long, Fangfu Liu, Xiang Xu, Jiaze Ma, Xuekai Zhu, Ermo Hua, Yihao Liu, Zonglin Li, Huayu Chen, Xiaoye Qu, Yafu Li, Weize Chen, Zhenzhao Yuan, Junqi Gao, Dong Li, Zhiyuan Ma, Ganqu Cui, Zhiyuan Liu, Biqing Qi, Ning Ding, Bowen Zhou

Abstract: In this paper, we survey recent advances in Reinforcement Learning (RL) for reasoning with Large Language Models (LLMs). RL has achieved remarkable success in advancing the frontier of LLM capabilities, particularly in addressing complex logical tasks such as mathematics and coding. As a result, RL has emerged as a foundational methodology for transforming LLMs into LRMs. With the rapid progress of the field, further scaling of RL for LRMs now faces foundational challenges not only in computational resources but also in algorithm design, training data, and infrastructure. To this end, it is timely to revisit the development of this domain, reassess its trajectory, and explore strategies to enhance the scalability of RL toward Artificial SuperIntelligence (ASI). In particular, we examine research applying RL to LLMs and LRMs for reasoning abilities, especially since the release of DeepSeek-R1, including foundational components, core problems, training resources, and downstream applications, to identify future opportunities and directions for this rapidly evolving area. We hope this review will promote future research on RL for broader reasoning models. Github: https://github.com/TsinghuaC3I/Awesome-RL-for-LRMs

URLs: https://github.com/TsinghuaC3I/Awesome-RL-for-LRMs

replace FlexFringe: Modeling Software Behavior by Learning Probabilistic Automata

Authors: Sicco Verwer, Christian Hammerschmidt

Abstract: We present the efficient implementations of probabilistic deterministic finite automaton learning methods available in FlexFringe. These implement well-known strategies for state-merging including several modifications to improve their performance in practice. We show experimentally that these algorithms obtain competitive results and significant improvements over a default implementation. We also demonstrate how to use FlexFringe to learn interpretable models from software logs and use these for anomaly detection. Although less interpretable, we show that learning smaller more convoluted models improves the performance of FlexFringe on anomaly detection, outperforming an existing solution based on neural nets.

replace Joint Optimization of Energy Consumption and Completion Time in Federated Learning

Authors: Xinyu Zhou, Jun Zhao, Huimei Han, Claude Guet

Abstract: Federated Learning (FL) is an intriguing distributed machine learning approach due to its privacy-preserving characteristics. To balance the trade-off between energy and execution latency, and thus accommodate different demands and application scenarios, we formulate an optimization problem to minimize a weighted sum of total energy consumption and completion time through two weight parameters. The optimization variables include bandwidth, transmission power and CPU frequency of each device in the FL system, where all devices are linked to a base station and train a global model collaboratively. Through decomposing the non-convex optimization problem into two subproblems, we devise a resource allocation algorithm to determine the bandwidth allocation, transmission power, and CPU frequency for each participating device. We further present the convergence analysis and computational complexity of the proposed algorithm. Numerical results show that our proposed algorithm not only has better performance at different weight parameters (i.e., different demands) but also outperforms the state of the art.

replace Calibrating Transformers via Sparse Gaussian Processes

Authors: Wenlong Chen, Yingzhen Li

Abstract: Transformer models have achieved profound success in prediction tasks in a wide range of applications in natural language processing, speech recognition and computer vision. Extending Transformer's success to safety-critical domains requires calibrated uncertainty estimation which remains under-explored. To address this, we propose Sparse Gaussian Process attention (SGPA), which performs Bayesian inference directly in the output space of multi-head attention blocks (MHAs) in transformer to calibrate its uncertainty. It replaces the scaled dot-product operation with a valid symmetric kernel and uses sparse Gaussian processes (SGP) techniques to approximate the posterior processes of MHA outputs. Empirically, on a suite of prediction tasks on text, images and graphs, SGPA-based Transformers achieve competitive predictive accuracy, while noticeably improving both in-distribution calibration and out-of-distribution robustness and detection.

replace Adversarial Robustness of Link Sign Prediction in Signed Graphs

Authors: Jialong Zhou, Xing Ai, Yuni Lai, Tomasz Michalak, Gaolei Li, Jianhua Li, Di Tang, Xingxing Zhang, Mengpei Yang, Kai Zhou

Abstract: Signed graphs serve as fundamental data structures for representing positive and negative relationships in social networks, with signed graph neural networks (SGNNs) emerging as the primary tool for their analysis. Our investigation reveals that balance theory, while essential for modeling signed relationships in SGNNs, inadvertently introduces exploitable vulnerabilities to black-box attacks. To showcase this, we propose balance-attack, a novel adversarial strategy specifically designed to compromise graph balance degree, and develop an efficient heuristic algorithm to solve the associated NP-hard optimization problem. While existing approaches attempt to restore attacked graphs through balance learning techniques, they face a critical challenge we term "Irreversibility of Balance-related Information," as restored edges fail to align with original attack targets. To address this limitation, we introduce Balance Augmented-Signed Graph Contrastive Learning (BA-SGCL), an innovative framework that combines contrastive learning with balance augmentation techniques to achieve robust graph representations. By maintaining high balance degree in the latent space, BA-SGCL not only effectively circumvents the irreversibility challenge but also significantly enhances model resilience. Extensive experiments across multiple SGNN architectures and real-world datasets demonstrate both the effectiveness of our proposed balance-attack and the superior robustness of BA-SGCL, advancing the security and reliability of signed graph analysis in social networks. Datasets and codes of the proposed framework are at the github repository https://anonymous.4open.science/r/BA-SGCL-submit-DF41/.

URLs: https://anonymous.4open.science/r/BA-SGCL-submit-DF41/.

replace FedComLoc: Communication-Efficient Distributed Training of Sparse and Quantized Models

Authors: Kai Yi, Georg Meinhardt, Laurent Condat, Peter Richt\'arik

Abstract: Federated Learning (FL) has garnered increasing attention due to its unique characteristic of allowing heterogeneous clients to process their private data locally and interact with a central server, while being respectful of privacy. A critical bottleneck in FL is the communication cost. A pivotal strategy to mitigate this burden is Local Training, which involves running multiple local stochastic gradient descent iterations between communication phases. Our work is inspired by the innovative Scaffnew algorithm, which has considerably advanced the reduction of communication complexity in FL. We introduce FedComLoc (Federated Compressed and Local Training), integrating practical and effective compression into Scaffnew to further enhance communication efficiency. Extensive experiments, using the popular TopK compressor and quantization, demonstrate its prowess in substantially reducing communication overheads in heterogeneous settings.

replace A Transformer approach for Electricity Price Forecasting

Authors: Oscar Llorente, Jose Portela

Abstract: This paper presents a novel approach to electricity price forecasting (EPF) using a pure Transformer model. As opposed to other alternatives, no other recurrent network is used in combination to the attention mechanism. Hence, showing that the attention layer is enough for capturing the temporal patterns. The paper also provides fair comparison of the models using the open-source EPF toolbox and provide the code to enhance reproducibility and transparency in EPF research. The results show that the Transformer model outperforms traditional methods, offering a promising solution for reliable and sustainable power system operation.

replace The Quest for the Right Mediator: Surveying Mechanistic Interpretability Through the Lens of Causal Mediation Analysis

Authors: Aaron Mueller, Jannik Brinkmann, Millicent Li, Samuel Marks, Koyena Pal, Nikhil Prakash, Can Rager, Aruna Sankaranarayanan, Arnab Sen Sharma, Jiuding Sun, Eric Todd, David Bau, Yonatan Belinkov

Abstract: Interpretability provides a toolset for understanding how and why language models behave in certain ways. However, there is little unity in the field: most studies employ ad-hoc evaluations and do not share theoretical foundations, making it difficult to measure progress and compare the pros and cons of different techniques. Furthermore, while mechanistic understanding is frequently discussed, the basic causal units underlying these mechanisms are often not explicitly defined. In this article, we propose a perspective on interpretability research grounded in causal mediation analysis. Specifically, we describe the history and current state of interpretability taxonomized according to the types of causal units (mediators) employed, as well as methods used to search over mediators. We discuss the pros and cons of each mediator, providing insights as to when particular kinds of mediators and search methods are most appropriate. We argue that this framing yields a more cohesive narrative of the field and helps researchers select appropriate methods based on their research objective. Our analysis yields actionable recommendations for future work, including the discovery of new mediators and the development of standardized evaluations tailored to these goals.

replace Generative Example-Based Explanations: Bridging the Gap between Generative Modeling and Explainability

Authors: Philipp Vaeth, Alexander M. Fruehwald, Benjamin Paassen, Magda Gregorova

Abstract: Recently, several methods have leveraged deep generative modeling to produce example-based explanations of image classifiers. Despite producing visually stunning results, these methods are largely disconnected from classical explainability literature. This conceptual and communication gap leads to misunderstandings and misalignments in goals and expectations. In this paper, we bridge this gap by proposing a probabilistic framework for example-based explanations, formally defining the example-based explanations in a probabilistic manner amenable for modeling via deep generative models while coherent with the critical characteristics and desiderata widely accepted in the explainability community. Our aim is on one hand to provide a constructive framework for the development of well-grounded generative algorithms for example-based explanations and, on the other, to facilitate communication between the generative and explainability research communities, foster rigor and transparency, and improve the quality of peer discussion and research progress in this promising direction.

replace Symbolic regression via MDLformer-guided search: from minimizing prediction error to minimizing description length

Authors: Zihan Yu, Jingtao Ding, Yong Li, Depeng Jin

Abstract: Symbolic regression, a task discovering the formula best fitting the given data, is typically based on the heuristical search. These methods usually update candidate formulas to obtain new ones with lower prediction errors iteratively. However, since formulas with similar function shapes may have completely different symbolic forms, the prediction error does not decrease monotonously as the search approaches the target formula, causing the low recovery rate of existing methods. To solve this problem, we propose a novel search objective based on the minimum description length, which reflects the distance from the target and decreases monotonically as the search approaches the correct form of the target formula. To estimate the minimum description length of any input data, we design a neural network, MDLformer, which enables robust and scalable estimation through large-scale training. With the MDLformer's output as the search objective, we implement a symbolic regression method, SR4MDL, that can effectively recover the correct mathematical form of the formula. Extensive experiments illustrate its excellent performance in recovering formulas from data. Our method successfully recovers around 50 formulas across two benchmark datasets comprising 133 problems, outperforming state-of-the-art methods by 43.92%. Experiments on 122 unseen black-box problems further demonstrate its generalization performance. We release our code at https://github.com/tsinghua-fib-lab/SR4MDL .

URLs: https://github.com/tsinghua-fib-lab/SR4MDL

replace FAMES: Fast Approximate Multiplier Substitution for Mixed-Precision Quantized DNNs--Down to 2 Bits!

Authors: Yi Ren, Ruge Xu, Xinfei Guo, Weikang Qian

Abstract: A widely-used technique in designing energy-efficient deep neural network (DNN) accelerators is quantization. Recent progress in this direction has reduced the bitwidths used in DNN down to 2. Meanwhile, many prior works apply approximate multipliers (AppMuls) in designing DNN accelerators to lower their energy consumption. Unfortunately, these works still assume a bitwidth much larger than 2, which falls far behind the state-of-the-art in quantization area and even challenges the meaningfulness of applying AppMuls in DNN accelerators, since a high-bitwidth AppMul consumes much more energy than a low-bitwidth exact multiplier! Thus, an important problem to study is: Can approximate multipliers be effectively applied to quantized DNN models with very low bitwidths? In this work, we give an affirmative answer to this question and present a systematic solution that achieves the answer: FAMES, a fast approximate multiplier substitution method for mixed-precision DNNs. Our experiments demonstrate an average 28.67% energy reduction on state-of-the-art mixed-precision quantized models with bitwidths as low as 2 bits and accuracy losses kept under 1%. Additionally, our approach is up to 300x faster than previous genetic algorithm-based methods.

replace Differentially Private Random Feature Model

Authors: Chunyang Liao, Deanna Needell, Hayden Schaeffer, Alexander Xue

Abstract: Designing privacy-preserving machine learning algorithms has received great attention in recent years, especially in the setting when the data contains sensitive information. Differential privacy (DP) is a widely used mechanism for data analysis with privacy guarantees. In this paper, we produce a differentially private random feature model. Random features, which were proposed to approximate large-scale kernel machines, have been used to study privacy-preserving kernel machines as well. We consider the over-parametrized regime (more features than samples) where the non-private random feature model is learned via solving the min-norm interpolation problem, and then we apply output perturbation techniques to produce a private model. We show that our method preserves privacy and derive a generalization error bound for the method. To the best of our knowledge, we are the first to consider privacy-preserving random feature models in the over-parametrized regime and provide theoretical guarantees. We empirically compare our method with other privacy-preserving learning methods in the literature as well. Our results show that our approach is superior to the other methods in terms of generalization performance on synthetic data and benchmark data sets. Additionally, it was recently observed that DP mechanisms may exhibit and exacerbate disparate impact, which means that the outcomes of DP learning algorithms vary significantly among different groups. We show that both theoretically and empirically, random features have the potential to reduce disparate impact, and hence achieve better fairness.

replace HopCast: Calibration of Autoregressive Dynamics Models

Authors: Muhammad Bilal Shahid, Cody Fleming

Abstract: Deep learning models are often trained to approximate dynamical systems that can be modeled using differential equations. Many of these models are optimized to predict one step ahead; such approaches produce calibrated one-step predictions if the predictive model can quantify uncertainty, such as Deep Ensembles. At inference time, multi-step predictions are generated via autoregression, which needs a sound uncertainty propagation method to produce calibrated multi-step predictions. This work introduces an alternative Predictor-Corrector approach named \hop{} that uses Modern Hopfield Networks (MHN) to learn the errors of a deterministic Predictor that approximates the dynamical system. The Corrector predicts a set of errors for the Predictor's output based on a context state at any timestep during autoregression. The set of errors creates sharper and well-calibrated prediction intervals with higher predictive accuracy compared to baselines without uncertainty propagation. The calibration and prediction performances are evaluated across a set of dynamical systems. This work is also the first to benchmark existing uncertainty propagation methods based on calibration errors.

replace MDDM: A Molecular Dynamics Diffusion Model to Predict Particle Self-Assembly

Authors: Kevin Ferguson, Yu-hsuan Chen, Levent Burak Kara

Abstract: The discovery and study of new material systems rely on molecular simulations that often come with significant computational expense. We propose MDDM, a Molecular Dynamics Diffusion Model, which is capable of predicting a valid output conformation for a given input pair potential function. After training MDDM on a large dataset of molecular dynamics self-assembly results, the proposed model can convert uniform noise into a meaningful output particle structure corresponding to an arbitrary input potential. The model's architecture has domain-specific properties built-in, such as satisfying periodic boundaries and being invariant to translation. The model significantly outperforms the baseline point-cloud diffusion model for both unconditional and conditional generation tasks.

replace Investigating Compositional Reasoning in Time Series Foundation Models

Authors: Willa Potosnak, Cristian Challu, Mononito Goswami, Kin G. Olivares, Micha{\l} Wili\'nski, Nina \.Zukowska, Artur Dubrawski

Abstract: Large pre-trained time series foundation models (TSFMs) have demonstrated promising zero-shot performance across a wide range of domains. However, a question remains: Do TSFMs succeed by memorizing patterns in training data, or do they possess the ability to reason about such patterns? While reasoning is a topic of great interest in the study of Large Language Models (LLMs), it is undefined and largely unexplored in the context of TSFMs. In this work, inspired by language modeling literature, we formally define compositional reasoning in forecasting and distinguish it from in-distribution generalization. We evaluate the reasoning and generalization capabilities of 16 popular deep learning forecasting models on multiple synthetic and real-world datasets. Additionally, through controlled studies, we systematically examine which design choices in 7 popular open-source TSFMs contribute to improved reasoning capabilities. Our study yields key insights into the impact of TSFM architecture design on compositional reasoning and generalization. We find that patch-based Transformers have the best reasoning performance, closely followed by residualized MLP-based architectures, which are 97\% less computationally complex in terms of FLOPs and 86\% smaller in terms of the number of trainable parameters. Interestingly, in some zero-shot out-of-distribution scenarios, these models can outperform moving average and exponential smoothing statistical baselines trained on in-distribution data. Only a few design choices, such as the tokenization method, had a significant (negative) impact on Transformer model performance.

replace A general language model for peptide identification

Authors: Jixiu Zhai, Zikun Wang, Tianchi Lu, Haitian Zhong, Ziyang Xu, Yuhuan Liu, Shengrui Xu, Jingwan Wang, Dan Huang

Abstract: Accurate identification of bioactive peptides (BPs) and protein post-translational modifications (PTMs) is essential for understanding protein function and advancing therapeutic discovery. However, most computational methods remain limited in their generalizability across diverse peptide functions. Here, we present PDeepPP, a unified deep learning framework that integrates pretrained protein language models with a hybrid transformer-convolutional architecture, enabling robust identification across diverse peptide classes and PTM sites. We curated comprehensive benchmark datasets and implemented strategies to address data imbalance, allowing PDeepPP to systematically extract both global and local sequence features. Through extensive analyses-including dimensionality reduction and comparison studies-PDeepPP demonstrates strong, interpretable peptide representations and achieves state-of-the-art performance in 25 of the 33 biological identification tasks. Notably, PDeepPP attains high accuracy in antimicrobial (0.9726) and phosphorylation site (0.9984) identification, with 99.5% specificity in glycosylation site prediction and substantial reduction in false negatives in antimalarial tasks. By enabling large-scale, accurate peptide analysis, PDeepPP supports biomedical research and the discovery of novel therapeutic targets for disease treatment. All code, datasets, and pretrained models are publicly available via GitHub:https://github.com/fondress/PDeepPP and Hugging Face:https://huggingface.co/fondress/PDeppPP.

URLs: https://github.com/fondress/PDeepPP, https://huggingface.co/fondress/PDeppPP.

replace Cauchy Random Features for Operator Learning in Sobolev Space

Authors: Chunyang Liao, Deanna Needell, Hayden Schaeffer

Abstract: Operator learning is the approximation of operators between infinite dimensional Banach spaces using machine learning approaches. While most progress in this area has been driven by variants of deep neural networks such as the Deep Operator Network and Fourier Neural Operator, the theoretical guarantees are often in the form of a universal approximation property. However, the existence theorems do not guarantee that an accurate operator network is obtainable in practice. Motivated by the recent kernel-based operator learning framework, we propose a random feature operator learning method with theoretical guarantees and error bounds. The random feature method can be viewed as a randomized approximation of a kernel method, which significantly reduces the computation requirements for training. We provide a generalization error analysis for our proposed random feature operator learning method along with comprehensive numerical results. Compared to kernel-based method and neural network methods, the proposed method can obtain similar or better test errors across benchmarks examples with significantly reduced training times. An additional advantages it that our implementation is simple and does require costly computational resources, such as GPU.

replace Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training

Authors: Vaibhav Singh, Paul Janson, Paria Mehrbod, Adam Ibrahim, Irina Rish, Eugene Belilovsky, Benjamin Th\'erien

Abstract: The ever-growing availability of unlabeled data presents both opportunities and challenges for training artificial intelligence systems. While self-supervised learning (SSL) has emerged as a powerful paradigm for extracting meaningful representations from vast amounts of unlabeled data, existing methods still struggle to adapt to the non-stationary, non-IID nature of real-world data streams without forgetting previously learned knowledge. Recent works have adopted a repeated cosine annealing schedule for large-scale continual pre-training; however, these schedules (1) inherently cause forgetting during the re-warming phase and (2) have not been systematically compared to existing continual SSL methods. In this work, we systematically compare the widely used cosine schedule with the recently proposed infinite learning rate schedule and empirically find the latter to be a more effective alternative. Our extensive empirical evaluation across diverse image and language datasets demonstrates that the infinite learning rate schedule consistently enhances continual pre-training performance compared to a repeated cosine decay without being restricted to a fixed iteration budget. For instance, in a small-scale MAE pre-training setup, it outperforms several strong baselines from the literature. We then scale up our experiments to larger MAE pre-training and autoregressive language model pre-training. Our results show that the infinite learning rate schedule remains effective at scale, surpassing repeated cosine decay for both MAE pre-training and zero-shot LM benchmarks.

replace To See a World in a Spark of Neuron: Disentangling Multi-task Interference for Training-free Model Merging

Authors: Zitao Fang, Guodong DU, Shuyang Yu, Yifei Guo, Yiwei Zhang, Yiyao Cao, Jing Li, Ho-Kin Tang, Sim Kuan Goh

Abstract: Fine-tuning pre-trained models on targeted datasets enhances task-specific performance but often comes at the expense of generalization. Model merging techniques, which integrate multiple fine-tuned models into a single multi-task model through task arithmetic, offer a promising solution. However, task interference remains a fundamental challenge, leading to performance degradation and suboptimal merged models. Existing approaches largely overlooked the fundamental roles of neurons, their connectivity, and activation, resulting in a merging process and a merged model that does not consider how neurons relay and process information. In this work, we present the first study that relies on neuronal mechanisms for model merging. Specifically, we decomposed task-specific representations into two complementary neuronal subspaces that regulate input sensitivity and task adaptability. Leveraging this decomposition, we introduced NeuroMerging, a novel merging framework developed to mitigate task interference within neuronal subspaces, enabling training-free model fusion across diverse tasks. Through extensive experiments, we demonstrated that NeuroMerging achieved superior performance compared to existing methods on multi-task benchmarks across both natural language and vision domains. Our findings highlighted the importance of aligning neuronal mechanisms in model merging, offering new insights into mitigating task interference and improving knowledge fusion. Our project is available at https://ZzzitaoFang.github.io/projects/NeuroMerging/.

URLs: https://ZzzitaoFang.github.io/projects/NeuroMerging/.

replace VIPER: Visual Perception and Explainable Reasoning for Sequential Decision-Making

Authors: Mohamed Salim Aissi, Clemence Grislain, Mohamed Chetouani, Olivier Sigaud, Laure Soulier, Nicolas Thome

Abstract: While Large Language Models (LLMs) excel at reasoning on text and Vision-Language Models (VLMs) are highly effective for visual perception, applying those models for visual instruction-based planning remains a widely open problem. In this paper, we introduce VIPER, a novel framework for multimodal instruction-based planning that integrates VLM-based perception with LLM-based reasoning. Our approach uses a modular pipeline where a frozen VLM generates textual descriptions of image observations, which are then processed by an LLM policy to predict actions based on the task goal. We fine-tune the reasoning module using behavioral cloning and reinforcement learning, improving our agent's decision-making capabilities. Experiments on the ALFWorld benchmark show that VIPER significantly outperforms state-of-the-art visual instruction-based planners while narrowing the gap with purely text-based oracles. By leveraging text as an intermediate representation, VIPER also enhances explainability, paving the way for a fine-grained analysis of perception and reasoning components.

replace Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data?

Authors: Grgur Kova\v{c}, J\'er\'emy Perez, R\'emy Portelas, Peter Ford Dominey, Pierre-Yves Oudeyer

Abstract: Large language models (LLMs) are increasingly used in the creation of online content, creating feedback loops as subsequent generations of models will be trained on this synthetic data. Such loops were shown to lead to distribution shifts - models misrepresenting the true underlying distributions of human data (also called model collapse). However, how human data properties affect such shifts remains poorly understood. In this paper, we provide the first empirical examination of the effect of such properties on the outcome of recursive training. We first confirm that using different human datasets leads to distribution shifts of different magnitudes. Through exhaustive manipulation of dataset properties combined with regression analyses, we then identify a set of properties predicting distribution shift magnitudes. Lexical diversity is found to amplify these shifts, while semantic diversity and data quality mitigate them. Furthermore, we find that these influences are highly modular: data scrapped from a given internet domain has little influence on the content generated for another domain. Finally, experiments on political bias reveal that human data properties affect whether the initial bias will be amplified or reduced. Overall, our results portray a novel view, where different parts of internet may undergo different types of distribution shift.

replace Task-based Loss Functions in Computer Vision: A Comprehensive Review

Authors: Omar Elharrouss, Yasir Mahmood, Yassine Bechqito, Mohamed Adel Serhani, Elarbi Badidi, Jamal Riffi, Hamid Tairi

Abstract: Loss functions are at the heart of deep learning, shaping how models learn and perform across diverse tasks. They are used to quantify the difference between predicted outputs and ground truth labels, guiding the optimization process to minimize errors. Selecting the right loss function is critical, as it directly impacts model convergence, generalization, and overall performance across various applications, from computer vision to time series forecasting. This paper presents a comprehensive review of loss functions, covering fundamental metrics like Mean Squared Error and Cross-Entropy to advanced functions such as Adversarial and Diffusion losses. We explore their mathematical foundations, impact on model training, and strategic selection for various applications, including computer vision (Discriminative and generative), tabular data prediction, and time series forecasting. For each of these categories, we discuss the most used loss functions in the recent advancements of deep learning techniques. Also, this review explore the historical evolution, computational efficiency, and ongoing challenges in loss function design, underlining the need for more adaptive and robust solutions. Emphasis is placed on complex scenarios involving multi-modal data, class imbalances, and real-world constraints. Finally, we identify key future directions, advocating for loss functions that enhance interpretability, scalability, and generalization, leading to more effective and resilient deep learning models.

replace Traversal Learning: A Lossless And Efficient Distributed Learning Framework

Authors: Erdenebileg Batbaatar, Jeonggeol Kim, Yongcheol Kim, Young Yoon

Abstract: In this paper, we introduce Traversal Learning (TL), a novel approach designed to address the problem of decreased quality encountered in popular distributed learning (DL) paradigms such as Federated Learning (FL), Split Learning (SL), and SplitFed Learning (SFL). Traditional FL experiences from an accuracy drop during aggregation due to its averaging function, while SL and SFL face increased loss due to the independent gradient updates on each split network. TL adopts a unique strategy where the model traverses the nodes during forward propagation (FP) and performs backward propagation (BP) on the orchestrator, effectively implementing centralized learning (CL) principles within a distributed environment. The orchestrator is tasked with generating virtual batches and planning the sequential node visits of the model during FP, aligning them with the ordered index of the data within these batches. We conducted experiments on six datasets representing diverse characteristics across various domains. Our evaluation demonstrates that TL is on par with classic CL approaches in terms of accurate inference, thereby offering a viable and robust solution for DL tasks. TL outperformed other DL methods and improved accuracy by 7.85% for independent and identically distributed (IID) datasets, macro F1-score by 1.06% for non-IID datasets, accuracy by 2.60% for text classification, and AUC by 3.88% and 4.54% for medical and financial datasets, respectively. By effectively preserving data privacy while maintaining performance, TL represents a significant advancement in DL methodologies. The implementation of TL is available at https://github.com/neouly-inc/Traversal-Learning

URLs: https://github.com/neouly-inc/Traversal-Learning

replace Training Deep Morphological Neural Networks as Universal Approximators

Authors: Konstantinos Fotopoulos, Petros Maragos

Abstract: We investigate deep morphological neural networks (DMNNs). We demonstrate that despite their inherent non-linearity, "linear" activations are essential for DMNNs. To preserve their inherent sparsity, we propose architectures that constraint the parameters of the "linear" activations: For the first (resp. second) architecture, we work under the constraint that the majority of parameters (resp. learnable parameters) should be part of morphological operations. We improve the generalization ability of our networks via residual connections and weight dropout. Our proposed networks can be successfully trained, and are more prunable than linear networks. To the best of our knowledge, we are the first to successfully train DMNNs under such constraints. Finally, we propose a hybrid network architecture combining linear and morphological layers, showing empirically that the inclusion of morphological layers significantly accelerates the convergence of gradient descent with large batches.

replace Reasoning Large Language Model Errors Arise from Hallucinating Critical Problem Features

Authors: Alex Heyman, Joel Zylberberg

Abstract: Large language models have recently made great strides in reasoning task performance through chain-of-thought (CoT) strategies trained via reinforcement learning; however, these "reasoning large language models" (RLLMs) remain imperfect reasoners, and understanding the frequencies and causes of their failure modes is important for both users and developers. We test o1-mini, o3-mini, DeepSeek-R1, Claude 3.7 Sonnet, Gemini 2.5 Pro Preview, and Grok 3 Mini Beta on graph coloring as a variable-complexity constraint-satisfaction logic problem, and find evidence from both error rate comparisons and CoT/explanation text analysis that RLLMs are prone to hallucinate graph edges not specified in the prompt. This phenomenon persists across multiple problem complexity levels and semantic frames, and it appears to account for a significant fraction of the incorrect answers from every tested model, and the vast majority of them for some models. We also validate the generalizability of this input-conflicting hallucination phenomenon with smaller-scale experiments on a type of stable matching problem. Our results indicate that RLLMs may possess broader issues with misrepresentation of problem specifics, and we offer suggestions for design choices to mitigate this weakness.

replace HOFT: Householder Orthogonal Fine-tuning

Authors: Alejandro Moreno Arcas, Albert Sanchis, Jorge Civera, Alfons Juan

Abstract: Adaptation of foundation models using low-rank methods is a widespread approach. Another way to adapt these models is to employ orthogonal fine-tuning methods, which are less time and memory efficient despite their good generalization properties. In this work, we propose Householder Orthogonal Fine-tuning (HOFT), a novel orthogonal fine-tuning method that aims to alleviate time and space complexity. Moreover, some theoretical properties of the orthogonal fine-tuning paradigm are explored. From this exploration, Scaled Householder Orthogonal Fine-tuning (SHOFT) is proposed. Both HOFT and SHOFT are evaluated in downstream tasks, namely commonsense reasoning, machine translation, subject-driven generation and mathematical reasoning. Compared with state-of-the-art adaptation methods, HOFT and SHOFT show comparable or better results.

replace Learning Fluid-Structure Interaction Dynamics with Physics-Informed Neural Networks and Immersed Boundary Methods

Authors: Afrah Farea, Saiful Khan, Reza Daryani, Emre Cenk Ersan, Mustafa Serdar Celebi

Abstract: Physics-informed neural networks (PINNs) have emerged as a promising approach for solving complex fluid dynamics problems, yet their application to fluid-structure interaction (FSI) problems with moving boundaries remains largely unexplored. This work addresses the critical challenge of modeling FSI systems with deformable interfaces, where traditional unified PINN architectures struggle to capture the distinct physics governing fluid and structural domains simultaneously. We present an innovative Eulerian-Lagrangian PINN architecture that integrates immersed boundary method (IBM) principles to solve FSI problems with moving boundary conditions. Our approach fundamentally departs from conventional unified architectures by introducing domain-specific neural networks: an Eulerian network for fluid dynamics and a Lagrangian network for structural interfaces, coupled through physics-based constraints. Additionally, we incorporate learnable B-spline activation functions with SiLU to capture both localized high-gradient features near interfaces and global flow patterns. Empirical studies on a 2D cavity flow problem involving a moving solid structure show that while baseline unified PINNs achieve reasonable velocity predictions, they suffer from substantial pressure errors (12.9%) in structural regions. Our Eulerian-Lagrangian architecture with learnable activations (EL-L) achieves better performance across all metrics, improving accuracy by 24.1-91.4% and particularly reducing pressure errors from 12.9% to 2.39%. These results demonstrate that domain decomposition aligned with physical principles, combined with locality-aware activation functions, is essential for accurate FSI modeling within the PINN framework.

replace A Certified Unlearning Approach without Access to Source Data

Authors: Umit Yigit Basaran, Sk Miraj Ahmed, Amit Roy-Chowdhury, Basak Guler

Abstract: With the growing adoption of data privacy regulations, the ability to erase private or copyrighted information from trained models has become a crucial requirement. Traditional unlearning methods often assume access to the complete training dataset, which is unrealistic in scenarios where the source data is no longer available. To address this challenge, we propose a certified unlearning framework that enables effective data removal \final{without access to the original training data samples}. Our approach utilizes a surrogate dataset that approximates the statistical properties of the source data, allowing for controlled noise scaling based on the statistical distance between the two. \updated{While our theoretical guarantees assume knowledge of the exact statistical distance, practical implementations typically approximate this distance, resulting in potentially weaker but still meaningful privacy guarantees.} This ensures strong guarantees on the model's behavior post-unlearning while maintaining its overall utility. We establish theoretical bounds, introduce practical noise calibration techniques, and validate our method through extensive experiments on both synthetic and real-world datasets. The results demonstrate the effectiveness and reliability of our approach in privacy-sensitive settings.

replace Rescaled Influence Functions: Accurate Data Attribution in High Dimension

Authors: Ittai Rubinstein, Samuel B. Hopkins

Abstract: How does the training data affect a model's behavior? This is the question we seek to answer with data attribution. The leading practical approaches to data attribution are based on influence functions (IF). IFs utilize a first-order Taylor approximation to efficiently predict the effect of removing a set of samples from the training set without retraining the model, and are used in a wide variety of machine learning applications. However, especially in the high-dimensional regime (# params $\geq \Omega($# samples$)$), they are often imprecise and tend to underestimate the effect of sample removals, even for simple models such as logistic regression. We present rescaled influence functions (RIF), a new tool for data attribution which can be used as a drop-in replacement for influence functions, with little computational overhead but significant improvement in accuracy. We compare IF and RIF on a range of real-world datasets, showing that RIFs offer significantly better predictions in practice, and present a theoretical analysis explaining this improvement. Finally, we present a simple class of data poisoning attacks that would fool IF-based detections but would be detected by RIF.

replace Discrete Diffusion in Large Language and Multimodal Models: A Survey

Authors: Runpeng Yu, Qi Li, Xinchao Wang

Abstract: In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel decoding paradigm using full attention and a denoising-based generation strategy. This paradigm naturally enables parallel generation, fine-grained output control, and dynamic perception. These capabilities are previously difficult to achieve with AR models. A growing number of industrial-scale proprietary d(M)LLMs, as well as a large number of open-source academic d(M)LLMs, have demonstrated performance comparable to their autoregressive counterparts, while achieving up to \textit{10$\times$} acceleration in inference speed. These developments position discrete diffusion models as a promising alternative to intelligence based on the traditional autoregressive approach. In this work, we present a comprehensive overview of the research in the dLLM and dMLLM domains. We trace the historical development of dLLMs and dMLLMs, formalize the underlying mathematical frameworks, and categorize representative models. We further analyze key techniques for training and inference, and summarize emerging applications across language, vision-language, and biological domains and \textit{etc.}. We conclude by discussing future directions for research and deployment. Relative papers are collected in https://github.com/LiQiiiii/Awesome-Discrete-Diffusion-LLM_MLLM

URLs: https://github.com/LiQiiiii/Awesome-Discrete-Diffusion-LLM_MLLM

replace A Nonlinear Low-rank Representation Model with Convolutional Neural Network for Imputing Water Quality Data

Authors: Xin Liao, Bing Yang, Cai Yu

Abstract: The integrity of Water Quality Data (WQD) is critical in environmental monitoring for scientific decision-making and ecological protection. However, water quality monitoring systems are often challenged by large amounts of missing data due to unavoidable problems such as sensor failures and communication delays, which further lead to water quality data becoming High-Dimensional and Sparse (HDS). Traditional data imputation methods are difficult to depict the potential dynamics and fail to capture the deep data features, resulting in unsatisfactory imputation performance. To effectively address the above issues, this paper proposes a Nonlinear Low-rank Representation model (NLR) with Convolutional Neural Networks (CNN) for imputing missing WQD, which utilizes CNNs to implement two ideas: a) fusing temporal features to model the temporal dependence of data between time slots, and b) Extracting nonlinear interactions and local patterns to mine higher-order relationships features and achieve deep fusion of multidimensional information. Experimental studies on three real water quality datasets demonstrate that the proposed model significantly outperforms existing state-of-the-art data imputation models in terms of estimation accuracy. It provides an effective approach for handling water quality monitoring data in complex dynamic environments.

replace Comprehensive Evaluation of Prototype Neural Networks

Authors: Philipp Schlinge, Steffen Meinert, Martin Atzmueller

Abstract: Prototype models are an important method for explainable artificial intelligence (XAI) and interpretable machine learning. In this paper, we perform an in-depth analysis of a set of prominent prototype models including ProtoPNet, ProtoPool and PIPNet. For their assessment, we apply a comprehensive set of metrics. In addition to applying standard metrics from literature, we propose several new metrics to further complement the analysis of model interpretability. In our experimentation, we apply the set of prototype models on a diverse set of datasets including fine-grained classification, Non-IID settings and multi-label classification to further contrast the performance. Furthermore, we also provide our code as an open-source library (https://github.com/uos-sis/quanproto), which facilitates simple application of the metrics itself, as well as extensibility -- providing the option for easily adding new metrics and models.

URLs: https://github.com/uos-sis/quanproto),

replace How Should We Meta-Learn Reinforcement Learning Algorithms?

Authors: Alexander David Goldie, Zilin Wang, Jaron Cohen, Jakob Nicolaus Foerster, Shimon Whiteson

Abstract: The process of meta-learning algorithms from data, instead of relying on manual design, is growing in popularity as a paradigm for improving the performance of machine learning systems. Meta-learning shows particular promise for reinforcement learning (RL), where algorithms are often adapted from supervised or unsupervised learning despite their suboptimality for RL. However, until now there has been a severe lack of comparison between different meta-learning algorithms, such as using evolution to optimise over black-box functions or LLMs to propose code. In this paper, we carry out this empirical comparison of the different approaches when applied to a range of meta-learned algorithms which target different parts of the RL pipeline. In addition to meta-train and meta-test performance, we also investigate factors including the interpretability, sample cost and train time for each meta-learning algorithm. Based on these findings, we propose several guidelines for meta-learning new RL algorithms which will help ensure that future learned algorithms are as performant as possible.

replace KLLM: Fast LLM Inference with K-Means Quantization

Authors: Xueying Wu, Baijun Zhou, Zhihui Gao, Yuzhe Fu, Qilin Zheng, Yintao He, Hai Li

Abstract: Large language model (LLM) inference poses significant challenges due to its intensive memory and computation demands. Weight and activation quantization (WAQ) offers a promising solution by reducing both memory footprint and arithmetic complexity. Traditional WAQ designs rely on uniform integer quantization for hardware efficiency, but often suffer from significant model performance degradation at low precision. In contrast, K-Means quantization, a non-uniform technique, achieves higher accuracy by aligning with the Gaussian-like distributions of weights and activations in LLMs. However, two key challenges prevent the efficient deployment of K-Means-based WAQ designs for LLM inference: (1) The non-uniform structure of K-Means-quantized data precludes direct execution on low-precision compute units, necessitating dequantization and floating-point matrix multiplications (MatMuls) during inference. (2) Activation outliers hinder effective low-precision quantization. Offline thresholding methods for outlier detection degrade model performance substantially, while existing online detection techniques introduce significant runtime overhead. To address the aforementioned challenges and fully unleash the potential of K-Means-based WAQ for LLM inference, in this paper, we propose KLLM, an LLM inference accelerator for efficient execution with K-Means-quantized weights and activations. KLLM features an index-based computation scheme for efficient execution of MatMuls and nonlinear operations on K-Means-quantized data, which avoids most of the dequantization and full-precision computations. Moreover, KLLM incorporates a lightweight outlier detection engine, Orizuru, that efficiently identifies the top-$k$ largest and smallest elements in the activation data stream during online inference.

replace TweakLLM: A Routing Architecture for Dynamic Tailoring of Cached Responses

Authors: Muhammad Taha Cheema, Abeer Aamir, Khawaja Gul Muhammad, Naveed Anwar Bhatti, Ihsan Ayyub Qazi, Zafar Ayyub Qazi

Abstract: Large Language Models (LLMs) process millions of queries daily, making efficient response caching a compelling optimization for reducing cost and latency. However, preserving relevance to user queries using this approach proves difficult due to the personalized nature of chatbot interactions and the limited accuracy of semantic similarity search. To address this, we present TweakLLM, a novel routing architecture that employs a lightweight LLM to dynamically adapt cached responses to incoming prompts. Through comprehensive evaluation, including user studies with side-by-side comparisons, satisfaction voting, as well as multi-agent LLM debates, we demonstrate that TweakLLM maintains response quality comparable to frontier models while significantly improving cache effectiveness. Our results across real-world datasets highlight TweakLLM as a scalable, resource-efficient caching solution for high-volume LLM deployments without compromising user experience.

replace Self-Questioning Language Models

Authors: Lili Chen, Mihir Prabhudesai, Katerina Fragkiadaki, Hao Liu, Deepak Pathak

Abstract: Can large language models improve without external data -- by generating their own questions and answers? We hypothesize that a pre-trained language model can improve its reasoning skills given only a single prompt specifying the topic (e.g., algebra word problems) and asking the model to generate its own questions. To do this, we propose Self-Questioning Language Models (SQLM): an asymmetric self-play framework where a proposer is given the topic and generates a question for a solver, who tries to answer it. Both the proposer and solver are trained via reinforcement learning. The proposer receives a reward if the problem is not too easy or too difficult, and the solver receives a reward based on majority voting, a proxy for correctness in the absence of ground-truth answers. For coding, the proposer can instead generate unit tests which are used for verification. We study this asymmetric self-play framework on three benchmarks: three-digit multiplication, algebra problems from the OMEGA benchmark, and programming problems from Codeforces. By continually generating more interesting problems and attempting to solve them, language models can improve on downstream benchmarks without access to any curated training datasets.

replace Predicting the Performance of Graph Convolutional Networks with Spectral Properties of the Graph Laplacian

Authors: Shalima Binta Manir, Tim Oates

Abstract: A common observation in the Graph Convolutional Network (GCN) literature is that stacking GCN layers may or may not result in better performance on tasks like node classification and edge prediction. We have found empirically that a graph's algebraic connectivity, which is known as the Fiedler value, is a good predictor of GCN performance. Intuitively, graphs with similar Fiedler values have analogous structural properties, suggesting that the same filters and hyperparameters may yield similar results when used with GCNs, and that transfer learning may be more effective between graphs with similar algebraic connectivity. We explore this theoretically and empirically with experiments on synthetic and real graph data, including the Cora, CiteSeer and Polblogs datasets. We explore multiple ways of aggregating the Fiedler value for connected components in the graphs to arrive at a value for the entire graph, and show that it can be used to predict GCN performance. We also present theoretical arguments as to why the Fiedler value is a good predictor.

replace Metis: Training Large Language Models with Advanced Low-Bit Quantization

Authors: Hengjie Cao, Mengyi Chen, Yifeng Yang, Ruijun Huang, Fang Dong, Jixian Zhou, Anrui Chen, Mingzhi Dong, Yujiang Wang, Jinlong Hou, Yuan Cheng, Fan Wu, Fan Yang, Tun Lu, Ning Gu, Li Shang

Abstract: This work identifies anisotropic parameter distributions as a fundamental barrier to training large language models (LLMs) with low-bit quantization: a few dominant singular values create wide numerical ranges that conflict with the inherent bias of block-wise quantization. This bias disproportionately preserves high-magnitude values while discarding smaller ones, causing training instability and low model performance. This work introduces Metis, a training framework that combines (i) spectral decomposition with random embedding to efficiently disentangle dominant from long-tail components, compressing broad distributions into quantization-friendly narrow ranges; (ii) adaptive learning rates in the spectral domain to amplify underrepresented directions and better capture diverse features critical for performance; and (iii) a dual-range regularizer that jointly constrains numerical precision and parameter range distribution, ensuring stable, unbiased low-bit training. With Metis, FP8 training surpasses FP32 baselines, and FP4 training achieves accuracy comparable to FP32, paving the way for robust and scalable LLM training under advanced low-bit quantization. The code implementation for Metis is available at: https://github.com/sii-research/Metis.

URLs: https://github.com/sii-research/Metis.

replace RoFt-Mol: Benchmarking Robust Fine-Tuning with Molecular Graph Foundation Models

Authors: Shikun Liu, Deyu Zou, Nima Shoghi, Victor Fung, Kai Liu, Pan Li

Abstract: In the era of foundation models, fine-tuning pre-trained models for specific downstream tasks has become crucial. This drives the need for robust fine-tuning methods to address challenges such as model overfitting and sparse labeling. Molecular graph foundation models (MGFMs) face unique difficulties that complicate fine-tuning. These models are limited by smaller pre-training datasets and more severe data scarcity for downstream tasks, both of which require enhanced model generalization. Moreover, MGFMs must accommodate diverse objectives, including both regression and classification tasks. To better understand and improve fine-tuning techniques under these conditions, we classify eight fine-tuning methods into three mechanisms: weight-based, representation-based, and partial fine-tuning. We benchmark these methods on downstream regression and classification tasks across supervised and self-supervised pre-trained models in diverse labeling settings. This extensive evaluation provides valuable insights and informs the design of a refined robust fine-tuning method, ROFT-MOL. This approach combines the strengths of simple post-hoc weight interpolation with more complex weight ensemble fine-tuning methods, delivering improved performance across both task types while maintaining the ease of use inherent in post-hoc weight interpolation.

replace Second-Order Tensorial Partial Differential Equations on Graphs

Authors: Aref Einizade, Fragkiskos D. Malliaros, Jhony H. Giraldo

Abstract: Processing data on multiple interacting graphs is crucial for many applications, but existing approaches rely mostly on discrete filtering or first-order continuous models that dampen high frequencies and propagate information slowly. We introduce second-order tensorial partial differential equations on graphs (So-TPDEGs) and propose the first theoretically grounded framework for second-order continuous product graph neural networks. Our method exploits the separability of cosine kernels in Cartesian product graphs to enable efficient spectral decomposition while preserving high-frequency signals. We further provide rigorous analyses of stability under graph perturbations and over-smoothing, establishing a solid theoretical foundation for continuous graph learning.

replace CAME-AB: Cross-Modality Attention with Mixture-of-Experts for Antibody Binding Site Prediction

Authors: Hongzong Li, Jiahao Ma, Zhanpeng Shi, Rui Xiao, Fanming Jin, Ye-Fan Hu, Hangjun Che, Jian-Dong Huang

Abstract: Antibody binding site prediction plays a pivotal role in computational immunology and therapeutic antibody design. Existing sequence or structure methods rely on single-view features and fail to identify antibody-specific binding sites on the antigens. In this paper, we propose \textbf{CAME-AB}, a novel Cross-modality Attention framework with a Mixture-of-Experts (MoE) backbone for robust antibody binding site prediction. CAME-AB integrates five biologically grounded modalities, including raw amino acid encodings, BLOSUM substitution profiles, pretrained language model embeddings, structure-aware features, and GCN-refined biochemical graphs, into a unified multimodal representation. To enhance adaptive cross-modal reasoning, we propose an \emph{adaptive modality fusion} module that learns to dynamically weight each modality based on its global relevance and input-specific contribution. A Transformer encoder combined with an MoE module further promotes feature specialization and capacity expansion. We additionally incorporate a supervised contrastive learning objective to explicitly shape the latent space geometry, encouraging intra-class compactness and inter-class separability. To improve optimization stability and generalization, we apply stochastic weight averaging during training. Extensive experiments on benchmark antibody-antigen datasets demonstrate that CAME-AB consistently outperforms strong baselines on multiple metrics, including Precision, Recall, F1-score, AUC-ROC, and MCC. Ablation studies further validate the effectiveness of each architectural component and the benefit of multimodal feature integration. The model implementation details and the codes are available on https://anonymous.4open.science/r/CAME-AB-C525

URLs: https://anonymous.4open.science/r/CAME-AB-C525

replace RoseCDL: Robust and Scalable Convolutional Dictionary Learning for Rare-event Detection

Authors: Jad Yehya, Mansour Benbakoura, C\'edric Allain, Beno\^it Malezieux, Matthieu Kowalski, Thomas Moreau

Abstract: Identifying recurring patterns and rare events in large-scale signals is a fundamental challenge in fields such as astronomy, physical simulations, and biomedical science. Convolutional Dictionary Learning (CDL) offers a powerful framework for modeling local structures in signals, but its use for detecting rare or anomalous events remains largely unexplored. In particular, CDL faces two key challenges in this setting: high computational cost and sensitivity to artifacts and outliers. In this paper, we introduce RoseCDL, a scalable and robust CDL algorithm designed for unsupervised rare event detection in long signals. RoseCDL combines stochastic windowing for efficient training on large datasets with inline outlier detection to enhance robustness and isolate anomalous patterns. This reframes CDL as a practical tool for event discovery and characterization in real-world signals, extending its role beyond traditional tasks like compression or denoising.

replace-cross Maximizing Information in Domain-Invariant Representation Improves Transfer Learning

Authors: Adrian Shuai Li, Elisa Bertino, Xuan-Hong Dang, Ankush Singla, Yuhai Tu, Mark N Wegman

Abstract: We propose MaxDIRep, a domain adaptation method that improves the decomposition of data representations into domain-independent and domain-dependent components. Existing methods, such as Domain-Separation Networks (DSN), use a weak orthogonality constraint between these components, which can lead to label-relevant features being partially encoded in the domain-dependent representation (DDRep) rather than the domain-independent representation (DIRep). As a result, information crucial for target-domain classification may be missing from the DIRep. MaxDIRep addresses this issue by applying a Kullback-Leibler (KL) divergence constraint to minimize the information content of the DDRep, thereby encouraging the DIRep to retain features that are both domain-invariant and predictive of target labels. Through geometric analysis and an ablation study on synthetic datasets, we show why DSN's weaker constraint can lead to suboptimal adaptation. Experiments on standard image benchmarks and a network intrusion detection task demonstrate that MaxDIRep achieves strong performance, works with pretrained models, and generalizes to non-image classification tasks.

replace-cross From Channel Bias to Feature Redundancy: Uncovering the "Less is More" Principle in Few-Shot Learning

Authors: Ji Zhang, Xu Luo, Lianli Gao, Difan Zou, Hengtao Shen, Jingkuan Song

Abstract: Deep neural networks often fail to adapt representations to novel tasks under distribution shifts, especially when only a few examples are available. This paper identifies a core obstacle behind this failure: channel bias, where networks develop a rigid emphasis on feature dimensions that were discriminative for the source task, but this emphasis is misaligned and fails to adapt to the distinct needs of a novel task. This bias leads to a striking and detrimental consequence: feature redundancy. We demonstrate that for few-shot tasks, classification accuracy is significantly improved by using as few as 1-5% of the most discriminative feature dimensions, revealing that the vast majority are actively harmful. Our theoretical analysis confirms that this redundancy originates from confounding feature dimensions-those with high intra-class variance but low inter-class separability-which are especially problematic in low-data regimes. This "less is more" phenomenon is a defining characteristic of the few-shot setting, diminishing as more samples become available. To address this, we propose a simple yet effective soft-masking method, Augmented Feature Importance Adjustment (AFIA), which estimates feature importance from augmented data to mitigate the issue. By establishing the cohesive link from channel bias to its consequence of extreme feature redundancy, this work provides a foundational principle for few-shot representation transfer and a practical method for developing more robust few-shot learning algorithms.

replace-cross Damped Proximal Augmented Lagrangian Method for weakly-Convex Problems with Convex Constraints

Authors: Hari Dahal, Wei Liu, Yangyang Xu

Abstract: We give a damped proximal augmented Lagrangian method (DPALM) for solving problems with a weakly-convex objective and convex linear/nonlinear constraints. Instead of taking a full stepsize, DPALM adopts a damped dual stepsize to ensure the boundedness of dual iterates. We show that DPALM can produce a (near) $\vareps$-KKT point within $O(\vareps^{-2})$ outer iterations if each DPALM subproblem is solved to a proper accuracy. In addition, we establish overall iteration complexity of DPALM when the objective is either a regularized smooth function or in a regularized compositional form. For the former case, DPALM achieves the complexity of $\widetilde{\mathcal{O}}\left(\varepsilon^{-2.5} \right)$ to produce an $\varepsilon$-KKT point by applying an accelerated proximal gradient (APG) method to each DPALM subproblem. For the latter case, the complexity of DPALM is $\widetilde{\mathcal{O}}\left(\varepsilon^{-3} \right)$ to produce a near $\varepsilon$-KKT point by using an APG to solve a Moreau-envelope smoothed version of each subproblem. Our outer iteration complexity and the overall complexity either generalize existing best ones from unconstrained or linear-constrained problems to convex-constrained ones, or improve over the best-known results on solving the same-structured problems. Furthermore, numerical experiments on linearly/quadratically constrained non-convex quadratic programs and linear-constrained robust nonlinear least squares are conducted to demonstrate the empirical efficiency of the proposed DPALM over several state-of-the art methods.

replace-cross PQMass: Probabilistic Assessment of the Quality of Generative Models using Probability Mass Estimation

Authors: Pablo Lemos, Sammy Sharief, Nikolay Malkin, Salma Salhi, Connor Stone, Laurence Perreault-Levasseur, Yashar Hezaveh

Abstract: We propose a likelihood-free method for comparing two distributions given samples from each, with the goal of assessing the quality of generative models. The proposed approach, PQMass, provides a statistically rigorous method for assessing the performance of a single generative model or the comparison of multiple competing models. PQMass divides the sample space into non-overlapping regions and applies chi-squared tests to the number of data samples that fall within each region, giving a p-value that measures the probability that the bin counts derived from two sets of samples are drawn from the same multinomial distribution. PQMass does not depend on assumptions regarding the density of the true distribution, nor does it rely on training or fitting any auxiliary models. We evaluate PQMass on data of various modalities and dimensions, demonstrating its effectiveness in assessing the quality, novelty, and diversity of generated samples. We further show that PQMass scales well to moderately high-dimensional data and thus obviates the need for feature extraction in practical applications.

replace-cross Statistical-Computational Trade-offs for Recursive Adaptive Partitioning Estimators

Authors: Yan Shuo Tan, Jason M. Klusowski, Krishnakumar Balasubramanian

Abstract: Models based on recursive adaptive partitioning such as decision trees and their ensembles are popular for high-dimensional regression as they can potentially avoid the curse of dimensionality. Because empirical risk minimization (ERM) is computationally infeasible, these models are typically trained using greedy algorithms. Although effective in many cases, these algorithms have been empirically observed to get stuck at local optima. We explore this phenomenon in the context of learning sparse regression functions over $d$ binary features, showing that when the true regression function $f^*$ does not satisfy Abbe et al. (2022)'s Merged Staircase Property (MSP), greedy training requires $\exp(\Omega(d))$ to achieve low estimation error. Conversely, when $f^*$ does satisfy MSP, greedy training can attain small estimation error with only $O(\log d)$ samples. This dichotomy mirrors that of two-layer neural networks trained with stochastic gradient descent (SGD) in the mean-field regime, thereby establishing a head-to-head comparison between SGD-trained neural networks and greedy recursive partitioning estimators. Furthermore, ERM-trained recursive partitioning estimators achieve low estimation error with $O(\log d)$ samples irrespective of whether $f^*$ satisfies MSP, thereby demonstrating a statistical-computational trade-off for greedy training. Our proofs are based on a novel interpretation of greedy recursive partitioning using stochastic process theory and a coupling technique that may be of independent interest.

replace-cross Downlink MIMO Channel Estimation from Bits: Recoverability and Algorithm

Authors: Rajesh Shrestha, Mingjie Shao, Mingyi Hong, Wing-Kin Ma, Xiao Fu

Abstract: In frequency division duplex (FDD) massive MIMO systems, a major challenge lies in acquiring the downlink channel state information}\ (CSI) at the base station (BS) from limited feedback sent by the user equipment (UE). To tackle this fundamental task, our contribution is twofold: First, a simple feedback framework is proposed, where a compression and Gaussian dithering-based quantization strategy is adopted at the UE side, and then a maximum likelihood estimator (MLE) is formulated at the BS side. Recoverability of the MIMO channel under the widely used double directional model is established. Specifically, analyses are presented for two compression schemes -- showing one being more overhead-economical and the other computationally lighter at the UE side. Second, to realize the MLE, an alternating direction method of multipliers (ADMM) algorithm is proposed. The algorithm is carefully designed to integrate a sophisticated harmonic retrieval (HR) solver as subroutine, which turns out to be the key of effectively tackling this hard MLE problem.Extensive numerical experiments are conducted to validate the efficacy of our approach.

replace-cross A single-loop SPIDER-type stochastic subgradient method for expectation-constrained nonconvex nonsmooth optimization

Authors: Wei Liu, Yangyang Xu

Abstract: Many real-world problems, such as those with fairness constraints, involve complex expectation constraints and large datasets, necessitating the design of efficient stochastic methods to solve them. Most existing research focuses on cases with no {constraint} or easy-to-project constraints or deterministic constraints. In this paper, we consider nonconvex nonsmooth stochastic optimization problems with expectation constraints, for which we build a novel exact penalty model. We first show the relationship between the penalty model and the original problem. Then on solving the penalty problem, we present a single-loop SPIDER-type stochastic subgradient method, which utilizes the subgradients of both the objective and constraint functions, as well as the constraint function value at each iteration. Under certain regularity conditions (weaker than Slater-type constraint qualification or strong feasibility assumed in existing works), we establish an iteration complexity result of $O(\epsilon^{-4})$ to reach a near-$\epsilon$ stationary point of the penalized problem in expectation, matching the lower bound for such tasks. Building on the exact penalization, an $(\epsilon,\epsilon)$-KKT point of the original problem is obtained. For a few scenarios, our complexity of either the {objective} sample subgradient or the constraint sample function values can be lower than the state-of-the-art results by a factor of $\epsilon^{-2}$. Moreover, on solving two fairness-constrained problems and a multi-class Neyman-Pearson classification problem, our method is significantly (up to 466 times) faster than the state-of-the-art algorithms, including switching subgradient method and inexact proximal point methods.

replace-cross MPO: Boosting LLM Agents with Meta Plan Optimization

Authors: Weimin Xiong, Yifan Song, Qingxiu Dong, Bingchan Zhao, Feifan Song, Xun Wang, Sujian Li

Abstract: Recent advancements in large language models (LLMs) have enabled LLM-based agents to successfully tackle interactive planning tasks. However, despite their successes, existing approaches often suffer from planning hallucinations and require retraining for each new agent. To address these challenges, we propose the Meta Plan Optimization (MPO) framework, , which enhances agent planning capabilities by directly incorporating explicit guidance. Unlike previous methods that rely on complex knowledge, which either require significant human effort or lack quality assurance, MPO leverages high-level general guidance through meta plans to assist agent planning and enables continuous optimization of the meta plans based on feedback from the agent's task execution. Our experiments conducted on two representative tasks demonstrate that MPO significantly outperforms existing baselines. Moreover, our analysis indicates that MPO provides a plug-and-play solution that enhances both task completion efficiency and generalization capabilities in previous unseen scenarios.

replace-cross A Randomized Zeroth-Order Hierarchical Framework for Heterogeneous Federated Learning

Authors: Yuyang Qiu, Kibaek Kim, Farzad Yousefian

Abstract: Heterogeneity in federated learning (FL) is a critical and challenging aspect that significantly impacts model performance and convergence. In this paper, we propose a novel framework by formulating heterogeneous FL as a hierarchical optimization problem. This new framework captures both local and global training processes through a bilevel formulation and is capable of the following: (i) addressing client heterogeneity through a personalized learning framework; (ii) capturing the pre-training process on the server side; (iii) updating the global model through nonstandard aggregation; (iv) allowing for nonidentical local steps; and (v) capturing clients' local constraints. We design and analyze an implicit zeroth-order FL method (ZO-HFL), equipped with nonasymptotic convergence guarantees for both the server-agent and the individual client-agents, and asymptotic guarantees for both the server-agent and client-agents in an almost sure sense. Notably, our method does not rely on standard assumptions in heterogeneous FL, such as the bounded gradient dissimilarity condition. We implement our method on image classification tasks and compare with other methods under different heterogeneous settings.

replace-cross Dexterous Manipulation through Imitation Learning: A Survey

Authors: Shan An, Ziyu Meng, Chao Tang, Yuning Zhou, Tengyu Liu, Fangqiang Ding, Shufang Zhang, Yao Mu, Ran Song, Wei Zhang, Zeng-Guang Hou, Hong Zhang

Abstract: Dexterous manipulation, which refers to the ability of a robotic hand or multi-fingered end-effector to skillfully control, reorient, and manipulate objects through precise, coordinated finger movements and adaptive force modulation, enables complex interactions similar to human hand dexterity. With recent advances in robotics and machine learning, there is a growing demand for these systems to operate in complex and unstructured environments. Traditional model-based approaches struggle to generalize across tasks and object variations due to the high dimensionality and complex contact dynamics of dexterous manipulation. Although model-free methods such as reinforcement learning (RL) show promise, they require extensive training, large-scale interaction data, and carefully designed rewards for stability and effectiveness. Imitation learning (IL) offers an alternative by allowing robots to acquire dexterous manipulation skills directly from expert demonstrations, capturing fine-grained coordination and contact dynamics while bypassing the need for explicit modeling and large-scale trial-and-error. This survey provides an overview of dexterous manipulation methods based on imitation learning, details recent advances, and addresses key challenges in the field. Additionally, it explores potential research directions to enhance IL-driven dexterous manipulation. Our goal is to offer researchers and practitioners a comprehensive introduction to this rapidly evolving domain.

replace-cross Real Time Semantic Segmentation of High Resolution Automotive LiDAR Scans

Authors: Hannes Reichert, Benjamin Serfling, Elijah Sch\"ussler, Kerim Turacan, Konrad Doll, Bernhard Sick

Abstract: In recent studies, numerous previous works emphasize the importance of semantic segmentation of LiDAR data as a critical component to the development of driver-assistance systems and autonomous vehicles. However, many state-of-the-art methods are tested on outdated, lower-resolution LiDAR sensors and struggle with real-time constraints. This study introduces a novel semantic segmentation framework tailored for modern high-resolution LiDAR sensors that addresses both accuracy and real-time processing demands. We propose a novel LiDAR dataset collected by a cutting-edge automotive 128 layer LiDAR in urban traffic scenes. Furthermore, we propose a semantic segmentation method utilizing surface normals as strong input features. Our approach is bridging the gap between cutting-edge research and practical automotive applications. Additionaly, we provide a Robot Operating System (ROS2) implementation that we operate on our research vehicle. Our dataset and code are publicly available: https://github.com/kav-institute/SemanticLiDAR.

URLs: https://github.com/kav-institute/SemanticLiDAR.

replace-cross Meta-Semantics Augmented Few-Shot Relational Learning

Authors: Han Wu, Jie Yin

Abstract: Few-shot relational learning on knowledge graph (KGs) aims to perform reasoning over relations with only a few training examples. While existing methods have primarily focused on leveraging specific relational information, rich semantics inherent in KGs have been largely overlooked. To address this critical gap, we propose a novel prompted meta-learning (PromptMeta) framework that seamlessly integrates meta-semantics with relational information for few-shot relational learning. PromptMeta has two key innovations: (1) a Meta-Semantic Prompt (MSP) pool that learns and consolidates high-level meta-semantics, enabling effective knowledge transfer and adaptation to rare and newly emerging relations; and (2) a learnable fusion token that dynamically combines meta-semantics with task-specific relational information tailored to different few-shot tasks. Both components are optimized jointly with model parameters within a meta-learning framework. Extensive experiments and analyses on two real-world KG datasets demonstrate the effectiveness of PromptMeta in adapting to new relations with limited data.

replace-cross Linear Convergence of the Frank-Wolfe Algorithm over Product Polytopes

Authors: Gabriele Iommazzo, David Mart\'inez-Rubio, Francisco Criado, Elias Wirth, Sebastian Pokutta

Abstract: We study the linear convergence of Frank-Wolfe algorithms over product polytopes. We analyze two condition numbers for the product polytope, namely the \emph{pyramidal width} and the \emph{vertex-facet distance}, based on the condition numbers of individual polytope components. As a result, for convex objectives that are $\mu$-Polyak-{\L}ojasiewicz, we show linear convergence rates quantified in terms of the resulting condition numbers. We apply our results to the problem of approximately finding a feasible point in a polytope intersection in high-dimensions, and demonstrate the practical efficiency of our algorithms through empirical results.

replace-cross From Static to Adaptive Defense: Federated Multi-Agent Deep Reinforcement Learning-Driven Moving Target Defense Against DoS Attacks in UAV Swarm Networks

Authors: Yuyang Zhou, Guang Cheng, Kang Du, Zihan Chen, Tian Qin, Yuyu Zhao

Abstract: The proliferation of UAVs has enabled a wide range of mission-critical applications and is becoming a cornerstone of low-altitude networks, supporting smart cities, emergency response, and more. However, the open wireless environment, dynamic topology, and resource constraints of UAVs expose low-altitude networks to severe DoS threats. Traditional defense approaches, which rely on fixed configurations or centralized decision-making, cannot effectively respond to the rapidly changing conditions in UAV swarm environments. To address these challenges, we propose a novel federated multi-agent deep reinforcement learning (FMADRL)-driven moving target defense (MTD) framework for proactive DoS mitigation in low-altitude networks. Specifically, we design lightweight and coordinated MTD mechanisms, including leader switching, route mutation, and frequency hopping, to disrupt attacker efforts and enhance network resilience. The defense problem is formulated as a multi-agent partially observable Markov decision process, capturing the uncertain nature of UAV swarms under attack. Each UAV is equipped with a policy agent that autonomously selects MTD actions based on partial observations and local experiences. By employing a policy gradient-based algorithm, UAVs collaboratively optimize their policies via reward-weighted aggregation. Extensive simulations demonstrate that our approach significantly outperforms state-of-the-art baselines, achieving up to a 34.6% improvement in attack mitigation rate, a reduction in average recovery time of up to 94.6%, and decreases in energy consumption and defense cost by as much as 29.3% and 98.3%, respectively, under various DoS attack strategies. These results highlight the potential of intelligent, distributed defense mechanisms to protect low-altitude networks, paving the way for reliable and scalable low-altitude economy.

replace-cross Accelerating Hamiltonian Monte Carlo for Bayesian Inference in Neural Networks and Neural Operators

Authors: Ponkrshnan Thiagarajan, Tamer A. Zaki, Michael D. Shields

Abstract: Hamiltonian Monte Carlo (HMC) is a powerful and accurate method to sample from the posterior distribution in Bayesian inference. However, HMC techniques are computationally demanding for Bayesian neural networks due to the high dimensionality of the network's parameter space and the non-convexity of their posterior distributions. Therefore, various approximation techniques, such as variational inference (VI) or stochastic gradient MCMC, are often employed to infer the posterior distribution of the network parameters. Such approximations introduce inaccuracies in the inferred distributions, resulting in unreliable uncertainty estimates. In this work, we propose a hybrid approach that combines inexpensive VI and accurate HMC methods to efficiently and accurately quantify uncertainties in neural networks and neural operators. The proposed approach leverages an initial VI training on the full network. We examine the influence of individual parameters on the prediction uncertainty, which shows that a large proportion of the parameters do not contribute substantially to uncertainty in the network predictions. This information is then used to significantly reduce the dimension of the parameter space, and HMC is performed only for the subset of network parameters that strongly influence prediction uncertainties. This yields a framework for accelerating the full batch HMC for posterior inference in neural networks. We demonstrate the efficiency and accuracy of the proposed framework on deep neural networks and operator networks, showing that inference can be performed for large networks with tens to hundreds of thousands of parameters. We show that this method can effectively learn surrogates for complex physical systems by modeling the operator that maps from upstream conditions to wall-pressure data on a cone in hypersonic flow.

replace-cross MetaExplainer: A Framework to Generate Multi-Type User-Centered Explanations for AI Systems

Authors: Shruthi Chari, Oshani Seneviratne, Prithwish Chakraborty, Pablo Meyer, Deborah L. McGuinness

Abstract: Explanations are crucial for building trustworthy AI systems, but a gap often exists between the explanations provided by models and those needed by users. To address this gap, we introduce MetaExplainer, a neuro-symbolic framework designed to generate user-centered explanations. Our approach employs a three-stage process: first, we decompose user questions into machine-readable formats using state-of-the-art large language models (LLM); second, we delegate the task of generating system recommendations to model explainer methods; and finally, we synthesize natural language explanations that summarize the explainer outputs. Throughout this process, we utilize an Explanation Ontology to guide the language models and explainer methods. By leveraging LLMs and a structured approach to explanation generation, MetaExplainer aims to enhance the interpretability and trustworthiness of AI systems across various applications, providing users with tailored, question-driven explanations that better meet their needs. Comprehensive evaluations of MetaExplainer demonstrate a step towards evaluating and utilizing current state-of-the-art explanation frameworks. Our results show high performance across all stages, with a 59.06% F1-score in question reframing, 70% faithfulness in model explanations, and 67% context-utilization in natural language synthesis. User studies corroborate these findings, highlighting the creativity and comprehensiveness of generated explanations. Tested on the Diabetes (PIMA Indian) tabular dataset, MetaExplainer supports diverse explanation types, including Contrastive, Counterfactual, Rationale, Case-Based, and Data explanations. The framework's versatility and traceability from using ontology to guide LLMs suggest broad applicability beyond the tested scenarios, positioning MetaExplainer as a promising tool for enhancing AI explainability across various domains.

replace-cross A Survey on Training-free Alignment of Large Language Models

Authors: Birong Pan, Yongqi Li, Weiyu Zhang, Wenpeng Lu, Mayi Xu, Shen Zhou, Yuanyuan Zhu, Ming Zhong, Tieyun Qian

Abstract: The alignment of large language models (LLMs) aims to ensure their outputs adhere to human values, ethical standards, and legal norms. Traditional alignment methods often rely on resource-intensive fine-tuning (FT), which may suffer from knowledge degradation and face challenges in scenarios where the model accessibility or computational resources are constrained. In contrast, training-free (TF) alignment techniques--leveraging in-context learning, decoding-time adjustments, and post-generation corrections--offer a promising alternative by enabling alignment without heavily retraining LLMs, making them adaptable to both open-source and closed-source environments. This paper presents the first systematic review of TF alignment methods, categorizing them by stages of pre-decoding, in-decoding, and post-decoding. For each stage, we provide a detailed examination from the viewpoint of LLMs and multimodal LLMs (MLLMs), highlighting their mechanisms and limitations. Furthermore, we identify key challenges and future directions, paving the way for more inclusive and effective TF alignment techniques. By synthesizing and organizing the rapidly growing body of research, this survey offers a guidance for practitioners and advances the development of safer and more reliable LLMs.

replace-cross GRAM-R$^2$: Self-Training Generative Foundation Reward Models for Reward Reasoning

Authors: Chenglong Wang, Yongyu Mu, Hang Zhou, Yifu Huo, Ziming Zhu, Jiali Zeng, Murun Yang, Bei Li, Tong Xiao, Xiaoyang Hao, Chunliang Zhang, Fandong Meng, Jingbo Zhu

Abstract: Significant progress in reward modeling over recent years has been driven by a paradigm shift from task-specific designs towards generalist reward models. Despite this trend, developing effective reward models remains a fundamental challenge: the heavy reliance on large-scale labeled preference data. Pre-training on abundant unlabeled data offers a promising direction, but existing approaches fall short of instilling explicit reasoning into reward models. To bridge this gap, we propose a self-training approach that leverages unlabeled data to elicit reward reasoning in reward models. Based on this approach, we develop GRAM-R$^2$, a generative reward model trained to produce not only preference labels but also accompanying reward rationales. GRAM-R$^2$ can serve as a foundation model for reward reasoning and can be applied to a wide range of tasks with minimal or no additional fine-tuning. It can support downstream applications such as response ranking and task-specific reward tuning. Experiments on response ranking, task adaptation, and reinforcement learning from human feedback demonstrate that GRAM-R$^2$ consistently delivers strong performance, outperforming several strong discriminative and generative baselines.

replace-cross CURE: Controlled Unlearning for Robust Embeddings -- Mitigating Conceptual Shortcuts in Pre-Trained Language Models

Authors: Aysenur Kocak, Shuo Yang, Bardh Prenkaj, Gjergji Kasneci

Abstract: Pre-trained language models have achieved remarkable success across diverse applications but remain susceptible to spurious, concept-driven correlations that impair robustness and fairness. In this work, we introduce CURE, a novel and lightweight framework that systematically disentangles and suppresses conceptual shortcuts while preserving essential content information. Our method first extracts concept-irrelevant representations via a dedicated content extractor reinforced by a reversal network, ensuring minimal loss of task-relevant information. A subsequent controllable debiasing module employs contrastive learning to finely adjust the influence of residual conceptual cues, enabling the model to either diminish harmful biases or harness beneficial correlations as appropriate for the target task. Evaluated on the IMDB and Yelp datasets using three pre-trained architectures, CURE achieves an absolute improvement of +10 points in F1 score on IMDB and +2 points on Yelp, while introducing minimal computational overhead. Our approach establishes a flexible, unsupervised blueprint for combating conceptual biases, paving the way for more reliable and fair language understanding systems.

replace-cross Delta Velocity Rectified Flow for Text-to-Image Editing

Authors: Gaspard Beaudouin, Minghan Li, Jaeyeon Kim, Sung-Hoon Yoon, Mengyu Wang

Abstract: We propose Delta Velocity Rectified Flow (DVRF), a novel inversion-free, path-aware editing framework within rectified flow models for text-to-image editing. DVRF is a distillation-based method that explicitly models the discrepancy between the source and target velocity fields in order to mitigate over-smoothing artifacts rampant in prior distillation sampling approaches. We further introduce a time-dependent shift term to push noisy latents closer to the target trajectory, enhancing the alignment with the target distribution. We theoretically demonstrate that when this shift is disabled, DVRF reduces to Delta Denoising Score, thereby bridging score-based diffusion optimization and velocity-based rectified-flow optimization. Moreover, when the shift term follows a linear schedule under rectified-flow dynamics, DVRF generalizes the Inversion-free method FlowEdit and provides a principled theoretical interpretation for it. Experimental results indicate that DVRF achieves superior editing quality, fidelity, and controllability while requiring no architectural modifications, making it efficient and broadly applicable to text-to-image editing tasks. Code is available at https://github.com/Harvard-AI-and-Robotics-Lab/DeltaVelocityRectifiedFlow.

URLs: https://github.com/Harvard-AI-and-Robotics-Lab/DeltaVelocityRectifiedFlow.

replace-cross Murphys Laws of AI Alignment: Why the Gap Always Wins

Authors: Madhava Gaikwad

Abstract: We prove a formal impossibility result for reinforcement learning from human feedback (RLHF). In misspecified environments with bounded query budgets, any RLHF-style learner suffers an irreducible performance gap Omega(gamma) unless it has access to a calibration oracle. We give tight lower bounds via an information-theoretic proof and show that a minimal calibration oracle suffices to eliminate the gap. Small-scale empirical illustrations and a catalogue of alignment regularities (Murphy's Laws) indicate that many observed alignment failures are consistent with this structural mechanism. Our results position Murphys Gap as both a diagnostic limit of RLHF and a guide for future work on calibration and causal preference checks.

replace-cross Uncertainty Quantification in Probabilistic Machine Learning Models: Theory, Methods, and Insights

Authors: Marzieh Ajirak, Anand Ravishankar, Petar M. Djuric

Abstract: Uncertainty Quantification (UQ) is essential in probabilistic machine learning models, particularly for assessing the reliability of predictions. In this paper, we present a systematic framework for estimating both epistemic and aleatoric uncertainty in probabilistic models. We focus on Gaussian Process Latent Variable Models and employ scalable Random Fourier Features-based Gaussian Processes to approximate predictive distributions efficiently. We derive a theoretical formulation for UQ, propose a Monte Carlo sampling-based estimation method, and conduct experiments to evaluate the impact of uncertainty estimation. Our results provide insights into the sources of predictive uncertainty and illustrate the effectiveness of our approach in quantifying the confidence in the predictions.

replace-cross BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models

Authors: Yuming Li, Yikai Wang, Yuying Zhu, Zhongyu Zhao, Ming Lu, Qi She, Shanghang Zhang

Abstract: Recent advancements in aligning image and video generative models via GRPO have achieved remarkable gains in enhancing human preference alignment. However, these methods still face high computational costs from on-policy rollouts and excessive SDE sampling steps, as well as training instability due to sparse rewards. In this paper, we propose BranchGRPO, a novel method that introduces a branch sampling policy updating the SDE sampling process. By sharing computation across common prefixes and pruning low-reward paths and redundant depths, BranchGRPO substantially lowers the per-update compute cost while maintaining or improving exploration diversity. This work makes three main contributions: (1) a branch sampling scheme that reduces rollout and training cost; (2) a tree-based advantage estimator incorporating dense process-level rewards; and (3) pruning strategies exploiting path and depth redundancy to accelerate convergence and boost performance. Experiments on image and video preference alignment show that BranchGRPO improves alignment scores by 16% over strong baselines, while cutting training time by 50%.

replace-cross The Efficiency Frontier: Classical Shadows versus Quantum Footage

Authors: Shuowei Ma, Junyu Liu

Abstract: Interfacing quantum and classical processors is an important subroutine in full-stack quantum algorithms. The so-called "classical shadow" method efficiently extracts essential classical information from quantum states, enabling the prediction of many properties of a quantum system from only a few measurements. However, for a small number of highly non-local observables, or when classical post-processing power is limited, the classical shadow method is not always the most efficient choice. Here, we address this issue quantitatively by performing a full-stack resource analysis that compares classical shadows with "quantum footage," which refers to direct quantum measurement. Under certain assumptions, our analysis illustrates a boundary of download efficiency between classical shadows and quantum footage. For observables expressed as linear combinations of Pauli matrices, the classical shadow method outperforms direct measurement when the number of observables is large and the Pauli weight is small. For observables in the form of large Hermitian sparse matrices, the classical shadow method shows an advantage when the number of observables, the sparsity of the matrix, and the number of qubits fall within a certain range. The key parameters influencing this behavior include the number of qubits $n$, observables $M$, sparsity $k$, Pauli weight $w$, accuracy requirement $\epsilon$, and failure tolerance $\delta$. We also compare the resource consumption of the two methods on different types of quantum computers and identify break-even points where the classical shadow method becomes more efficient, which vary depending on the hardware. This paper opens a new avenue for quantitatively designing optimal strategies for hybrid quantum-classical tomography and provides practical insights for selecting the most suitable quantum measurement approach in real-world applications.

replace-cross Detection of trade in products derived from threatened species using machine learning and a smartphone

Authors: Ritwik Kulkarni, WU Hanqin, Enrico Di Minin

Abstract: Unsustainable trade in wildlife is a major threat to biodiversity and is now increasingly prevalent in digital marketplaces and social media. With the sheer volume of digital content, the need for automated methods to detect wildlife trade listings is growing. These methods are especially needed for the automatic identification of wildlife products, such as ivory. We developed machine learning-based object recognition models that can identify wildlife products within images and highlight them. The data consists of images of elephant, pangolin, and tiger products that were identified as being sold illegally or that were confiscated by authorities. Specifically, the wildlife products included elephant ivory and skins, pangolin scales, and claws (raw and crafted), and tiger skins and bones. We investigated various combinations of training strategies and two loss functions to identify the best model to use in the automatic detection of these wildlife products. Models were trained for each species while also developing a single model to identify products from all three species. The best model showed an overall accuracy of 84.2% with accuracies of 71.1%, 90.2% and 93.5% in detecting products derived from elephants, pangolins, and tigers, respectively. We further demonstrate that the machine learning model can be made easily available to stakeholders, such as government authorities and law enforcement agencies, by developing a smartphone-based application that had an overall accuracy of 91.3%. The application can be used in real time to click images and help identify potentially prohibited products of target species. Thus, the proposed method is not only applicable for monitoring trade on the web but can also be used e.g. in physical markets for monitoring wildlife trade.

replace-cross Reward function compression facilitates goal-dependent reinforcement learning

Authors: Gaia Molinaro, Anne G. E. Collins

Abstract: Reinforcement learning agents learn from rewards, but humans can uniquely assign value to novel, abstract outcomes in a goal-dependent manner. However, this flexibility is cognitively costly, making learning less efficient. Here, we propose that goal-dependent learning is initially supported by a capacity-limited working memory system. With consistent experience, learners create a "compressed" reward function (a simplified rule defining the goal) which is then transferred to long-term memory and applied automatically upon receiving feedback. This process frees up working memory resources, boosting learning efficiency. We test this theory across six experiments. Consistent with our predictions, our findings demonstrate that learning is parametrically impaired by the size of the goal space, but improves when the goal space structure allows for compression. We also find faster reward processing to correlate with better learning performance, supporting the idea that as goal valuation becomes more automatic, more resources are available for learning. We leverage computational modeling to support this interpretation. Our work suggests that efficient goal-directed learning relies on compressing complex goal information into a stable reward function, shedding light on the cognitive mechanisms of human motivation. These findings generate new insights into the neuroscience of intrinsic motivation and could help improve behavioral techniques that support people in achieving their goals.

replace-cross Moment- and Power-Spectrum-Based Gaussianity Regularization for Text-to-Image Models

Authors: Jisung Hwang, Jaihoon Kim, Minhyuk Sung

Abstract: We propose a novel regularization loss that enforces standard Gaussianity, encouraging samples to align with a standard Gaussian distribution. This facilitates a range of downstream tasks involving optimization in the latent space of text-to-image models. We treat elements of a high-dimensional sample as one-dimensional standard Gaussian variables and define a composite loss that combines moment-based regularization in the spatial domain with power spectrum-based regularization in the spectral domain. Since the expected values of moments and power spectrum distributions are analytically known, the loss promotes conformity to these properties. To ensure permutation invariance, the losses are applied to randomly permuted inputs. Notably, existing Gaussianity-based regularizations fall within our unified framework: some correspond to moment losses of specific orders, while the previous covariance-matching loss is equivalent to our spectral loss but incurs higher time complexity due to its spatial-domain computation. We showcase the application of our regularization in generative modeling for test-time reward alignment with a text-to-image model, specifically to enhance aesthetics and text alignment. Our regularization outperforms previous Gaussianity regularization, effectively prevents reward hacking and accelerates convergence.

replace-cross RINO: Renormalization Group Invariance with No Labels

Authors: Zichun Hao, Raghav Kansal, Abhijith Gandrakota, Chang Sun, Ngadiuba Jennifer, Javier Duarte, Maria Spiropulu

Abstract: A common challenge with supervised machine learning (ML) in high energy physics (HEP) is the reliance on simulations for labeled data, which can often mismodel the underlying collision or detector response. To help mitigate this problem of domain shift, we propose RINO (Renormalization Group Invariance with No Labels), a self-supervised learning approach that can instead pretrain models directly on collision data, learning embeddings invariant to renormalization group flow scales. In this work, we pretrain a transformer-based model on jets originating from quantum chromodynamic (QCD) interactions from the JetClass dataset, emulating real QCD-dominated experimental data, and then finetune on the JetNet dataset -- emulating simulations -- for the task of identifying jets originating from top quark decays. RINO demonstrates improved generalization from the JetNet training data to JetClass data compared to supervised training on JetNet from scratch, demonstrating the potential for RINO pretraining on real collision data followed by fine-tuning on small, high-quality MC datasets, to improve the robustness of ML models in HEP.

replace-cross Nearest Neighbor Projection Removal Adversarial Training

Authors: Himanshu Singh, A. V. Subramanyam, Shivank Rajput, Mohan Kankanhalli

Abstract: Deep neural networks have exhibited impressive performance in image classification tasks but remain vulnerable to adversarial examples. Standard adversarial training enhances robustness but typically fails to explicitly address inter-class feature overlap, a significant contributor to adversarial susceptibility. In this work, we introduce a novel adversarial training framework that actively mitigates inter-class proximity by projecting out inter-class dependencies from adversarial and clean samples in the feature space. Specifically, our approach first identifies the nearest inter-class neighbors for each adversarial sample and subsequently removes projections onto these neighbors to enforce stronger feature separability. Theoretically, we demonstrate that our proposed logits correction reduces the Lipschitz constant of neural networks, thereby lowering the Rademacher complexity, which directly contributes to improved generalization and robustness. Extensive experiments across standard benchmarks including CIFAR-10, CIFAR-100, and SVHN show that our method demonstrates strong performance that is competitive with leading adversarial training techniques, highlighting significant achievements in both robust and clean accuracy. Our findings reveal the importance of addressing inter-class feature proximity explicitly to bolster adversarial robustness in DNNs.