new Explainable Machine Learning: An Illustration of Kolmogorov-Arnold Network Model for Airfoil Lift Prediction

Authors: Sudhanva Kulkarni

Abstract: Data science has emerged as fourth paradigm of scientific exploration. However many machine learning models operate as black boxes offering limited insight into the reasoning behind their predictions. This lack of transparency is one of the drawbacks to generate new knowledge from data. Recently Kolmogorov-Arnold Network or KAN has been proposed as an alternative model which embeds explainable AI. This study demonstrates the potential of KAN for new scientific exploration. KAN along with five other popular supervised machine learning models are applied to the well-known problem of airfoil lift prediction in aerospace engineering. Standard data generated from an earlier study on 2900 different airfoils is used. KAN performed the best with an R2 score of 96.17 percent on the test data, surpassing both the baseline model and Multi Layer Perceptron. Explainability of KAN is shown by pruning and symbolizing the model resulting in an equation for coefficient of lift in terms of input variables. The explainable information retrieved from KAN model is found to be consistent with the known physics of lift generation by airfoil thus demonstrating its potential to aid in scientific exploration.

new Shared DIFF Transformer

Authors: Yueyang Cang, Yuhang Liu, Xiaoteng Zhang, Xiangju Wang

Abstract: DIFF Transformer improves attention allocation by enhancing focus on relevant context while suppressing noise. It introduces a differential attention mechanism that calculates the difference between two independently generated attention distributions, effectively reducing noise and promoting sparse attention patterns. However, the independent signal generation in DIFF Transformer results in parameter redundancy and suboptimal utilization of information. In this work, we propose Shared DIFF Transformer, which draws on the idea of a differential amplifier by introducing a shared base matrix to model global patterns and incorporating low-rank updates to enhance task-specific flexibility. This design significantly reduces parameter redundancy, improves efficiency, and retains strong noise suppression capabilities. Experimental results show that, compared to DIFF Transformer, our method achieves better performance in tasks such as long-sequence modeling, key information retrieval, and in-context learning. Our work provides a novel and efficient approach to optimizing differential attention mechanisms and advancing robust Transformer architectures.

new DReSS: Data-driven Regularized Structured Streamlining for Large Language Models

Authors: Mingkuan Feng, Jinyang Wu, Shuai Zhang, Pengpeng Shao, Ruihan Jin, Zhengqi Wen, Jianhua Tao, Feihu Che

Abstract: Large language models (LLMs) have achieved significant progress across various domains, but their increasing scale results in high computational and memory costs. Recent studies have revealed that LLMs exhibit sparsity, providing the potential to reduce model size through pruning techniques. However, existing pruning methods typically follow a prune-then-finetune paradigm. Since the pruned components still contain valuable information, their direct removal often leads to irreversible performance degradation, imposing a substantial computational burden to recover performance during finetuning. In this paper, we propose a novel paradigm that first applies regularization, then prunes, and finally finetunes. Based on this paradigm, we introduce DReSS, a simple and effective Data-driven Regularized Structured Streamlining method for LLMs. By leveraging a small amount of data to regularize the components to be pruned, DReSS explicitly transfers the important information to the remaining parts of the model in advance. Compared to direct pruning, this can reduce the information loss caused by parameter removal, thereby enhancing its language modeling capabilities. Experimental results demonstrate that DReSS significantly outperforms existing pruning methods even under extreme pruning ratios, significantly reducing latency and increasing throughput.

new Deep Ensembles Secretly Perform Empirical Bayes

Authors: Gabriel Loaiza-Ganem, Valentin Villecroze, Yixin Wang

Abstract: Quantifying uncertainty in neural networks is a highly relevant problem which is essential to many applications. The two predominant paradigms to tackle this task are Bayesian neural networks (BNNs) and deep ensembles. Despite some similarities between these two approaches, they are typically surmised to lack a formal connection and are thus understood as fundamentally different. BNNs are often touted as more principled due to their reliance on the Bayesian paradigm, whereas ensembles are perceived as more ad-hoc; yet, deep ensembles tend to empirically outperform BNNs, with no satisfying explanation as to why this is the case. In this work we bridge this gap by showing that deep ensembles perform exact Bayesian averaging with a posterior obtained with an implicitly learned data-dependent prior. In other words deep ensembles are Bayesian, or more specifically, they implement an empirical Bayes procedure wherein the prior is learned from the data. This perspective offers two main benefits: (i) it theoretically justifies deep ensembles and thus provides an explanation for their strong empirical performance; and (ii) inspection of the learned prior reveals it is given by a mixture of point masses -- the use of such a strong prior helps elucidate observed phenomena about ensembles. Overall, our work delivers a newfound understanding of deep ensembles which is not only of interest in it of itself, but which is also likely to generate future insights that drive empirical improvements for these models.

new Variational Combinatorial Sequential Monte Carlo for Bayesian Phylogenetics in Hyperbolic Space

Authors: Alex Chen, Philipe Chlenski, Kenneth Munyuza, Antonio Khalil Moretti, Christian A. Naesseth, Itsik Pe'er

Abstract: Hyperbolic space naturally encodes hierarchical structures such as phylogenies (binary trees), where inward-bending geodesics reflect paths through least common ancestors, and the exponential growth of neighborhoods mirrors the super-exponential scaling of topologies. This scaling challenge limits the efficiency of Euclidean-based approximate inference methods. Motivated by the geometric connections between trees and hyperbolic space, we develop novel hyperbolic extensions of two sequential search algorithms: Combinatorial and Nested Combinatorial Sequential Monte Carlo (\textsc{Csmc} and \textsc{Ncsmc}). Our approach introduces consistent and unbiased estimators, along with variational inference methods (\textsc{H-Vcsmc} and \textsc{H-Vncsmc}), which outperform their Euclidean counterparts. Empirical results demonstrate improved speed, scalability and performance in high-dimensional phylogenetic inference tasks.

new KoopAGRU: A Koopman-based Anomaly Detection in Time-Series using Gated Recurrent Units

Authors: Issam Ait Yahia, Ismail Berrada

Abstract: Anomaly detection in real-world time-series data is a challenging task due to the complex and nonlinear temporal dynamics involved. This paper introduces KoopAGRU, a new deep learning model designed to tackle this problem by combining Fast Fourier Transform (FFT), Deep Dynamic Mode Decomposition (DeepDMD), and Koopman theory. FFT allows KoopAGRU to decompose temporal data into time-variant and time-invariant components providing precise modeling of complex patterns. To better control these two components, KoopAGRU utilizes Gate Recurrent Unit (GRU) encoders to learn Koopman observables, enhancing the detection capability across multiple temporal scales. KoopAGRU is trained in a single process and offers fast inference times. Extensive tests on various benchmark datasets show that KoopAGRU outperforms other leading methods, achieving a new average F1-score of 90.88\% on the well-known anomalies detection task of times series datasets, and proves to be efficient and reliable in detecting anomalies in real-world scenarios.

new Topological Signatures of Adversaries in Multimodal Alignments

Authors: Minh Vu, Geigh Zollicoffer, Huy Mai, Ben Nebgen, Boian Alexandrov, Manish Bhattarai

Abstract: Multimodal Machine Learning systems, particularly those aligning text and image data like CLIP/BLIP models, have become increasingly prevalent, yet remain susceptible to adversarial attacks. While substantial research has addressed adversarial robustness in unimodal contexts, defense strategies for multimodal systems are underexplored. This work investigates the topological signatures that arise between image and text embeddings and shows how adversarial attacks disrupt their alignment, introducing distinctive signatures. We specifically leverage persistent homology and introduce two novel Topological-Contrastive losses based on Total Persistence and Multi-scale kernel methods to analyze the topological signatures introduced by adversarial perturbations. We observe a pattern of monotonic changes in the proposed topological losses emerging in a wide range of attacks on image-text alignments, as more adversarial samples are introduced in the data. By designing an algorithm to back-propagate these signatures to input samples, we are able to integrate these signatures into Maximum Mean Discrepancy tests, creating a novel class of tests that leverage topological signatures for better adversarial detection.

new When less is more: evolving large neural networks from small ones

Authors: Anil Radhakrishnan, John F. Lindner, Scott T. Miller, Sudeshna Sinha, William L. Ditto

Abstract: In contrast to conventional artificial neural networks, which are large and structurally static, we study feed-forward neural networks that are small and dynamic, whose nodes can be added (or subtracted) during training. A single neuronal weight in the network controls the network's size, while the weight itself is optimized by the same gradient-descent algorithm that optimizes the network's other weights and biases, but with a size-dependent objective or loss function. We train and evaluate such Nimble Neural Networks on nonlinear regression and classification tasks where they outperform the corresponding static networks. Growing networks to minimal, appropriate, or optimal sizes while training elucidates network dynamics and contrasts with pruning large networks after training but before deployment.

new A Proximal Operator for Inducing 2:4-Sparsity

Authors: Jonas M K\"ubler, Yu-Xiang Wang, Shoham Sabach, Navid Ansari, Matth\"aus Kleindessner, Kailash Budhathoki, Volkan Cevher, George Karypis

Abstract: Recent hardware advancements in AI Accelerators and GPUs allow to efficiently compute sparse matrix multiplications, especially when 2 out of 4 consecutive weights are set to zero. However, this so-called 2:4 sparsity usually comes at a decreased accuracy of the model. We derive a regularizer that exploits the local correlation of features to find better sparsity masks in trained models. We minimize the regularizer jointly with a local squared loss by deriving the proximal operator for which we show that it has an efficient solution in the 2:4-sparse case. After optimizing the mask, we use maskedgradient updates to further minimize the local squared loss. We illustrate our method on toy problems and apply it to pruning entire large language models up to 70B parameters. On models up to 13B we improve over previous state of the art algorithms, whilst on 70B models we match their performance.

new KNN and K-means in Gini Prametric Spaces

Authors: Cassandra Mussard, Arthur Charpentier, St\'ephane Mussard

Abstract: This paper introduces innovative enhancements to the K-means and K-nearest neighbors (KNN) algorithms based on the concept of Gini prametric spaces. Unlike traditional distance metrics, Gini-based measures incorporate both value-based and rank-based information, improving robustness to noise and outliers. The main contributions of this work include: proposing a Gini-based measure that captures both rank information and value distances; presenting a Gini K-means algorithm that is proven to converge and demonstrates resilience to noisy data; and introducing a Gini KNN method that performs competitively with state-of-the-art approaches such as Hassanat's distance in noisy environments. Experimental evaluations on 14 datasets from the UCI repository demonstrate the superior performance and efficiency of Gini-based algorithms in clustering and classification tasks. This work opens new avenues for leveraging rank-based measures in machine learning and statistical analysis.

new Joint Pricing and Resource Allocation: An Optimal Online-Learning Approach

Authors: Jianyu Xu, Xuan Wang, Yu-Xiang Wang, Jiashuo Jiang

Abstract: We study an online learning problem on dynamic pricing and resource allocation, where we make joint pricing and inventory decisions to maximize the overall net profit. We consider the stochastic dependence of demands on the price, which complicates the resource allocation process and introduces significant non-convexity and non-smoothness to the problem. To solve this problem, we develop an efficient algorithm that utilizes a "Lower-Confidence Bound (LCB)" meta-strategy over multiple OCO agents. Our algorithm achieves $\tilde{O}(\sqrt{Tmn})$ regret (for $m$ suppliers and $n$ consumers), which is optimal with respect to the time horizon $T$. Our results illustrate an effective integration of statistical learning methodologies with complex operations research problems.

new SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders

Authors: Bartosz Cywi\'nski, Kamil Deja

Abstract: Recent machine unlearning approaches offer promising solution for removing unwanted concepts from diffusion models. However, traditional methods, which largely rely on fine-tuning, provide little insight into the changes they introduce to the base model, making it unclear whether concepts are truly removed or only masked. In this work, we introduce SAeUron, a novel method leveraging features learned by sparse autoencoders (SAEs) to unlearn unwanted concepts in text-to-image diffusion models. First, we demonstrate that SAEs, trained in an unsupervised manner on activations from multiple denoising timesteps of the diffusion model, capture sparse and interpretable features corresponding to specific concepts. Building on this, we propose a method of selecting concept-specific features. This enables precise interventions on the model's activations to block targeted content while preserving the model's overall performance. Evaluation on the competitive UnlearnCanvas benchmark on object and style unlearning highlights SAeUron's state-of-the-art performance. Moreover, we show that with a single SAE, we can remove multiple concepts simultaneously and that in contrast to other methods, SAeUron dismisses the possibility of generating unwanted content, even under adversarial attack.

new Current Pathology Foundation Models are unrobust to Medical Center Differences

Authors: Edwin D. de Jong, Eric Marcus, Jonas Teuwen

Abstract: Pathology Foundation Models (FMs) hold great promise for healthcare. Before they can be used in clinical practice, it is essential to ensure they are robust to variations between medical centers. We measure whether pathology FMs focus on biological features like tissue and cancer type, or on the well known confounding medical center signatures introduced by staining procedure and other differences. We introduce the Robustness Index. This novel robustness metric reflects to what degree biological features dominate confounding features. Ten current publicly available pathology FMs are evaluated. We find that all current pathology foundation models evaluated represent the medical center to a strong degree. Significant differences in the robustness index are observed. Only one model so far has a robustness index greater than one, meaning biological features dominate confounding features, but only slightly. A quantitative approach to measure the influence of medical center differences on FM-based prediction performance is described. We analyze the impact of unrobustness on classification performance of downstream models, and find that cancer-type classification errors are not random, but specifically attributable to same-center confounders: images of other classes from the same medical center. We visualize FM embedding spaces, and find these are more strongly organized by medical centers than by biological factors. As a consequence, the medical center of origin is predicted more accurately than the tissue source and cancer type. The robustness index introduced here is provided with the aim of advancing progress towards clinical adoption of robust and reliable pathology FMs.

new Learning the Optimal Stopping for Early Classification within Finite Horizons via Sequential Probability Ratio Test

Authors: Akinori F. Ebihara, Taiki Miyagawa, Kazuyuki Sakurai, Hitoshi Imaoka

Abstract: Time-sensitive machine learning benefits from Sequential Probability Ratio Test (SPRT), which provides an optimal stopping time for early classification of time series. However, in finite horizon scenarios, where input lengths are finite, determining the optimal stopping rule becomes computationally intensive due to the need for backward induction, limiting practical applicability. We thus introduce FIRMBOUND, an SPRT-based framework that efficiently estimates the solution to backward induction from training data, bridging the gap between optimal stopping theory and real-world deployment. It employs density ratio estimation and convex function learning to provide statistically consistent estimators for sufficient statistic and conditional expectation, both essential for solving backward induction; consequently, FIRMBOUND minimizes Bayes risk to reach optimality. Additionally, we present a faster alternative using Gaussian process regression, which significantly reduces training time while retaining low deployment overhead, albeit with potential compromise in statistical consistency. Experiments across independent and identically distributed (i.i.d.), non-i.i.d., binary, multiclass, synthetic, and real-world datasets show that FIRMBOUND achieves optimalities in the sense of Bayes risk and speed-accuracy tradeoff. Furthermore, it advances the tradeoff boundary toward optimality when possible and reduces decision-time variance, ensuring reliable decision-making. Code is publicly available at https://github.com/Akinori-F-Ebihara/FIRMBOUND

URLs: https://github.com/Akinori-F-Ebihara/FIRMBOUND

new FinanceQA: A Benchmark for Evaluating Financial Analysis Capabilities of Large Language Models

Authors: Spencer Mateega, Carlos Georgescu, Danny Tang

Abstract: FinanceQA is a testing suite that evaluates LLMs' performance on complex numerical financial analysis tasks that mirror real-world investment work. Despite recent advances, current LLMs fail to meet the strict accuracy requirements of financial institutions, with models failing approximately 60% of realistic tasks that mimic on-the-job analyses at hedge funds, private equity firms, investment banks, and other financial institutions. The primary challenges include hand-spreading metrics, adhering to standard accounting and corporate valuation conventions, and performing analysis under incomplete information - particularly in multi-step tasks requiring assumption generation. This performance gap highlights the disconnect between existing LLM capabilities and the demands of professional financial analysis that are inadequately tested by current testing architectures. Results show that higher-quality training data is needed to support such tasks, which we experiment with using OpenAI's fine-tuning API. FinanceQA is publicly released at [this https URL](https://huggingface.co/datasets/AfterQuery/FinanceQA).

URLs: https://huggingface.co/datasets/AfterQuery/FinanceQA).

new Towards Transparent and Accurate Diabetes Prediction Using Machine Learning and Explainable Artificial Intelligence

Authors: Pir Bakhsh Khokhar, Viviana Pentangelo, Fabio Palomba, Carmine Gravino

Abstract: Diabetes mellitus (DM) is a global health issue of significance that must be diagnosed as early as possible and managed well. This study presents a framework for diabetes prediction using Machine Learning (ML) models, complemented with eXplainable Artificial Intelligence (XAI) tools, to investigate both the predictive accuracy and interpretability of the predictions from ML models. Data Preprocessing is based on the Synthetic Minority Oversampling Technique (SMOTE) and feature scaling used on the Diabetes Binary Health Indicators dataset to deal with class imbalance and variability of clinical features. The ensemble model provided high accuracy, with a test accuracy of 92.50% and an ROC-AUC of 0.975. BMI, Age, General Health, Income, and Physical Activity were the most influential predictors obtained from the model explanations. The results of this study suggest that ML combined with XAI is a promising means of developing accurate and computationally transparent tools for use in healthcare systems.

new DIAL: Distribution-Informed Adaptive Learning of Multi-Task Constraints for Safety-Critical Systems

Authors: Se-Wook Yoo, Seung-Woo Seo

Abstract: Safe reinforcement learning has traditionally relied on predefined constraint functions to ensure safety in complex real-world tasks, such as autonomous driving. However, defining these functions accurately for varied tasks is a persistent challenge. Recent research highlights the potential of leveraging pre-acquired task-agnostic knowledge to enhance both safety and sample efficiency in related tasks. Building on this insight, we propose a novel method to learn shared constraint distributions across multiple tasks. Our approach identifies the shared constraints through imitation learning and then adapts to new tasks by adjusting risk levels within these learned distributions. This adaptability addresses variations in risk sensitivity stemming from expert-specific biases, ensuring consistent adherence to general safety principles even with imperfect demonstrations. Our method can be applied to control and navigation domains, including multi-task and meta-task scenarios, accommodating constraints such as maintaining safe distances or adhering to speed limits. Experimental results validate the efficacy of our approach, demonstrating superior safety performance and success rates compared to baselines, all without requiring task-specific constraint definitions. These findings underscore the versatility and practicality of our method across a wide range of real-world tasks.

new Learning Provablely Improves the Convergence of Gradient Descent

Authors: Qingyu Song, Wei Lin, Hong Xu

Abstract: As a specialized branch of deep learning, Learning to Optimize (L2O) tackles optimization problems by training DNN-based solvers. Despite achieving significant success in various scenarios, such as faster convergence in solving convex optimizations and improved optimality in addressing non-convex cases, there remains a deficiency in theoretical support. Current research heavily relies on stringent assumptions that do not align with the intricacies of the training process. To address this gap, our study aims to establish L2O's convergence through its training methodology. We demonstrate that learning an algorithm's hyperparameters significantly enhances its convergence. Focusing on the gradient descent (GD) algorithm for quadratic programming, we prove the convergence of L2O's training using the neural tangent kernel theory. Moreover, we conduct empirical evaluations using synthetic datasets. Our findings indicate exceeding 50\% outperformance over the GD methods.

new Reward Prediction Error Prioritisation in Experience Replay: The RPE-PER Method

Authors: Hoda Yamani, Yuning Xing, Lee Violet C. Ong, Bruce A. MacDonald, Henry Williams

Abstract: Reinforcement Learning algorithms aim to learn optimal control strategies through iterative interactions with an environment. A critical element in this process is the experience replay buffer, which stores past experiences, allowing the algorithm to learn from a diverse range of interactions rather than just the most recent ones. This buffer is especially essential in dynamic environments with limited experiences. However, efficiently selecting high-value experiences to accelerate training remains a challenge. Drawing inspiration from the role of reward prediction errors (RPEs) in biological systems, where they are essential for adaptive behaviour and learning, we introduce Reward Predictive Error Prioritised Experience Replay (RPE-PER). This novel approach prioritises experiences in the buffer based on RPEs. Our method employs a critic network, EMCN, that predicts rewards in addition to the Q-values produced by standard critic networks. The discrepancy between these predicted and actual rewards is computed as RPE and utilised as a signal for experience prioritisation. Experimental evaluations across various continuous control tasks demonstrate RPE-PER's effectiveness in enhancing the learning speed and performance of off-policy actor-critic algorithms compared to baseline approaches.

new AlphaAdam:Asynchronous Masked Optimization with Dynamic Alpha for Selective Updates

Authors: Da Chang, Yu Li, Ganzhao Yuan

Abstract: In the training of large language models (LLMs), updating parameters more efficiently and stably has always been an important challenge. To achieve efficient parameter updates, existing methods usually achieve performance comparable to full parameter updates through methods such as low-dimensional decomposition or layer-wise selective updates. In this work, we propose AlphaAdam, an optimization framework for LLM from the perspective of intra-layer parameter updates. By decoupling parameter updates and dynamically adjusting their strength, AlphaAdam accelerates convergence and improves training stability. We construct parameter masks based on the consistency of historical momentum and gradient direction and combine them with an adaptive mask strength strategy to ensure efficient optimization and theoretical convergence guarantees, which is also applicable to most momentum-based optimizers. Extensive experiments show that AlphaAdam outperforms state-of-the-art methods such as AdamW in terms of convergence speed and computational efficiency across tasks, including GPT-2 pre-trained and fine-tuned RoBERTa and Llama-7B. Our AlphaAdam implements an optimizer enhancement framework for LLMs through intra-layer asynchronous masked adaptive updates. Our code is available in this \href{https://github.com/MaeChd/AlphaAdam}{link}

URLs: https://github.com/MaeChd/AlphaAdam

new Scaling Inference-Efficient Language Models

Authors: Song Bian, Minghao Yan, Shivaram Venkataraman

Abstract: Scaling laws are powerful tools to predict the performance of large language models. However, current scaling laws fall short of accounting for inference costs. In this work, we first show that model architecture affects inference latency, where models of the same size can have up to 3.5x difference in latency. To tackle this challenge, we modify the Chinchilla scaling laws to co-optimize the model parameter count, the number of training tokens, and the model architecture. Due to the reason that models of similar training loss exhibit gaps in downstream evaluation, we also propose a novel method to train inference-efficient models based on the revised scaling laws. We perform extensive empirical studies to fit and evaluate our inference-aware scaling laws. We vary model parameters from 80M to 1B, training tokens from 1.6B to 30B, and model shapes, training a total of 63 models. Guided by our inference-efficient scaling law and model selection method, we release the Morph-1B model, which improves inference latency by 1.8x while maintaining accuracy on downstream tasks compared to open-source models, pushing the Pareto frontier of accuracy-latency tradeoff.

new ACTGNN: Assessment of Clustering Tendency with Synthetically-Trained Graph Neural Networks

Authors: Yiran Luo, Evangelos E. Papalexakis

Abstract: Determining clustering tendency in datasets is a fundamental but challenging task, especially in noisy or high-dimensional settings where traditional methods, such as the Hopkins Statistic and Visual Assessment of Tendency (VAT), often struggle to produce reliable results. In this paper, we propose ACTGNN, a graph-based framework designed to assess clustering tendency by leveraging graph representations of data. Node features are constructed using Locality-Sensitive Hashing (LSH), which captures local neighborhood information, while edge features incorporate multiple similarity metrics, such as the Radial Basis Function (RBF) kernel, to model pairwise relationships. A Graph Neural Network (GNN) is trained exclusively on synthetic datasets, enabling robust learning of clustering structures under controlled conditions. Extensive experiments demonstrate that ACTGNN significantly outperforms baseline methods on both synthetic and real-world datasets, exhibiting superior performance in detecting faint clustering structures, even in high-dimensional or noisy data. Our results highlight the generalizability and effectiveness of the proposed approach, making it a promising tool for robust clustering tendency assessment.

new VQLTI: Long-Term Tropical Cyclone Intensity Forecasting with Physical Constraints

Authors: Xinyu Wang, Lei Liu, Kang Chen, Tao Han, Bin Li, Lei Bai

Abstract: Tropical cyclone (TC) intensity forecasting is crucial for early disaster warning and emergency decision-making. Numerous researchers have explored deep-learning methods to address computational and post-processing issues in operational forecasting. Regrettably, they exhibit subpar long-term forecasting capabilities. We use two strategies to enhance long-term forecasting. (1) By enhancing the matching between TC intensity and spatial information, we can improve long-term forecasting performance. (2) Incorporating physical knowledge and physical constraints can help mitigate the accumulation of forecasting errors. To achieve the above strategies, we propose the VQLTI framework. VQLTI transfers the TC intensity information to a discrete latent space while retaining the spatial information differences, using large-scale spatial meteorological data as conditions. Furthermore, we leverage the forecast from the weather prediction model FengWu to provide additional physical knowledge for VQLTI. Additionally, we calculate the potential intensity (PI) to impose physical constraints on the latent variables. In the global long-term TC intensity forecasting, VQLTI achieves state-of-the-art results for the 24h to 120h, with the MSW (Maximum Sustained Wind) forecast error reduced by 35.65%-42.51% compared to ECMWF-IFS.

new Battery State of Health Estimation Using LLM Framework

Authors: Aybars Yunusoglu, Dexter Le, Karn Tiwari, Murat Isik, I. Can Dikmen

Abstract: Battery health monitoring is critical for the efficient and reliable operation of electric vehicles (EVs). This study introduces a transformer-based framework for estimating the State of Health (SoH) and predicting the Remaining Useful Life (RUL) of lithium titanate (LTO) battery cells by utilizing both cycle-based and instantaneous discharge data. Testing on eight LTO cells under various cycling conditions over 500 cycles, we demonstrate the impact of charge durations on energy storage trends and apply Differential Voltage Analysis (DVA) to monitor capacity changes (dQ/dV) across voltage ranges. Our LLM model achieves superior performance, with a Mean Absolute Error (MAE) as low as 0.87\% and varied latency metrics that support efficient processing, demonstrating its strong potential for real-time integration into EVs. The framework effectively identifies early signs of degradation through anomaly detection in high-resolution data, facilitating predictive maintenance to prevent sudden battery failures and enhance energy efficiency.

new Tensor Completion for Surrogate Modeling of Material Property Prediction

Authors: Shaan Pakala, Dawon Ahn, Evangelos Papalexakis

Abstract: When designing materials to optimize certain properties, there are often many possible configurations of designs that need to be explored. For example, the materials' composition of elements will affect properties such as strength or conductivity, which are necessary to know when developing new materials. Exploring all combinations of elements to find optimal materials becomes very time consuming, especially when there are more design variables. For this reason, there is growing interest in using machine learning (ML) to predict a material's properties. In this work, we model the optimization of certain material properties as a tensor completion problem, to leverage the structure of our datasets and navigate the vast number of combinations of material configurations. Across a variety of material property prediction tasks, our experiments show tensor completion methods achieving 10-20% decreased error compared with baseline ML models such as GradientBoosting and Multilayer Perceptron (MLP), while maintaining similar training speed.

new B3C: A Minimalist Approach to Offline Multi-Agent Reinforcement Learning

Authors: Woojun Kim, Katia Sycara

Abstract: Overestimation arising from selecting unseen actions during policy evaluation is a major challenge in offline reinforcement learning (RL). A minimalist approach in the single-agent setting -- adding behavior cloning (BC) regularization to existing online RL algorithms -- has been shown to be effective; however, this approach is understudied in multi-agent settings. In particular, overestimation becomes worse in multi-agent settings due to the presence of multiple actions, resulting in the BC regularization-based approach easily suffering from either over-regularization or critic divergence. To address this, we propose a simple yet effective method, Behavior Cloning regularization with Critic Clipping (B3C), which clips the target critic value in policy evaluation based on the maximum return in the dataset and pushes the limit of the weight on the RL objective over BC regularization, thereby improving performance. Additionally, we leverage existing value factorization techniques, particularly non-linear factorization, which is understudied in offline settings. Integrated with non-linear value factorization, B3C outperforms state-of-the-art algorithms on various offline multi-agent benchmarks.

new Dual-Bounded Nonlinear Optimal Transport for Size Constrained Min Cut Clustering

Authors: Fangyuan Xie, Jinghui Yuan, Feiping Nie, Xuelong Li

Abstract: Min cut is an important graph partitioning method. However, current solutions to the min cut problem suffer from slow speeds, difficulty in solving, and often converge to simple solutions. To address these issues, we relax the min cut problem into a dual-bounded constraint and, for the first time, treat the min cut problem as a dual-bounded nonlinear optimal transport problem. Additionally, we develop a method for solving dual-bounded nonlinear optimal transport based on the Frank-Wolfe method (abbreviated as DNF). Notably, DNF not only solves the size constrained min cut problem but is also applicable to all dual-bounded nonlinear optimal transport problems. We prove that for convex problems satisfying Lipschitz smoothness, the DNF method can achieve a convergence rate of $\mathcal{O}(\frac{1}{t})$. We apply the DNF method to the min cut problem and find that it achieves state-of-the-art performance in terms of both the loss function and clustering accuracy at the fastest speed, with a convergence rate of $\mathcal{O}(\frac{1}{\sqrt{t}})$. Moreover, the DNF method for the size constrained min cut problem requires no parameters and exhibits better stability.

new Faster Convergence of Riemannian Stochastic Gradient Descent with Increasing Batch Size

Authors: Kanata Oowada, Hideaki Iiduka

Abstract: Many models used in machine learning have become so large that even computer computation of the full gradient of the loss function is impractical. This has made it necessary to efficiently train models using limited available information, such as batch size and learning rate. We have theoretically analyzed the use of Riemannian stochastic gradient descent (RSGD) and found that using an increasing batch size leads to faster RSGD convergence than using a constant batch size not only with a constant learning rate but also with a decaying learning rate, such as cosine annealing decay and polynomial decay. In particular, RSGD has a better convergence rate $O(\frac{1}{\sqrt{T}})$ than the existing rate $O(\frac{\sqrt{\log T}}{\sqrt[4]{T}})$ with a diminishing learning rate, where $T$ is the number of iterations. The results of experiments on principal component analysis and low-rank matrix completion problems confirmed that, except for the MovieLens dataset and a constant learning rate, using a polynomial growth batch size or an exponential growth batch size results in better performance than using a constant batch size.

new Continually Evolved Multimodal Foundation Models for Cancer Prognosis

Authors: Jie Peng, Shuang Zhou, Longwei Yang, Yiran Song, Mohan Zhang, Kaixiong Zhou, Feng Xie, Mingquan Lin, Rui Zhang, Tianlong Chen

Abstract: Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates. To enhance prediction accuracy, previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information. However, existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals, thus rendering sub-optimal generalizability and limited utility in real-world applications. Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities. To address these, we propose a continually evolving multi-modal foundation model. Extensive experiments on the TCGA dataset demonstrate the effectiveness of our approach, highlighting its potential to advance cancer prognosis by enabling robust and adaptive multimodal integration.

new Advancing Personalized Federated Learning: Integrative Approaches with AI for Enhanced Privacy and Customization

Authors: Kevin Cooper, Michael Geller

Abstract: In the age of data-driven decision making, preserving privacy while providing personalized experiences has become paramount. Personalized Federated Learning (PFL) offers a promising framework by decentralizing the learning process, thus ensuring data privacy and reducing reliance on centralized data repositories. However, the integration of advanced Artificial Intelligence (AI) techniques within PFL remains underexplored. This paper proposes a novel approach that enhances PFL with cutting-edge AI methodologies including adaptive optimization, transfer learning, and differential privacy. We present a model that not only boosts the performance of individual client models but also ensures robust privacy-preserving mechanisms and efficient resource utilization across heterogeneous networks. Empirical results demonstrate significant improvements in model accuracy and personalization, along with stringent privacy adherence, as compared to conventional federated learning models. This work paves the way for a new era of truly personalized and privacy-conscious AI systems, offering significant implications for industries requiring compliance with stringent data protection regulations.

new Genetic Algorithm with Border Trades (GAB)

Authors: Qingchuan Lyu

Abstract: This paper introduces a novel approach to improving Genetic Algorithms (GA) in large or complex problem spaces by incorporating new chromosome patterns in the breeding process through border trade activities. These strategies increase chromosome diversity, preventing premature convergence and enhancing the GA's ability to explore the solution space more effectively. Empirical evidence demonstrates significant improvements in convergence behavior. This approach offers a promising pathway to addressing challenges in optimizing large or complex problem domains.

new In-Context Learning of Polynomial Kernel Regression in Transformers with GLU Layers

Authors: Haoyuan Sun, Ali Jadbabaie, Navid Azizan

Abstract: Transformer-based models have demonstrated remarkable ability in in-context learning (ICL), where they can adapt to unseen tasks from a prompt with a few examples, without requiring parameter updates. Recent research has provided insight into how linear Transformers can perform ICL by implementing gradient descent estimators. In particular, it has been shown that the optimal linear self-attention (LSA) mechanism can implement one step of gradient descent with respect to a linear least-squares objective when trained on random linear regression tasks. However, the theoretical understanding of ICL for nonlinear function classes remains limited. In this work, we address this gap by first showing that LSA is inherently restricted to solving linear least-squares objectives and thus, the solutions in prior works cannot readily extend to nonlinear ICL tasks. To overcome this limitation, drawing inspiration from modern architectures, we study a mechanism that combines LSA with GLU-like feed-forward layers and show that this allows the model to perform one step of gradient descent on a polynomial kernel regression. Further, we characterize the scaling behavior of the resulting Transformer model, highlighting the necessary model size to effectively handle quadratic ICL tasks. Our findings highlight the distinct roles of attention and feed-forward layers in nonlinear ICL and identify key challenges when extending ICL to nonlinear function classes.

new Neural Network Modeling of Microstructure Complexity Using Digital Libraries

Authors: Yingjie Zhao, Zhiping Xu

Abstract: Microstructure evolution in matter is often modeled numerically using field or level-set solvers, mirroring the dual representation of spatiotemporal complexity in terms of pixel or voxel data, and geometrical forms in vector graphics. Motivated by this analog, as well as the structural and event-driven nature of artificial and spiking neural networks, respectively, we evaluate their performance in learning and predicting fatigue crack growth and Turing pattern development. Predictions are made based on digital libraries constructed from computer simulations, which can be replaced by experimental data to lift the mathematical overconstraints of physics. Our assessment suggests that the leaky integrate-and-fire neuron model offers superior predictive accuracy with fewer parameters and less memory usage, alleviating the accuracy-cost tradeoff in contrast to the common practices in computer vision tasks. Examination of network architectures shows that these benefits arise from its reduced weight range and sparser connections. The study highlights the capability of event-driven models in tackling problems with evolutionary bulk-phase and interface behaviors using the digital library approach.

new GDformer: Going Beyond Subsequence Isolation for Multivariate Time Series Anomaly Detection

Authors: Qingxiang Liu, Chenghao Liu, Sheng Sun, Di Yao, Yuxuan Liang

Abstract: Unsupervised anomaly detection of multivariate time series is a challenging task, given the requirements of deriving a compact detection criterion without accessing the anomaly points. The existing methods are mainly based on reconstruction error or association divergence, which are both confined to isolated subsequences with limited horizons, hardly promising unified series-level criterion. In this paper, we propose the Global Dictionary-enhanced Transformer (GDformer) with a renovated dictionary-based cross attention mechanism to cultivate the global representations shared by all normal points in the entire series. Accordingly, the cross-attention maps reflect the correlation weights between the point and global representations, which naturally leads to the representation-wise similarity-based detection criterion. To foster more compact detection boundary, prototypes are introduced to capture the distribution of normal point-global correlation weights. GDformer consistently achieves state-of-the-art unsupervised anomaly detection performance on five real-world benchmark datasets. Further experiments validate the global dictionary has great transferability among various datasets. The code is available at https://github.com/yuppielqx/GDformer.

URLs: https://github.com/yuppielqx/GDformer.

new Fundamental Challenges in Evaluating Text2SQL Solutions and Detecting Their Limitations

Authors: Cedric Renggli, Ihab F. Ilyas, Theodoros Rekatsinas

Abstract: In this work, we dive into the fundamental challenges of evaluating Text2SQL solutions and highlight potential failure causes and the potential risks of relying on aggregate metrics in existing benchmarks. We identify two largely unaddressed limitations in current open benchmarks: (1) data quality issues in the evaluation data, mainly attributed to the lack of capturing the probabilistic nature of translating a natural language description into a structured query (e.g., NL ambiguity), and (2) the bias introduced by using different match functions as approximations for SQL equivalence. To put both limitations into context, we propose a unified taxonomy of all Text2SQL limitations that can lead to both prediction and evaluation errors. We then motivate the taxonomy by providing a survey of Text2SQL limitations using state-of-the-art Text2SQL solutions and benchmarks. We describe the causes of limitations with real-world examples and propose potential mitigation solutions for each category in the taxonomy. We conclude by highlighting the open challenges encountered when deploying such mitigation strategies or attempting to automatically apply the taxonomy.

new HKAN: Hierarchical Kolmogorov-Arnold Network without Backpropagation

Authors: Grzegorz Dudek, Tomasz Rodak

Abstract: This paper introduces the Hierarchical Kolmogorov-Arnold Network (HKAN), a novel network architecture that offers a competitive alternative to the recently proposed Kolmogorov-Arnold Network (KAN). Unlike KAN, which relies on backpropagation, HKAN adopts a randomized learning approach, where the parameters of its basis functions are fixed, and linear aggregations are optimized using least-squares regression. HKAN utilizes a hierarchical multi-stacking framework, with each layer refining the predictions from the previous one by solving a series of linear regression problems. This non-iterative training method simplifies computation and eliminates sensitivity to local minima in the loss function. Empirical results show that HKAN delivers comparable, if not superior, accuracy and stability relative to KAN across various regression tasks, while also providing insights into variable importance. The proposed approach seamlessly integrates theoretical insights with practical applications, presenting a robust and efficient alternative for neural network modeling.

new Exploring Large Protein Language Models in Constrained Evaluation Scenarios within the FLIP Benchmark

Authors: Manuel F. Mollon, Joaquin Gonzalez-Rodriguez, Alicia Lozano-Diez, Daniel Ramos, Doroteo T. Toledano

Abstract: In this study, we expand upon the FLIP benchmark-designed for evaluating protein fitness prediction models in small, specialized prediction tasks-by assessing the performance of state-of-the-art large protein language models, including ESM-2 and SaProt on the FLIP dataset. Unlike larger, more diverse benchmarks such as ProteinGym, which cover a broad spectrum of tasks, FLIP focuses on constrained settings where data availability is limited. This makes it an ideal framework to evaluate model performance in scenarios with scarce task-specific data. We investigate whether recent advances in protein language models lead to significant improvements in such settings. Our findings provide valuable insights into the performance of large-scale models in specialized protein prediction tasks.

new PDE-DKL: PDE-constrained deep kernel learning in high dimensionality

Authors: Weihao Yan, Christoph Brune, Mengwu Guo

Abstract: Many physics-informed machine learning methods for PDE-based problems rely on Gaussian processes (GPs) or neural networks (NNs). However, both face limitations when data are scarce and the dimensionality is high. Although GPs are known for their robust uncertainty quantification in low-dimensional settings, their computational complexity becomes prohibitive as the dimensionality increases. In contrast, while conventional NNs can accommodate high-dimensional input, they often require extensive training data and do not offer uncertainty quantification. To address these challenges, we propose a PDE-constrained Deep Kernel Learning (PDE-DKL) framework that combines DL and GPs under explicit PDE constraints. Specifically, NNs learn a low-dimensional latent representation of the high-dimensional PDE problem, reducing the complexity of the problem. GPs then perform kernel regression subject to the governing PDEs, ensuring accurate solutions and principled uncertainty quantification, even when available data are limited. This synergy unifies the strengths of both NNs and GPs, yielding high accuracy, robust uncertainty estimates, and computational efficiency for high-dimensional PDEs. Numerical experiments demonstrate that PDE-DKL achieves high accuracy with reduced data requirements. They highlight its potential as a practical, reliable, and scalable solver for complex PDE-based applications in science and engineering.

new Reducing Aleatoric and Epistemic Uncertainty through Multi-modal Data Acquisition

Authors: Arthur Hoarau, Benjamin Quost, S\'ebastien Destercke, Willem Waegeman

Abstract: To generate accurate and reliable predictions, modern AI systems need to combine data from multiple modalities, such as text, images, audio, spreadsheets, and time series. Multi-modal data introduces new opportunities and challenges for disentangling uncertainty: it is commonly assumed in the machine learning community that epistemic uncertainty can be reduced by collecting more data, while aleatoric uncertainty is irreducible. However, this assumption is challenged in modern AI systems when information is obtained from different modalities. This paper introduces an innovative data acquisition framework where uncertainty disentanglement leads to actionable decisions, allowing sampling in two directions: sample size and data modality. The main hypothesis is that aleatoric uncertainty decreases as the number of modalities increases, while epistemic uncertainty decreases by collecting more observations. We provide proof-of-concept implementations on two multi-modal datasets to showcase our data acquisition framework, which combines ideas from active learning, active feature acquisition and uncertainty quantification.

new Pre-Trained Vision-Language Model Selection and Reuse for Downstream Tasks

Authors: Hao-Zhe Tan, Zhi Zhou, Lan-Zhe Guo, Yu-Feng Li

Abstract: Pre-trained Vision-Language Models (VLMs) are becoming increasingly popular across various visual tasks, and several open-sourced VLM variants have been released. However, selecting the best-performing pre-trained VLM for a specific downstream task is challenging since no single VLM can achieve promising performance on all downstream tasks, and evaluating all available VLMs is impossible due to time and data limitations. To address this problem, this paper proposes a novel paradigm to select and reuse VLM for downstream tasks, called Model Label Learning (MLL). The proposal contains three key modules: \emph{model labeling}, which assigns labels to each VLM to describe their specialty and utility; \emph{model selection}, which matches the requirements of the target task with model labels; and \emph{model reuse}, which applies selected VLMs to the target task in an ensemble manner. The proposal is highly computationally efficient and growable since the model labeling process is completed target task independent and the ability could grow with the number of candidate VLMs. We also introduce a new benchmark for evaluating VLM selection methods, including 49 VLMs and 17 target task datasets. Experimental results clearly demonstrate the effectiveness of the proposed method for selecting and reusing VLMs.

new Sebra: Debiasing Through Self-Guided Bias Ranking

Authors: Adarsh Kappiyath, Abhra Chaudhuri, Ajay Jaiswal, Ziquan Liu, Yunpeng Li, Xiatian Zhu, Lu Yin

Abstract: Ranking samples by fine-grained estimates of spuriosity (the degree to which spurious cues are present) has recently been shown to significantly benefit bias mitigation, over the traditional binary biased-\textit{vs}-unbiased partitioning of train sets. However, this spuriosity ranking comes with the requirement of human supervision. In this paper, we propose a debiasing framework based on our novel \ul{Se}lf-Guided \ul{B}ias \ul{Ra}nking (\emph{Sebra}), that mitigates biases (spurious correlations) via an automatic ranking of data points by spuriosity within their respective classes. Sebra leverages a key local symmetry in Empirical Risk Minimization (ERM) training -- the ease of learning a sample via ERM inversely correlates with its spuriousity; the fewer spurious correlations a sample exhibits, the harder it is to learn, and vice versa. However, globally across iterations, ERM tends to deviate from this symmetry. Sebra dynamically steers ERM to correct this deviation, facilitating the sequential learning of attributes in increasing order of difficulty, \ie, decreasing order of spuriosity. As a result, the sequence in which Sebra learns samples naturally provides spuriousity rankings. We use the resulting fine-grained bias characterization in a contrastive learning framework to mitigate biases from multiple sources. Extensive experiments show that Sebra consistently outperforms previous state-of-the-art unsupervised debiasing techniques across multiple standard benchmarks, including UrbanCars, BAR, CelebA, and ImageNet-1K. Code, pre-trained models, and training logs are available at https://kadarsh22.github.io/sebra_iclr25/.

URLs: https://kadarsh22.github.io/sebra_iclr25/.

new ReactEmbed: A Cross-Domain Framework for Protein-Molecule Representation Learning via Biochemical Reaction Networks

Authors: Amitay Sicherman, Kira Radinsky

Abstract: The challenge in computational biology and drug discovery lies in creating comprehensive representations of proteins and molecules that capture their intrinsic properties and interactions. Traditional methods often focus on unimodal data, such as protein sequences or molecular structures, limiting their ability to capture complex biochemical relationships. This work enhances these representations by integrating biochemical reactions encompassing interactions between molecules and proteins. By leveraging reaction data alongside pre-trained embeddings from state-of-the-art protein and molecule models, we develop ReactEmbed, a novel method that creates a unified embedding space through contrastive learning. We evaluate ReactEmbed across diverse tasks, including drug-target interaction, protein-protein interaction, protein property prediction, and molecular property prediction, consistently surpassing all current state-of-the-art models. Notably, we showcase ReactEmbed's practical utility through successful implementation in lipid nanoparticle-based drug delivery, enabling zero-shot prediction of blood-brain barrier permeability for protein-nanoparticle complexes. The code and comprehensive database of reaction pairs are available for open use at \href{https://github.com/amitaysicherman/ReactEmbed}{GitHub}.

URLs: https://github.com/amitaysicherman/ReactEmbed

new Leveraging Sparsity for Sample-Efficient Preference Learning: A Theoretical Perspective

Authors: Yunzhen Yao, Lie He, Michael Gastpar

Abstract: This paper considers the sample-efficiency of preference learning, which models and predicts human choices based on comparative judgments. The minimax optimal estimation rate $\Theta(d/n)$ in traditional estimation theory requires that the number of samples $n$ scales linearly with the dimensionality of the feature space $d$. However, the high dimensionality of the feature space and the high cost of collecting human-annotated data challenge the efficiency of traditional estimation methods. To remedy this, we leverage sparsity in the preference model and establish sharp estimation rates. We show that under the sparse random utility model, where the parameter of the reward function is $k$-sparse, the minimax optimal rate can be reduced to $\Theta(k/n \log(d/k))$. Furthermore, we analyze the $\ell_{1}$-regularized estimator and show that it achieves near-optimal rate under mild assumptions on the Gram matrix. Experiments on synthetic data and LLM alignment data validate our theoretical findings, showing that sparsity-aware methods significantly reduce sample complexity and improve prediction accuracy.

new Update Estimation and Scheduling for Over-the-Air Federated Learning with Energy Harvesting Devices

Authors: Furkan Bagci, Busra Tegin, Mohammad Kazemi, Tolga M. Duman

Abstract: We study over-the-air (OTA) federated learning (FL) for energy harvesting devices with heterogeneous data distribution over wireless fading multiple access channel (MAC). To address the impact of low energy arrivals and data heterogeneity on global learning, we propose user scheduling strategies. Specifically, we develop two approaches: 1) entropy-based scheduling for known data distributions and 2) least-squares-based user representation estimation for scheduling with unknown data distributions at the parameter server. Both methods aim to select diverse users, mitigating bias and enhancing convergence. Numerical and analytical results demonstrate improved learning performance by reducing redundancy and conserving energy.

new Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis

Authors: Haoxiong Liu, Jiacheng Sun, Zhenguo Li, Andrew C Yao

Abstract: The synergy between deep learning models and traditional automation tools plays a pivotal role in developing robust neural theorem provers (NTPs). However, for proof synthesis with LLMs, previous work applies automation tools either only when the model explicitly calls the method, or only at a single granularity level, failing to fully exploit the power of built-in tactics and off-the-shelf automated theorem provers. In this work, we propose ProofAug, a novel theorem proving method that enjoys superior sample efficiency through equipping proof-generation LLMs with automation methods in different granularities via fine-grained structure analysis of model-generated proof proposals. Furthermore, ProofAug serves as a versatile plug-and-play module that seamlessly integrates with any tree-search algorithm, enabling our construction of an efficient recursive proving (ERP) module to further enhance performance. The superiority of our method is validated on the miniF2F-test benchmark using the open-source deepseek-math-7b-base model and the Isabelle proof assistant. Notably, by additionally employing a mixed prompting strategy, we achieve a cumulative pass rate of 66.0% after curation of the dataset (61.9% for the original version), setting a new SOTA across all proof languages with a total sample budget of only 2100. Our code is available at https://github.com/haoxiongliu/ProofAug.

URLs: https://github.com/haoxiongliu/ProofAug.

new A Unified Perspective on the Dynamics of Deep Transformers

Authors: Val\'erie Castin, Pierre Ablin, Jos\'e Antonio Carrillo, Gabriel Peyr\'e

Abstract: Transformers, which are state-of-the-art in most machine learning tasks, represent the data as sequences of vectors called tokens. This representation is then exploited by the attention function, which learns dependencies between tokens and is key to the success of Transformers. However, the iterative application of attention across layers induces complex dynamics that remain to be fully understood. To analyze these dynamics, we identify each input sequence with a probability measure and model its evolution as a Vlasov equation called Transformer PDE, whose velocity field is non-linear in the probability measure. Our first set of contributions focuses on compactly supported initial data. We show the Transformer PDE is well-posed and is the mean-field limit of an interacting particle system, thus generalizing and extending previous analysis to several variants of self-attention: multi-head attention, L2 attention, Sinkhorn attention, Sigmoid attention, and masked attention--leveraging a conditional Wasserstein framework. In a second set of contributions, we are the first to study non-compactly supported initial conditions, by focusing on Gaussian initial data. Again for different types of attention, we show that the Transformer PDE preserves the space of Gaussian measures, which allows us to analyze the Gaussian case theoretically and numerically to identify typical behaviors. This Gaussian analysis captures the evolution of data anisotropy through a deep Transformer. In particular, we highlight a clustering phenomenon that parallels previous results in the non-normalized discrete case.

new Stream-Based Monitoring of Algorithmic Fairness

Authors: Jan Baumeister, Bernd Finkbeiner, Frederik Scheerer, Julian Siber, Tobias Wagenpfeil

Abstract: Automatic decision and prediction systems are increasingly deployed in applications where they significantly impact the livelihood of people, such as for predicting the creditworthiness of loan applicants or the recidivism risk of defendants. These applications have given rise to a new class of algorithmic-fairness specifications that require the systems to decide and predict without bias against social groups. Verifying these specifications statically is often out of reach for realistic systems, since the systems may, e.g., employ complex learning components, and reason over a large input space. In this paper, we therefore propose stream-based monitoring as a solution for verifying the algorithmic fairness of decision and prediction systems at runtime. Concretely, we present a principled way to formalize algorithmic fairness over temporal data streams in the specification language RTLola and demonstrate the efficacy of this approach on a number of benchmarks. Besides synthetic scenarios that particularly highlight its efficiency on streams with a scaling amount of data, we notably evaluate the monitor on real-world data from the recidivism prediction tool COMPAS.

new Transfer Learning of Surrogate Models: Integrating Domain Warping and Affine Transformations

Authors: Shuaiqun Pan, Diederick Vermetten, Manuel L\'opez-Ib\'a\~nez, Thomas B\"ack, Hao Wang

Abstract: Surrogate models provide efficient alternatives to computationally demanding real-world processes but often require large datasets for effective training. A promising solution to this limitation is the transfer of pre-trained surrogate models to new tasks. Previous studies have investigated the transfer of differentiable and non-differentiable surrogate models, typically assuming an affine transformation between the source and target functions. This paper extends previous research by addressing a broader range of transformations, including linear and nonlinear variations. Specifically, we consider the combination of an unknown input warping, such as one modelled by the beta cumulative distribution function, with an unspecified affine transformation. Our approach achieves transfer learning by employing a limited number of data points from the target task to optimize these transformations, minimizing empirical loss on the transfer dataset. We validate the proposed method on the widely used Black-Box Optimization Benchmark (BBOB) testbed and a real-world transfer learning task from the automobile industry. The results underscore the significant advantages of the approach, revealing that the transferred surrogate significantly outperforms both the original surrogate and the one built from scratch using the transfer dataset, particularly in data-scarce scenarios.

new State Stream Transformer (SST) : Emergent Metacognitive Behaviours Through Latent State Persistence

Authors: Thea Aviss

Abstract: We introduce the State Stream Transformer (SST), a novel LLM architecture that reveals emergent reasoning behaviours and capabilities latent in pretrained weights through addressing a fundamental limitation in traditional transformer models: the lack of latent computational continuity across autoregressive generations in the state space. SST introduces a sliding window latent state (FFN) cache with weighted decay that maintains and evolves persistent latent processes throughout autoregressive generations. Through controlled experiments comparing base and SST architectures using the same frozen weights, we demonstrate that this architectural modification alone enables enhanced reasoning capabilities which appear best explained by some form of potential higher-order processing, as evidenced by emergent metacognitive behaviours. These behaviours persist under controlled conditions designed to eliminate confounding factors such as stochastic variation or learned response patterns. Analysis of latent state distributions and processing dynamics provides evidence that it is solely the 'state stream' that is responsible for these phenomena. In quantitative evaluations, the SST achieves substantial performance improvements over the base model on two reasoning benchmarks, reaching 89.01\% accuracy on GSM-8K (0-shot) and 91.04\% on ARC Challenge (0-shot CoT). These findings indicate that persistent computation in the latent state space enables fundamentally different information processing and internal reasoning strategies, with implications for our understanding of artificial intelligence systems.

new Contrastive Learning Meets Pseudo-label-assisted Mixup Augmentation: A Comprehensive Graph Representation Framework from Local to Global

Authors: Jinlu Wang, Yanfeng Sun, Jiapu Wang, Junbin Gao, Shaofan Wang, Jipeng Guo

Abstract: Graph Neural Networks (GNNs) have demonstrated remarkable effectiveness in various graph representation learning tasks. However, most existing GNNs focus primarily on capturing local information through explicit graph convolution, often neglecting global message-passing. This limitation hinders the establishment of a collaborative interaction between global and local information, which is crucial for comprehensively understanding graph data. To address these challenges, we propose a novel framework called Comprehensive Graph Representation Learning (ComGRL). ComGRL integrates local information into global information to derive powerful representations. It achieves this by implicitly smoothing local information through flexible graph contrastive learning, ensuring reliable representations for subsequent global exploration. Then ComGRL transfers the locally derived representations to a multi-head self-attention module, enhancing their discriminative ability by uncovering diverse and rich global correlations. To further optimize local information dynamically under the self-supervision of pseudo-labels, ComGRL employs a triple sampling strategy to construct mixed node pairs and applies reliable Mixup augmentation across attributes and structure for local contrastive learning. This approach broadens the receptive field and facilitates coordination between local and global representation learning, enabling them to reinforce each other. Experimental results across six widely used graph datasets demonstrate that ComGRL achieves excellent performance in node classification tasks. The code could be available at https://github.com/JinluWang1002/ComGRL.

URLs: https://github.com/JinluWang1002/ComGRL.

new Robust Online Conformal Prediction under Uniform Label Noise

Authors: Huajun Xi, Kangdao Liu, Hao Zeng, Wenguang Sun, Hongxin Wei

Abstract: Conformal prediction is an emerging technique for uncertainty quantification that constructs prediction sets guaranteed to contain the true label with a predefined probability. Recent work develops online conformal prediction methods that adaptively construct prediction sets to accommodate distribution shifts. However, existing algorithms typically assume perfect label accuracy which rarely holds in practice. In this work, we investigate the robustness of online conformal prediction under uniform label noise with a known noise rate, in both constant and dynamic learning rate schedules. We show that label noise causes a persistent gap between the actual mis-coverage rate and the desired rate $\alpha$, leading to either overestimated or underestimated coverage guarantees. To address this issue, we propose Noise Robust Online Conformal Prediction (dubbed NR-OCP) by updating the threshold with a novel robust pinball los}, which provides an unbiased estimate of clean pinball loss without requiring ground-truth labels. Our theoretical analysis shows that NR-OCP eliminates the coverage gap in both constant and dynamic learning rate schedules, achieving a convergence rate of $\mathcal{O}(T^{-1/2})$ for both empirical and expected coverage errors under uniform label noise. Extensive experiments demonstrate the effectiveness of our method by achieving both precise coverage and improved efficiency.

new A Learnable Multi-views Contrastive Framework with Reconstruction Discrepancy for Medical Time-Series

Authors: Yifan Wang, Hongfeng Ai, Ruiqi Li, Maowei Jiang, Cheng Jiang, Chenzhong Li

Abstract: In medical time series disease diagnosis, two key challenges are identified.First, the high annotation cost of medical data leads to overfitting in models trained on label-limited, single-center datasets. To address this, we propose incorporating external data from related tasks and leveraging AE-GAN to extract prior knowledge,providing valuable references for downstream tasks. Second, many existing studies employ contrastive learning to derive more generalized medical sequence representations for diagnostic tasks, usually relying on manually designed diverse positive and negative sample pairs.However, these approaches are complex, lack generalizability, and fail to adaptively capture disease-specific features across different conditions.To overcome this, we introduce LMCF (Learnable Multi-views Contrastive Framework), a framework that integrates a multi-head attention mechanism and adaptively learns representations from different views through inter-view and intra-view contrastive learning strategies.Additionally, the pre-trained AE-GAN is used to reconstruct discrepancies in the target data as disease probabilities, which are then integrated into the contrastive learning process.Experiments on three target datasets demonstrate that our method consistently outperforms seven other baselines, highlighting its significant impact on healthcare applications such as the diagnosis of myocardial infarction, Alzheimer's disease, and Parkinson's disease.

new A Cartesian Encoding Graph Neural Network for Crystal Structures Property Prediction: Application to Thermal Ellipsoid Estimation

Authors: \`Alex Sol\'e, Albert Mosella-Montoro, Joan Cardona, Silvia G\'omez-Coca, Daniel Aravena, Eliseo Ruiz, Javier Ruiz-Hidalgo

Abstract: In diffraction-based crystal structure analysis, thermal ellipsoids, quantified via Anisotropic Displacement Parameters (ADPs), are critical yet challenging to determine. ADPs capture atomic vibrations, reflecting thermal and structural properties, but traditional computation is often expensive. This paper introduces CartNet, a novel graph neural network (GNN) for efficiently predicting crystal properties by encoding atomic geometry into Cartesian coordinates alongside the crystal temperature. CartNet integrates a neighbour equalization technique to emphasize covalent and contact interactions, and a Cholesky-based head to ensure valid ADP predictions. We also propose a rotational SO(3) data augmentation strategy during training to handle unseen orientations. An ADP dataset with over 200,000 experimental crystal structures from the Cambridge Structural Database (CSD) was curated to validate the approach. CartNet significantly reduces computational costs and outperforms existing methods in ADP prediction by 10.87%, while delivering a 34.77% improvement over theoretical approaches. We further evaluated CartNet on other datasets covering formation energy, band gap, total energy, energy above the convex hull, bulk moduli, and shear moduli, achieving 7.71% better results on the Jarvis Dataset and 13.16% on the Materials Project Dataset. These gains establish CartNet as a state-of-the-art solution for diverse crystal property predictions. Project website and online demo: https://www.ee.ub.edu/cartnet

URLs: https://www.ee.ub.edu/cartnet

new Function Encoders: A Principled Approach to Transfer Learning in Hilbert Spaces

Authors: Tyler Ingebrand, Adam J. Thorpe, Ufuk Topcu

Abstract: A central challenge in transfer learning is designing algorithms that can quickly adapt and generalize to new tasks without retraining. Yet, the conditions of when and how algorithms can effectively transfer to new tasks is poorly characterized. We introduce a geometric characterization of transfer in Hilbert spaces and define three types of inductive transfer: interpolation within the convex hull, extrapolation to the linear span, and extrapolation outside the span. We propose a method grounded in the theory of function encoders to achieve all three types of transfer. Specifically, we introduce a novel training scheme for function encoders using least-squares optimization, prove a universal approximation theorem for function encoders, and provide a comprehensive comparison with existing approaches such as transformers and meta-learning on four diverse benchmarks. Our experiments demonstrate that the function encoder outperforms state-of-the-art methods on four benchmark tasks and on all three types of transfer.

new Improved Replicable Boosting with Majority-of-Majorities

Authors: Kasper Green Larsen, Markus Engelund Mathiasen, Clement Svendsen

Abstract: We introduce a new replicable boosting algorithm which significantly improves the sample complexity compared to previous algorithms. The algorithm works by doing two layers of majority voting, using an improved version of the replicable boosting algorithm introduced by Impagliazzo et al. [2022] in the bottom layer.

new Segmentation of cracks in 3d images of fiber reinforced concrete using deep learning

Authors: Anna Nowacka, Katja Schladitz, Szymon Grzesiak, Matthias Pahn

Abstract: Cracks in concrete structures are very common and are an integral part of this heterogeneous material. Characteristics of cracks induced by standardized tests yield valuable information about the tested concrete formulation and its mechanical properties. Observing cracks on the surface of the concrete structure leaves a wealth of structural information unused. Computed tomography enables looking into the sample without interfering or destroying the microstructure. The reconstructed tomographic images are 3d images, consisting of voxels whose gray values represent local X-ray absorption. In order to identify voxels belonging to the crack, so to segment the crack structure in the images, appropriate algorithms need to be developed. Convolutional neural networks are known to solve this type of task very well given enough and consistent training data. We adapted a 3d version of the well-known U-Net and trained it on semi-synthetic 3d images of real concrete samples equipped with simulated crack structures. Here, we explain the general approach. Moreover, we show how to teach the network to detect also real crack systems in 3d images of varying types of real concrete, in particular of fiber reinforced concrete.

new Exploring Potential Prompt Injection Attacks in Federated Military LLMs and Their Mitigation

Authors: Youngjoon Lee, Taehyun Park, Yunho Lee, Jinu Gong, Joonhyuk Kang

Abstract: Federated Learning (FL) is increasingly being adopted in military collaborations to develop Large Language Models (LLMs) while preserving data sovereignty. However, prompt injection attacks-malicious manipulations of input prompts-pose new threats that may undermine operational security, disrupt decision-making, and erode trust among allies. This perspective paper highlights four potential vulnerabilities in federated military LLMs: secret data leakage, free-rider exploitation, system disruption, and misinformation spread. To address these potential risks, we propose a human-AI collaborative framework that introduces both technical and policy countermeasures. On the technical side, our framework uses red/blue team wargaming and quality assurance to detect and mitigate adversarial behaviors of shared LLM weights. On the policy side, it promotes joint AI-human policy development and verification of security protocols. Our findings will guide future research and emphasize proactive strategies for emerging military contexts.

new Causal Inference Real-Time Anomaly Detection with Synthetic Anomaly Monitoring (SAM)

Authors: Emanuele Luzio, Moacir Antonelli Ponti

Abstract: Anomaly detection is essential for identifying rare and significant events across diverse domains such as finance, cybersecurity, and network monitoring. This paper presents Synthetic Anomaly Monitoring (SAM), an innovative approach that applies synthetic control methods from causal inference to improve both the accuracy and interpretability of anomaly detection processes. By modeling normal behavior through the treatment of each feature as a control unit, SAM identifies anomalies as deviations within this causal framework. We conducted extensive experiments comparing SAM with established benchmark models, including Isolation Forest, Local Outlier Factor (LOF), k-Nearest Neighbors (kNN), and One-Class Support Vector Machine (SVM), across five diverse datasets, including Credit Card Fraud, HTTP Dataset CSIC 2010, and KDD Cup 1999, among others. Our results demonstrate that SAM consistently delivers robust performance, highlighting its potential as a powerful tool for real-time anomaly detection in dynamic and complex environments.

new Guaranteed confidence-band enclosures for PDE surrogates

Authors: Ander Gray, Vignesh Gopakumar, Sylvain Rousseau, S\'ebastien Destercke

Abstract: We propose a method for obtaining statistically guaranteed confidence bands for functional machine learning techniques: surrogate models which map between function spaces, motivated by the need build reliable PDE emulators. The method constructs nested confidence sets on a low-dimensional representation (an SVD) of the surrogate model's prediction error, and then maps these sets to the prediction space using set-propagation techniques. The result are conformal-like coverage guaranteed prediction sets for functional surrogate models. We use zonotopes as basis of the set construction, due to their well studied set-propagation and verification properties. The method is model agnostic and can thus be applied to complex Sci-ML models, including Neural Operators, but also in simpler settings. We also elicit a technique to capture the truncation error of the SVD, ensuring the guarantees of the method.

new MolGraph-xLSTM: A graph-based dual-level xLSTM framework with multi-head mixture-of-experts for enhanced molecular representation and interpretability

Authors: Yan Sun, Yutong Lu, Yan Yi Li, Zihao Jing, Carson K. Leung, Pingzhao Hu

Abstract: Predicting molecular properties is essential for drug discovery, and computational methods can greatly enhance this process. Molecular graphs have become a focus for representation learning, with Graph Neural Networks (GNNs) widely used. However, GNNs often struggle with capturing long-range dependencies. To address this, we propose MolGraph-xLSTM, a novel graph-based xLSTM model that enhances feature extraction and effectively models molecule long-range interactions. Our approach processes molecular graphs at two scales: atom-level and motif-level. For atom-level graphs, a GNN-based xLSTM framework with jumping knowledge extracts local features and aggregates multilayer information to capture both local and global patterns effectively. Motif-level graphs provide complementary structural information for a broader molecular view. Embeddings from both scales are refined via a multi-head mixture of experts (MHMoE), further enhancing expressiveness and performance. We validate MolGraph-xLSTM on 10 molecular property prediction datasets, covering both classification and regression tasks. Our model demonstrates consistent performance across all datasets, with improvements of up to 7.03% on the BBBP dataset for classification and 7.54% on the ESOL dataset for regression compared to baselines. On average, MolGraph-xLSTM achieves an AUROC improvement of 3.18\% for classification tasks and an RMSE reduction of 3.83\% across regression datasets compared to the baseline methods. These results confirm the effectiveness of our model, offering a promising solution for molecular representation learning for drug discovery.

new Clustering Properties of Self-Supervised Learning

Authors: Xi Weng, Jianing An, Xudong Ma, Binhang Qi, Jie Luo, Xi Yang, Jin Song Dong, Lei Huang

Abstract: Self-supervised learning (SSL) methods via joint embedding architectures have proven remarkably effective at capturing semantically rich representations with strong clustering properties, magically in the absence of label supervision. Despite this, few of them have explored leveraging these untapped properties to improve themselves. In this paper, we provide an evidence through various metrics that the encoder's output $encoding$ exhibits superior and more stable clustering properties compared to other components. Building on this insight, we propose a novel positive-feedback SSL method, termed Representation Soft Assignment (ReSA), which leverages the model's clustering properties to promote learning in a self-guided manner. Extensive experiments on standard SSL benchmarks reveal that models pretrained with ReSA outperform other state-of-the-art SSL methods by a significant margin. Finally, we analyze how ReSA facilitates better clustering properties, demonstrating that it effectively enhances clustering performance at both fine-grained and coarse-grained levels, shaping representations that are inherently more structured and semantically meaningful.

new CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization

Authors: Yanxia Deng, Aozhong Zhang, Naigang Wang, Selcuk Gurses, Zi Yang, Penghang Yin

Abstract: Fine-tuning large language models (LLMs) using low-rank adaptation (LoRA) has become a highly efficient approach for downstream tasks, particularly in scenarios with limited computational resources. However, applying LoRA techniques to quantized LLMs poses unique challenges due to the reduced representational precision of quantized weights. In this paper, we introduce CLoQ (Calibrated LoRA initialization for Quantized LLMs), a simplistic initialization strategy designed to overcome these challenges. Our approach focuses on minimizing the layer-wise discrepancy between the original LLM and its quantized counterpart with LoRA components during initialization. By leveraging a small calibration dataset, CLoQ quantizes a pre-trained LLM and determines the optimal LoRA components for each layer, ensuring a strong foundation for subsequent fine-tuning. A key contribution of this work is a novel theoretical result that enables the accurate and closed-form construction of these optimal LoRA components. We validate the efficacy of CLoQ across multiple tasks such as language generation, arithmetic reasoning, and commonsense reasoning, demonstrating that it consistently outperforms existing LoRA fine-tuning methods for quantized LLMs, especially at ultra low-bit widths.

new WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training

Authors: Benjamin Feuer, Chinmay Hegde

Abstract: Language model (LLM) post-training, from DPO to distillation, can refine behaviors and unlock new skills, but the open science supporting these post-training techniques is still in its infancy. One limiting factor has been the difficulty of conducting large-scale comparative analyses of synthetic data generating models and LLM judges. To close this gap, we introduce WILDCHAT-50M, the largest public chat dataset to date. We extend the existing WildChat dataset to include responses not only from GPT, but from over 50 different open-weight models, ranging in size from 0.5B to 104B parameters. We conduct an extensive comparative analysis and demonstrate the potential of this dataset by creating RE-WILD, our own public SFT mix, which outperforms the recent Tulu-3 SFT mixture from Allen AI with only 40% as many samples. Our dataset, samples and code are available at https://github.com/penfever/wildchat-50m.

URLs: https://github.com/penfever/wildchat-50m.

new Neural Discovery in Mathematics: Do Machines Dream of Colored Planes?

Authors: Konrad Mundinger, Max Zimmer, Aldo Kiem, Christoph Spiegel, Sebastian Pokutta

Abstract: We demonstrate how neural networks can drive mathematical discovery through a case study of the Hadwiger-Nelson problem, a long-standing open problem from discrete geometry and combinatorics about coloring the plane avoiding monochromatic unit-distance pairs. Using neural networks as approximators, we reformulate this mixed discrete-continuous geometric coloring problem as an optimization task with a probabilistic, differentiable loss function. This enables gradient-based exploration of admissible configurations that most significantly led to the discovery of two novel six-colorings, providing the first improvements in thirty years to the off-diagonal variant of the original problem (Mundinger et al., 2024a). Here, we establish the underlying machine learning approach used to obtain these results and demonstrate its broader applicability through additional results and numerical insights.

new Joint Learning of Energy-based Models and their Partition Function

Authors: Michael E. Sander, Vincent Roulet, Tianlin Liu, Mathieu Blondel

Abstract: Energy-based models (EBMs) offer a flexible framework for parameterizing probability distributions using neural networks. However, learning EBMs by exact maximum likelihood estimation (MLE) is generally intractable, due to the need to compute the partition function (normalization constant). In this paper, we propose a novel formulation for approximately learning probabilistic EBMs in combinatorially-large discrete spaces, such as sets or permutations. Our key idea is to jointly learn both an energy model and its log-partition, both parameterized as a neural network. Our approach not only provides a novel tractable objective criterion to learn EBMs by stochastic gradient descent (without relying on MCMC), but also a novel means to estimate the log-partition function on unseen data points. On the theoretical side, we show that our approach recovers the optimal MLE solution when optimizing in the space of continuous functions. Furthermore, we show that our approach naturally extends to the broader family of Fenchel-Young losses, allowing us to obtain the first tractable method for optimizing the sparsemax loss in combinatorially-large spaces. We demonstrate our approach on multilabel classification and label ranking.

new A Hybrid Data-Driven Approach For Analyzing And Predicting Inpatient Length Of Stay In Health Centre

Authors: Tasfia Noor Chowdhury, Sanjida Afrin Mou, Kazi Naimur Rahman

Abstract: Patient length of stay (LoS) is a critical metric for evaluating the efficacy of hospital management. The primary objectives encompass to improve efficiency and reduce costs while enhancing patient outcomes and hospital capacity within the patient journey. By seamlessly merging data-driven techniques with simulation methodologies, the study proposes an all-encompassing framework for the optimization of patient flow. Using a comprehensive dataset of 2.3 million de-identified patient records, we analyzed demographics, diagnoses, treatments, services, costs, and charges with machine learning models (Decision Tree, Logistic Regression, Random Forest, Adaboost, LightGBM) and Python tools (Spark, AWS clusters, dimensionality reduction). Our model predicts patient length of stay (LoS) upon admission using supervised learning algorithms. This hybrid approach enables the identification of key factors influencing LoS, offering a robust framework for hospitals to streamline patient flow and resource utilization. The research focuses on patient flow, corroborating the efficacy of the approach, illustrating decreased patient length of stay within a real healthcare environment. The findings underscore the potential of hybrid data-driven models in transforming hospital management practices. This innovative methodology provides generally flexible decision-making, training, and patient flow enhancement; such a system could have huge implications for healthcare administration and overall satisfaction with healthcare.

new Loss Functions and Operators Generated by f-Divergences

Authors: Vincent Roulet, Tianlin Liu, Nino Vieillard, Michael E. Sander, Mathieu Blondel

Abstract: The logistic loss (a.k.a. cross-entropy loss) is one of the most popular loss functions used for multiclass classification. It is also the loss function of choice for next-token prediction in language modeling. It is associated with the Kullback--Leibler (KL) divergence and the softargmax operator. In this work, we propose to construct new convex loss functions based on $f$-divergences. Our loss functions generalize the logistic loss in two directions: i) by replacing the KL divergence with $f$-divergences and ii) by allowing non-uniform reference measures. We instantiate our framework for numerous $f$-divergences, recovering existing losses and creating new ones. By analogy with the logistic loss, the loss function generated by an $f$-divergence is associated with an operator, that we dub $f$-softargmax. We derive a novel parallelizable bisection algorithm for computing the $f$-softargmax associated with any $f$-divergence. On the empirical side, one of the goals of this paper is to determine the effectiveness of loss functions beyond the classical cross-entropy in a language model setting, including on pre-training, post-training (SFT) and distillation. We show that the loss function generated by the $\alpha$-divergence (which is equivalent to Tsallis $\alpha$-negentropy in the case of unit reference measures) with $\alpha=1.5$ performs well across several tasks.

new Bandits with Anytime Knapsacks

Authors: Eray Can Elumar, Cem Tekin, Osman Yagan

Abstract: We consider bandits with anytime knapsacks (BwAK), a novel version of the BwK problem where there is an \textit{anytime} cost constraint instead of a total cost budget. This problem setting introduces additional complexities as it mandates adherence to the constraint throughout the decision-making process. We propose SUAK, an algorithm that utilizes upper confidence bounds to identify the optimal mixture of arms while maintaining a balance between exploration and exploitation. SUAK is an adaptive algorithm that strategically utilizes the available budget in each round in the decision-making process and skips a round when it is possible to violate the anytime cost constraint. In particular, SUAK slightly under-utilizes the available cost budget to reduce the need for skipping rounds. We show that SUAK attains the same problem-dependent regret upper bound of $ O(K \log T)$ established in prior work under the simpler BwK framework. Finally, we provide simulations to verify the utility of SUAK in practical settings.

new No Equations Needed: Learning System Dynamics Without Relying on Closed-Form ODEs

Authors: Krzysztof Kacprzyk, Mihaela van der Schaar

Abstract: Data-driven modeling of dynamical systems is a crucial area of machine learning. In many scenarios, a thorough understanding of the model's behavior becomes essential for practical applications. For instance, understanding the behavior of a pharmacokinetic model, constructed as part of drug development, may allow us to both verify its biological plausibility (e.g., the drug concentration curve is non-negative and decays to zero) and to design dosing guidelines. Discovery of closed-form ordinary differential equations (ODEs) can be employed to obtain such insights by finding a compact mathematical equation and then analyzing it (a two-step approach). However, its widespread use is currently hindered because the analysis process may be time-consuming, requiring substantial mathematical expertise, or even impossible if the equation is too complex. Moreover, if the found equation's behavior does not satisfy the requirements, editing it or influencing the discovery algorithms to rectify it is challenging as the link between the symbolic form of an ODE and its behavior can be elusive. This paper proposes a conceptual shift to modeling low-dimensional dynamical systems by departing from the traditional two-step modeling process. Instead of first discovering a closed-form equation and then analyzing it, our approach, direct semantic modeling, predicts the semantic representation of the dynamical system (i.e., description of its behavior) directly from data, bypassing the need for complex post-hoc analysis. This direct approach also allows the incorporation of intuitive inductive biases into the optimization algorithm and editing the model's behavior directly, ensuring that the model meets the desired specifications. Our approach not only simplifies the modeling pipeline but also enhances the transparency and flexibility of the resulting models compared to traditional closed-form ODEs.

new Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH

Authors: Evgenii Evstafev

Abstract: This study investigates the performance of the DeepSeek R1 language model on 30 challenging mathematical problems derived from the MATH dataset, problems that previously proved unsolvable by other models under time constraints. Unlike prior work, this research removes time limitations to explore whether DeepSeek R1's architecture, known for its reliance on token-based reasoning, can achieve accurate solutions through a multi-step process. The study compares DeepSeek R1 with four other models (gemini-1.5-flash-8b, gpt-4o-mini-2024-07-18, llama3.1:8b, and mistral-8b-latest) across 11 temperature settings. Results demonstrate that DeepSeek R1 achieves superior accuracy on these complex problems but generates significantly more tokens than other models, confirming its token-intensive approach. The findings highlight a trade-off between accuracy and efficiency in mathematical problem-solving with large language models: while DeepSeek R1 excels in accuracy, its reliance on extensive token generation may not be optimal for applications requiring rapid responses. The study underscores the importance of considering task-specific requirements when selecting an LLM and emphasizes the role of temperature settings in optimizing performance.

new Node Classification and Search on the Rubik's Cube Graph with GNNs

Authors: Alessandro Barro

Abstract: This study focuses on the application of deep geometric models to solve the 3x3x3 Rubik's Cube. We begin by discussing the cube's graph representation and defining distance as the model's optimization objective. The distance approximation task is reformulated as a node classification problem, effectively addressed using Graph Neural Networks (GNNs). After training the model on a random subgraph, the predicted classes are used to construct a heuristic for $A^*$ search. We conclude with experiments comparing our heuristic to that of the DeepCubeA model.

new Bias-variance decompositions: the exclusive privilege of Bregman divergences

Authors: Tom Heskes

Abstract: Bias-variance decompositions are widely used to understand the generalization performance of machine learning models. While the squared error loss permits a straightforward decomposition, other loss functions - such as zero-one loss or $L_1$ loss - either fail to sum bias and variance to the expected loss or rely on definitions that lack the essential properties of meaningful bias and variance. Recent research has shown that clean decompositions can be achieved for the broader class of Bregman divergences, with the cross-entropy loss as a special case. However, the necessary and sufficient conditions for these decompositions remain an open question. In this paper, we address this question by studying continuous, nonnegative loss functions that satisfy the identity of indiscernibles under mild regularity conditions. We prove that so-called $g$-Bregman divergences are the only such loss functions that have a clean bias-variance decomposition. A $g$-Bregman divergence can be transformed into a standard Bregman divergence through an invertible change of variables. This makes the squared Mahalanobis distance, up to such a variable transformation, the only symmetric loss function with a clean bias-variance decomposition. We also examine the impact of relaxing the restrictions on the loss functions and how this affects our results.

new Accuracy and Robustness of Weight-Balancing Methods for Training PINNs

Authors: Matthieu Barreau, Haoming Shen

Abstract: Physics-Informed Neural Networks (PINNs) have emerged as powerful tools for integrating physics-based models with data by minimizing both data and physics losses. However, this multi-objective optimization problem is notoriously challenging, with some benchmark problems leading to unfeasible solutions. To address these issues, various strategies have been proposed, including adaptive weight adjustments in the loss function. In this work, we introduce clear definitions of accuracy and robustness in the context of PINNs and propose a novel training algorithm based on the Primal-Dual (PD) optimization framework. Our approach enhances the robustness of PINNs while maintaining comparable performance to existing weight-balancing methods. Numerical experiments demonstrate that the PD method consistently achieves reliable solutions across all investigated cases and can be easily implemented, facilitating its practical adoption. The code is available at https://github.com/haoming-SHEN/Accuracy-and-Robustness-of-Weight-Balancing-Methods-for-Training-PINNs.git.

URLs: https://github.com/haoming-SHEN/Accuracy-and-Robustness-of-Weight-Balancing-Methods-for-Training-PINNs.git.

new DeltaLLM: Compress LLMs with Low-Rank Deltas between Shared Weights

Authors: Liana Mikaelyan, Ayyoob Imani, Mathew Salvaris, Parth Pathak, Mohsen Fayyaz

Abstract: We introduce DeltaLLM, a new post-training compression technique to reduce the memory footprint of LLMs. We propose an alternative way of structuring LLMs with weight sharing between layers in subsequent Transformer blocks, along with additional low-rank difference matrices between them. For training, we adopt the progressing module replacement method and show that the lightweight training of the low-rank modules with approximately 30M-40M tokens is sufficient to achieve performance on par with LLMs of comparable sizes trained from scratch. We release the resultant models, DeltaLLAMA and DeltaPHI, with a 12% parameter reduction, retaining 90% of the performance of the base Llama and Phi models on common knowledge and reasoning benchmarks. Our method also outperforms compression techniques JointDrop, LaCo, ShortGPT and SliceGPT with the same number of parameters removed. For example, DeltaPhi 2.9B with a 24% reduction achieves similar average zero-shot accuracies as recovery fine-tuned SlicedPhi 3.3B with a 12% reduction, despite being approximately 400M parameters smaller with no fine-tuning applied. This work provides new insights into LLM architecture design and compression methods when storage space is critical.

cross On the challenges of detecting MCI using EEG in the wild

Authors: Aayush Mishra, David Joffe, Sankara Surendra Telidevara, David S Oakley, Anqi Liu

Abstract: Recent studies have shown promising results in the detection of Mild Cognitive Impairment (MCI) using easily accessible Electroencephalogram (EEG) data which would help administer early and effective treatment for dementia patients. However, the reliability and practicality of such systems remains unclear. In this work, we investigate the potential limitations and challenges in developing a robust MCI detection method using two contrasting datasets: 1) CAUEEG, collected and annotated by expert neurologists in controlled settings and 2) GENEEG, a new dataset collected and annotated in general practice clinics, a setting where routine MCI diagnoses are typically made. We find that training on small datasets, as is done by most previous works, tends to produce high variance models that make overconfident predictions, and are unreliable in practice. Additionally, distribution shifts between datasets make cross-domain generalization challenging. Finally, we show that MCI detection using EEG may suffer from fundamental limitations because of the overlapping nature of feature distributions with control groups. We call for more effort in high-quality data collection in actionable settings (like general practice clinics) to make progress towards this salient goal of non-invasive MCI detection.

cross Performance Analysis of NR Sidelink and Wi-Fi Coexistence Networks in Unlicensed Spectrum

Authors: Zhuangzhuang Yan, Xinyu Gu, Zhenyu Liu

Abstract: With the rapid development of various internet of things (IoT) applications, including industrial IoT (IIoT) and visual IoT (VIoT), the demand for direct device-to-device communication to support high data rates continues to grow. To address this demand, 5G-Advanced has introduced sidelink communication over the unlicensed spectrum (SL-U) as a method to increase data rates. However, the primary challenge of SL-U in the unlicensed spectrum is ensuring fair coexistence with other incumbent systems, such as Wi-Fi. In this paper, we address the challenge by designing channel access mechanisms and power control strategies to mitigate interference and ensure fair coexistence. First, we propose a novel collaborative channel access (CCHA) mechanism that integrates channel access with resource allocation through collaborative interactions between base stations (BS) and SL-U users. This mechanism ensures fair coexistence with incumbent systems while improving resource utilization. Second, we mathematically model the joint channel access and power control problems, analyzing the trade-off between fairness and transmission rate to minimize interference and optimize performance in the coexistence system. Finally, we develop a collaborative subgoal-based hierarchical deep reinforcement learning (C-GHDRL) framework. This framework enables SL-U users to make globally optimal decisions by leveraging collaborative operations between the BS and SL-U users, effectively overcoming the limitations of traditional optimization methods in solving joint optimization problems with nonlinear constraints. Simulation results demonstrate that the proposed scheme significantly enhances the coexistence system's performance while ensuring fair coexistence between SL-U and Wi-Fi users.

cross Assessment of the January 2025 Los Angeles County wildfires: A multi-modal analysis of impact, response, and population exposure

Authors: Seyd Teymoor Seydi

Abstract: This study presents a comprehensive analysis of four significant California wildfires: Palisades, Eaton, Kenneth, and Hurst, examining their impacts through multiple dimensions, including land cover change, jurisdictional management, structural damage, and demographic vulnerability. Using the Chebyshev-Kolmogorov-Arnold network model applied to Sentinel-2 imagery, the extent of burned areas was mapped, ranging from 315.36 to 10,960.98 hectares. Our analysis revealed that shrubland ecosystems were consistently the most affected, comprising 57.4-75.8% of burned areas across all events. The jurisdictional assessment demonstrated varying management complexities, from singular authority (98.7% in the Palisades Fire) to distributed management across multiple agencies. A structural impact analysis revealed significant disparities between urban interface fires (Eaton: 9,869 structures; Palisades: 8,436 structures) and rural events (Kenneth: 24 structures; Hurst: 17 structures). The demographic analysis showed consistent gender distributions, with 50.9% of the population identified as female and 49.1% as male. Working-age populations made up the majority of the affected populations, ranging from 53.7% to 54.1%, with notable temporal shifts in post-fire periods. The study identified strong correlations between urban interface proximity, structural damage, and population exposure. The Palisades and Eaton fires affected over 20,000 people each, compared to fewer than 500 in rural events. These findings offer valuable insights for the development of targeted wildfire management strategies, particularly in wildland urban interface zones, and emphasize the need for age- and gender-conscious approaches in emergency response planning.

cross RayLoc: Wireless Indoor Localization via Fully Differentiable Ray-tracing

Authors: Xueqiang Han, Tianyue Zheng, Tony Xiao Han, Jun Luo

Abstract: Wireless indoor localization has been a pivotal area of research over the last two decades, becoming a cornerstone for numerous sensing applications. However, conventional wireless localization methods rely on channel state information to perform blind modelling and estimation of a limited set of localization parameters. This oversimplification neglects many sensing scene details, resulting in suboptimal localization accuracy. To address this limitation, this paper presents a novel approach to wireless indoor localization by reformulating it as an inverse problem of wireless ray-tracing, inferring scene parameters that generates the measured CSI. At the core of our solution is a fully differentiable ray-tracing simulator that enables backpropagation to comprehensive parameters of the sensing scene, allowing for precise localization. To establish a robust localization context, RayLoc constructs a high-fidelity sensing scene by refining coarse-grained background model. Furthermore, RayLoc overcomes the challenges of sparse gradient and local minima by convolving the signal generation process with a Gaussian kernel. Extensive experiments showcase that RayLoc outperforms traditional localization baselines and is able to generalize to different sensing environments.

cross Heterogeneous Multi-Player Multi-Armed Bandits Robust To Adversarial Attacks

Authors: Akshayaa Magesh, Venugopal V. Veeravalli

Abstract: We consider a multi-player multi-armed bandit setting in the presence of adversaries that attempt to negatively affect the rewards received by the players in the system. The reward distributions for any given arm are heterogeneous across the players. In the event of a collision (more than one player choosing the same arm), all the colliding users receive zero rewards. The adversaries use collisions to affect the rewards received by the players, i.e., if an adversary attacks an arm, any player choosing that arm will receive zero reward. At any time step, the adversaries may attack more than one arm. It is assumed that the players in the system do not deviate from a pre-determined policy used by all the players, and that the probability that none of the arms face adversarial attacks is strictly positive at every time step. In order to combat the adversarial attacks, the players are allowed to communicate using a single bit for $O(\log T)$ time units, where $T$ is the time horizon, and each player can only observe their own actions and rewards at all time steps. We propose a {policy that is used by all the players, which} achieves near order optimal regret of order $O(\log^{1+\delta}T + W)$, where $W$ is total number of time units for which there was an adversarial attack on at least one arm.

cross RadioLLM: Introducing Large Language Model into Cognitive Radio via Hybrid Prompt and Token Reprogrammings

Authors: Shuai Chen, Yong Zu, Zhixi Feng, Shuyuan Yang, Mengchang Li, Yue Ma, Jun Liu, Qiukai Pan, Xinlei Zhang, Changjun Sun

Abstract: The increasing scarcity of spectrum resources and the rapid growth of wireless device have made efficient management of radio networks a critical challenge. Cognitive Radio Technology (CRT), when integrated with deep learning (DL), offers promising solutions for tasks such as radio signal classification (RSC), signal denoising, and spectrum allocation. However, existing DL-based CRT frameworks are often task-specific and lack scalability to diverse real-world scenarios. Meanwhile, Large Language Models (LLMs) have demonstrated exceptional generalization capabilities across multiple domains, making them a potential candidate for advancing CRT technologies. In this paper, we introduce RadioLLM, a novel framework that incorporates Hybrid Prompt and Token Reprogramming (HPTR) and a Frequency Attuned Fusion (FAF) module to enhance LLMs for CRT tasks. HPTR enables the integration of radio signal features with expert knowledge, while FAF improves the modeling of high-frequency features critical for precise signal processing. These innovations allow RadioLLM to handle diverse CRT tasks, bridging the gap between LLMs and traditional signal processing methods. Extensive empirical studies on multiple benchmark datasets demonstrate that the proposed RadioLLM achieves superior performance over current baselines.

cross Knoop: Practical Enhancement of Knockoff with Over-Parameterization for Variable Selection

Authors: Xiaochen Zhang, Yunfeng Cai, Haoyi Xiong

Abstract: Variable selection plays a crucial role in enhancing modeling effectiveness across diverse fields, addressing the challenges posed by high-dimensional datasets of correlated variables. This work introduces a novel approach namely Knockoff with over-parameterization (Knoop) to enhance Knockoff filters for variable selection. Specifically, Knoop first generates multiple knockoff variables for each original variable and integrates them with the original variables into an over-parameterized Ridgeless regression model. For each original variable, Knoop evaluates the coefficient distribution of its knockoffs and compares these with the original coefficients to conduct an anomaly-based significance test, ensuring robust variable selection. Extensive experiments demonstrate superior performance compared to existing methods in both simulation and real-world datasets. Knoop achieves a notably higher Area under the Curve (AUC) of the Receiver Operating Characteristic (ROC) Curve for effectively identifying relevant variables against the ground truth by controlled simulations, while showcasing enhanced predictive accuracy across diverse regression and classification tasks. The analytical results further backup our observations.

cross Language Modelling for Speaker Diarization in Telephonic Interviews

Authors: Miquel India, Javier Hernando, Jos\'e A. R. Fonollosa

Abstract: The aim of this paper is to investigate the benefit of combining both language and acoustic modelling for speaker diarization. Although conventional systems only use acoustic features, in some scenarios linguistic data contain high discriminative speaker information, even more reliable than the acoustic ones. In this study we analyze how an appropriate fusion of both kind of features is able to obtain good results in these cases. The proposed system is based on an iterative algorithm where a LSTM network is used as a speaker classifier. The network is fed with character-level word embeddings and a GMM based acoustic score created with the output labels from previous iterations. The presented algorithm has been evaluated in a Call-Center database, which is composed of telephone interview audios. The combination of acoustic features and linguistic content shows a 84.29% improvement in terms of a word-level DER as compared to a HMM/VB baseline system. The results of this study confirms that linguistic content can be efficiently used for some speaker recognition tasks.

cross Progress in Artificial Intelligence and its Determinants

Authors: Michael R. Douglas, Sergiy Verstyuk

Abstract: We study long-run progress in artificial intelligence in a quantitative way. Many measures, including traditional ones such as patents and publications, machine learning benchmarks, and a new Aggregate State of the Art in ML (or ASOTA) Index we have constructed from these, show exponential growth at roughly constant rates over long periods. Production of patents and publications doubles every ten years, by contrast with the growth of computing resources driven by Moore's Law, roughly a doubling every two years. We argue that the input of AI researchers is also crucial and its contribution can be objectively estimated. Consequently, we give a simple argument that explains the 5:1 relation between these two rates. We then discuss the application of this argument to different output measures and compare our analyses with predictions based on machine learning scaling laws proposed in existing literature. Our quantitative framework facilitates understanding, predicting, and modulating the development of these important technologies.

cross Distilling Knowledge for Designing Computational Imaging Systems

Authors: Leon Suarez-Rodriguez, Roman Jacome, Henry Arguello

Abstract: Designing the physical encoder is crucial for accurate image reconstruction in computational imaging (CI) systems. Currently, these systems are designed via end-to-end (E2E) optimization, where the encoder is modeled as a neural network layer and is jointly optimized with the decoder. However, the performance of E2E optimization is significantly reduced by the physical constraints imposed on the encoder. Also, since the E2E learns the parameters of the encoder by backpropagating the reconstruction error, it does not promote optimal intermediate outputs and suffers from gradient vanishing. To address these limitations, we reinterpret the concept of knowledge distillation (KD) for designing a physically constrained CI system by transferring the knowledge of a pretrained, less-constrained CI system. Our approach involves three steps: (1) Given the original CI system (student), a teacher system is created by relaxing the constraints on the student's encoder. (2) The teacher is optimized to solve a less-constrained version of the student's problem. (3) The teacher guides the training of the student through two proposed knowledge transfer functions, targeting both the encoder and the decoder feature space. The proposed method can be employed to any imaging modality since the relaxation scheme and the loss functions can be adapted according to the physical acquisition and the employed decoder. This approach was validated on three representative CI modalities: magnetic resonance, single-pixel, and compressive spectral imaging. Simulations show that a teacher system with an encoder that has a structure similar to that of the student encoder provides effective guidance. Our approach achieves significantly improved reconstruction performance and encoder design, outperforming both E2E optimization and traditional non-data-driven encoder designs.

cross Molecular Fingerprints Are Strong Models for Peptide Function Prediction

Authors: Jakub Adamczyk, Piotr Ludynia, Wojciech Czech

Abstract: We study the effectiveness of molecular fingerprints for peptide property prediction and demonstrate that domain-specific feature extraction from molecular graphs can outperform complex and computationally expensive models such as GNNs, pretrained sequence-based transformers and multimodal ensembles, even without hyperparameter tuning. To this end, we perform a thorough evaluation on 126 datasets, achieving state-of-the-art results on LRGB and 5 other peptide function prediction benchmarks. We show that models based on count variants of ECFP, Topological Torsion, and RDKit molecular fingerprints and LightGBM as classification head are remarkably robust. The strong performance of molecular fingerprints, which are intrinsically very short-range feature encoders, challenges the presumed importance of long-range interactions in peptides. Our conclusion is that the use of molecular fingerprints for larger molecules, such as peptides, can be a computationally feasible, low-parameter, and versatile alternative to sophisticated deep learning models.

cross A Robust Support Vector Machine Approach for Raman COVID-19 Data Classification

Authors: Marco Piazza, Andrea Spinelli, Francesca Maggioni, Marzia Bedoni, Enza Messina

Abstract: Recent advances in healthcare technologies have led to the availability of large amounts of biological samples across several techniques and applications. In particular, in the last few years, Raman spectroscopy analysis of biological samples has been successfully applied for early-stage diagnosis. However, spectra' inherent complexity and variability make the manual analysis challenging, even for domain experts. For the same reason, the use of traditional Statistical and Machine Learning (ML) techniques could not guarantee for accurate and reliable results. ML models, combined with robust optimization techniques, offer the possibility to improve the classification accuracy and enhance the resilience of predictive models. In this paper, we investigate the performance of a novel robust formulation for Support Vector Machine (SVM) in classifying COVID-19 samples obtained from Raman Spectroscopy. Given the noisy and perturbed nature of biological samples, we protect the classification process against uncertainty through the application of robust optimization techniques. Specifically, we derive robust counterpart models of deterministic formulations using bounded-by-norm uncertainty sets around each observation. We explore the cases of both linear and kernel-induced classifiers to address binary and multiclass classification tasks. The effectiveness of our approach is validated on real-world COVID-19 datasets provided by Italian hospitals by comparing the results of our simulations with a state-of-the-art classifier.

cross VoD-3DGS: View-opacity-Dependent 3D Gaussian Splatting

Authors: Nowak Mateusz, Jarosz Wojciech, Chin Peter

Abstract: Reconstructing a 3D scene from images is challenging due to the different ways light interacts with surfaces depending on the viewer's position and the surface's material. In classical computer graphics, materials can be classified as diffuse or specular, interacting with light differently. The standard 3D Gaussian Splatting model struggles to represent view-dependent content, since it cannot differentiate an object within the scene from the light interacting with its specular surfaces, which produce highlights or reflections. In this paper, we propose to extend the 3D Gaussian Splatting model by introducing an additional symmetric matrix to enhance the opacity representation of each 3D Gaussian. This improvement allows certain Gaussians to be suppressed based on the viewer's perspective, resulting in a more accurate representation of view-dependent reflections and specular highlights without compromising the scene's integrity. By allowing the opacity to be view dependent, our enhanced model achieves state-of-the-art performance on Mip-Nerf, Tanks\&Temples, Deep Blending, and Nerf-Synthetic datasets without a significant loss in rendering speed, achieving >60FPS, and only incurring a minimal increase in memory used.

cross Reinforcement-Learning Portfolio Allocation with Dynamic Embedding of Market Information

Authors: Jinghai He, Cheng Hua, Chunyang Zhou, Zeyu Zheng

Abstract: We develop a portfolio allocation framework that leverages deep learning techniques to address challenges arising from high-dimensional, non-stationary, and low-signal-to-noise market information. Our approach includes a dynamic embedding method that reduces the non-stationary, high-dimensional state space into a lower-dimensional representation. We design a reinforcement learning (RL) framework that integrates generative autoencoders and online meta-learning to dynamically embed market information, enabling the RL agent to focus on the most impactful parts of the state space for portfolio allocation decisions. Empirical analysis based on the top 500 U.S. stocks demonstrates that our framework outperforms common portfolio benchmarks and the predict-then-optimize (PTO) approach using machine learning, particularly during periods of market stress. Traditional factor models do not fully explain this superior performance. The framework's ability to time volatility reduces its market exposure during turbulent times. Ablation studies confirm the robustness of this performance across various reinforcement learning algorithms. Additionally, the embedding and meta-learning techniques effectively manage the complexities of high-dimensional, noisy, and non-stationary financial data, enhancing both portfolio performance and risk management.

cross InnerThoughts: Disentangling Representations and Predictions in Large Language Models

Authors: Didier Ch\'etelat, Joseph Cotnareanu, Rylee Thompson, Yingxue Zhang, Mark Coates

Abstract: Large language models (LLMs) contain substantial factual knowledge which is commonly elicited by multiple-choice question-answering prompts. Internally, such models process the prompt through multiple transformer layers, building varying representations of the problem within its hidden states. Ultimately, however, only the hidden state corresponding to the final layer and token position are used to predict the answer label. In this work, we propose instead to learn a small separate neural network predictor module on a collection of training questions, that take the hidden states from all the layers at the last temporal position as input and outputs predictions. In effect, such a framework disentangles the representational abilities of LLMs from their predictive abilities. On a collection of hard benchmarks, our method achieves considerable improvements in performance, sometimes comparable to supervised fine-tuning procedures, but at a fraction of the computational cost.

cross Fault Localization via Fine-tuning Large Language Models with Mutation Generated Stack Traces

Authors: Neetha Jambigi, Bartosz Bogacz, Moritz Mueller, Thomas Bach, Michael Felderer

Abstract: Abrupt and unexpected terminations of software are termed as software crashes. They can be challenging to analyze. Finding the root cause requires extensive manual effort and expertise to connect information sources like stack traces, source code, and logs. Typical approaches to fault localization require either test failures or source code. Crashes occurring in production environments, such as that of SAP HANA, provide solely crash logs and stack traces. We present a novel approach to localize faults based only on the stack trace information and no additional runtime information, by fine-tuning large language models (LLMs). We address complex cases where the root cause of a crash differs from the technical cause, and is not located in the innermost frame of the stack trace. As the number of historic crashes is insufficient to fine-tune LLMs, we augment our dataset by leveraging code mutators to inject synthetic crashes into the code base. By fine-tuning on 64,369 crashes resulting from 4.1 million mutations of the HANA code base, we can correctly predict the root cause location of a crash with an accuracy of 66.9\% while baselines only achieve 12.6% and 10.6%. We substantiate the generalizability of our approach by evaluating on two additional open-source databases, SQLite and DuckDB, achieving accuracies of 63% and 74%, respectively. Across all our experiments, fine-tuning consistently outperformed prompting non-finetuned LLMs for localizing faults in our datasets.

cross Digital Twin-Enabled Real-Time Control in Robotic Additive Manufacturing via Soft Actor-Critic Reinforcement Learning

Authors: Matsive Ali, Sandesh Giri, Sen Liu, Qin Yang

Abstract: Smart manufacturing systems increasingly rely on adaptive control mechanisms to optimize complex processes. This research presents a novel approach integrating Soft Actor-Critic (SAC) reinforcement learning with digital twin technology to enable real-time process control in robotic additive manufacturing. We demonstrate our methodology using a Viper X300s robot arm, implementing two distinct control scenarios: static target acquisition and dynamic trajectory following. The system architecture combines Unity's simulation environment with ROS2 for seamless digital twin synchronization, while leveraging transfer learning to efficiently adapt trained models across tasks. Our hierarchical reward structure addresses common reinforcement learning challenges including local minima avoidance, convergence acceleration, and training stability. Experimental results show rapid policy convergence and robust task execution in both simulated and physical environments, with performance metrics including cumulative reward, value prediction accuracy, policy loss, and discrete entropy coefficient demonstrating the effectiveness of our approach. This work advances the integration of reinforcement learning with digital twins for industrial robotics applications, providing a framework for enhanced adaptive real-time control for smart additive manufacturing process.

cross Perforated Backpropagation: A Neuroscience Inspired Extension to Artificial Neural Networks

Authors: Rorry Brenner, Laurent Itti

Abstract: The neurons of artificial neural networks were originally invented when much less was known about biological neurons than is known today. Our work explores a modification to the core neuron unit to make it more parallel to a biological neuron. The modification is made with the knowledge that biological dendrites are not simply passive activation funnels, but also compute complex non-linear functions as they transmit activation to the cell body. The paper explores a novel system of "Perforated" backpropagation empowering the artificial neurons of deep neural networks to achieve better performance coding for the same features they coded for in the original architecture. After an initial network training phase, additional "Dendrite Nodes" are added to the network and separately trained with a different objective: to correlate their output with the remaining error of the original neurons. The trained Dendrite Nodes are then frozen, and the original neurons are further trained, now taking into account the additional error signals provided by the Dendrite Nodes. The cycle of training the original neurons and then adding and training Dendrite Nodes can be repeated several times until satisfactory performance is achieved. Our algorithm was successfully added to modern state-of-the-art PyTorch networks across multiple domains, improving upon original accuracies and allowing for significant model compression without a loss in accuracy.

cross Noise-Adaptive Conformal Classification with Marginal Coverage

Authors: Teresa Bortolotti, Y. X. Rachel Wang, Xin Tong, Alessandra Menafoglio, Simone Vantini, Matteo Sesia

Abstract: Conformal inference provides a rigorous statistical framework for uncertainty quantification in machine learning, enabling well-calibrated prediction sets with precise coverage guarantees for any classification model. However, its reliance on the idealized assumption of perfect data exchangeability limits its effectiveness in the presence of real-world complications, such as low-quality labels -- a widespread issue in modern large-scale data sets. This work tackles this open problem by introducing an adaptive conformal inference method capable of efficiently handling deviations from exchangeability caused by random label noise, leading to informative prediction sets with tight marginal coverage guarantees even in those challenging scenarios. We validate our method through extensive numerical experiments demonstrating its effectiveness on synthetic and real data sets, including CIFAR-10H and BigEarthNet.

cross U-aggregation: Unsupervised Aggregation of Multiple Learning Algorithms

Authors: Rui Duan

Abstract: Across various domains, the growing advocacy for open science and open-source machine learning has made an increasing number of models publicly available. These models allow practitioners to integrate them into their own contexts, reducing the need for extensive data labeling, training, and calibration. However, selecting the best model for a specific target population remains challenging due to issues like limited transferability, data heterogeneity, and the difficulty of obtaining true labels or outcomes in real-world settings. In this paper, we propose an unsupervised model aggregation method, U-aggregation, designed to integrate multiple pre-trained models for enhanced and robust performance in new populations. Unlike existing supervised model aggregation or super learner approaches, U-aggregation assumes no observed labels or outcomes in the target population. Our method addresses limitations in existing unsupervised model aggregation techniques by accommodating more realistic settings, including heteroskedasticity at both the model and individual levels, and the presence of adversarial models. Drawing on insights from random matrix theory, U-aggregation incorporates a variance stabilization step and an iterative sparse signal recovery process. These steps improve the estimation of individuals' true underlying risks in the target population and evaluate the relative performance of candidate models. We provide a theoretical investigation and systematic numerical experiments to elucidate the properties of U-aggregation. We demonstrate its potential real-world application by using U-aggregation to enhance genetic risk prediction of complex traits, leveraging publicly available models from the PGS Catalog.

cross ISAM-MTL: Cross-subject multi-task learning model with identifiable spikes and associative memory networks

Authors: Junyan Li, Bin Hu, Zhi-Hong Guan

Abstract: Cross-subject variability in EEG degrades performance of current deep learning models, limiting the development of brain-computer interface (BCI). This paper proposes ISAM-MTL, which is a multi-task learning (MTL) EEG classification model based on identifiable spiking (IS) representations and associative memory (AM) networks. The proposed model treats EEG classification of each subject as an independent task and leverages cross-subject data training to facilitate feature sharing across subjects. ISAM-MTL consists of a spiking feature extractor that captures shared features across subjects and a subject-specific bidirectional associative memory network that is trained by Hebbian learning for efficient and fast within-subject EEG classification. ISAM-MTL integrates learned spiking neural representations with bidirectional associative memory for cross-subject EEG classification. The model employs label-guided variational inference to construct identifiable spike representations, enhancing classification accuracy. Experimental results on two BCI Competition datasets demonstrate that ISAM-MTL improves the average accuracy of cross-subject EEG classification while reducing performance variability among subjects. The model further exhibits the characteristics of few-shot learning and identifiable neural activity beneath EEG, enabling rapid and interpretable calibration for BCI systems.

cross LLMs can see and hear without any training

Authors: Kumar Ashutosh, Yossi Gandelsman, Xinlei Chen, Ishan Misra, Rohit Girdhar

Abstract: We present MILS: Multimodal Iterative LLM Solver, a surprisingly simple, training-free approach, to imbue multimodal capabilities into your favorite LLM. Leveraging their innate ability to perform multi-step reasoning, MILS prompts the LLM to generate candidate outputs, each of which are scored and fed back iteratively, eventually generating a solution to the task. This enables various applications that typically require training specialized models on task-specific data. In particular, we establish a new state-of-the-art on emergent zero-shot image, video and audio captioning. MILS seamlessly applies to media generation as well, discovering prompt rewrites to improve text-to-image generation, and even edit prompts for style transfer! Finally, being a gradient-free optimization approach, MILS can invert multimodal embeddings into text, enabling applications like cross-modal arithmetic.

cross Disentangling Safe and Unsafe Corruptions via Anisotropy and Locality

Authors: Ramchandran Muthukumar, Ambar Pal, Jeremias Sulam, Rene Vidal

Abstract: State-of-the-art machine learning systems are vulnerable to small perturbations to their input, where ``small'' is defined according to a threat model that assigns a positive threat to each perturbation. Most prior works define a task-agnostic, isotropic, and global threat, like the $\ell_p$ norm, where the magnitude of the perturbation fully determines the degree of the threat and neither the direction of the attack nor its position in space matter. However, common corruptions in computer vision, such as blur, compression, or occlusions, are not well captured by such threat models. This paper proposes a novel threat model called \texttt{Projected Displacement} (PD) to study robustness beyond existing isotropic and global threat models. The proposed threat model measures the threat of a perturbation via its alignment with \textit{unsafe directions}, defined as directions in the input space along which a perturbation of sufficient magnitude changes the ground truth class label. Unsafe directions are identified locally for each input based on observed training data. In this way, the PD threat model exhibits anisotropy and locality. Experiments on Imagenet-1k data indicate that, for any input, the set of perturbations with small PD threat includes \textit{safe} perturbations of large $\ell_p$ norm that preserve the true label, such as noise, blur and compression, while simultaneously excluding \textit{unsafe} perturbations that alter the true label. Unlike perceptual threat models based on embeddings of large-vision models, the PD threat model can be readily computed for arbitrary classification tasks without pre-training or finetuning. Further additional task annotation such as sensitivity to image regions or concept hierarchies can be easily integrated into the assessment of threat and thus the PD threat model presents practitioners with a flexible, task-driven threat specification.

cross Investigating an Intelligent System to Monitor \& Explain Abnormal Activity Patterns of Older Adults

Authors: Min Hun Lee, Daniel P. Siewiorek, Alexandre Bernardino

Abstract: Despite the growing potential of older adult care technologies, the adoption of these technologies remains challenging. In this work, we conducted a focus-group session with family caregivers to scope designs of the older adult care technology. We then developed a high-fidelity prototype and conducted its qualitative study with professional caregivers and older adults to understand their perspectives on the system functionalities. This system monitors abnormal activity patterns of older adults using wireless motion sensors and machine learning models and supports interactive dialogue responses to explain abnormal activity patterns of older adults to caregivers and allow older adults proactively sharing their status with caregivers for an adequate intervention. Both older adults and professional caregivers appreciated that our system can provide a faster, personalized service while proactively controlling what information is to be shared through interactive dialogue responses. We further discuss other considerations to realize older adult technology in practice.

cross DCatalyst: A Unified Accelerated Framework for Decentralized Optimization

Authors: Tianyu Cao, Xiaokai Chen, Gesualdo Scutari

Abstract: We study decentralized optimization over a network of agents, modeled as graphs, with no central server. The goal is to minimize $f+r$, where $f$ represents a (strongly) convex function averaging the local agents' losses, and $r$ is a convex, extended-value function. We introduce DCatalyst, a unified black-box framework that integrates Nesterov acceleration into decentralized optimization algorithms. %, enhancing their performance. At its core, DCatalyst operates as an \textit{inexact}, \textit{momentum-accelerated} proximal method (forming the outer loop) that seamlessly incorporates any selected decentralized algorithm (as the inner loop). We demonstrate that DCatalyst achieves optimal communication and computational complexity (up to log-factors) across various decentralized algorithms and problem instances. Notably, it extends acceleration capabilities to problem classes previously lacking accelerated solution methods, thereby broadening the effectiveness of decentralized methods. On the technical side, our framework introduce the {\it inexact estimating sequences}--a novel extension of the well-known Nesterov's estimating sequences, tailored for the minimization of composite losses in decentralized settings. This method adeptly handles consensus errors and inexact solutions of agents' subproblems, challenges not addressed by existing models.

cross A spectral clustering-type algorithm for the consistent estimation of the Hurst distribution in moderately high dimensions

Authors: Patrice Abry, Gustavo Didier, Oliver Orejola, Herwig Wendt

Abstract: Scale invariance (fractality) is a prominent feature of the large-scale behavior of many stochastic systems. In this work, we construct an algorithm for the statistical identification of the Hurst distribution (in particular, the scaling exponents) undergirding a high-dimensional fractal system. The algorithm is based on wavelet random matrices, modified spectral clustering and a model selection step for picking the value of the clustering precision hyperparameter. In a moderately high-dimensional regime where the dimension, the sample size and the scale go to infinity, we show that the algorithm consistently estimates the Hurst distribution. Monte Carlo simulations show that the proposed methodology is efficient for realistic sample sizes and outperforms another popular clustering method based on mixed-Gaussian modeling. We apply the algorithm in the analysis of real-world macroeconomic time series to unveil evidence for cointegration.

cross DeepFRC: An End-to-End Deep Learning Model for Functional Registration and Classification

Authors: Siyuan Jiang, Yihan Hu, Wenjie Li, Pengcheng Zeng

Abstract: Functional data analysis (FDA) is essential for analyzing continuous, high-dimensional data, yet existing methods often decouple functional registration and classification, limiting their efficiency and performance. We present DeepFRC, an end-to-end deep learning framework that unifies these tasks within a single model. Our approach incorporates an alignment module that learns time warping functions via elastic function registration and a learnable basis representation module for dimensionality reduction on aligned data. This integration enhances both alignment accuracy and predictive performance. Theoretical analysis establishes that DeepFRC achieves low misalignment and generalization error, while simulations elucidate the progression of registration, reconstruction, and classification during training. Experiments on real-world datasets demonstrate that DeepFRC consistently outperforms state-of-the-art methods, particularly in addressing complex registration challenges. Code is available at: https://github.com/Drivergo-93589/DeepFRC.

URLs: https://github.com/Drivergo-93589/DeepFRC.

cross Optimal Survey Design for Private Mean Estimation

Authors: Yu-Wei Chen, Raghu Pasupathy, Jordan A. Awan

Abstract: This work identifies the first privacy-aware stratified sampling scheme that minimizes the variance for general private mean estimation under the Laplace, Discrete Laplace (DLap) and Truncated-Uniform-Laplace (TuLap) mechanisms within the framework of differential privacy (DP). We view stratified sampling as a subsampling operation, which amplifies the privacy guarantee; however, to have the same final privacy guarantee for each group, different nominal privacy budgets need to be used depending on the subsampling rate. Ignoring the effect of DP, traditional stratified sampling strategies risk significant variance inflation. We phrase our optimal survey design as an optimization problem, where we determine the optimal subsampling sizes for each group with the goal of minimizing the variance of the resulting estimator. We establish strong convexity of the variance objective, propose an efficient algorithm to identify the integer-optimal design, and offer insights on the structure of the optimal design.

cross HyperZero: A Customized End-to-End Auto-Tuning System for Recommendation with Hourly Feedback

Authors: Xufeng Cai, Ziwei Guan, Lei Yuan, Ali Selman Aydin, Tengyu Xu, Boying Liu, Wenbo Ren, Renkai Xiang, Songyi He, Haichuan Yang, Serena Li, Mingze Gao, Yue Weng, Ji Liu

Abstract: Modern recommendation systems can be broadly divided into two key stages: the ranking stage, where the system predicts various user engagements (e.g., click-through rate, like rate, follow rate, watch time), and the value model stage, which aggregates these predictive scores through a function (e.g., a linear combination defined by a weight vector) to measure the value of each content by a single numerical score. Both stages play roughly equally important roles in real industrial systems; however, how to optimize the model weights for the second stage still lacks systematic study. This paper focuses on optimizing the second stage through auto-tuning technology. Although general auto-tuning systems and solutions - both from established production practices and open-source solutions - can address this problem, they typically require weeks or even months to identify a feasible solution. Such prolonged tuning processes are unacceptable in production environments for recommendation systems, as suboptimal value models can severely degrade user experience. An effective auto-tuning solution is required to identify a viable model within 2-3 days, rather than the extended timelines typically associated with existing approaches. In this paper, we introduce a practical auto-tuning system named HyperZero that addresses these time constraints while effectively solving the unique challenges inherent in modern recommendation systems. Moreover, this framework has the potential to be expanded to broader tuning tasks within recommendation systems.

cross Large Language Models for Cryptocurrency Transaction Analysis: A Bitcoin Case Study

Authors: Yuchen Lei, Yuexin Xiang, Qin Wang, Rafael Dowsley, Tsz Hon Yuen, Jiangshan Yu

Abstract: Cryptocurrencies are widely used, yet current methods for analyzing transactions heavily rely on opaque, black-box models. These lack interpretability and adaptability, failing to effectively capture behavioral patterns. Many researchers, including us, believe that Large Language Models (LLMs) could bridge this gap due to their robust reasoning abilities for complex tasks. In this paper, we test this hypothesis by applying LLMs to real-world cryptocurrency transaction graphs, specifically within the Bitcoin network. We introduce a three-tiered framework to assess LLM capabilities: foundational metrics, characteristic overview, and contextual interpretation. This includes a new, human-readable graph representation format, LLM4TG, and a connectivity-enhanced sampling algorithm, CETraS, which simplifies larger transaction graphs. Experimental results show that LLMs excel at foundational metrics and offer detailed characteristic overviews. Their effectiveness in contextual interpretation suggests they can provide useful explanations of transaction behaviors, even with limited labeled data.

cross Estimating Multi-chirp Parameters using Curvature-guided Langevin Monte Carlo

Authors: Sattwik Basu, Debottam Dutta, Yu-Lin Wei, Romit Roy Choudhury

Abstract: This paper considers the problem of estimating chirp parameters from a noisy mixture of chirps. While a rich body of work exists in this area, challenges remain when extending these techniques to chirps of higher order polynomials. We formulate this as a non-convex optimization problem and propose a modified Langevin Monte Carlo (LMC) sampler that exploits the average curvature of the objective function to reliably find the minimizer. Results show that our Curvature-guided LMC (CG-LMC) algorithm is robust and succeeds even in low SNR regimes, making it viable for practical applications.

cross Decentralized Projection-free Online Upper-Linearizable Optimization with Applications to DR-Submodular Optimization

Authors: Yiyang Lu, Mohammad Pedramfar, Vaneet Aggarwal

Abstract: We introduce a novel framework for decentralized projection-free optimization, extending projection-free methods to a broader class of upper-linearizable functions. Our approach leverages decentralized optimization techniques with the flexibility of upper-linearizable function frameworks, effectively generalizing traditional DR-submodular function optimization. We obtain the regret of $O(T^{1-\theta/2})$ with communication complexity of $O(T^{\theta})$ and number of linear optimization oracle calls of $O(T^{2\theta})$ for decentralized upper-linearizable function optimization, for any $0\le \theta \le 1$. This approach allows for the first results for monotone up-concave optimization with general convex constraints and non-monotone up-concave optimization with general convex constraints. Further, the above results for first order feedback are extended to zeroth order, semi-bandit, and bandit feedback.

cross Machine Learning Fairness for Depression Detection using EEG Data

Authors: Angus Man Ho Kwok, Jiaee Cheong, Sinan Kalkan, Hatice Gunes

Abstract: This paper presents the very first attempt to evaluate machine learning fairness for depression detection using electroencephalogram (EEG) data. We conduct experiments using different deep learning architectures such as Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM) networks, and Gated Recurrent Unit (GRU) networks across three EEG datasets: Mumtaz, MODMA and Rest. We employ five different bias mitigation strategies at the pre-, in- and post-processing stages and evaluate their effectiveness. Our experimental results show that bias exists in existing EEG datasets and algorithms for depression detection, and different bias mitigation methods address bias at different levels across different fairness measures.

cross Revisiting $\Psi$DONet: microlocally inspired filters for incomplete-data tomographic reconstructions

Authors: Tatiana A. Bubba, Luca Ratti, Andrea Sebastiani

Abstract: In this paper, we revisit a supervised learning approach based on unrolling, known as $\Psi$DONet, by providing a deeper microlocal interpretation for its theoretical analysis, and extending its study to the case of sparse-angle tomography. Furthermore, we refine the implementation of the original $\Psi$DONet considering special filters whose structure is specifically inspired by the streak artifact singularities characterizing tomographic reconstructions from incomplete data. This allows to considerably lower the number of (learnable) parameters while preserving (or even slightly improving) the same quality for the reconstructions from limited-angle data and providing a proof-of-concept for the case of sparse-angle tomographic data.

cross Statistical multi-metric evaluation and visualization of LLM system predictive performance

Authors: Samuel Ackerman, Eitan Farchi, Orna Raz, Assaf Toledo

Abstract: The evaluation of generative or discriminative large language model (LLM)-based systems is often a complex multi-dimensional problem. Typically, a set of system configuration alternatives are evaluated on one or more benchmark datasets, each with one or more evaluation metrics, which may differ between datasets. We often want to evaluate -- with a statistical measure of significance -- whether systems perform differently either on a given dataset according to a single metric, on aggregate across metrics on a dataset, or across datasets. Such evaluations can be done to support decision-making, such as deciding whether a particular system component change (e.g., choice of LLM or hyperparameter values) significantly improves performance over the current system configuration, or, more generally, whether a fixed set of system configurations (e.g., a leaderboard list) have significantly different performances according to metrics of interest. We present a framework implementation that automatically performs the correct statistical tests, properly aggregates the statistical results across metrics and datasets (a nontrivial task), and can visualize the results. The framework is demonstrated on the multi-lingual code generation benchmark CrossCodeEval, for several state-of-the-art LLMs.

cross Jailbreaking LLMs' Safeguard with Universal Magic Words for Text Embedding Models

Authors: Haoyu Liang, Youran Sun, Yunfeng Cai, Jun Zhu, Bo Zhang

Abstract: The security issue of large language models (LLMs) has gained significant attention recently, with various defense mechanisms developed to prevent harmful outputs, among which safeguards based on text embedding models serve as a fundamental defense. Through testing, we discover that the distribution of text embedding model outputs is significantly biased with a large mean. Inspired by this observation, we propose novel efficient methods to search for universal magic words that can attack text embedding models. The universal magic words as suffixes can move the embedding of any text towards the bias direction, therefore manipulate the similarity of any text pair and mislead safeguards. By appending magic words to user prompts and requiring LLMs to end answers with magic words, attackers can jailbreak the safeguard. To eradicate this security risk, we also propose defense mechanisms against such attacks, which can correct the biased distribution of text embeddings in a train-free manner.

cross Random Feature Representation Boosting

Authors: Nikita Zozoulenko, Thomas Cass, Lukas Gonon

Abstract: We introduce Random Feature Representation Boosting (RFRBoost), a novel method for constructing deep residual random feature neural networks (RFNNs) using boosting theory. RFRBoost uses random features at each layer to learn the functional gradient of the network representation, enhancing performance while preserving the convex optimization benefits of RFNNs. In the case of MSE loss, we obtain closed-form solutions to greedy layer-wise boosting with random features. For general loss functions, we show that fitting random feature residual blocks reduces to solving a quadratically constrained least squares problem. We demonstrate, through numerical experiments on 91 tabular datasets for regression and classification, that RFRBoost significantly outperforms traditional RFNNs and end-to-end trained MLP ResNets, while offering substantial computational advantages and theoretical guarantees stemming from boosting theory.

cross Unfaithful Probability Distributions in Binary Triple of Causality Directed Acyclic Graph

Authors: Jingwei Liu

Abstract: Faithfulness is the foundation of probability distribution and graph in causal discovery and causal inference. In this paper, several unfaithful probability distribution examples are constructed in three--vertices binary causality directed acyclic graph (DAG) structure, which are not faithful to causal DAGs described in J.M.,Robins,et al. Uniform consistency in causal inference. Biometrika (2003),90(3): 491--515. And the general unfaithful probability distribution with multiple independence and conditional independence in binary triple causal DAG is given.

cross Contextual Online Decision Making with Infinite-Dimensional Functional Regression

Authors: Haichen Hu, Rui Ai, Stephen Bates, David Simchi-Levi

Abstract: Contextual sequential decision-making problems play a crucial role in machine learning, encompassing a wide range of downstream applications such as bandits, sequential hypothesis testing and online risk control. These applications often require different statistical measures, including expectation, variance and quantiles. In this paper, we provide a universal admissible algorithm framework for dealing with all kinds of contextual online decision-making problems that directly learns the whole underlying unknown distribution instead of focusing on individual statistics. This is much more difficult because the dimension of the regression is uncountably infinite, and any existing linear contextual bandits algorithm will result in infinite regret. To overcome this issue, we propose an efficient infinite-dimensional functional regression oracle for contextual cumulative distribution functions (CDFs), where each data point is modeled as a combination of context-dependent CDF basis functions. Our analysis reveals that the decay rate of the eigenvalue sequence of the design integral operator governs the regression error rate and, consequently, the utility regret rate. Specifically, when the eigenvalue sequence exhibits a polynomial decay of order $\frac{1}{\gamma}\ge 1$, the utility regret is bounded by $\tilde{\mathcal{O}}\Big(T^{\frac{3\gamma+2}{2(\gamma+2)}}\Big)$. By setting $\gamma=0$, this recovers the existing optimal regret rate for contextual bandits with finite-dimensional regression and is optimal under a stronger exponential decay assumption. Additionally, we provide a numerical method to compute the eigenvalue sequence of the integral operator, enabling the practical implementation of our framework.

cross MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Authors: Yuxin Zuo, Shang Qu, Yifei Li, Zhangren Chen, Xuekai Zhu, Ermo Hua, Kaiyan Zhang, Ning Ding, Bowen Zhou

Abstract: We introduce MedXpertQA, a highly challenging and comprehensive benchmark to evaluate expert-level medical knowledge and advanced reasoning. MedXpertQA includes 4,460 questions spanning 17 specialties and 11 body systems. It includes two subsets, Text for text evaluation and MM for multimodal evaluation. Notably, MM introduces expert-level exam questions with diverse images and rich clinical information, including patient records and examination results, setting it apart from traditional medical multimodal benchmarks with simple QA pairs generated from image captions. MedXpertQA applies rigorous filtering and augmentation to address the insufficient difficulty of existing benchmarks like MedQA, and incorporates specialty board questions to improve clinical relevance and comprehensiveness. We perform data synthesis to mitigate data leakage risk and conduct multiple rounds of expert reviews to ensure accuracy and reliability. We evaluate 16 leading models on MedXpertQA. Moreover, medicine is deeply connected to real-world decision-making, providing a rich and representative setting for assessing reasoning abilities beyond mathematics and code. To this end, we develop a reasoning-oriented subset to facilitate the assessment of o1-like models.

cross Implicit Riemannian Optimism with Applications to Min-Max Problems

Authors: Christophe Roux, David Mart\'inez-Rubio, Sebastian Pokutta

Abstract: We introduce a Riemannian optimistic online learning algorithm for Hadamard manifolds based on inexact implicit updates. Unlike prior work, our method can handle in-manifold constraints, and matches the best known regret bounds in the Euclidean setting with no dependence on geometric constants, like the minimum curvature. Building on this, we develop algorithms for g-convex, g-concave smooth min-max problems on Hadamard manifolds. Notably, one method nearly matches the gradient oracle complexity of the lower bound for Euclidean problems, for the first time.

cross GBFRS: Robust Fuzzy Rough Sets via Granular-ball Computing

Authors: Shuyin Xia, Xiaoyu Lian, Binbin Sang, Guoyin Wang, Xinbo Gao

Abstract: Fuzzy rough set theory is effective for processing datasets with complex attributes, supported by a solid mathematical foundation and closely linked to kernel methods in machine learning. Attribute reduction algorithms and classifiers based on fuzzy rough set theory exhibit promising performance in the analysis of high-dimensional multivariate complex data. However, most existing models operate at the finest granularity, rendering them inefficient and sensitive to noise, especially for high-dimensional big data. Thus, enhancing the robustness of fuzzy rough set models is crucial for effective feature selection. Muiti-garanularty granular-ball computing, a recent development, uses granular-balls of different sizes to adaptively represent and cover the sample space, performing learning based on these granular-balls. This paper proposes integrating multi-granularity granular-ball computing into fuzzy rough set theory, using granular-balls to replace sample points. The coarse-grained characteristics of granular-balls make the model more robust. Additionally, we propose a new method for generating granular-balls, scalable to the entire supervised method based on granular-ball computing. A forward search algorithm is used to select feature sequences by defining the correlation between features and categories through dependence functions. Experiments demonstrate the proposed model's effectiveness and superiority over baseline methods.

cross Consensus statement on the credibility assessment of ML predictors

Authors: Alessandra Aldieri, Thiranja Prasad Babarenda Gamage, Antonino Amedeo La Mattina, Yi Li, Axel Loewe, Francesco Pappalardo, Marco Viceconti Italy

Abstract: The rapid integration of machine learning (ML) predictors into in silico medicine has revolutionized the estimation of quantities of interest (QIs) that are otherwise challenging to measure directly. However, the credibility of these predictors is critical, especially when they inform high-stakes healthcare decisions. This position paper presents a consensus statement developed by experts within the In Silico World Community of Practice. We outline twelve key statements forming the theoretical foundation for evaluating the credibility of ML predictors, emphasizing the necessity of causal knowledge, rigorous error quantification, and robustness to biases. By comparing ML predictors with biophysical models, we highlight unique challenges associated with implicit causal knowledge and propose strategies to ensure reliability and applicability. Our recommendations aim to guide researchers, developers, and regulators in the rigorous assessment and deployment of ML predictors in clinical and biomedical contexts.

cross DeepExtractor: Time-domain reconstruction of signals and glitches in gravitational wave data with deep learning

Authors: Tom Dooney, Harsh Narola, Stefano Bromuri, R. Lyana Curier, Chris Van Den Broeck, Sarah Caudill, Daniel Stanley Tan

Abstract: Gravitational wave (GW) interferometers, detect faint signals from distant astrophysical events, such as binary black hole mergers. However, their high sensitivity also makes them susceptible to background noise, which can obscure these signals. This noise often includes transient artifacts called "glitches" that can mimic astrophysical signals or mask their characteristics. Fast and accurate reconstruction of both signals and glitches is crucial for reliable scientific inference. In this study, we present DeepExtractor, a deep learning framework designed to reconstruct signals and glitches with power exceeding interferometer noise, regardless of their source. We design DeepExtractor to model the inherent noise distribution of GW interferometers, following conventional assumptions that the noise is Gaussian and stationary over short time scales. It operates by predicting and subtracting the noise component of the data, retaining only the clean reconstruction. Our approach achieves superior generalization capabilities for arbitrary signals and glitches compared to methods that directly map inputs to the clean training waveforms. We validate DeepExtractor's effectiveness through three experiments: (1) reconstructing simulated glitches injected into simulated detector noise, (2) comparing performance with the state-of-the-art BayesWave algorithm, and (3) analyzing real data from the Gravity Spy dataset to demonstrate effective glitch subtraction from LIGO strain data. DeepExtractor achieves a median mismatch of only 0.9% for simulated glitches, outperforming several deep learning baselines. Additionally, DeepExtractor surpasses BayesWave in glitch recovery, offering a dramatic computational speedup by reconstructing one glitch sample in approx. 0.1 seconds on a CPU, compared to BayesWave's processing time of approx. one hour per glitch.

cross adabmDCA 2.0 -- a flexible but easy-to-use package for Direct Coupling Analysis

Authors: Lorenzo Rosset, Roberto Netti, Anna Paola Muntoni, Martin Weigt, Francesco Zamponi

Abstract: In this methods article, we provide a flexible but easy-to-use implementation of Direct Coupling Analysis (DCA) based on Boltzmann machine learning, together with a tutorial on how to use it. The package \texttt{adabmDCA 2.0} is available in different programming languages (C++, Julia, Python) usable on different architectures (single-core and multi-core CPU, GPU) using a common front-end interface. In addition to several learning protocols for dense and sparse generative DCA models, it allows to directly address common downstream tasks like residue-residue contact prediction, mutational-effect prediction, scoring of sequence libraries and generation of artificial sequences for sequence design. It is readily applicable to protein and RNA sequence data.

cross Resampling Filter Design for Multirate Neural Audio Effect Processing

Authors: Alistair Carson, Vesa V\"alim\"aki, Alec Wright, Stefan Bilbao

Abstract: Neural networks have become ubiquitous in audio effects modelling, especially for guitar amplifiers and distortion pedals. One limitation of such models is that the sample rate of the training data is implicitly encoded in the model weights and therefore not readily adjustable at inference. Recent work explored modifications to recurrent neural network architecture to approximate a sample rate independent system, enabling audio processing at a rate that differs from the original training rate. This method works well for integer oversampling and can reduce aliasing caused by nonlinear activation functions. For small fractional changes in sample rate, fractional delay filters can be used to approximate sample rate independence, but in some cases this method fails entirely. Here, we explore the use of signal resampling at the input and output of the neural network as an alternative solution. We investigate several resampling filter designs and show that a two-stage design consisting of a half-band IIR filter cascaded with a Kaiser window FIR filter can give similar or better results to the previously proposed model adjustment method with many fewer operations per sample and less than one millisecond of latency at typical audio rates. Furthermore, we investigate interpolation and decimation filters for the task of integer oversampling and show that cascaded half-band IIR and FIR designs can be used in conjunction with the model adjustment method to reduce aliasing in a range of distortion effect models.

cross GuardReasoner: Towards Reasoning-based LLM Safeguards

Authors: Yue Liu, Hongcheng Gao, Shengfang Zhai, Jun Xia, Tianyi Wu, Zhiwei Xue, Yulin Chen, Kenji Kawaguchi, Jiaheng Zhang, Bryan Hooi

Abstract: As LLMs increasingly impact safety-critical applications, ensuring their safety using guardrails remains a key challenge. This paper proposes GuardReasoner, a new safeguard for LLMs, by guiding the guard model to learn to reason. Concretely, we first create the GuardReasonerTrain dataset, which consists of 127K samples with 460K detailed reasoning steps. Then, we introduce reasoning SFT to unlock the reasoning capability of guard models. In addition, we present hard sample DPO to further strengthen their reasoning ability. In this manner, GuardReasoner achieves better performance, explainability, and generalizability. Extensive experiments and analyses on 13 benchmarks of 3 guardrail tasks demonstrate its superiority. Remarkably, GuardReasoner 8B surpasses GPT-4o+CoT by 5.74% and LLaMA Guard 3 8B by 20.84% F1 score on average. We release the training data, code, and models with different scales (1B, 3B, 8B) of GuardReasoner : https://github.com/yueliu1999/GuardReasoner/.

URLs: https://github.com/yueliu1999/GuardReasoner/.

cross Runway vs. Taxiway: Challenges in Automated Line Identification and Notation Approaches

Authors: Parth Ganeriwala, Amy Alvarez, Abdullah AlQahtani, Siddhartha Bhattacharyya, Mohammed Abdul Hafeez Khan, Natasha Neogi

Abstract: The increasing complexity of autonomous systems has amplified the need for accurate and reliable labeling of runway and taxiway markings to ensure operational safety. Precise detection and labeling of these markings are critical for tasks such as navigation, landing assistance, and ground control automation. Existing labeling algorithms, like the Automated Line Identification and Notation Algorithm (ALINA), have demonstrated success in identifying taxiway markings but encounter significant challenges when applied to runway markings. This limitation arises due to notable differences in line characteristics, environmental context, and interference from elements such as shadows, tire marks, and varying surface conditions. To address these challenges, we modified ALINA by adjusting color thresholds and refining region of interest (ROI) selection to better suit runway-specific contexts. While these modifications yielded limited improvements, the algorithm still struggled with consistent runway identification, often mislabeling elements such as the horizon or non-relevant background features. This highlighted the need for a more robust solution capable of adapting to diverse visual interferences. In this paper, we propose integrating a classification step using a Convolutional Neural Network (CNN) named AssistNet. By incorporating this classification step, the detection pipeline becomes more resilient to environmental variations and misclassifications. This work not only identifies the challenges but also outlines solutions, paving the way for improved automated labeling techniques essential for autonomous aviation systems.

cross Beyond Prior Limits: Addressing Distribution Misalignment in Particle Filtering

Authors: Yiwei Shi, Jingyu Hu, Yu Zhang, Mengyue Yang, Weinan Zhang, Cunjia Liu, Weiru Liu

Abstract: Particle filtering is a Bayesian inference method and a fundamental tool in state estimation for dynamic systems, but its effectiveness is often limited by the constraints of the initial prior distribution, a phenomenon we define as the Prior Boundary Phenomenon. This challenge arises when target states lie outside the prior's support, rendering traditional particle filtering methods inadequate for accurate estimation. Although techniques like unbounded priors and larger particle sets have been proposed, they remain computationally prohibitive and lack adaptability in dynamic scenarios. To systematically overcome these limitations, we propose the Diffusion-Enhanced Particle Filtering Framework, which introduces three key innovations: adaptive diffusion through exploratory particles, entropy-driven regularisation to prevent weight collapse, and kernel-based perturbations for dynamic support expansion. These mechanisms collectively enable particle filtering to explore beyond prior boundaries, ensuring robust state estimation for out-of-boundary targets. Theoretical analysis and extensive experiments validate framework's effectiveness, indicating significant improvements in success rates and estimation accuracy across high-dimensional and non-convex scenarios.

cross Optimal generalisation and learning transition in extensive-width shallow neural networks near interpolation

Authors: Jean Barbier, Francesco Camilli, Minh-Toan Nguyen, Mauro Pastore, Rudy Skerk

Abstract: We consider a teacher-student model of supervised learning with a fully-trained 2-layer neural network whose width $k$ and input dimension $d$ are large and proportional. We compute the Bayes-optimal generalisation error of the network for any activation function in the regime where the number of training data $n$ scales quadratically with the input dimension, i.e., around the interpolation threshold where the number of trainable parameters $kd+k$ and of data points $n$ are comparable. Our analysis tackles generic weight distributions. Focusing on binary weights, we uncover a discontinuous phase transition separating a "universal" phase from a "specialisation" phase. In the first, the generalisation error is independent of the weight distribution and decays slowly with the sampling rate $n/d^2$, with the student learning only some non-linear combinations of the teacher weights. In the latter, the error is weight distribution-dependent and decays faster due to the alignment of the student towards the teacher network. We thus unveil the existence of a highly predictive solution near interpolation, which is however potentially hard to find.

cross Graph Learning for Bidirectional Disease Contact Tracing on Real Human Mobility Data

Authors: Sofia Hurtado, Radu Marculescu

Abstract: For rapidly spreading diseases where many cases show no symptoms, swift and effective contact tracing is essential. While exposure notification applications provide alerts on potential exposures, a fully automated system is needed to track the infectious transmission routes. To this end, our research leverages large-scale contact networks from real human mobility data to identify the path of transmission. More precisely, we introduce a new Infectious Path Centrality network metric that informs a graph learning edge classifier to identify important transmission events, achieving an F1-score of 94%. Additionally, we explore bidirectional contact tracing, which quarantines individuals both retroactively and proactively, and compare its effectiveness against traditional forward tracing, which only isolates individuals after testing positive. Our results indicate that when only 30% of symptomatic individuals are tested, bidirectional tracing can reduce infectious effective reproduction rate by 71%, thus significantly controlling the outbreak.

cross Differentially Private Steering for Large Language Model Alignment

Authors: Anmol Goel, Yaxi Hu, Iryna Gurevych, Amartya Sanyal

Abstract: Aligning Large Language Models (LLMs) with human values and away from undesirable behaviors (such as hallucination) has become increasingly important. Recently, steering LLMs towards a desired behavior via activation editing has emerged as an effective method to mitigate harmful generations at inference-time. Activation editing modifies LLM representations by preserving information from positive demonstrations (e.g., truthful) and minimising information from negative demonstrations (e.g., hallucinations). When these demonstrations come from a private dataset, the aligned LLM may leak private information contained in those private samples. In this work, we present the first study of aligning LLM behavior with private datasets. Our work proposes the \textit{\underline{P}rivate \underline{S}teering for LLM \underline{A}lignment (PSA)} algorithm to edit LLM activations with differential privacy (DP) guarantees. We conduct extensive experiments on seven different benchmarks with open-source LLMs of different sizes (0.5B to 7B) and model families (LlaMa, Qwen, Mistral and Gemma). Our results show that PSA achieves DP guarantees for LLM alignment with minimal loss in performance, including alignment metrics, open-ended text generation quality, and general-purpose reasoning. We also develop the first Membership Inference Attack (MIA) for evaluating and auditing the empirical privacy for the problem of LLM steering via activation editing. Our attack is tailored for activation editing and relies solely on the generated texts without their associated probabilities. Our experiments support the theoretical guarantees by showing improved guarantees for our \textit{PSA} algorithm compared to several existing non-private techniques.

cross Prediction-Powered Inference with Imputed Covariates and Nonuniform Sampling

Authors: Dan M. Kluger, Kerri Lu, Tijana Zrnic, Sherrie Wang, Stephen Bates

Abstract: Machine learning models are increasingly used to produce predictions that serve as input data in subsequent statistical analyses. For example, computer vision predictions of economic and environmental indicators based on satellite imagery are used in downstream regressions; similarly, language models are widely used to approximate human ratings and opinions in social science research. However, failure to properly account for errors in the machine learning predictions renders standard statistical procedures invalid. Prior work uses what we call the Predict-Then-Debias estimator to give valid confidence intervals when machine learning algorithms impute missing variables, assuming a small complete sample from the population of interest. We expand the scope by introducing bootstrap confidence intervals that apply when the complete data is a nonuniform (i.e., weighted, stratified, or clustered) sample and to settings where an arbitrary subset of features is imputed. Importantly, the method can be applied to many settings without requiring additional calculations. We prove that these confidence intervals are valid under no assumptions on the quality of the machine learning model and are no wider than the intervals obtained by methods that do not use machine learning predictions.

cross R.I.P.: Better Models by Survival of the Fittest Prompts

Authors: Ping Yu, Weizhe Yuan, Olga Golovneva, Tianhao Wu, Sainbayar Sukhbaatar, Jason Weston, Jing Xu

Abstract: Training data quality is one of the most important drivers of final model quality. In this work, we introduce a method for evaluating data integrity based on the assumption that low-quality input prompts result in high variance and low quality responses. This is achieved by measuring the rejected response quality and the reward gap between the chosen and rejected preference pair. Our method, Rejecting Instruction Preferences (RIP) can be used to filter prompts from existing training sets, or to make high quality synthetic datasets, yielding large performance gains across various benchmarks compared to unfiltered data. Using Llama 3.1-8B-Instruct, RIP improves AlpacaEval2 LC Win Rate by 9.4%, Arena-Hard by 8.7%, and WildBench by 9.9%. Using Llama 3.3-70B-Instruct, RIP improves Arena-Hard from 67.5 to 82.9, which is from 18th place to 6th overall in the leaderboard.

cross Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models

Authors: Hao Dong, Moru Liu, Kaiyang Zhou, Eleni Chatzi, Juho Kannala, Cyrill Stachniss, Olga Fink

Abstract: In real-world scenarios, achieving domain adaptation and generalization poses significant challenges, as models must adapt to or generalize across unknown target distributions. Extending these capabilities to unseen multimodal distributions, i.e., multimodal domain adaptation and generalization, is even more challenging due to the distinct characteristics of different modalities. Significant progress has been made over the years, with applications ranging from action recognition to semantic segmentation. Besides, the recent advent of large-scale pre-trained multimodal foundation models, such as CLIP, has inspired works leveraging these models to enhance adaptation and generalization performances or adapting them to downstream tasks. This survey provides the first comprehensive review of recent advances from traditional approaches to foundation models, covering: (1) Multimodal domain adaptation; (2) Multimodal test-time adaptation; (3) Multimodal domain generalization; (4) Domain adaptation and generalization with the help of multimodal foundation models; and (5) Adaptation of multimodal foundation models. For each topic, we formally define the problem and thoroughly review existing methods. Additionally, we analyze relevant datasets and applications, highlighting open challenges and potential future research directions. We maintain an active repository that contains up-to-date literature at https://github.com/donghao51/Awesome-Multimodal-Adaptation.

URLs: https://github.com/donghao51/Awesome-Multimodal-Adaptation.

cross Diffusion Autoencoders are Scalable Image Tokenizers

Authors: Yinbo Chen, Rohit Girdhar, Xiaolong Wang, Sai Saketh Rambhatla, Ishan Misra

Abstract: Tokenizing images into compact visual representations is a key step in learning efficient and high-quality image generative models. We present a simple diffusion tokenizer (DiTo) that learns compact visual representations for image generation models. Our key insight is that a single learning objective, diffusion L2 loss, can be used for training scalable image tokenizers. Since diffusion is already widely used for image generation, our insight greatly simplifies training such tokenizers. In contrast, current state-of-the-art tokenizers rely on an empirically found combination of heuristics and losses, thus requiring a complex training recipe that relies on non-trivially balancing different losses and pretrained supervised models. We show design decisions, along with theoretical grounding, that enable us to scale DiTo for learning competitive image representations. Our results show that DiTo is a simpler, scalable, and self-supervised alternative to the current state-of-the-art image tokenizer which is supervised. DiTo achieves competitive or better quality than state-of-the-art in image reconstruction and downstream image generation tasks.

replace Beyond Predictions in Neural ODEs: Identification and Interventions

Authors: Hananeh Aliee, Fabian J. Theis, Niki Kilbertus

Abstract: Spurred by tremendous success in pattern matching and prediction tasks, researchers increasingly resort to machine learning to aid original scientific discovery. Given large amounts of observational data about a system, can we uncover the rules that govern its evolution? Solving this task holds the great promise of fully understanding the causal interactions and being able to make reliable predictions about the system's behavior under interventions. We take a step towards answering this question for time-series data generated from systems of ordinary differential equations (ODEs). While the governing ODEs might not be identifiable from data alone, we show that combining simple regularization schemes with flexible neural ODEs can robustly recover the dynamics and causal structures from time-series data. Our results on a variety of (non)-linear first and second order systems as well as real data validate our method. We conclude by showing that we can also make accurate predictions under interventions on variables or the system itself.

replace Adaptive Client Sampling in Federated Learning via Online Learning with Bandit Feedback

Authors: Boxin Zhao, Lingxiao Wang, Ziqi Liu, Zhiqiang Zhang, Jun Zhou, Chaochao Chen, Mladen Kolar

Abstract: Due to the high cost of communication, federated learning (FL) systems need to sample a subset of clients that are involved in each round of training. As a result, client sampling plays an important role in FL systems as it affects the convergence rate of optimization algorithms used to train machine learning models. Despite its importance, there is limited work on how to sample clients effectively. In this paper, we cast client sampling as an online learning task with bandit feedback, which we solve with an online stochastic mirror descent (OSMD) algorithm designed to minimize the sampling variance. We then theoretically show how our sampling method can improve the convergence speed of federated optimization algorithms over the widely used uniform sampling. Through both simulated and real data experiments, we empirically illustrate the advantages of the proposed client sampling algorithm over uniform sampling and existing online learning-based sampling strategies. The proposed adaptive sampling procedure is applicable beyond the FL problem studied here and can be used to improve the performance of stochastic optimization procedures such as stochastic gradient descent and stochastic coordinate descent.

replace HPSCAN: Human Perception-Based Scattered Data Clustering

Authors: Sebastian Hartwig, Christian van Onzenoodt, Dominik Engel, Pedro Hermosilla, Timo Ropinski

Abstract: Cluster separation is a task typically tackled by widely used clustering techniques, such as k-means or DBSCAN. However, these algorithms are based on non-perceptual metrics, and our experiments demonstrate that their output does not reflect human cluster perception. To bridge the gap between human cluster perception and machine-computed clusters, we propose HPSCAN, a learning strategy that operates directly on scattered data. To learn perceptual cluster separation on such data, we crowdsourced the labeling of 7,320 bivariate (scatterplot) datasets to 384 human participants. We train our HPSCAN model on these human-annotated data. Instead of rendering these data as scatterplot images, we used their x and y point coordinates as input to a modified PointNet++ architecture, enabling direct inference on point clouds. In this work, we provide details on how we collected our dataset, report statistics of the resulting annotations, and investigate the perceptual agreement of cluster separation for real-world data. We also report the training and evaluation protocol for HPSCAN and introduce a novel metric, that measures the accuracy between a clustering technique and a group of human annotators. We explore predicting point-wise human agreement to detect ambiguities. Finally, we compare our approach to ten established clustering techniques and demonstrate that HPSCAN is capable of generalizing to unseen and out-of-scope data.

replace MINN: Learning the dynamics of differential-algebraic equations and application to battery modeling

Authors: Yicun Huang, Changfu Zou, Yang Li, Torsten Wik

Abstract: The concept of integrating physics-based and data-driven approaches has become popular for modeling sustainable energy systems. However, the existing literature mainly focuses on the data-driven surrogates generated to replace physics-based models. These models often trade accuracy for speed but lack the generalizability, adaptability, and interpretability inherent in physics-based models, which are often indispensable in modeling real-world dynamic systems for optimization and control purposes. We propose a novel machine learning architecture, termed model-integrated neural networks (MINN), that can learn the physics-based dynamics of general autonomous or non-autonomous systems consisting of partial differential-algebraic equations (PDAEs). The obtained architecture systematically solves an unsettled research problem in control-oriented modeling, i.e., how to obtain optimally simplified models that are physically insightful, numerically accurate, and computationally tractable simultaneously. We apply the proposed neural network architecture to model the electrochemical dynamics of lithium-ion batteries and show that MINN is extremely data-efficient to train while being sufficiently generalizable to previously unseen input data, owing to its underlying physical invariants. The MINN battery model has an accuracy comparable to the first principle-based model in predicting both the system outputs and any locally distributed electrochemical behaviors but achieves two orders of magnitude reduction in the solution time.

replace Active Learning For Contextual Linear Optimization: A Margin-Based Approach

Authors: Mo Liu, Paul Grigas, Heyuan Liu, Zuo-Jun Max Shen

Abstract: We develop the first active learning method for contextual linear optimization. Specifically, we introduce a label acquisition algorithm that sequentially decides whether to request the ``labels'' of feature samples from an unlabeled data stream, where the labels correspond to the coefficients of the objective in the linear optimization. Our method is the first to be directly informed by the decision loss induced by the predicted coefficients, referred to as the Smart Predict-then-Optimize (SPO) loss. Motivated by the structure of the SPO loss, our algorithm adopts a margin-based criterion utilizing the concept of distance to degeneracy. In particular, we design an efficient active learning algorithm with theoretical excess risk (i.e., generalization) guarantees. We derive upper bounds on the label complexity, defined as the number of samples whose labels are acquired to achieve a desired small level of SPO risk. These bounds show that our algorithm has a much smaller label complexity than the naive supervised learning approach that labels all samples, particularly when the SPO loss is minimized directly on the collected data. To address the discontinuity and nonconvexity of the SPO loss, we derive label complexity bounds under tractable surrogate loss functions. Under natural margin conditions, these bounds also outperform naive supervised learning. Using the SPO+ loss, a specialized surrogate of the SPO loss, we establish even tighter bounds under separability conditions. Finally, we present numerical evidence showing the practical value of our algorithms in settings such as personalized pricing and the shortest path problem.

replace Generative Adversarial Reduced Order Modelling

Authors: Dario Coscia, Nicola Demo, Gianluigi Rozza

Abstract: In this work, we present GAROM, a new approach for reduced order modelling (ROM) based on generative adversarial networks (GANs). GANs have the potential to learn data distribution and generate more realistic data. While widely applied in many areas of deep learning, little research is done on their application for ROM, i.e. approximating a high-fidelity model with a simpler one. In this work, we combine the GAN and ROM framework, by introducing a data-driven generative adversarial model able to learn solutions to parametric differential equations. The latter is achieved by modelling the discriminator network as an autoencoder, extracting relevant features of the input, and applying a conditioning mechanism to the generator and discriminator networks specifying the differential equation parameters. We show how to apply our methodology for inference, provide experimental evidence of the model generalisation, and perform a convergence study of the method.

replace Necessary and Sufficient Conditions for Optimal Decision Trees using Dynamic Programming

Authors: Jacobus G. M. van der Linden, Mathijs M. de Weerdt, Emir Demirovi\'c

Abstract: Global optimization of decision trees has shown to be promising in terms of accuracy, size, and consequently human comprehensibility. However, many of the methods used rely on general-purpose solvers for which scalability remains an issue. Dynamic programming methods have been shown to scale much better because they exploit the tree structure by solving subtrees as independent subproblems. However, this only works when an objective can be optimized separately for subtrees. We explore this relationship in detail and show the necessary and sufficient conditions for such separability and generalize previous dynamic programming approaches into a framework that can optimize any combination of separable objectives and constraints. Experiments on five application domains show the general applicability of this framework, while outperforming the scalability of general-purpose solvers by a large margin.

replace Virtual Human Generative Model: Masked Modeling Approach for Learning Human Characteristics

Authors: Kenta Oono, Nontawat Charoenphakdee, Kotatsu Bito, Zhengyan Gao, Hideyoshi Igata, Masashi Yoshikawa, Yoshiaki Ota, Hiroki Okui, Kei Akita, Shoichiro Yamaguchi, Yohei Sugawara, Shin-ichi Maeda, Kunihiko Miyoshi, Yuki Saito, Koki Tsuda, Hiroshi Maruyama, Kohei Hayashi

Abstract: Identifying the relationship between healthcare attributes, lifestyles, and personality is vital for understanding and improving physical and mental well-being. Machine learning approaches are promising for modeling their relationships and offering actionable suggestions. In this paper, we propose the Virtual Human Generative Model (VHGM), a novel deep generative model capable of estimating over 2,000 attributes across healthcare, lifestyle, and personality domains. VHGM leverages masked modeling to learn the joint distribution of attributes, enabling accurate predictions and robust conditional sampling. We deploy VHGM as a web service, showcasing its versatility in driving diverse healthcare applications aimed at improving user well-being. Through extensive quantitative evaluations, we demonstrate VHGM's superior performance in attribute imputation and high-quality sample generation compared to existing baselines. This work highlights VHGM as a powerful tool for personalized healthcare and lifestyle management, with broad implications for data-driven health solutions.

replace Enhancing Hyperedge Prediction with Context-Aware Self-Supervised Learning

Authors: Yunyong Ko, Hanghang Tong, Sang-Wook Kim

Abstract: Hypergraphs can naturally model group-wise relations (e.g., a group of users who co-purchase an item) as hyperedges. Hyperedge prediction is to predict future or unobserved hyperedges, which is a fundamental task in many real-world applications (e.g., group recommendation). Despite the recent breakthrough of hyperedge prediction methods, the following challenges have been rarely studied: (C1) How to aggregate the nodes in each hyperedge candidate for accurate hyperedge prediction? and (C2) How to mitigate the inherent data sparsity problem in hyperedge prediction? To tackle both challenges together, in this paper, we propose a novel hyperedge prediction framework (CASH) that employs (1) context-aware node aggregation to precisely capture complex relations among nodes in each hyperedge for (C1) and (2) self-supervised contrastive learning in the context of hyperedge prediction to enhance hypergraph representations for (C2). Furthermore, as for (C2), we propose a hyperedge-aware augmentation method to fully exploit the latent semantics behind the original hypergraph and consider both node-level and group-level contrasts (i.e., dual contrasts) for better node and hyperedge representations. Extensive experiments on six real-world hypergraphs reveal that CASH consistently outperforms all competing methods in terms of the accuracy in hyperedge prediction and each of the proposed strategies is effective in improving the model accuracy of CASH. For the detailed information of CASH, we provide the code and datasets at: https://github.com/yy-ko/cash.

URLs: https://github.com/yy-ko/cash.

replace Efficient Methods for Non-stationary Online Learning

Authors: Peng Zhao, Yan-Feng Xie, Lijun Zhang, Zhi-Hua Zhou

Abstract: Non-stationary online learning has drawn much attention in recent years. In particular, dynamic regret and adaptive regret are proposed as two principled performance measures for online convex optimization in non-stationary environments. To optimize them, a two-layer online ensemble is usually deployed due to the inherent uncertainty of the non-stationarity, in which a group of base-learners are maintained and a meta-algorithm is employed to track the best one on the fly. However, the two-layer structure raises the concern about the computational complexity -- those methods typically maintain $\mathcal{O}(\log T)$ base-learners simultaneously for a $T$-round online game and thus perform multiple projections onto the feasible domain per round, which becomes the computational bottleneck when the domain is complicated. In this paper, we first present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $\mathcal{O}(\log T)$ to $1$. The obtained algorithms require only one gradient query and one function evaluation at each round. Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods. Furthermore, we study an even strengthened measure, namely the ``interval dynamic regret'', and reduce the number of projections per round from $\mathcal{O}(\log^2 T)$ to $1$ to minimize it. Our reduction demonstrates great generalizability and can be applied to two important applications: online stochastic control and online principal component analysis, resulting in methods that are both efficient and optimal. Finally, empirical studies verify our theoretical findings.

replace Effective Learning with Node Perturbation in Multi-Layer Neural Networks

Authors: Sander Dalm, Marcel van Gerven, Nasir Ahmad

Abstract: Backpropagation (BP) remains the dominant and most successful method for training parameters of deep neural network models. However, BP relies on two computationally distinct phases, does not provide a satisfactory explanation of biological learning, and can be challenging to apply for training of networks with discontinuities or noisy node dynamics. By comparison, node perturbation (NP) proposes learning by the injection of noise into network activations, and subsequent measurement of the induced loss change. NP relies on two forward (inference) passes, does not make use of network derivatives, and has been proposed as a model for learning in biological systems. However, standard NP is highly data inefficient and unstable due to its unguided noise-based search process. In this work, we investigate different formulations of NP and relate it to the concept of directional derivatives as well as combining it with a decorrelating mechanism for layer-wise inputs. We find that a closer alignment with directional derivatives together with input decorrelation at every layer strongly enhances performance of NP learning with large improvements in parameter convergence and much higher performance on the test data, approaching that of BP. Furthermore, our novel formulation allows for application to noisy systems in which the noise process itself is inaccessible.

replace Fold Bifurcation Identification through Scientific Machine Learning

Authors: Giuseppe Habib, \'Ad\'am Horv\'ath

Abstract: This study employs scientific machine learning to identify transient time series of dynamical systems near a fold bifurcation of periodic solutions. The unique aspect of this work is that a convolutional neural network (CNN) is trained with a relatively small amount of data and on a single, very simple system, yet it is tested on much more complicated systems. This task requires strong generalization capabilities, which are achieved by incorporating physics-based information. This information is provided through a specific pre-processing of the input data, which includes transformation into polar coordinates, normalization, transformation into the logarithmic scale, and filtering through a moving mean. The results demonstrate that such data pre-processing enables the CNN to grasp the important features related to transient time-series near a fold bifurcation, namely, the trend of the oscillation amplitude, and disregard other characteristics that are not particularly relevant, such as the vibration frequency. The developed CNN was able to correctly classify transient trajectories near a fold for a mass-on-moving-belt system, a van der Pol-Duffing oscillator with an attached tuned mass damper, and a pitch-and-plunge wing profile. The results contribute to the progress towards the development of similar CNNs effective in real-life applications such as safety monitoring of dynamical systems.

replace Efficient Algorithms for Regularized Nonnegative Scale-invariant Low-rank Approximation Models

Authors: Jeremy E. Cohen, Valentin Leplat

Abstract: Regularized nonnegative low-rank approximations, such as sparse Nonnegative Matrix Factorization or sparse Nonnegative Tucker Decomposition, form an important branch of dimensionality reduction models known for their enhanced interpretability. From a practical perspective, however, selecting appropriate regularizers and regularization coefficients, as well as designing efficient algorithms, remains challenging due to the multifactor nature of these models and the limited theoretical guidance available. This paper addresses these challenges by studying a more general model, the Homogeneous Regularized Scale-Invariant model. We prove that the scale-invariance inherent to low-rank approximation models induces an implicit regularization effect that balances solutions. This insight provides a deeper understanding of the role of regularization functions in low-rank approximation models, informs the selection of regularization hyperparameters, and enables the design of balancing strategies to accelerate the empirical convergence of optimization algorithms. Additionally, we propose a generic Majorization-Minimization (MM) algorithm capable of handling $\ell_p^p$-regularized nonnegative low-rank approximations with non-Euclidean loss functions, with convergence guarantees. Our contributions are demonstrated on sparse Nonnegative Matrix Factorization, ridge-regularized Nonnegative Canonical Polyadic Decomposition, and sparse Nonnegative Tucker Decomposition.

replace Efficient Learning With Sine-Activated Low-rank Matrices

Authors: Yiping Ji, Hemanth Saratchandran, Cameron Gordon, Zeyu Zhang, Simon Lucey

Abstract: Low-rank decomposition has emerged as a vital tool for enhancing parameter efficiency in neural network architectures, gaining traction across diverse applications in machine learning. These techniques significantly lower the number of parameters, striking a balance between compactness and performance. However, a common challenge has been the compromise between parameter efficiency and the accuracy of the model, where reduced parameters often lead to diminished accuracy compared to their full-rank counterparts. In this work, we propose a novel theoretical framework that integrates a sinusoidal function within the low-rank decomposition process. This approach not only preserves the benefits of the parameter efficiency characteristic of low-rank methods but also increases the decomposition's rank, thereby enhancing model performance. Our method proves to be a plug in enhancement for existing low-rank models, as evidenced by its successful application in Vision Transformers (ViT), Large Language Models (LLMs), Neural Radiance Fields (NeRF) and 3D shape modelling.

replace Probabilistic Verification of Neural Networks using Branch and Bound

Authors: David Boetius, Stefan Leue, Tobias Sutter

Abstract: Probabilistic verification of neural networks is concerned with formally analysing the output distribution of a neural network under a probability distribution of the inputs. Examples of probabilistic verification include verifying the demographic parity fairness notion or quantifying the safety of a neural network. We present a new algorithm for the probabilistic verification of neural networks based on an algorithm for computing and iteratively refining lower and upper bounds on probabilities over the outputs of a neural network. By applying state-of-the-art bound propagation and branch and bound techniques from non-probabilistic neural network verification, our algorithm significantly outpaces existing probabilistic verification algorithms, reducing solving times for various benchmarks from the literature from tens of minutes to tens of seconds. Furthermore, our algorithm compares favourably even to dedicated algorithms for restricted subsets of probabilistic verification. We complement our empirical evaluation with a theoretical analysis, proving that our algorithm is sound and, under mildly restrictive conditions, also complete when using a suitable set of heuristics.

replace Overcoming Uncertain Incompleteness for Robust Multimodal Sequential Diagnosis Prediction via Curriculum Data Erasing Guided Knowledge Distillation

Authors: Heejoon Koo

Abstract: In this paper, we present NECHO v2, a novel framework designed to enhance the predictive accuracy of multimodal sequential patient diagnoses under uncertain missing visit sequences, a common challenge in real clinical settings. Firstly, we modify NECHO, designed in a diagnosis code-centric fashion, to handle uncertain modality representation dominance under the imperfect data. Secondly, we develop a systematic knowledge distillation by employing the modified NECHO as both teacher and student. It encompasses a modality-wise contrastive and hierarchical distillation, transformer representation random distillation, along with other distillations to align representations between teacher and student tightly and effectively. We also propose curriculum learning guided random data erasing within sequences during both training and distillation of the teacher to lightly simulate scenario with missing visit information, thereby fostering effective knowledge transfer. As a result, NECHO v2 verifies itself by showing robust superiority in multimodal sequential diagnosis prediction under both balanced and imbalanced incomplete settings on multimodal healthcare data.

replace Can Optimization Trajectories Explain Multi-Task Transfer?

Authors: David Mueller, Mark Dredze, Nicholas Andrews

Abstract: Despite the widespread adoption of multi-task training in deep learning, little is understood about how multi-task learning (MTL) affects generalization. Prior work has conjectured that the negative effects of MTL are due to optimization challenges that arise during training, and many optimization methods have been proposed to improve multi-task performance. However, recent work has shown that these methods fail to consistently improve multi-task generalization. In this work, we seek to improve our understanding of these failures by empirically studying how MTL impacts the optimization of tasks, and whether this impact can explain the effects of MTL on generalization. We show that MTL results in a generalization gap (a gap in generalization at comparable training loss) between single-task and multi-task trajectories early into training. However, we find that factors of the optimization trajectory previously proposed to explain generalization gaps in single-task settings cannot explain the generalization gaps between single-task and multi-task models. Moreover, we show that the amount of gradient conflict between tasks is correlated with negative effects to task optimization, but is not predictive of generalization. Our work sheds light on the underlying causes for failures in MTL and, importantly, raises questions about the role of general purpose multi-task optimization algorithms.

replace CHIRPs: Change-Induced Regret Proxy metrics for Lifelong Reinforcement Learning

Authors: John Birkbeck, Adam Sobey, Federico Cerutti, Katherine Heseltine Hurley Flynn, Timothy J. Norman

Abstract: Reinforcement learning (RL) agents are costly to train and fragile to environmental changes. They often perform poorly when there are many changing tasks, prohibiting their widespread deployment in the real world. Many Lifelong RL agent designs have been proposed to mitigate issues such as catastrophic forgetting or demonstrate positive characteristics like forward transfer when change occurs. However, no prior work has established whether the impact on agent performance can be predicted from the change itself. Understanding this relationship will help agents proactively mitigate a change's impact for improved learning performance. We propose Change-Induced Regret Proxy (CHIRP) metrics to link change to agent performance drops and use two environments to demonstrate a CHIRP's utility in lifelong learning. A simple CHIRP-based agent achieved $48\%$ higher performance than the next best method in one benchmark and attained the best success rates in 8 of 10 tasks in a second benchmark which proved difficult for existing lifelong RL agents.

replace LLMs & XAI for Water Sustainability: Seasonal Water Quality Prediction with LIME Explainable AI and a RAG-based Chatbot for Insights

Authors: Biplov Paneru, Bishwash Paneru

Abstract: Ensuring safe water supplies requires effective water quality monitoring, especially in developing countries like Nepal, where contamination risks are high. This paper introduces a hybrid deep learning model to predict Nepal's seasonal water quality using a small dataset with multiple water quality parameters. Models such as CatBoost, XGBoost, Extra Trees, and LightGBM, along with a neural network combining CNN and RNN layers, are used to capture temporal and spatial patterns in the data. The model demonstrated notable accuracy improvements, aiding proactive water quality control. CatBoost, XGBoost, and Extra Trees Regressor predicted Water Quality Index (WQI) values with an average RMSE of 1.2 and an R2 score of 0.99. Additionally, classifiers achieved 99 percent accuracy, cross-validated across models. LIME analysis highlighted the importance of indicators like EC and DO levels in XGBoost classification decisions. The neural network model achieved 92 percent classification accuracy and an R2 score of 0.97, with an RMSE of 2.87 in regression analysis. Furthermore, a multifunctional application was developed to predict WQI values using both regression and classification methods.

replace Remaining Useful Life Prediction for Batteries Utilizing an Explainable AI Approach with a Predictive Application for Decision-Making

Authors: Biplov Paneru, Bipul Thapa, Durga Prasad Mainali, Bishwash Paneru, Krishna Bikram Shah

Abstract: Accurately estimating the Remaining Useful Life (RUL) of a battery is essential for determining its lifespan and recharge requirements. In this work, we develop machine learning-based models to predict and classify battery RUL. We introduce a two-level ensemble learning (TLE) framework and a CNN+MLP hybrid model for RUL prediction, comparing their performance against traditional, deep, and hybrid machine learning models. Our analysis evaluates various models for both prediction and classification while incorporating interpretability through SHAP. The proposed TLE model consistently outperforms baseline models in RMSE, MAE, and R squared error, demonstrating its superior predictive capabilities. Additionally, the XGBoost classifier achieves an impressive 99% classification accuracy, validated through cross-validation techniques. The models effectively predict relay-based charging triggers, enabling automated and energy-efficient charging processes. This automation reduces energy consumption and enhances battery performance by optimizing charging cycles. SHAP interpretability analysis highlights the cycle index and charging parameters as the most critical factors influencing RUL. To improve accessibility, we developed a Tkinter-based GUI that allows users to input new data and predict RUL in real time. This practical solution supports sustainable battery management by enabling data-driven decisions about battery usage and maintenance, contributing to energy-efficient and innovative battery life prediction.

replace Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts

Authors: Minh Le, Chau Nguyen, Huy Nguyen, Quyen Tran, Trung Le, Nhat Ho

Abstract: Prompt-based techniques, such as prompt-tuning and prefix-tuning, have gained prominence for their efficiency in fine-tuning large pre-trained models. Despite their widespread adoption, the theoretical foundations of these methods remain limited. For instance, in prefix-tuning, we observe that a key factor in achieving performance parity with full fine-tuning lies in the reparameterization strategy. However, the theoretical principles underpinning the effectiveness of this approach have yet to be thoroughly examined. Our study demonstrates that reparameterization is not merely an engineering trick but is grounded in deep theoretical foundations. Specifically, we show that the reparameterization strategy implicitly encodes a shared structure between prefix key and value vectors. Building on recent insights into the connection between prefix-tuning and mixture of experts models, we further illustrate that this shared structure significantly improves sample efficiency in parameter estimation compared to non-shared alternatives. The effectiveness of prefix-tuning across diverse tasks is empirically confirmed to be enhanced by the shared structure, through extensive experiments in both visual and language domains. Additionally, we uncover similar structural benefits in prompt-tuning, offering new perspectives on its success. Our findings provide theoretical and empirical contributions, advancing the understanding of prompt-based methods and their underlying mechanisms. Our code is publicly available at https://github.com/Minhchuyentoancbn/ReparamPrefix

URLs: https://github.com/Minhchuyentoancbn/ReparamPrefix

replace How Much Can We Forget about Data Contamination?

Authors: Sebastian Bordt, Suraj Srinivas, Valentyn Boreiko, Ulrike von Luxburg

Abstract: The leakage of benchmark data into the training data has emerged as a significant challenge for evaluating the capabilities of large language models (LLMs). In this work, we challenge the common assumption that small-scale contamination renders benchmark evaluations invalid. First, we experimentally quantify the magnitude of benchmark overfitting based on scaling along three dimensions: The number of model parameters (up to 1.6B), the number of times an example is seen (up to 144), and the number of training tokens (up to 40B). If model and data follow the Chinchilla scaling laws, minor contamination indeed leads to overfitting. At the same time, even 144 times of contamination can be forgotten if the training data is scaled beyond five times Chinchilla, a regime characteristic of many modern LLMs. Continual pre-training of OLMo-7B corroborates these results. Next, we study the impact of the weight decay parameter on example forgetting, showing that empirical forgetting occurs faster than the cumulative weight decay. This allows us to gauge the degree of example forgetting in large-scale training runs, indicating that many LLMs, including Lllama 3 405B, have forgotten the data seen at the beginning of training.

replace Adaptive Guidance for Local Training in Heterogeneous Federated Learning

Authors: Jianqing Zhang, Yang Liu, Yang Hua, Jian Cao, Qiang Yang

Abstract: Model heterogeneity poses a significant challenge in Heterogeneous Federated Learning (HtFL). In scenarios with diverse model architectures, directly aggregating model parameters is impractical, leading HtFL methods to incorporate an extra objective alongside the original local objective on each client to facilitate collaboration. However, this often results in a mismatch between the extra and local objectives. To resolve this, we propose Federated Learning-to-Guide (FedL2G), a method that adaptively learns to guide local training in a federated manner, ensuring the added objective aligns with each client's original goal. With theoretical guarantees, FedL2G utilizes only first-order derivatives w.r.t. model parameters, achieving a non-convex convergence rate of O(1/T). We conduct extensive experiments across two data heterogeneity and six model heterogeneity settings, using 14 heterogeneous model architectures (e.g., CNNs and ViTs). The results show that FedL2G significantly outperforms seven state-of-the-art methods.

replace Distillation of Discrete Diffusion through Dimensional Correlations

Authors: Satoshi Hayakawa, Yuhta Takida, Masaaki Imaizumi, Hiromi Wakaki, Yuki Mitsufuji

Abstract: Diffusion models have demonstrated exceptional performances in various fields of generative modeling, but suffer from slow sampling speed due to their iterative nature. While this issue is being addressed in continuous domains, discrete diffusion models face unique challenges, particularly in capturing dependencies between elements (e.g., pixel relationships in image, sequential dependencies in language) mainly due to the computational cost of processing high-dimensional joint distributions. In this paper, (i) we propose "mixture" models for discrete diffusion that are capable of treating dimensional correlations while remaining scalable, and (ii) we provide a set of loss functions for distilling the iterations of existing models. Two primary theoretical insights underpin our approach: First, conventional models with element-wise independence can well approximate the data distribution, but essentially require many sampling steps. Second, our loss functions enable the mixture models to distill such many-step conventional models into just a few steps by learning the dimensional correlations. Our experimental results show the effectiveness of the proposed method in distilling pretrained discrete diffusion models across image and language domains.

replace Toward Understanding In-context vs. In-weight Learning

Authors: Bryan Chan, Xinyi Chen, Andr\'as Gy\"orgy, Dale Schuurmans

Abstract: It has recently been demonstrated empirically that in-context learning emerges in transformers when certain distributional properties are present in the training data, but this ability can also diminish upon further training. We provide a new theoretical understanding of these phenomena by identifying simplified distributional properties that give rise to the emergence and eventual disappearance of in-context learning. We do so by first analyzing a simplified model that uses a gating mechanism to choose between an in-weight and an in-context predictor. Through a combination of a generalization error and regret analysis we identify conditions where in-context and in-weight learning emerge. These theoretical findings are then corroborated experimentally by comparing the behaviour of a full transformer on the simplified distributions to that of the stylized model, demonstrating aligned results. We then extend the study to a full large language model, showing how fine-tuning on various collections of natural language prompts can elicit similar in-context and in-weight learning behaviour.

replace LSEAttention is All You Need for Time Series Forecasting

Authors: Dizhen Liang

Abstract: Transformer-based architectures have achieved remarkable success in natural language processing and computer vision. However, their performance in multivariate long-term forecasting often falls short compared to simpler linear baselines. Previous research has identified the traditional attention mechanism as a key factor limiting their effectiveness in this domain. To bridge this gap, we introduce LATST, a novel approach designed to mitigate entropy collapse and training instability common challenges in Transformer-based time series forecasting. We rigorously evaluate LATST across multiple real-world multivariate time series datasets, demonstrating its ability to outperform existing state-of-the-art Transformer models. Notably, LATST manages to achieve competitive performance with fewer parameters than some linear models on certain datasets, highlighting its efficiency and effectiveness.

replace An information-matching approach to optimal experimental design and active learning

Authors: Yonatan Kurniawan (Brigham Young University, Provo, UT, USA), Tracianne B. Neilsen (Brigham Young University, Provo, UT, USA), Benjamin L. Francis (Achilles Heel Technologies, Orem, UT, USA), Alex M. Stankovic (SLAC National Accelerator Laboratory, Menlo Park, CA, USA), Mingjian Wen (University of Houston, Houston, TX, USA), Ilia Nikiforov (University of Minnesota, Minneapolis, MN, USA), Ellad B. Tadmor (University of Minnesota, Minneapolis, MN, USA), Vasily V. Bulatov (Lawrence Livermore National Laboratory), Vincenzo Lordi (Lawrence Livermore National Laboratory), Mark K. Transtrum (Brigham Young University, Provo, UT, USA, SLAC National Accelerator Laboratory, Menlo Park, CA, USA)

Abstract: The efficacy of mathematical models heavily depends on the quality of the training data, yet collecting sufficient data is often expensive and challenging. Many modeling applications require inferring parameters only as a means to predict other quantities of interest (QoI). Because models often contain many unidentifiable (sloppy) parameters, QoIs often depend on a relatively small number of parameter combinations. Therefore, we introduce an information-matching criterion based on the Fisher Information Matrix to select the most informative training data from a candidate pool. This method ensures that the selected data contain sufficient information to learn only those parameters that are needed to constrain downstream QoIs. It is formulated as a convex optimization problem, making it scalable to large models and datasets. We demonstrate the effectiveness of this approach across various modeling problems in diverse scientific fields, including power systems and underwater acoustics. Finally, we use information-matching as a query function within an Active Learning loop for material science applications. In all these applications, we find that a relatively small set of optimal training data can provide the necessary information for achieving precise predictions. These results are encouraging for diverse future applications, particularly active learning in large machine learning models.

replace Fitting Multiple Machine Learning Models with Performance Based Clustering

Authors: Mehmet Efe Lorasdagi, Ahmet Berker Koc, Ali Taha Koc, Suleyman Serdar Kozat

Abstract: Traditional machine learning approaches assume that data comes from a single generating mechanism, which may not hold for most real life data. In these cases, the single mechanism assumption can result in suboptimal performance. We introduce a clustering framework that eliminates this assumption by grouping the data according to the relations between the features and the target values and we obtain multiple separate models to learn different parts of the data. We further extend our framework to applications having streaming data where we produce outcomes using an ensemble of models. For this, the ensemble weights are updated based on the incoming data batches. We demonstrate the performance of our approach over the widely-studied real life datasets, showing significant improvements over the traditional single-model approaches.

replace Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training

Authors: Elia Cunegatti, Leonardo Lucio Custode, Giovanni Iacca

Abstract: Network pruning focuses on computational techniques that aim to reduce a given model's computational cost by removing a subset of its parameters while having minimal impact on performance. Throughout the last decade, the most widely used pruning paradigm has been pruning and re-training, which nowadays is inconvenient due to the vast amount of pre-trained models, which are in any case too expensive to re-train. In this paper, we exploit functional information from dense pre-trained models, i.e., their activations, to obtain sparse models that maximize the activations' alignment w.r.t. their corresponding dense models. Hence, we propose \textsc{NeuroAL}, a \emph{top-up} algorithm that can be used on top of any given pruning algorithm for LLMs, which modifies the block-wise and row-wise sparsity exploiting information from both the dense model and its sparse version to maximize the \emph{neuron alignment} among activations. Differently from existing methods, our approach adaptively selects the best hyperparameters for the block-wise and row-wise sparsity ratios w.r.t. the model and the desired sparsity, and requires \emph{no re-training}. We test our method over 276 cases combining four LLM families, three sparsity ratios, and ten language tasks (three language modeling and seven zero-shot datasets), showing how it consistently outperforms the latest state-of-the-art methods in terms of performance-runtime trade-off. The code is available at \href{https://github.com/eliacunegatti/NeuroAL}{https://github.com/eliacunegatti/NeuroAL}.

URLs: https://github.com/eliacunegatti/NeuroAL, https://github.com/eliacunegatti/NeuroAL

replace General Geospatial Inference with a Population Dynamics Foundation Model

Authors: Mohit Agarwal, Mimi Sun, Chaitanya Kamath, Arbaaz Muslim, Prithul Sarker, Joydeep Paul, Hector Yee, Marcin Sieniek, Kim Jablonski, Yael Mayer, David Fork, Sheila de Guia, Jamie McPike, Adam Boulanger, Tomer Shekel, David Schottlander, Yao Xiao, Manjit Chakravarthy Manukonda, Yun Liu, Neslihan Bulut, Sami Abu-el-haija, Bryan Perozzi, Monica Bharel, Von Nguyen, Luke Barrington, Niv Efron, Yossi Matias, Greg Corrado, Krish Eswaran, Shruthi Prabhakara, Shravya Shetty, Gautam Prasad

Abstract: Supporting the health and well-being of dynamic populations around the world requires governmental agencies, organizations and researchers to understand and reason over complex relationships between human behavior and local contexts in order to identify high-risk groups and strategically allocate limited resources. Traditional approaches to these classes of problems often entail developing manually curated, task-specific features and models to represent human behavior and the natural and built environment, which can be challenging to adapt to new, or even, related tasks. To address this, we introduce a Population Dynamics Foundation Model (PDFM) that aims to capture the relationships between diverse data modalities and is applicable to a broad range of geospatial tasks. We first construct a geo-indexed dataset for postal codes and counties across the United States, capturing rich aggregated information on human behavior from maps, busyness, and aggregated search trends, and environmental factors such as weather and air quality. We then model this data and the complex relationships between locations using a graph neural network, producing embeddings that can be adapted to a wide range of downstream tasks using relatively simple models. We evaluate the effectiveness of our approach by benchmarking it on 27 downstream tasks spanning three distinct domains: health indicators, socioeconomic factors, and environmental measurements. The approach achieves state-of-the-art performance on all 27 geospatial interpolation tasks, and on 25 out of the 27 extrapolation and super-resolution tasks. We combined the PDFM with a state-of-the-art forecasting foundation model, TimesFM, to predict unemployment and poverty, achieving performance that surpasses fully supervised forecasting. The full set of embeddings and sample code are publicly available for researchers.

replace Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders

Authors: Charles O'Neill, Alim Gumran, David Klindt

Abstract: A recent line of work has shown promise in using sparse autoencoders (SAEs) to uncover interpretable features in neural network representations. However, the simple linear-nonlinear encoding mechanism in SAEs limits their ability to perform accurate sparse inference. Using compressed sensing theory, we prove that an SAE encoder is inherently insufficient for accurate sparse inference, even in solvable cases. We then decouple encoding and decoding processes to empirically explore conditions where more sophisticated sparse inference methods outperform traditional SAE encoders. Our results reveal substantial performance gains with minimal compute increases in correct inference of sparse codes. We demonstrate this generalises to SAEs applied to large language models, where more expressive encoders achieve greater interpretability. This work opens new avenues for understanding neural network representations and analysing large language model activations.

replace Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning

Authors: Emile Anand, Ishani Karmarkar, Guannan Qu

Abstract: Designing efficient algorithms for multi-agent reinforcement learning (MARL) is fundamentally challenging because the size of the joint state and action spaces grows exponentially in the number of agents. These difficulties are exacerbated when balancing sequential global decision-making with local agent interactions. In this work, we propose a new algorithm $\texttt{SUBSAMPLE-MFQ}$ ($\textbf{Subsample}$-$\textbf{M}$ean-$\textbf{F}$ield-$\textbf{Q}$-learning) and a decentralized randomized policy for a system with $n$ agents. For $k\leq n$, our algorithm learns a policy for the system in time polynomial in $k$. We show that this learned policy converges to the optimal policy on the order of $\tilde{O}(1/\sqrt{k})$ as the number of subsampled agents $k$ increases. We empirically validate our method in Gaussian squeeze and global exploration settings.

replace Bayesian Flow Is All You Need to Sample Out-of-Distribution Chemical Spaces

Authors: Nianze Tao

Abstract: Generating novel molecules with higher properties than the training space, namely the out-of-distribution generation, is important for ${de~novo}$ drug design. However, it is not easy for distribution learning-based models, for example diffusion models, to solve this challenge as these methods are designed to fit the distribution of training data as close as possible. In this paper, we show that Bayesian flow network is capable of effortlessly generating high quality out-of-distribution samples that meet several scenarios. We introduce a semi-autoregressive training/sampling method that helps to enhance the model performance and surpass the state-of-the-art models.

replace The Meta-Representation Hypothesis

Authors: Zhengpeng Xie, Jiahang Cao, Qiang Zhang, Jianxiong Zhang, Changwei Wang, Renjing Xu

Abstract: Humans rely on high-level understandings of things, i.e., meta-representations, to engage in abstract reasoning. In complex cognitive tasks, these meta-representations help individuals abstract general rules from experience. However, constructing such meta-representations from high-dimensional observations remains a longstanding challenge for reinforcement learning (RL) agents. For instance, a well-trained agent often fails to generalize to even minor variations of the same task, such as changes in background color, while humans can easily handle. In this paper, we theoretically investigate how meta-representations contribute to the generalization ability of RL agents, demonstrating that learning meta-representations from high-dimensional observations enhance an agent's ability to generalize across varied environments. We further hypothesize that deep mutual learning (DML) among agents can help them learn the meta-representations that capture the underlying essence of the task. Empirical results provide strong support for both our theory and hypothesis. Overall, this work provides a new perspective on the generalization of deep reinforcement learning.

replace AlgoRxplorers | Precision in Mutation: Enhancing Drug Design with Advanced Protein Stability Prediction Tools

Authors: Karishma Thakrar, Jiangqin Ma, Max Diamond, Akash Patel

Abstract: Predicting the impact of single-point amino acid mutations on protein stability is essential for understanding disease mechanisms and advancing drug development. Protein stability, quantified by changes in Gibbs free energy ($\Delta\Delta G$), is influenced by these mutations. However, the scarcity of data and the complexity of model interpretation pose challenges in accurately predicting stability changes. This study proposes the application of deep neural networks, leveraging transfer learning and fusing complementary information from different models, to create a feature-rich representation of the protein stability landscape. We developed four models, with our third model, ThermoMPNN+, demonstrating the best performance in predicting $\Delta\Delta G$ values. This approach, which integrates diverse feature sets and embeddings through latent transfusion techniques, aims to refine $\Delta\Delta G$ predictions and contribute to a deeper understanding of protein dynamics, potentially leading to advancements in disease research and drug discovery.

replace Real-Time Decision-Making for Digital Twin in Additive Manufacturing with Model Predictive Control using Time-Series Deep Neural Networks

Authors: Yi-Ping Chen, Vispi Karkaria, Ying-Kuan Tsai, Faith Rolark, Daniel Quispe, Robert X. Gao, Jian Cao, Wei Chen

Abstract: Digital Twin-a virtual replica of a physical system enabling real-time monitoring, model updating, prediction, and decision-making-combined with recent advances in machine learning (ML), offers new opportunities for proactive control strategies in autonomous manufacturing. However, achieving real-time decision-making with Digital Twins requires efficient optimization driven by accurate predictions of highly nonlinear manufacturing systems. This paper presents a simultaneous multi-step Model Predictive Control (MPC) framework for real-time decision-making, using a multi-variate deep neural network (DNN), named Time-Series Dense Encoder (TiDE), as the surrogate model. Different from the models in conventional MPC which only provide one-step ahead prediction, TiDE is capable of predicting future states within the prediction horizon in one shot (multi-step), significantly accelerating MPC. Using Directed Energy Deposition additive manufacturing as a case study, we demonstrate the effectiveness of the proposed MPC in achieving melt pool temperature tracking to ensure part quality, while reducing porosity defects by regulating laser power to maintain melt pool depth constraints. In this work, we first show that TiDE is capable of accurately predicting melt pool temperature and depth. Second, we demonstrate that the proposed MPC achieves precise temperature tracking while satisfying melt pool depth constraints within a targeted dilution range (10%-30%), reducing potential porosity defects. Compared to the PID controller, MPC results in smoother and less fluctuating laser power profiles with competitive or superior melt pool temperature control performance. This demonstrates MPC's proactive control capabilities, leveraging time-series prediction and real-time optimization, positioning it as a powerful tool for future Digital Twin applications and real-time process optimization in manufacturing.

replace Stronger Than You Think: Benchmarking Weak Supervision on Realistic Tasks

Authors: Tianyi Zhang, Linrong Cai, Jeffrey Li, Nicholas Roberts, Neel Guha, Jinoh Lee, Frederic Sala

Abstract: Weak supervision (WS) is a popular approach for label-efficient learning, leveraging diverse sources of noisy but inexpensive weak labels to automatically annotate training data. Despite its wide usage, WS and its practical value are challenging to benchmark due to the many knobs in its setup, including: data sources, labeling functions (LFs), aggregation techniques (called label models), and end model pipelines. Existing evaluation suites tend to be limited, focusing on particular components or specialized use cases. Moreover, they often involve simplistic benchmark tasks or de-facto LF sets that are suboptimally written, producing insights that may not generalize to real-world settings. We address these limitations by introducing a new benchmark, BOXWRENCH, designed to more accurately reflect real-world usages of WS. This benchmark features tasks with (1) higher class cardinality and imbalance, (2) notable domain expertise requirements, and (3) opportunities to re-use LFs across parallel multilingual corpora. For all tasks, LFs are written using a careful procedure aimed at mimicking real-world settings. In contrast to existing WS benchmarks, we show that supervised learning requires substantial amounts (1000+) of labeled examples to match WS in many settings.

replace Control LLM: Controlled Evolution for Intelligence Retention in LLM

Authors: Haichao Wei, Yunxiang Ren, Zhoutong Fu, Aman Lunia, Yi-Lin Chen, Alice Leung, Ya Xu

Abstract: Large Language Models (LLMs) demand significant computational resources, making it essential to enhance their capabilities without retraining from scratch. A key challenge in this domain is \textit{catastrophic forgetting} (CF), which hampers performance during Continuous Pre-training (CPT) and Continuous Supervised Fine-Tuning (CSFT). We propose \textbf{Control LLM}, a novel approach that leverages parallel pre-trained and expanded transformer blocks, aligning their hidden-states through interpolation strategies This method effectively preserves performance on existing tasks while seamlessly integrating new knowledge. Extensive experiments demonstrate the effectiveness of Control LLM in both CPT and CSFT. On Llama3.1-8B-Instruct, it achieves significant improvements in mathematical reasoning ($+14.4\%$ on Math-Hard) and coding performance ($+10\%$ on MBPP-PLUS). On Llama3.1-8B, it enhances multilingual capabilities ($+10.6\%$ on C-Eval, $+6.8\%$ on CMMLU, and $+30.2\%$ on CMMLU-0shot-CoT). It surpasses existing methods and achieves SOTA among open-source models tuned from the same base model, using substantially less data and compute. Crucially, these gains are realized while preserving strong original capabilities, with minimal degradation ($<4.3\% \text{on MMLU}$) compared to $>35\%$ in open-source Math and Coding models. This approach has been successfully deployed in LinkedIn's GenAI-powered job seeker and Ads unit products. To support further research, we release the training and evaluation code (https://github.com/linkedin/ControlLLM) along with models trained on public datasets (https://huggingface.co/ControlLLM) to the community.

URLs: https://github.com/linkedin/ControlLLM), https://huggingface.co/ControlLLM)

replace Unsupervised Learning in Echo State Networks for Input Reconstruction

Authors: Taiki Yamada, Yuichi Katori, Kantaro Fujiwara

Abstract: Conventional echo state networks (ESNs) require supervised learning to train the readout layer, using the desired outputs as training data. In this study, we focus on input reconstruction (IR), which refers to training the readout layer to reproduce the input time series in its output. We reformulate the learning algorithm of the ESN readout layer to perform IR using unsupervised learning (UL). By conducting theoretical analysis and numerical experiments, we demonstrate that IR in ESNs can be effectively implemented under realistic conditions without explicitly using the desired outputs as training data; in this way, UL is enabled. Furthermore, we demonstrate that applications relying on IR, such as dynamical system replication and noise filtering, can be reformulated within the UL framework. Our findings establish a theoretically sound and universally applicable IR formulation, along with its related tasks in ESNs. This work paves the way for novel predictions and highlights unresolved theoretical challenges in ESNs, particularly in the context of time-series processing methods and computational models of the brain.

replace S-LoRA: Scalable Low-Rank Adaptation for Class Incremental Learning

Authors: Yichen Wu, Hongming Piao, Long-Kai Huang, Renzhen Wang, Wanhua Li, Hanspeter Pfister, Deyu Meng, Kede Ma, Ying Wei

Abstract: Continual Learning with foundation models has recently emerged as a promising approach to harnessing the power of pre-trained models for sequential tasks. Existing prompt-based methods generally use a gating mechanism to select relevant prompts aligned with the test query for further processing. However, the success of these methods largely depends on the precision of the gating mechanism, which becomes less scalable with additional computational overhead as tasks increases. To overcome these issues, we propose a Scalable Low-Rank Adaptation (S-LoRA) method for CL (in particular class incremental learning), which incrementally decouples the learning of the direction and magnitude of LoRA parameters. S-LoRA supports efficient inference by employing the last-stage trained model for direct testing without a gating process. Our theoretical and empirical analysis demonstrates that S-LoRA tends to follow a low-loss trajectory that converges to an overlapped low-loss region, resulting in an excellent stability-plasticity trade-off in CL. Furthermore, based on our findings, we develop variants of S-LoRA with further improved scalability. Extensive experiments across multiple CL benchmarks and various foundation models consistently validate the effectiveness of S-LoRA.

replace An Attentive Graph Agent for Topology-Adaptive Cyber Defence

Authors: Ilya Orson Sandoval, Isaac Symes Thompson, Vasilios Mavroudis, Chris Hicks

Abstract: As cyber threats grow increasingly sophisticated, reinforcement learning (RL) is emerging as a promising technique to create intelligent and adaptive cyber defense systems. However, most existing autonomous defensive agents have overlooked the inherent graph structure of computer networks subject to cyber attacks, potentially missing critical information and constraining their adaptability. To overcome these limitations, we developed a custom version of the Cyber Operations Research Gym (CybORG) environment, encoding network state as a directed graph with realistic low-level features. We employ a Graph Attention Network (GAT) architecture to process node, edge, and global features, and adapt its output to be compatible with policy gradient methods in RL. Our GAT-based approach offers key advantages over flattened alternatives: robust policies capable of handling dynamic network topology changes, generalisation to networks of varying sizes beyond the training distribution, and interpretable defensive actions grounded in tangible network properties. We demonstrate the effectiveness of our low-level directed graph observations by training GAT defensive policies that successfully adapt to changing network topologies. Evaluations across networks of different sizes, but consistent subnetwork structure, show our policies achieve comparable performance to policies trained specifically for each network configuration. Our study contributes to the development of robust cyber defence systems that can better adapt to real-world network security challenges.

replace LemmaHead: RAG Assisted Proof Generation Using Large Language Models

Authors: Tianbo Yang, Mingqi Yang, Hongyi Zhao, Tianshuo Yang

Abstract: Developing the logic necessary to solve mathematical problems or write mathematical proofs is one of the more difficult objectives for large language models (LLMS). Currently, the most popular methods in literature consists of fine-tuning the model on written mathematical content such as academic publications and textbooks, so that the model can learn to emulate the style of mathematical writing. In this project, we explore the effectiveness of using retrieval augmented generation (RAG) to address gaps in the mathematical reasoning of LLMs. We develop LemmaHead, a RAG knowledge base that supplements queries to the model with relevant mathematical context, with particular focus on context from published textbooks. To measure our model's performance in mathematical reasoning, our testing paradigm focuses on the task of automated theorem proving via generating proofs to a given mathematical claim in the Lean formal language.

replace ScaDyG:A New Paradigm for Large-scale Dynamic Graph Learning

Authors: Xiang Wu, Xunkai Li, Rong-Hua Li, Kangfei Zhao, Guoren Wang

Abstract: Dynamic graphs (DGs), which capture time-evolving relationships between graph entities, have widespread real-world applications. To efficiently encode DGs for downstream tasks, most dynamic graph neural networks follow the traditional message-passing mechanism and extend it with time-based techniques. Despite their effectiveness, the growth of historical interactions introduces significant scalability issues, particularly in industry scenarios. To address this limitation, we propose ScaDyG, with the core idea of designing a time-aware scalable learning paradigm as follows: 1) Time-aware Topology Reformulation: ScaDyG first segments historical interactions into time steps (intra and inter) based on dynamic modeling, enabling weight-free and time-aware graph propagation within pre-processing. 2) Dynamic Temporal Encoding: To further achieve fine-grained graph propagation within time steps, ScaDyG integrates temporal encoding through a combination of exponential functions in a scalable manner. 3) Hypernetwork-driven Message Aggregation: After obtaining the propagated features (i.e., messages), ScaDyG utilizes hypernetwork to analyze historical dependencies, implementing node-wise representation by an adaptive temporal fusion. Extensive experiments on 12 datasets demonstrate that ScaDyG performs comparably well or even outperforms other SOTA methods in both node and link-level downstream tasks, with fewer learnable parameters and higher efficiency.

replace ReFill: Reinforcement Learning for Fill-In Minimization

Authors: Elfarouk Harb, Ho Shan Lam

Abstract: Efficiently solving sparse linear systems $Ax=b$, where $A$ is a large, sparse, symmetric positive semi-definite matrix, is a core challenge in scientific computing, machine learning, and optimization. A major bottleneck in Gaussian elimination for these systems is fill-in, the creation of non-zero entries that increase memory and computational cost. Minimizing fill-in is NP-hard, and existing heuristics like Minimum Degree and Nested Dissection offer limited adaptability across diverse problem instances. We introduce \textit{ReFill}, a reinforcement learning framework enhanced by Graph Neural Networks (GNNs) to learn adaptive ordering strategies for fill-in minimization. ReFill trains a GNN-based heuristic to predict efficient elimination orders, outperforming traditional heuristics by dynamically adapting to the structure of input matrices. Experiments demonstrate that ReFill outperforms strong heuristics in reducing fill-in, highlighting the untapped potential of learning-based methods for this well-studied classical problem.

replace Sparse Autoencoders Trained on the Same Data Learn Different Features

Authors: Gon\c{c}alo Paulo, Nora Belrose

Abstract: Sparse autoencoders (SAEs) are a useful tool for uncovering human-interpretable features in the activations of large language models (LLMs). While some expect SAEs to find the true underlying features used by a model, our research shows that SAEs trained on the same model and data, differing only in the random seed used to initialize their weights, identify different sets of features. For example, in an SAE with 131K latents trained on a feedforward network in Llama 3 8B, only 30% of the features were shared across different seeds. We observed this phenomenon across multiple layers of three different LLMs, two datasets, and several SAE architectures. While ReLU SAEs trained with the L1 sparsity loss showed greater stability across seeds, SAEs using the state-of-the-art TopK activation function were more seed-dependent, even when controlling for the level of sparsity. Our results suggest that the set of features uncovered by an SAE should be viewed as a pragmatically useful decomposition of activation space, rather than an exhaustive and universal list of features "truly used" by the model.

replace Multi-Physics Simulations via Coupled Fourier Neural Operator

Authors: Shibo Li, Tao Wang, Yifei Sun, Hewei Tang

Abstract: Physical simulations are essential tools across critical fields such as mechanical and aerospace engineering, chemistry, meteorology, etc. While neural operators, particularly the Fourier Neural Operator (FNO), have shown promise in predicting simulation results with impressive performance and efficiency, they face limitations when handling real-world scenarios involving coupled multi-physics outputs. Current neural operator methods either overlook the correlations between multiple physical processes or employ simplistic architectures that inadequately capture these relationships. To overcome these challenges, we introduce a novel coupled multi-physics neural operator learning (COMPOL) framework that extends the capabilities of Fourier operator layers to model interactions among multiple physical processes. Our approach implements feature aggregation through recurrent and attention mechanisms, enabling comprehensive modeling of coupled interactions. Our method's core is an innovative system for aggregating latent features from multi-physics processes. These aggregated features serve as enriched information sources for neural operator layers, allowing our framework to capture complex physical relationships accurately. We evaluated our coupled multi-physics neural operator across diverse physical simulation tasks, including biological systems, fluid mechanics, and multiphase flow in porous media. Our proposed model demonstrates a two to three-fold improvement in predictive performance compared to existing approaches.

replace-cross Dynamic treatment effects: high-dimensional inference under model misspecification

Authors: Yuqian Zhang, Weijie Ji, Jelena Bradic

Abstract: Estimating dynamic treatment effects is crucial across various disciplines, providing insights into the time-dependent causal impact of interventions. However, this estimation poses challenges due to time-varying confounding, leading to potentially biased estimates. Furthermore, accurately specifying the growing number of treatment assignments and outcome models with multiple exposures appears increasingly challenging to accomplish. Double robustness, which permits model misspecification, holds great value in addressing these challenges. This paper introduces a novel "sequential model doubly robust" estimator. We develop novel moment-targeting estimates to account for confounding effects and establish that root-$N$ inference can be achieved as long as at least one nuisance model is correctly specified at each exposure time, despite the presence of high-dimensional covariates. Although the nuisance estimates themselves do not achieve root-$N$ rates, the carefully designed loss functions in our framework ensure final root-$N$ inference for the causal parameter of interest. Unlike off-the-shelf high-dimensional methods, which fail to deliver robust inference under model misspecification even within the doubly robust framework, our newly developed loss functions address this limitation effectively.

replace-cross A conditional gradient homotopy method with applications to Semidefinite Programming

Authors: Pavel Dvurechensky, Gabriele Iommazzo, Shimrit Shtern, Mathias Staudigl

Abstract: We propose a new homotopy-based conditional gradient method for solving convex optimization problems with a large number of simple conic constraints. Instances of this template naturally appear in semidefinite programming problems arising as convex relaxations of combinatorial optimization problems. Our method is a double-loop algorithm in which the conic constraint is treated via a self-concordant barrier, and the inner loop employs a conditional gradient algorithm to approximate the analytic central path, while the outer loop updates the accuracy imposed on the temporal solution and the homotopy parameter. Our theoretical iteration complexity is competitive when confronted to state-of-the-art SDP solvers, with the decisive advantage of cheap projection-free subroutines. Preliminary numerical experiments are provided for illustrating the practical performance of the method.

replace-cross Computing the gradients with respect to all parameters of a quantum neural network using a single circuit

Authors: Guang Ping He

Abstract: Finding gradients is a crucial step in training machine learning models. For quantum neural networks, computing gradients using the parameter-shift rule requires calculating the cost function twice for each adjustable parameter in the network. When the total number of parameters is large, the quantum circuit must be repeatedly adjusted and executed, leading to significant computational overhead. Here we propose an approach to compute all gradients using a single circuit only, significantly reducing both the circuit depth and the number of classical registers required. We experimentally validate our approach on both quantum simulators and IBM's real quantum hardware, demonstrating that our method significantly reduces circuit compilation time compared to the conventional approach, resulting in a substantial speedup in total runtime.

replace-cross Density Matrix Emulation of Quantum Recurrent Neural Networks for Multivariate Time Series Prediction

Authors: Jos\'e Daniel Viqueira, Daniel Fa\'ilde, Mariamo M. Juane, Andr\'es G\'omez, David Mera

Abstract: Quantum Recurrent Neural Networks (QRNNs) are robust candidates for modelling and predicting future values in multivariate time series. However, the effective implementation of some QRNN models is limited by the need for mid-circuit measurements. Those increase the requirements for quantum hardware, which in the current NISQ era does not allow reliable computations. Emulation arises as the main near-term alternative to explore the potential of QRNNs, but existing quantum emulators are not dedicated to circuits with multiple intermediate measurements. In this context, we design a specific emulation method that relies on density matrix formalism. Using a compact tensor notation, we provide the mathematical formulation of the operator-sum representation involved. This allows us to show how the present and past information from a time series is transmitted through the circuit, and how to reduce the computational cost in every time step of the emulated network. In addition, we derive the analytical gradient and the Hessian of the network outputs with respect to its trainable parameters, which are needed when the outputs have stochastic noise due to hardware errors and a finite number of circuit shots (sampling). We finally test the presented methods using a hardware-efficient ansatz and four diverse datasets that include univariate and multivariate time series, with and without sampling noise. In addition, we compare the model with other existing quantum and classical approaches. Our results show how QRNNs can be trained with numerical and analytical gradients to make accurate predictions of future values by capturing non-trivial patterns of input series with different complexities.

replace-cross Uncertainty quantification in automated valuation models with spatially weighted conformal prediction

Authors: Anders Hjort, Gudmund Horn Hermansen, Johan Pensar, Jonathan P. Williams

Abstract: Non-parametric machine learning models, such as random forests and gradient boosted trees, are frequently used to estimate house prices due to their predictive accuracy, but a main drawback of such methods is their limited ability to quantify prediction uncertainty. Conformal prediction (CP) is a model-agnostic framework for constructing confidence sets around predictions of machine learning models with minimal assumptions. However, due to the spatial dependencies observed in house prices, direct application of CP leads to confidence sets that are not calibrated everywhere, i.e., the confidence sets will be too large in certain geographical regions and too small in others. We survey various approaches to adjust the CP confidence set to account for this and demonstrate their performance on a data set from the housing market in Oslo, Norway. Our findings indicate that calibrating the confidence sets on a spatially weighted version of the non-conformity scores makes the coverage more consistently calibrated across geographical regions. We also perform a simulation study on synthetically generated sale prices to empirically explore the performance of CP on housing market data under idealized conditions with known data-generating mechanisms.

replace-cross Semi-Supervised Deep Sobolev Regression: Estimation and Variable Selection by ReQU Neural Network

Authors: Zhao Ding, Chenguang Duan, Yuling Jiao, Jerry Zhijian Yang

Abstract: We propose SDORE, a Semi-supervised Deep Sobolev Regressor, for the nonparametric estimation of the underlying regression function and its gradient. SDORE employs deep ReQU neural networks to minimize the empirical risk with gradient norm regularization, allowing the approximation of the regularization term by unlabeled data. Our study includes a thorough analysis of the convergence rates of SDORE in $L^{2}$-norm, achieving the minimax optimality. Further, we establish a convergence rate for the associated plug-in gradient estimator, even in the presence of significant domain shift. These theoretical findings offer valuable insights for selecting regularization parameters and determining the size of the neural network, while showcasing the provable advantage of leveraging unlabeled data in semi-supervised learning. To the best of our knowledge, SDORE is the first provable neural network-based approach that simultaneously estimates the regression function and its gradient, with diverse applications such as nonparametric variable selection. The effectiveness of SDORE is validated through an extensive range of numerical simulations.

replace-cross A distribution-guided Mapper algorithm

Authors: Yuyang Tao, Shufei Ge

Abstract: Motivation: The Mapper algorithm is an essential tool to explore shape of data in topology data analysis. With a dataset as an input, the Mapper algorithm outputs a graph representing the topological features of the whole dataset. This graph is often regarded as an approximation of a reeb graph of data. The classic Mapper algorithm uses fixed interval lengths and overlapping ratios, which might fail to reveal subtle features of data, especially when the underlying structure is complex. Results: In this work, we introduce a distribution guided Mapper algorithm named D-Mapper, that utilizes the property of the probability model and data intrinsic characteristics to generate density guided covers and provides enhanced topological features. Our proposed algorithm is a probabilistic model-based approach, which could serve as an alternative to non-prababilistic ones. Moreover, we introduce a metric accounting for both the quality of overlap clustering and extended persistence homology to measure the performance of Mapper type algorithm. Our numerical experiments indicate that the D-Mapper outperforms the classical Mapper algorithm in various scenarios. We also apply the D-Mapper to a SARS-COV-2 coronavirus RNA sequences dataset to explore the topological structure of different virus variants. The results indicate that the D-Mapper algorithm can reveal both vertical and horizontal evolution processes of the viruses. Availability: Our package is available at https://github.com/ShufeiGe/D-Mapper.

URLs: https://github.com/ShufeiGe/D-Mapper.

replace-cross Retrieve, Merge, Predict: Augmenting Tables with Data Lakes

Authors: Riccardo Cappuzzo (SODA Team - Inria Saclay), Aimee Coelho (Dataiku), Felix Lefebvre (SODA Team - Inria Saclay), Paolo Papotti (EURECOM), Gael Varoquaux (SODA Team - Inria Saclay)

Abstract: Machine-learning from a disparate set of tables, a data lake, requires assembling features by merging and aggregating tables. Data discovery can extend autoML to data tables by automating these steps. We present an in-depth analysis of such automated table augmentation for machine learning tasks, analyzing different methods for the three main steps: retrieving joinable tables, merging information, and predicting with the resultant table. We use two data lakes: Open Data US, a well-referenced real data lake, and a novel semi-synthetic dataset, YADL (Yet Another Data Lake), which we developed as a tool for benchmarking this data discovery task. Systematic exploration on both lakes outlines 1) the importance of accurately retrieving join candidates, 2) the efficiency of simple merging methods, and 3) the resilience of tree-based learners to noisy conditions. Our experimental environment is easily reproducible and based on open data, to foster more research on feature engineering, autoML, and learning in data lakes.

replace-cross Improving Model's Interpretability and Reliability using Biomarkers

Authors: Gautam Rajendrakumar Gare, Tom Fox, Beam Chansangavej, Amita Krishnan, Ricardo Luis Rodriguez, Bennett P deBoisblanc, Deva Kannan Ramanan, John Michael Galeotti

Abstract: Accurate and interpretable diagnostic models are crucial in the safety-critical field of medicine. We investigate the interpretability of our proposed biomarker-based lung ultrasound diagnostic pipeline to enhance clinicians' diagnostic capabilities. The objective of this study is to assess whether explanations from a decision tree classifier, utilizing biomarkers, can improve users' ability to identify inaccurate model predictions compared to conventional saliency maps. Our findings demonstrate that decision tree explanations, based on clinically established biomarkers, can assist clinicians in detecting false positives, thus improving the reliability of diagnostic models in medicine.

replace-cross Clipped SGD Algorithms for Performative Prediction: Tight Bounds for Clipping Bias and Remedies

Authors: Qiang Li, Michal Yemini, Hoi-To Wai

Abstract: This paper studies the convergence of clipped stochastic gradient descent (SGD) algorithms with decision-dependent data distribution. Our setting is motivated by privacy preserving optimization algorithms that interact with performative data where the prediction models can influence future outcomes. This challenging setting involves the non-smooth clipping operator and non-gradient dynamics due to distribution shifts. We make two contributions in pursuit for a performative stable solution using clipped SGD algorithms. First, we characterize the clipping bias with projected clipped SGD (PCSGD) algorithm which is caused by the clipping operator that prevents PCSGD from reaching a stable solution. When the loss function is strongly convex, we quantify the lower and upper bounds for this clipping bias and demonstrate a bias amplification phenomenon with the sensitivity of data distribution. When the loss function is non-convex, we bound the magnitude of stationarity bias. Second, we propose remedies to mitigate the bias either by utilizing an optimal step size design for PCSGD, or to apply the recent DiceSGD algorithm [Zhang et al., 2024]. Our analysis is also extended to show that the latter algorithm is free from clipping bias in the performative setting. Numerical experiments verify our findings.

replace-cross \copyright Plug-in Authorization for Human Content Copyright Protection in Text-to-Image Model

Authors: Chao Zhou, Huishuai Zhang, Jiang Bian, Weiming Zhang, Nenghai Yu

Abstract: This paper addresses the contentious issue of copyright infringement in images generated by text-to-image models, sparking debates among AI developers, content creators, and legal entities. State-of-the-art models create high-quality content without crediting original creators, causing concern in the artistic community. To mitigate this, we propose the \copyright Plug-in Authorization framework, introducing three operations: addition, extraction, and combination. Addition involves training a \copyright plug-in for specific copyright, facilitating proper credit attribution. Extraction allows creators to reclaim copyright from infringing models, and combination enables users to merge different \copyright plug-ins. These operations act as permits, incentivizing fair use and providing flexibility in authorization. We present innovative approaches,"Reverse LoRA" for extraction and "EasyMerge" for seamless combination. Experiments in artist-style replication and cartoon IP recreation demonstrate \copyright plug-ins' effectiveness, offering a valuable solution for human copyright protection in the age of generative AIs. The code is available at https://github.com/zc1023/-Plug-in-Authorization.git.

URLs: https://github.com/zc1023/-Plug-in-Authorization.git.

replace-cross Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models

Authors: Konstantinos Vilouras, Pedro Sanchez, Alison Q. O'Neil, Sotirios A. Tsaftaris

Abstract: Localizing the exact pathological regions in a given medical scan is an important imaging problem that traditionally requires a large amount of bounding box ground truth annotations to be accurately solved. However, there exist alternative, potentially weaker, forms of supervision, such as accompanying free-text reports, which are readily available. The task of performing localization with textual guidance is commonly referred to as phrase grounding. In this work, we use a publicly available Foundation Model, namely the Latent Diffusion Model, to perform this challenging task. This choice is supported by the fact that the Latent Diffusion Model, despite being generative in nature, contains cross-attention mechanisms that implicitly align visual and textual features, thus leading to intermediate representations that are suitable for the task at hand. In addition, we aim to perform this task in a zero-shot manner, i.e., without any training on the target task, meaning that the model's weights remain frozen. To this end, we devise strategies to select features and also refine them via post-processing without extra learnable parameters. We compare our proposed method with state-of-the-art approaches which explicitly enforce image-text alignment in a joint embedding space via contrastive learning. Results on a popular chest X-ray benchmark indicate that our method is competitive with SOTA on different types of pathology, and even outperforms them on average in terms of two metrics (mean IoU and AUC-ROC). Source code will be released upon acceptance at https://github.com/vios-s.

URLs: https://github.com/vios-s.

replace-cross Beyond Statistical Estimation: Differentially Private Individual Computation via Shuffling

Authors: Shaowei Wang, Changyu Dong, Xiangfu Song, Jin Li, Zhili Zhou, Di Wang, Han Wu

Abstract: In data-driven applications, preserving user privacy while enabling valuable computations remains a critical challenge. Technologies like differential privacy have been pivotal in addressing these concerns. The shuffle model of DP requires no trusted curators and can achieve high utility by leveraging the privacy amplification effect yielded from shuffling. These benefits have led to significant interest in the shuffle model. However, the computation tasks in the shuffle model are limited to statistical estimation, making it inapplicable to real-world scenarios in which each user requires a personalized output. This paper introduces a novel paradigm termed Private Individual Computation (PIC), expanding the shuffle model to support a broader range of permutation-equivariant computations. PIC enables personalized outputs while preserving privacy, and enjoys privacy amplification through shuffling. We propose a concrete protocol that realizes PIC. By using one-time public keys, our protocol enables users to receive their outputs without compromising anonymity, which is essential for privacy amplification. Additionally, we present an optimal randomizer, the Minkowski Response, designed for the PIC model to enhance utility. We formally prove the security and privacy properties of the PIC protocol. Theoretical analysis and empirical evaluations demonstrate PIC's capability in handling non-statistical computation tasks, and the efficacy of PIC and the Minkowski randomizer in achieving superior utility compared to existing solutions.

replace-cross LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

Authors: Xiang Li, Cristina Mata, Jongwoo Park, Kumara Kahatapitiya, Yoo Sung Jang, Jinghuan Shang, Kanchana Ranasinghe, Ryan Burgert, Mu Cai, Yong Jae Lee, Michael S. Ryoo

Abstract: Vision Language Models (VLMs) have recently been leveraged to generate robotic actions, forming Vision-Language-Action (VLA) models. However, directly adapting a pretrained VLM for robotic control remains challenging, particularly when constrained by a limited number of robot demonstrations. In this work, we introduce LLaRA: Large Language and Robotics Assistant, a framework that formulates robot action policy as visuo-textual conversations and enables an efficient transfer of a pretrained VLM into a powerful VLA, motivated by the success of visual instruction tuning in Computer Vision. First, we present an automated pipeline to generate conversation-style instruction tuning data for robots from existing behavior cloning datasets, aligning robotic actions with image pixel coordinates. Further, we enhance this dataset in a self-supervised manner by defining six auxiliary tasks, without requiring any additional action annotations. We show that a VLM finetuned with a limited amount of such datasets can produce meaningful action decisions for robotic control. Through experiments across multiple simulated and real-world tasks, we demonstrate that LLaRA achieves state-of-the-art performance while preserving the generalization capabilities of large language models. The code, datasets, and pretrained models are available at https://github.com/LostXine/LLaRA.

URLs: https://github.com/LostXine/LLaRA.

replace-cross Beyond the Veil of Similarity: Quantifying Semantic Continuity in Explainable AI

Authors: Qi Huang, Emanuele Mezzi, Osman Mutlu, Miltiadis Kofinas, Vidya Prasad, Shadnan Azwad Khan, Elena Ranguelova, Niki van Stein

Abstract: We introduce a novel metric for measuring semantic continuity in Explainable AI methods and machine learning models. We posit that for models to be truly interpretable and trustworthy, similar inputs should yield similar explanations, reflecting a consistent semantic understanding. By leveraging XAI techniques, we assess semantic continuity in the task of image recognition. We conduct experiments to observe how incremental changes in input affect the explanations provided by different XAI methods. Through this approach, we aim to evaluate the models' capability to generalize and abstract semantic concepts accurately and to evaluate different XAI methods in correctly capturing the model behaviour. This paper contributes to the broader discourse on AI interpretability by proposing a quantitative measure for semantic continuity for XAI methods, offering insights into the models' and explainers' internal reasoning processes, and promoting more reliable and transparent AI systems.

replace-cross Time Makes Space: Emergence of Place Fields in Networks Encoding Temporally Continuous Sensory Experiences

Authors: Zhaoze Wang, Ronald W. Di Tullio, Spencer Rooke, Vijay Balasubramanian

Abstract: The vertebrate hippocampus is believed to use recurrent connectivity in area CA3 to support episodic memory recall from partial cues. This brain area also contains place cells, whose location-selective firing fields implement maps supporting spatial memory. Here we show that place cells emerge in networks trained to remember temporally continuous sensory episodes. We model CA3 as a recurrent autoencoder that recalls and reconstructs sensory experiences from noisy and partially occluded observations by agents traversing simulated rooms. The agents move in realistic trajectories modeled from rodents and environments are modeled as high-dimensional sensory experience maps. Training our autoencoder to pattern-complete and reconstruct experiences with a constraint on total activity causes spatially localized firing fields, i.e., place cells, to emerge in the encoding layer. The emergent place fields reproduce key aspects of hippocampal phenomenology: a) remapping (maintenance of and reversion to distinct learned maps in different environments), implemented via repositioning of experience manifolds in the network's hidden layer, b) orthogonality of spatial representations in different arenas, c) robust place field emergence in differently shaped rooms, with single units showing multiple place fields in large or complex spaces, and d) slow representational drift of place fields. We argue that these results arise because continuous traversal of space makes sensory experience temporally continuous. We make testable predictions: a) rapidly changing sensory context will disrupt place fields, b) place fields will form even if recurrent connections are blocked, but reversion to previously learned representations upon remapping will be abolished, c) the dimension of temporally smooth experience sets the dimensionality of place fields, including during virtual navigation of abstract spaces.

replace-cross The NP-hardness of the Gromov-Wasserstein distance

Authors: Natalia Kravtsova

Abstract: This note addresses the property frequently mentioned in the literature that the Gromov-Wasserstein (GW) distance is NP-hard. We provide the details on the non-convex nature of the GW optimization problem that imply NP-hardness of the GW distance between finite spaces for any instance of an input data. We further illustrate the non-convexity of the problem with several explicit examples.

replace-cross Generating Synthetic Free-text Medical Records with Low Re-identification Risk using Masked Language Modeling

Authors: Samuel Belkadi, Libo Ren, Nicolo Micheletti, Lifeng Han, Goran Nenadic

Abstract: The vast amount of available medical records has the potential to improve healthcare and biomedical research. However, privacy restrictions make these data accessible for internal use only. Recent works have addressed this problem by generating synthetic data using Causal Language Modeling. Unfortunately, by taking this approach, it is often impossible to guarantee patient privacy while offering the ability to control the diversity of generations without increasing the cost of generating such data. In contrast, we present a system for generating synthetic free-text medical records using Masked Language Modeling. The system preserves critical medical information while introducing diversity in the generations and minimising re-identification risk. The system's size is about 120M parameters, minimising inference cost. The results demonstrate high-quality synthetic data with a HIPAA-compliant PHI recall rate of 96% and a re-identification risk of 3.5%. Moreover, downstream evaluations show that the generated data can effectively train a model with performance comparable to real data.

replace-cross MusicLIME: Explainable Multimodal Music Understanding

Authors: Theodoros Sotirou, Vassilis Lyberatos, Orfeas Menis Mastromichalakis, Giorgos Stamou

Abstract: Multimodal models are critical for music understanding tasks, as they capture the complex interplay between audio and lyrics. However, as these models become more prevalent, the need for explainability grows-understanding how these systems make decisions is vital for ensuring fairness, reducing bias, and fostering trust. In this paper, we introduce MusicLIME, a model-agnostic feature importance explanation method designed for multimodal music models. Unlike traditional unimodal methods, which analyze each modality separately without considering the interaction between them, often leading to incomplete or misleading explanations, MusicLIME reveals how audio and lyrical features interact and contribute to predictions, providing a holistic view of the model's decision-making. Additionally, we enhance local explanations by aggregating them into global explanations, giving users a broader perspective of model behavior. Through this work, we contribute to improving the interpretability of multimodal music models, empowering users to make informed choices, and fostering more equitable, fair, and transparent music understanding systems.

replace-cross Transfer Learning in $\ell_1$ Regularized Regression: Hyperparameter Selection Strategy based on Sharp Asymptotic Analysis

Authors: Koki Okajima, Tomoyuki Obuchi

Abstract: Transfer learning techniques aim to leverage information from multiple related datasets to enhance prediction quality against a target dataset. Such methods have been adopted in the context of high-dimensional sparse regression, and some Lasso-based algorithms have been invented: Trans-Lasso and Pretraining Lasso are such examples. These algorithms require the statistician to select hyperparameters that control the extent and type of information transfer from related datasets. However, selection strategies for these hyperparameters, as well as the impact of these choices on the algorithm's performance, have been largely unexplored. To address this, we conduct a thorough, precise study of the algorithm in a high-dimensional setting via an asymptotic analysis using the replica method. Our approach reveals a surprisingly simple behavior of the algorithm: Ignoring one of the two types of information transferred to the fine-tuning stage has little effect on generalization performance, implying that efforts for hyperparameter selection can be significantly reduced. Our theoretical findings are also empirically supported by applications on real-world and semi-artificial datasets using the IMDb and MNIST datasets, respectively.

replace-cross Safety challenges of AI in medicine in the era of large language models

Authors: Xiaoye Wang, Nicole Xi Zhang, Hongyu He, Trang Nguyen, Kun-Hsing Yu, Hao Deng, Cynthia Brandt, Danielle S. Bitterman, Ling Pan, Ching-Yu Cheng, James Zou, Dianbo Liu

Abstract: Recent advancements in artificial intelligence (AI), particularly in large language models (LLMs), have unlocked significant potential to enhance the quality and efficiency of medical care. By introducing a novel way to interact with AI and data through natural language, LLMs offer new opportunities for medical practitioners, patients, and researchers. However, as AI and LLMs become more powerful and especially achieve superhuman performance in some medical tasks, public concerns over their safety have intensified. These concerns about AI safety have emerged as the most significant obstacles to the adoption of AI in medicine. In response, this review examines emerging risks in AI utilization during the LLM era. First, we explore LLM-specific safety challenges from functional and communication perspectives, addressing issues across data collection, model training, and real-world application. We then consider inherent safety problems shared by all AI systems, along with additional complications introduced by LLMs. Last, we discussed how safety issues of using AI in clinical practice and healthcare system operation would undermine trust among patient, clinicians and the public, and how to build confidence in these systems. By emphasizing the development of safe AI, we believe these technologies can be more rapidly and reliably integrated into everyday medical practice to benefit both patients and clinicians.

replace-cross ROK Defense M&S in the Age of Hyperscale AI: Concepts, Challenges, and Future Directions

Authors: Youngjoon Lee, Taehyun Park, Yeongjoon Kang, Jonghoe Kim, Joonhyuk Kang

Abstract: Integrating hyperscale AI into national defense M&S(Modeling and Simulation), under the expanding IoMDT(Internet of Military Defense Things) framework, is crucial for boosting strategic and operational readiness. We examine how IoMDT-driven hyperscale AI can provide high accuracy, speed, and the ability to simulate complex, interconnected battlefield scenarios in defense M&S. Countries like the United States and China are leading the adoption of these technologies, with varying levels of success. However, realizing the full potential of hyperscale AI requires overcoming challenges such as closed networks, sparse or long-tail data, complex decision-making processes, and a shortage of experts. Future directions highlight the need to adopt domestic foundation models, expand GPU/NPU investments, leverage large tech services, and employ open source solutions. These efforts will enhance national security, maintain a competitive edge, and spur broader technological and economic growth. With this blueprint, the Republic of Korea can strengthen its defense posture and stay ahead of emerging threats in modern warfare.

replace-cross Large Language Models Reflect the Ideology of their Creators

Authors: Maarten Buyl, Alexander Rogiers, Sander Noels, Guillaume Bied, Iris Dominguez-Catena, Edith Heiter, Iman Johary, Alexandru-Cristian Mara, Rapha\"el Romero, Jefrey Lijffijt, Tijl De Bie

Abstract: Large language models (LLMs) are trained on vast amounts of data to generate natural language, enabling them to perform tasks like text summarization and question answering. These models have become popular in artificial intelligence (AI) assistants like ChatGPT and already play an influential role in how humans access information. However, the behavior of LLMs varies depending on their design, training, and use. In this paper, we prompt a diverse panel of popular LLMs to describe a large number of prominent personalities with political relevance, in all six official languages of the United Nations. By identifying and analyzing moral assessments reflected in their responses, we find normative differences between LLMs from different geopolitical regions, as well as between the responses of the same LLM when prompted in different languages. Among only models in the United States, we find that popularly hypothesized disparities in political views are reflected in significant normative differences related to progressive values. Among Chinese models, we characterize a division between internationally- and domestically-focused models. Our results show that the ideological stance of an LLM appears to reflect the worldview of its creators. This poses the risk of political instrumentalization and raises concerns around technological and regulatory efforts with the stated aim of making LLMs ideologically 'unbiased'.

replace-cross How to Backdoor Consistency Models?

Authors: Chengen Wang, Murat Kantarcioglu

Abstract: Consistency models are a new class of models that generate images by directly mapping noise to data, allowing for one-step generation and significantly accelerating the sampling process. However, their robustness against adversarial attacks has not yet been thoroughly investigated. In this work, we conduct the first study on the vulnerability of consistency models to backdoor attacks. While previous research has explored backdoor attacks on diffusion models, those studies have primarily focused on conventional diffusion models, employing a customized backdoor training process and objective, whereas consistency models have distinct training processes and objectives. Our proposed framework demonstrates the vulnerability of consistency models to backdoor attacks. During image generation, poisoned consistency models produce images with a Fr\'echet Inception Distance (FID) comparable to that of a clean model when sampling from Gaussian noise. However, once the trigger is activated, they generate backdoor target images. We explore various trigger and target configurations to evaluate the vulnerability of consistency models, including the use of random noise as a trigger. This novel trigger is visually inconspicuous, more challenging to detect, and aligns well with the sampling process of consistency models. Across all configurations, our framework successfully compromises the consistency models while maintaining high utility and specificity. We also examine the stealthiness of our proposed attack, which is attributed to the unique properties of consistency models and the elusive nature of the Gaussian noise trigger.

replace-cross Prospective Learning: Learning for a Dynamic Future

Authors: Ashwin De Silva, Rahul Ramesh, Rubing Yang, Siyu Yu, Joshua T Vogelstein, Pratik Chaudhari

Abstract: In real-world applications, the distribution of the data, and our goals, evolve over time. The prevailing theoretical framework for studying machine learning, namely probably approximately correct (PAC) learning, largely ignores time. As a consequence, existing strategies to address the dynamic nature of data and goals exhibit poor real-world performance. This paper develops a theoretical framework called "Prospective Learning" that is tailored for situations when the optimal hypothesis changes over time. In PAC learning, empirical risk minimization (ERM) is known to be consistent. We develop a learner called Prospective ERM, which returns a sequence of predictors that make predictions on future data. We prove that the risk of prospective ERM converges to the Bayes risk under certain assumptions on the stochastic process generating the data. Prospective ERM, roughly speaking, incorporates time as an input in addition to the data. We show that standard ERM as done in PAC learning, without incorporating time, can result in failure to learn when distributions are dynamic. Numerical experiments illustrate that prospective ERM can learn synthetic and visual recognition problems constructed from MNIST and CIFAR-10. Code at https://github.com/neurodata/prolearn.

URLs: https://github.com/neurodata/prolearn.

replace-cross Energy-based physics-informed neural network for frictionless contact problems under large deformation

Authors: Jinshuai Bai, Zhongya Lin, Yizheng Wang, Jiancong Wen, Yinghua Liu, Timon Rabczuk, YuanTong Gu, Xi-Qiao Feng

Abstract: Numerical methods for contact mechanics are of great importance in engineering applications, enabling the prediction and analysis of complex surface interactions under various conditions. In this work, we propose an energy-based physics-informed neural network (PINNs) framework for solving frictionless contact problems under large deformation. Inspired by microscopic Lennard-Jones potential, a surface contact energy is used to describe the contact phenomena. To ensure the robustness of the proposed PINN framework, relaxation, gradual loading and output scaling techniques are introduced. In the numerical examples, the well-known Hertz contact benchmark problem is conducted, demonstrating the effectiveness and robustness of the proposed PINNs framework. Moreover, challenging contact problems with the consideration of geometrical and material nonlinearities are tested. It has been shown that the proposed PINNs framework provides a reliable and powerful tool for nonlinear contact mechanics. More importantly, the proposed PINNs framework exhibits competitive computational efficiency to the commercial FEM software when dealing with those complex contact problems. The codes used in this manuscript are available at https://github.com/JinshuaiBai/energy_PINN_Contact.(The code will be available after acceptance)

URLs: https://github.com/JinshuaiBai/energy_PINN_Contact.(The

replace-cross Bounded Rationality Equilibrium Learning in Mean Field Games

Authors: Yannick Eich, Christian Fabian, Kai Cui, Heinz Koeppl

Abstract: Mean field games (MFGs) tractably model behavior in large agent populations. The literature on learning MFG equilibria typically focuses on finding Nash equilibria (NE), which assume perfectly rational agents and are hence implausible in many realistic situations. To overcome these limitations, we incorporate bounded rationality into MFGs by leveraging the well-known concept of quantal response equilibria (QRE). Two novel types of MFG QRE enable the modeling of large agent populations where individuals only noisily estimate the true objective. We also introduce a second source of bounded rationality to MFGs by restricting agents' planning horizon. The resulting novel receding horizon (RH) MFGs are combined with QRE and existing approaches to model different aspects of bounded rationality in MFGs. We formally define MFG QRE and RH MFGs and compare them to existing equilibrium concepts such as entropy-regularized NE. Subsequently, we design generalized fixed point iteration and fictitious play algorithms to learn QRE and RH equilibria. After a theoretical analysis, we give different examples to evaluate the capabilities of our learning algorithms and outline practical differences between the equilibrium concepts.

replace-cross More Expressive Attention with Negative Weights

Authors: Ang Lv, Ruobing Xie, Shuaipeng Li, Jiayi Liao, Xingwu Sun, Zhanhui Kang, Di Wang, Rui Yan

Abstract: We propose a novel attention mechanism, named Cog Attention, that enables attention weights to be negative for enhanced expressiveness, which stems from two key factors: (1) Cog Attention enhances parameter flexibility. For example, unlike traditional softmax attention heads that use a static output-value (OV) matrix to delete or copy inputs that the heads attend to, Cog Attention naturally learns to use the sign of dynamic query-key (QK) inner products to represent these operations. This enables Cog Attention to perform multiple operations simultaneously within a single head. Meanwhile, Cog Attention's OV matrix can focus more on refinement or modification. (2) Cog Attention enhances the model's robustness against representational collapse by preventing the ``over-squashing'' of earlier tokens into later positions. We develop Transformer-like models which use Cog Attention as attention modules, including decoder-only models at various scales for language modeling and U-ViT diffusion models for image generation. Experiments show that models using Cog Attention exhibit superior performance compared to those employing traditional softmax attention modules. Our approach suggests a promising research direction for rethinking and breaking the entrenched constraints of traditional softmax attention, such as the requirement for non-negative weights.

replace-cross Optimal Decentralized Smoothed Online Convex Optimization

Authors: Neelkamal Bhuyan, Debankur Mukherjee, Adam Wierman

Abstract: We study the multi-agent Smoothed Online Convex Optimization (SOCO) problem, where $N$ agents interact through a communication graph. In each round, each agent $i$ receives a strongly convex hitting cost function $f^i_t$ in an online fashion and selects an action $x^i_t \in \mathbb{R}^d$. The objective is to minimize the global cumulative cost, which includes the sum of individual hitting costs $f^i_t(x^i_t)$, a temporal "switching cost" for changing decisions, and a spatial "dissimilarity cost" that penalizes deviations in decisions among neighboring agents. We propose the first truly decentralized algorithm ACORD for multi-agent SOCO that provably exhibits asymptotic optimality. Our approach allows each agent to operate using only local information from its immediate neighbors in the graph. For finite-time performance, we establish that the optimality gap in the competitive ratio decreases with time horizon $T$ and can be conveniently tuned based on the per-round computation available to each agent. Our algorithm benefits from a provably scalable computational complexity that depends only logarithmically on the number of agents and almost linearly on their degree within the graph. Moreover, our results hold even when the communication graph changes arbitrarily and adaptively over time. Finally, ACORD, by virtue of its asymptotic-optimality, is shown to be provably superior to the state-of-the-art LPC algorithm, while exhibiting much lower computational complexity. Extensive numerical experiments across various network topologies further corroborate our theoretical claims.

replace-cross Universal Rates of Empirical Risk Minimization

Authors: Steve Hanneke, Mingyue Xu

Abstract: The well-known empirical risk minimization (ERM) principle is the basis of many widely used machine learning algorithms, and plays an essential role in the classical PAC theory. A common description of a learning algorithm's performance is its so-called "learning curve", that is, the decay of the expected error as a function of the input sample size. As the PAC model fails to explain the behavior of learning curves, recent research has explored an alternative universal learning model and has ultimately revealed a distinction between optimal universal and uniform learning rates (Bousquet et al., 2021). However, a basic understanding of such differences with a particular focus on the ERM principle has yet to be developed. In this paper, we consider the problem of universal learning by ERM in the realizable case and study the possible universal rates. Our main result is a fundamental tetrachotomy: there are only four possible universal learning rates by ERM, namely, the learning curves of any concept class learnable by ERM decay either at $e^{-n}$, $1/n$, $\log(n)/n$, or arbitrarily slow rates. Moreover, we provide a complete characterization of which concept classes fall into each of these categories, via new complexity structures. We also develop new combinatorial dimensions which supply sharp asymptotically-valid constant factors for these rates, whenever possible.

replace-cross A Hype-Adjusted Probability Measure for NLP Stock Return Forecasting

Authors: Zheng Cao, Helyette Geman

Abstract: This article introduces a Hype-Adjusted Probability Measure in the context of a new Natural Language Processing (NLP) approach for stock return and volatility forecasting. A novel sentiment score equation is proposed to represent the impact of intraday news on forecasting next-period stock return and volatility for selected U.S. semiconductor tickers, a very vibrant industry sector. This work improves the forecast accuracy by addressing news bias, memory, and weight, and incorporating shifts in sentiment direction. More importantly, it extends the use of the remarkable tool of change of Probability Measure developed in the finance of Asset Pricing to NLP forecasting by constructing a Hype-Adjusted Probability Measure, obtained from a redistribution of the weights in the probability space, meant to correct for excessive or insufficient news.

replace-cross Swin fMRI Transformer Predicts Early Neurodevelopmental Outcomes from Neonatal fMRI

Authors: Patrick Styll, Dowon Kim, Jiook Cha

Abstract: Brain development in the first few months of human life is a critical phase characterized by rapid structural growth and functional organization. Accurately predicting developmental outcomes during this time is crucial for identifying delays and enabling timely interventions. This study introduces the SwiFT (Swin 4D fMRI Transformer) model, designed to predict Bayley-III composite scores using neonatal fMRI from the Developing Human Connectome Project (dHCP). To enhance predictive accuracy, we apply dimensionality reduction via group independent component analysis (ICA) and pretrain SwiFT on large adult fMRI datasets to address the challenges of limited neonatal data. Our analysis shows that SwiFT significantly outperforms baseline models in predicting cognitive, motor, and language outcomes, leveraging both single-label and multi-label prediction strategies. The model's attention-based architecture processes spatiotemporal data end-to-end, delivering superior predictive performance. Additionally, we use Integrated Gradients with Smoothgrad sQuare (IG-SQ) to interpret predictions, identifying neural spatial representations linked to early cognitive and behavioral development. These findings underscore the potential of Transformer models to advance neurodevelopmental research and clinical practice.

replace-cross FLRONet: Deep Operator Learning for High-Fidelity Fluid Flow Field Reconstruction from Sparse Sensor Measurements

Authors: Hiep Vo Dang, Joseph B. Choi, Phong C. H. Nguyen

Abstract: Reconstructing high-fidelity fluid flow fields from sparse sensor measurements is vital for many science and engineering applications but remains challenging because of dimensional disparities between state and observational spaces. Due to such dimensional differences, the measurement operator becomes ill-conditioned and non-invertible, making the reconstruction of flow fields from sensor measurements extremely difficult. Although sparse optimization and machine learning address the above problems to some extent, questions about their generalization and efficiency remain, particularly regarding the discretization dependence of these models. In this context, deep operator learning offers a better solution as this approach models mappings between infinite-dimensional functional spaces, enabling superior generalization and discretization-independent reconstruction. We introduce FLRONet, a deep operator learning framework that is trained to reconstruct fluid flow fields from sparse sensor measurements. FLRONet employs a branch-trunk network architecture to represent the inverse measurement operator that maps sensor observations to the original flow field, a continuous function of both space and time. Validation performed on the CFDBench dataset has demonstrated that FLRONet consistently achieves high levels of reconstruction accuracy and robustness, even in scenarios where sensor measurements are inaccurate or missing. Furthermore, the operator learning approach endows FLRONet with the capability to perform zero-shot super-resolution in both spatial and temporal domains, offering a solution for rapid reconstruction of high-fidelity flow fields.

replace-cross LMFusion: Adapting Pretrained Language Models for Multimodal Generation

Authors: Weijia Shi, Xiaochuang Han, Chunting Zhou, Weixin Liang, Xi Victoria Lin, Luke Zettlemoyer, Lili Yu

Abstract: We present LMFusion, a framework for empowering pretrained text-only large language models (LLMs) with multimodal generative capabilities, enabling them to understand and generate both text and images in arbitrary sequences. LMFusion leverages existing Llama-3's weights for processing texts autoregressively while introducing additional and parallel transformer modules for processing images with diffusion. During training, the data from each modality is routed to its dedicated modules: modality-specific feedforward layers, query-key-value projections, and normalization layers process each modality independently, while the shared self-attention layers allow interactions across text and image features. By freezing the text-specific modules and only training the image-specific modules, LMFusion preserves the language capabilities of text-only LLMs while developing strong visual understanding and generation abilities. Compared to methods that pretrain multimodal generative models from scratch, our experiments demonstrate that, LMFusion improves image understanding by 20% and image generation by 3.6% using only 50% of the FLOPs while maintaining Llama-3's language capabilities. We also demonstrate that this framework can adapt existing vision-language models with multimodal generation ability. Overall, this framework not only leverages existing computational investments in text-only LLMs but also enables the parallel development of language and vision capabilities, presenting a promising direction for efficient multimodal model development.

replace-cross SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis

Authors: Huiyuan Tian, Bonan Xu, Shijian Li, Gang Pan

Abstract: Knowledge Distillation (KD) has achieved widespread success in compressing large Vision Transformers (ViTs), but a unified theoretical framework for both ViTs and KD is still lacking. In this paper, we propose SpectralKD, a novel unified analytical framework that offers deeper insights into ViTs and optimizes KD via spectral analysis. Our model-wise analysis reveals that CaiT concentrates information in their first and last few layers, informing optimal layer selection for KD. Surprisingly, our layer-wise analysis discovers that Swin Transformer and CaiT exhibit similar spectral encoding patterns despite their architectural differences, leading to feature map alignment guideline. Building on these insights, we propose a simple yet effective spectral alignment method for KD. Benefiting from the deeper understanding by above analysis results, even such a simple strategy achieves state-of-the-art performance on ImageNet-1K without introducing any trainable parameters, improving DeiT-Tiny by $+5.2\%$ and Swin-Tiny by $+1.4\%$ in top-1 accuracy. Furthermore, our post-training analysis reveals that distilled students can reproduce spectral patterns similar to their teachers, opening a new area we term ``distillation dynamics". Code and experimental logs are available in https://github.com/thy960112/SpectralKD.

URLs: https://github.com/thy960112/SpectralKD.

replace-cross Revisiting LocalSGD and SCAFFOLD: Improved Rates and Missing Analysis

Authors: Ruichen Luo, Sebastian U Stich, Samuel Horv\'ath, Martin Tak\'a\v{c}

Abstract: LocalSGD and SCAFFOLD are widely used methods in distributed stochastic optimization, with numerous applications in machine learning, large-scale data processing, and federated learning. However, rigorously establishing their theoretical advantages over simpler methods, such as minibatch SGD (MbSGD), has proven challenging, as existing analyses often rely on strong assumptions, unrealistic premises, or overly restrictive scenarios. In this work, we revisit the convergence properties of LocalSGD and SCAFFOLD under a variety of existing or weaker conditions, including gradient similarity, Hessian similarity, weak convexity, and Lipschitz continuity of the Hessian. Our analysis shows that (i) LocalSGD achieves faster convergence compared to MbSGD for weakly convex functions without requiring stronger gradient similarity assumptions; (ii) LocalSGD benefits significantly from higher-order similarity and smoothness; and (iii) SCAFFOLD demonstrates faster convergence than MbSGD for a broader class of non-quadratic functions. These theoretical insights provide a clearer understanding of the conditions under which LocalSGD and SCAFFOLD outperform MbSGD.

replace-cross Neural Network Verification is a Programming Language Challenge

Authors: Lucas C. Cordeiro, Matthew L. Daggitt, Julien Girard-Satabin, Omri Isac, Taylor T. Johnson, Guy Katz, Ekaterina Komendantskaya, Augustin Lemesle, Edoardo Manino, Artjoms \v{S}inkarovs, Haoze Wu

Abstract: Neural network verification is a new and rapidly developing field of research. So far, the main priority has been establishing efficient verification algorithms and tools, while proper support from the programming language perspective has been considered secondary or unimportant. Yet, there is mounting evidence that insights from the programming language community may make a difference in the future development of this domain. In this paper, we formulate neural network verification challenges as programming language challenges and suggest possible future solutions.

replace-cross SLIM: Sim-to-Real Legged Instructive Manipulation via Long-Horizon Visuomotor Learning

Authors: Haichao Zhang, Haonan Yu, Le Zhao, Andrew Choi, Qinxun Bai, Break Yang, Wei Xu

Abstract: We present a low-cost legged mobile manipulation system that solves long-horizon real-world tasks, trained by reinforcement learning purely in simulation. This system is made possible by 1) a hierarchical design of a high-level policy for visual-mobile manipulation following task instructions, and a low-level quadruped locomotion policy, 2) a teacher and student training pipeline for the high level, which trains a teacher to tackle long-horizon tasks using privileged task decomposition and target object information, and further trains a student for visual-mobile manipulation via RL guided by the teacher's behavior, and 3) a suite of techniques for minimizing the sim-to-real gap. In contrast to many previous works that use high-end equipments, our system demonstrates effective performance with more accessible hardware -- specifically, a Unitree Go1 quadruped, a WidowX-250S arm, and a single wrist-mounted RGB camera -- despite the increased challenges of sim-to-real transfer. Trained fully in simulation, a single policy autonomously solves long-horizon tasks involving search, move to, grasp, transport, and drop into, achieving nearly 80% real-world success. This performance is comparable to that of expert human teleoperation on the same tasks while the robot is more efficient, operating at about 1.5x the speed of the teleoperation. Finally, we perform extensive ablations on key techniques for efficient RL training and effective sim-to-real transfer, and demonstrate effective deployment across diverse indoor and outdoor scenes under various lighting conditions.

replace-cross Bayesian Despeckling of Structured Sources

Authors: Ali Zafari, Shirin Jalali

Abstract: Speckle noise is a fundamental challenge in coherent imaging systems, significantly degrading image quality. Over the past decades, numerous despeckling algorithms have been developed for applications such as Synthetic Aperture Radar (SAR) and digital holography. In this paper, we aim to establish a theoretically grounded approach to despeckling. We propose a method applicable to general structured stationary stochastic sources. We demonstrate the effectiveness of the proposed method on piecewise constant sources. Additionally, we theoretically derive a lower bound on the despeckling performance for such sources. The proposed depseckler applied to the 1-Markov structured sources achieves better reconstruction performance with no strong simplification of the ground truth signal model or speckle noise.

replace-cross Temporal Preference Optimization for Long-Form Video Understanding

Authors: Rui Li, Xiaohan Wang, Yuhui Zhang, Zeyu Wang, Serena Yeung-Levy

Abstract: Despite significant advancements in video large multimodal models (video-LMMs), achieving effective temporal grounding in long-form videos remains a challenge for existing models. To address this limitation, we propose Temporal Preference Optimization (TPO), a novel post-training framework designed to enhance the temporal grounding capabilities of video-LMMs through preference learning. TPO adopts a self-training approach that enables models to differentiate between well-grounded and less accurate temporal responses by leveraging curated preference datasets at two granularities: localized temporal grounding, which focuses on specific video segments, and comprehensive temporal grounding, which captures extended temporal dependencies across entire video sequences. By optimizing on these preference datasets, TPO significantly enhances temporal understanding while reducing reliance on manually annotated data. Extensive experiments on three long-form video understanding benchmarks--LongVideoBench, MLVU, and Video-MME--demonstrate the effectiveness of TPO across two state-of-the-art video-LMMs. Notably, LLaVA-Video-TPO establishes itself as the leading 7B model on the Video-MME benchmark, underscoring the potential of TPO as a scalable and efficient solution for advancing temporal reasoning in long-form video understanding. Project page: https://ruili33.github.io/tpo_website.

URLs: https://ruili33.github.io/tpo_website.

replace-cross Verify with Caution: The Pitfalls of Relying on Imperfect Factuality Metrics

Authors: Ameya Godbole, Robin Jia

Abstract: Improvements in large language models have led to increasing optimism that they can serve as reliable evaluators of natural language generation outputs. In this paper, we challenge this optimism by thoroughly re-evaluating five state-of-the-art factuality metrics on a collection of 11 datasets for summarization, retrieval-augmented generation, and question answering. We find that these evaluators are inconsistent with each other and often misestimate system-level performance, both of which can lead to a variety of pitfalls. We further show that these metrics exhibit biases against highly paraphrased outputs and outputs that draw upon faraway parts of the source documents. We urge users of these factuality metrics to proceed with caution and manually validate the reliability of these metrics in their domain of interest before proceeding.

replace-cross Context is Key for Agent Security

Authors: Lillian Tsai, Eugene Bagdasarian

Abstract: Judging the safety of an action, whether taken by a human or a system, must take into account the context in which the action takes place. For example, deleting an email from a user's mailbox may or may not be appropriate depending on the email's content, the user's goals, or even available space. Systems today that make these judgements -- providing security against harmful or inappropriate actions -- rely on manually-crafted policies or user confirmation for each relevant context. With the upcoming deployment of systems like generalist agents, we argue that we must rethink security designs to adapt to the scale of contexts and capabilities of these systems. As a first step, this paper explores contextual security in the domain of agents and proposes contextual security for agents (Conseca), a framework to generate just-in-time, contextual, and human-verifiable security policies.

replace-cross Solving Roughly Forced Nonlinear PDEs via Misspecified Kernel Methods and Neural Networks

Authors: Ricardo Baptista, Edoardo Calvello, Matthieu Darcy, Houman Owhadi, Andrew M. Stuart, Xianjin Yang

Abstract: We consider the use of Gaussian Processes (GPs) or Neural Networks (NNs) to numerically approximate the solutions to nonlinear partial differential equations (PDEs) with rough forcing or source terms, which commonly arise as pathwise solutions to stochastic PDEs. Kernel methods have recently been generalized to solve nonlinear PDEs by approximating their solutions as the maximum a posteriori estimator of GPs that are conditioned to satisfy the PDE at a finite set of collocation points. The convergence and error guarantees of these methods, however, rely on the PDE being defined in a classical sense and its solution possessing sufficient regularity to belong to the associated reproducing kernel Hilbert space. We propose a generalization of these methods to handle roughly forced nonlinear PDEs while preserving convergence guarantees with an oversmoothing GP kernel that is misspecified relative to the true solution's regularity. This is achieved by conditioning a regular GP to satisfy the PDE with a modified source term in a weak sense (when integrated against a finite number of test functions). This is equivalent to replacing the empirical $L^2$-loss on the PDE constraint by an empirical negative-Sobolev norm. We further show that this loss function can be used to extend physics-informed neural networks (PINNs) to stochastic equations, thereby resulting in a new NN-based variant termed Negative Sobolev Norm-PINN (NeS-PINN).

replace-cross Complete Chess Games Enable LLM Become A Chess Master

Authors: Yinqi Zhang, Xintian Han, Haolong Li, Kedi Chen, Shaohui Lin

Abstract: Large language models (LLM) have shown remarkable abilities in text generation, question answering, language translation, reasoning and many other tasks. It continues to advance rapidly and is becoming increasingly influential in various fields, from technology and business to education and entertainment. Despite LLM's success in multiple areas, its ability to play abstract games, such as chess, is underexplored. Chess-playing requires the language models to output legal and reasonable moves from textual inputs. Here, we propose the Large language model ChessLLM to play full chess games. We transform the game into a textual format with the best move represented in the Forsyth-Edwards Notation. We show that by simply supervised fine-tuning, our model has achieved a professional-level Elo rating of 1788 in matches against the standard Elo-rated Stockfish when permitted to sample 10 times. We further show that data quality is important. Long-round data supervision enjoys a 350 Elo rating improvement over short-round data.

replace-cross Near-Optimal Algorithms for Omniprediction

Authors: Princewill Okoroafor, Robert Kleinberg, Michael P. Kim

Abstract: Omnipredictors are simple prediction functions that encode loss-minimizing predictions with respect to a hypothesis class $\mathcal{H}$, simultaneously for every loss function within a class of losses $\mathcal{L}$. In this work, we give near-optimal learning algorithms for omniprediction, in both the online and offline settings. To begin, we give an oracle-efficient online learning algorithm that acheives $(\mathcal{L},\mathcal{H})$-omniprediction with $\tilde{O}(\sqrt{T \log |\mathcal{H}|})$ regret for any class of Lipschitz loss functions $\mathcal{L} \subseteq \mathcal{L}_\mathrm{Lip}$. Quite surprisingly, this regret bound matches the optimal regret for \emph{minimization of a single loss function} (up to a $\sqrt{\log(T)}$ factor). Given this online algorithm, we develop an online-to-offline conversion that achieves near-optimal complexity across a number of measures. In particular, for all bounded loss functions within the class of Bounded Variation losses $\mathcal{L}_\mathrm{BV}$ (which include all convex, all Lipschitz, and all proper losses) and any (possibly-infinite) $\mathcal{H}$, we obtain an offline learning algorithm that, leveraging an (offline) ERM oracle and $m$ samples from $\mathcal{D}$, returns an efficient $(\mathcal{L}_{\mathrm{BV}},\mathcal{H},\varepsilon(m))$-omnipredictor for $\varepsilon(m)$ scaling near-linearly in the Rademacher complexity of $\mathrm{Th} \circ \mathcal{H}$.

replace-cross Boosting Weak Positives for Text Based Person Search

Authors: Akshay Modi, Ashhar Aziz, Nilanjana Chatterjee, A V Subramanyam

Abstract: Large vision-language models have revolutionized cross-modal object retrieval, but text-based person search (TBPS) remains a challenging task due to limited data and fine-grained nature of the task. Existing methods primarily focus on aligning image-text pairs into a common representation space, often disregarding the fact that real world positive image-text pairs share a varied degree of similarity in between them. This leads models to prioritize easy pairs, and in some recent approaches, challenging samples are discarded as noise during training. In this work, we introduce a boosting technique that dynamically identifies and emphasizes these challenging samples during training. Our approach is motivated from classical boosting technique and dynamically updates the weights of the weak positives, wherein, the rank-1 match does not share the identity of the query. The weight allows these misranked pairs to contribute more towards the loss and the network has to pay more attention towards such samples. Our method achieves improved performance across four pedestrian datasets, demonstrating the effectiveness of our proposed module.