Authors: Lucas N. Alegre, Ana L. C. Bazzan, Andr\'e Barreto, Bruno C. da Silva
Abstract: Multi-task reinforcement learning aims to quickly identify solutions for new tasks with minimal or no additional interaction with the environment. Generalized Policy Improvement (GPI) addresses this by combining a set of base policies to produce a new one that is at least as good -- though not necessarily optimal -- as any individual base policy. Optimality can be ensured, particularly in the linear-reward case, via techniques that compute a Convex Coverage Set (CCS). However, these are computationally expensive and do not scale to complex domains. The Option Keyboard (OK) improves upon GPI by producing policies that are at least as good -- and often better. It achieves this through a learned meta-policy that dynamically combines base policies. However, its performance critically depends on the choice of base policies. This raises a key question: is there an optimal set of base policies -- an optimal behavior basis -- that enables zero-shot identification of optimal solutions for any linear tasks? We solve this open problem by introducing a novel method that efficiently constructs such an optimal behavior basis. We show that it significantly reduces the number of base policies needed to ensure optimality in new tasks. We also prove that it is strictly more expressive than a CCS, enabling particular classes of non-linear tasks to be solved optimally. We empirically evaluate our technique in challenging domains and show that it outperforms state-of-the-art approaches, increasingly so as task complexity increases.
Authors: Tam Nguyen, Ngoc N. Tran, Khai Nguyen, Richard G. Baraniuk
Abstract: Sparse Mixture of Experts (SMoE) has emerged as a key to achieving unprecedented scalability in deep learning. By activating only a small subset of parameters per sample, SMoE achieves an exponential increase in parameter counts while maintaining a constant computational overhead. However, SMoE models are susceptible to routing fluctuations--changes in the routing of a given input to its target expert--at the late stage of model training, leading to model non-robustness. In this work, we unveil the limitation of SMoE through the perspective of the probabilistic graphical model (PGM). Through this PGM framework, we highlight the independence in the expert-selection of tokens, which exposes the model to routing fluctuation and non-robustness. Alleviating this independence, we propose the novel Similarity-Aware (S)MoE, which considers interactions between tokens during expert selection. We then derive a new PGM underlying an (S)MoE-Attention block, going beyond just a single (S)MoE layer. Leveraging the token similarities captured by the attention matrix, we propose the innovative Attention-Aware (S)MoE, which employs the attention matrix to guide the routing of tokens to appropriate experts in (S)MoE. We theoretically prove that Similarity/Attention-Aware routing help reduce the entropy of expert selection, resulting in more stable token routing mechanisms. We empirically validate our models on various tasks and domains, showing significant improvements in reducing routing fluctuations, enhancing accuracy, and increasing model robustness over the baseline MoE-Transformer with token routing via softmax gating.
Authors: Iurii Kemaev, Dan A Calian, Luisa M Zintgraf, Gregory Farquhar, Hado van Hasselt
Abstract: Gradient-based bilevel optimisation is a powerful technique with applications in hyperparameter optimisation, task adaptation, algorithm discovery, meta-learning more broadly, and beyond. It often requires differentiating through the gradient-based optimisation process itself, leading to "gradient-of-a-gradient" calculations with computationally expensive second-order and mixed derivatives. While modern automatic differentiation libraries provide a convenient way to write programs for calculating these derivatives, they oftentimes cannot fully exploit the specific structure of these problems out-of-the-box, leading to suboptimal performance. In this paper, we analyse such cases and propose Mixed-Flow Meta-Gradients, or MixFlow-MG -- a practical algorithm that uses mixed-mode differentiation to construct more efficient and scalable computational graphs yielding over 10x memory and up to 25% wall-clock time improvements over standard implementations in modern meta-learning setups.
Authors: Kola Ayonrinde, Louis Jaburi
Abstract: Mechanistic Interpretability aims to understand neural networks through causal explanations. We argue for the Explanatory View Hypothesis: that Mechanistic Interpretability research is a principled approach to understanding models because neural networks contain implicit explanations which can be extracted and understood. We hence show that Explanatory Faithfulness, an assessment of how well an explanation fits a model, is well-defined. We propose a definition of Mechanistic Interpretability (MI) as the practice of producing Model-level, Ontic, Causal-Mechanistic, and Falsifiable explanations of neural networks, allowing us to distinguish MI from other interpretability paradigms and detail MI's inherent limits. We formulate the Principle of Explanatory Optimism, a conjecture which we argue is a necessary precondition for the success of Mechanistic Interpretability.
Authors: Jordi de la Torre
Abstract: Objective: To develop and evaluate a scalable methodology for harmonizing inconsistent units in large-scale clinical datasets, addressing a key barrier to data interoperability. Materials and Methods: We designed a novel unit harmonization system combining BM25, sentence embeddings, Bayesian optimization, and a bidirectional transformer based binary classifier for retrieving and matching laboratory test entries. The system was evaluated using the Optum Clinformatics Datamart dataset (7.5 billion entries). We implemented a multi-stage pipeline: filtering, identification, harmonization proposal generation, automated re-ranking, and manual validation. Performance was assessed using Mean Reciprocal Rank (MRR) and other standard information retrieval metrics. Results: Our hybrid retrieval approach combining BM25 and sentence embeddings (MRR: 0.8833) significantly outperformed both lexical-only (MRR: 0.7985) and embedding-only (MRR: 0.5277) approaches. The transformer-based reranker further improved performance (absolute MRR improvement: 0.10), bringing the final system MRR to 0.9833. The system achieved 83.39\% precision at rank 1 and 94.66\% recall at rank 5. Discussion: The hybrid architecture effectively leverages the complementary strengths of lexical and semantic approaches. The reranker addresses cases where initial retrieval components make errors due to complex semantic relationships in medical terminology. Conclusion: Our framework provides an efficient, scalable solution for unit harmonization in clinical datasets, reducing manual effort while improving accuracy. Once harmonized, data can be reused seamlessly in different analyses, ensuring consistency across healthcare systems and enabling more reliable multi-institutional studies and meta-analyses.
Authors: Kuan Zhang, Chengliang Chai, Jingzhe Xu, Chi Zhang, Ye Yuan, Guoren Wang, Lei Cao
Abstract: Recent studies indicate that deep neural networks degrade in generalization performance under noisy supervision. Existing methods focus on isolating clean subsets or correcting noisy labels, facing limitations such as high computational costs, heavy hyperparameter tuning process, and coarse-grained optimization. To address these challenges, we propose a novel two-stage noisy learning framework that enables instance-level optimization through a dynamically weighted loss function, avoiding hyperparameter tuning. To obtain stable and accurate information about noise modeling, we introduce a simple yet effective metric, termed wrong event, which dynamically models the cleanliness and difficulty of individual samples while maintaining computational costs. Our framework first collects wrong event information and builds a strong base model. Then we perform noise-robust training on the base model, using a probabilistic model to handle the wrong event information of samples. Experiments on five synthetic and real-world LNL benchmarks demonstrate our method surpasses state-of-the-art methods in performance, achieves a nearly 75% reduction in computational time and improves model scalability.
Authors: Heng-Sheng Chang, Prashant G. Mehta
Abstract: This paper presents a mathematical framework for causal nonlinear prediction in settings where observations are generated from an underlying hidden Markov model (HMM). Both the problem formulation and the proposed solution are motivated by the decoder-only transformer architecture, in which a finite sequence of observations (tokens) is mapped to the conditional probability of the next token. Our objective is not to construct a mathematical model of a transformer. Rather, our interest lies in deriving, from first principles, transformer-like architectures that solve the prediction problem for which the transformer is designed. The proposed framework is based on an original optimal control approach, where the prediction objective (MMSE) is reformulated as an optimal control problem. An analysis of the optimal control problem is presented leading to a fixed-point equation on the space of probability measures. To solve the fixed-point equation, we introduce the dual filter, an iterative algorithm that closely parallels the architecture of decoder-only transformers. These parallels are discussed in detail along with the relationship to prior work on mathematical modeling of transformers as transport on the space of probability measures. Numerical experiments are provided to illustrate the performance of the algorithm using parameter values used in researchscale transformer models.
Authors: Qianxi Fu, Youngjoon Suh, Xiaojing Zhang, Yoonjin Won
Abstract: Phase change plays a critical role in thermal management systems, yet quantitative characterization of multiphase heat transfer remains limited by the challenges of measuring temperature fields in chaotic, rapidly evolving flow regimes. While computational methods offer spatiotemporal resolution in idealized cases, replicating complex experimental conditions remains prohibitively difficult. Here, we present a data-driven framework that leverages a conditional generative adversarial network (CGAN) to infer temperature fields from geometric phase contours in a canonical pool boiling configuration where advanced data collection techniques are restricted. Using high-speed imaging data and simulation-informed training, our model demonstrates the ability to reconstruct temperature fields with errors below 6%. We further show that standard data augmentation strategies are effective in enhancing both accuracy and physical plausibility of the predicted maps across both simulation and experimental datasets when precise physical constraints are not applicable. Our results highlight the potential of deep generative models to bridge the gap between observable multiphase phenomena and underlying thermal transport, offering a powerful approach to augment and interpret experimental measurements in complex two-phase systems.
Authors: Joe Germino, Nuno Moniz, Nitesh V. Chawla
Abstract: Research on fairness in machine learning has been mainly framed in the context of classification tasks, leaving critical gaps in regression. In this paper, we propose a seminal approach to measure intersectional fairness in regression tasks, going beyond the focus on single protected attributes from existing work to consider combinations of all protected attributes. Furthermore, we contend that it is insufficient to measure the average error of groups without regard for imbalanced domain preferences. To this end, we propose Intersectional Divergence (ID) as the first fairness measure for regression tasks that 1) describes fair model behavior across multiple protected attributes and 2) differentiates the impact of predictions in target ranges most relevant to users. We extend our proposal demonstrating how ID can be adapted into a loss function, IDLoss, and used in optimization problems. Through an extensive experimental evaluation, we demonstrate how ID allows unique insights into model behavior and fairness, and how incorporating IDLoss into optimization can considerably improve single-attribute and intersectional model fairness while maintaining a competitive balance in predictive performance.
Authors: Julen Ercibengoa, Meritxell G\'omez-Omella, Izaro Goienetxea
Abstract: Wildfires pose a critical environmental issue to ecosystems, economies, and public safety, particularly in Mediterranean regions such as Spain. Accurate predictive models rely on high-resolution spatio-temporal data to capture the complex interplay of environmental and anthropogenic factors. To address the lack of localised and fine-grained datasets in Spain, this work introduces IberFire, a spatio-temporal datacube at 1 km x 1 km x 1-day resolution covering mainland Spain and the Balearic Islands from December 2007 to December 2024. IberFire integrates 260 features across eight main categories: auxiliary features, fire history, geography, topography, meteorology, vegetation indices, human activity, and land cover. All features are derived from open-access sources, ensuring transparency and real-time applicability. The data processing pipeline was implemented entirely using open-source tools, and the codebase has been made publicly available. This work not only enhances spatio-temporal granularity and feature diversity compared to existing European datacubes but also provides a reproducible methodology for constructing similar datasets. IberFire supports advanced wildfire risk modelling through Machine Learning (ML) and Deep Learning (DL) techniques, enables climate pattern analysis and informs strategic planning in fire prevention and land management. The dataset is publicly available on Zenodo to promote open research and collaboration.
Authors: Xinlin Li, Osama Hanna, Christina Fragouli, Suhas Diggavi
Abstract: The rapid deployment of Large Language Models (LLMs) highlights the need for efficient low-bit post-training quantization (PTQ), due to their high memory costs. A key challenge in weight quantization is the presence of outliers, which inflate quantization ranges and lead to large errors. While a number of outlier suppression techniques have been proposed, they either: fail to effectively shrink the quantization range, or incur (relatively) high bit overhead. In this paper, we present ICQuant, a novel framework that leverages outlier statistics to design an efficient index coding scheme for outlier-aware weight-only quantization. Compared to existing outlier suppression techniques requiring $\approx 1$ bit overhead to halve the quantization range, ICQuant requires only $\approx 0.3$ bits; a significant saving in extreme compression regimes (e.g., 2-3 bits per weight). ICQuant can be used on top of any existing quantizers to eliminate outliers, improving the quantization quality. Using just 2.3 bits per weight and simple scalar quantizers, ICQuant improves the zero-shot accuracy of the 2-bit Llama3-70B model by up to 130% and 150% relative to QTIP and QuIP#; and it achieves comparable performance to the best-known fine-tuned quantizer (PV-tuning) without fine-tuning.
Authors: Xi Chen, Yateng Tang, Jiarong Xu, Jiawei Zhang, Siwei Zhang, Sijia Peng, Xuehao Zheng, Yun Xiong
Abstract: Effectively modeling time information and incorporating it into applications or models involving chronologically occurring events is crucial. Real-world scenarios often involve diverse and complex time patterns, which pose significant challenges for time encoding methods. While previous methods focus on capturing time patterns, many rely on specific inductive biases, such as using trigonometric functions to model periodicity. This narrow focus on single-pattern modeling makes them less effective in handling the diversity and complexities of real-world time patterns. In this paper, we investigate to improve the existing commonly used time encoding methods and introduce Learnable Transformation-based Generalized Time Encoding (LeTE). We propose using deep function learning techniques to parameterize non-linear transformations in time encoding, making them learnable and capable of modeling generalized time patterns, including diverse and complex temporal dynamics. By enabling learnable transformations, LeTE encompasses previous methods as specific cases and allows seamless integration into a wide range of tasks. Through extensive experiments across diverse domains, we demonstrate the versatility and effectiveness of LeTE.
Authors: Daria Gitman, Igor Gitman, Evelina Bakhturina
Abstract: Adapting Large Language Models (LLMs) to novel tasks and enhancing their overall capabilities often requires large, high-quality training datasets. Synthetic data, generated at scale, serves a valuable alternative when real-world data is scarce or difficult to obtain. However, ensuring the quality of synthetic datasets is challenging, as developers must manually inspect and refine numerous samples to identify errors and areas for improvement. This process is time-consuming and requires specialized tools. We introduce NeMo-Inspector, an open-source tool designed to simplify the analysis of synthetic datasets with integrated inference capabilities. We demonstrate its effectiveness through two real-world cases. Analysis and cleaning of the synthetically generated GSM-Plus dataset with NeMo-Inspector led to a significant decrease in low-quality samples from 46.99% to 19.51%. The tool also helped identify and correct generation errors in OpenMath models, improving accuracy by 1.92% on the MATH dataset and by 4.17% on the GSM8K dataset for a Meta-Llama-3-8B model fine-tuned on synthetic data generated from Nemotron-4-340B.
Authors: Ihab Tabbara, Hussein Sibai
Abstract: Safety filters, particularly those based on control barrier functions, have gained increased interest as effective tools for safe control of dynamical systems. Existing correct-by-construction synthesis algorithms, however, suffer from the curse of dimensionality. Deep learning approaches have been proposed in recent years to address this challenge. In this paper, we contribute to this line of work by proposing an algorithm for training control barrier functions from offline datasets. Our algorithm trains the filter to not only prevent the system from reaching unsafe states but also out-of-distribution ones, at which the filter would be unreliable. It is inspired by Conservative Q-learning, an offline reinforcement learning algorithm. We call its outputs Conservative Control Barrier Functions (CCBFs). Our empirical results demonstrate that CCBFs outperform existing methods in maintaining safety and out-of-distribution avoidance while minimally affecting task performance.
Authors: Xianjin Yang, Jingguo Zhang
Abstract: We propose a Gaussian Process (GP)-based policy iteration framework for addressing both forward and inverse problems in Hamilton--Jacobi--Bellman (HJB) equations and mean field games (MFGs). Policy iteration is formulated as an alternating procedure between solving the value function under a fixed control policy and updating the policy based on the resulting value function. By exploiting the linear structure of GPs for function approximation, each policy evaluation step admits an explicit closed-form solution, eliminating the need for numerical optimization. To improve convergence, we incorporate the additive Schwarz acceleration as a preconditioning step following each policy update. Numerical experiments demonstrate the effectiveness of Schwarz acceleration in improving computational efficiency.
Authors: Han Wang, Adam White, Martha White
Abstract: Fine-tuning policies learned offline remains a major challenge in application domains. Monotonic performance improvement during \emph{fine-tuning} is often challenging, as agents typically experience performance degradation at the early fine-tuning stage. The community has identified multiple difficulties in fine-tuning a learned network online, however, the majority of progress has focused on improving learning efficiency during fine-tuning. In practice, this comes at a serious cost during fine-tuning: initially, agent performance degrades as the agent explores and effectively overrides the policy learned offline. We show across a range of settings, many offline-to-online algorithms exhibit either (1) performance degradation or (2) slow learning (sometimes effectively no improvement) during fine-tuning. We introduce a new fine-tuning algorithm, based on an algorithm called Jump Start, that gradually allows more exploration based on online estimates of performance. Empirically, this approach achieves fast fine-tuning and significantly reduces performance degradations compared with existing algorithms designed to do the same.
Authors: Ruiquan Huang, Yingbin Liang, Jing Yang
Abstract: Language recognition tasks are fundamental in natural language processing (NLP) and have been widely used to benchmark the performance of large language models (LLMs). These tasks also play a crucial role in explaining the working mechanisms of transformers. In this work, we focus on two representative tasks in the category of regular language recognition, known as `even pairs' and `parity check', the aim of which is to determine whether the occurrences of certain subsequences in a given sequence are even. Our goal is to explore how a one-layer transformer, consisting of an attention layer followed by a linear layer, learns to solve these tasks by theoretically analyzing its training dynamics under gradient descent. While even pairs can be solved directly by a one-layer transformer, parity check need to be solved by integrating Chain-of-Thought (CoT), either into the inference stage of a transformer well-trained for the even pairs task, or into the training of a one-layer transformer. For both problems, our analysis shows that the joint training of attention and linear layers exhibits two distinct phases. In the first phase, the attention layer grows rapidly, mapping data sequences into separable vectors. In the second phase, the attention layer becomes stable, while the linear layer grows logarithmically and approaches in direction to a max-margin hyperplane that correctly separates the attention layer outputs into positive and negative samples, and the loss decreases at a rate of $O(1/t)$. Our experiments validate those theoretical results.
Authors: Edison Mucllari, Zachary Daniels, David Zhang, Qiang Ye
Abstract: The Transformer architecture has shown significant success in many language processing and visual tasks. However, the method faces challenges in efficiently scaling to long sequences because the self-attention computation is quadratic with respect to the input length. To overcome this limitation, several approaches scale to longer sequences by breaking long sequences into a series of segments, restricting self-attention to local dependencies between tokens within each segment and using a memory mechanism to manage information flow between segments. However, these approached generally introduce additional compute overhead that restricts them from being used for applications where limited compute memory and power are of great concern (such as edge computing). We propose a novel and efficient Compact Recurrent Transformer (CRT), which combines shallow Transformer models that process short local segments with recurrent neural networks to compress and manage a single persistent memory vector that summarizes long-range global information between segments. We evaluate CRT on WordPTB and WikiText-103 for next-token-prediction tasks, as well as on the Toyota Smarthome video dataset for classification. CRT achieves comparable or superior prediction results to full-length Transformers in the language datasets while using significantly shorter segments (half or quarter size) and substantially reduced FLOPs. Our approach also demonstrates state-of-the-art performance on the Toyota Smarthome video dataset.
Authors: Lokesh Nagalapatti, Ashutosh Srivastava, Sunita Sarawagi, Amit Sharma
Abstract: Diagnosing the root cause of an anomaly in a complex interconnected system is a pressing problem in today's cloud services and industrial operations. We propose In-Distribution Interventions (IDI), a novel algorithm that predicts root cause as nodes that meet two criteria: 1) **Anomaly:** root cause nodes should take on anomalous values; 2) **Fix:** had the root cause nodes assumed usual values, the target node would not have been anomalous. Prior methods of assessing the fix condition rely on counterfactuals inferred from a Structural Causal Model (SCM) trained on historical data. But since anomalies are rare and fall outside the training distribution, the fitted SCMs yield unreliable counterfactual estimates. IDI overcomes this by relying on interventional estimates obtained by solely probing the fitted SCM at in-distribution inputs. We present a theoretical analysis comparing and bounding the errors in assessing the fix condition using interventional and counterfactual estimates. We then conduct experiments by systematically varying the SCM's complexity to demonstrate the cases where IDI's interventional approach outperforms the counterfactual approach and vice versa. Experiments on both synthetic and PetShop RCD benchmark datasets demonstrate that \our\ consistently identifies true root causes more accurately and robustly than nine existing state-of-the-art RCD baselines. Code is released at https://github.com/nlokeshiisc/IDI_release.
Authors: Yin Huang, Yongqi Dong, Youhua Tang, Alvaro Garc\'ia Hernandez
Abstract: The rapid expansion of bike-sharing systems (BSS) has greatly improved urban "last-mile" connectivity, yet large-scale deployments face escalating operational challenges, particularly in detecting faulty bikes. Existing detection approaches either rely on static model-based thresholds that overlook dynamic spatiotemporal (ST) usage patterns or employ supervised learning methods that struggle with label scarcity and class imbalance. To address these limitations, this paper proposes a novel Self-Supervised Transformer (SSTransformer) framework for automatically detecting unusable shared bikes, leveraging ST features extracted from GPS trajectories and trip records. The model incorporates a self-supervised pre-training strategy to enhance its feature extraction capabilities, followed by fine-tuning for efficient status recognition. In the pre-training phase, the Transformer encoder learns generalized representations of bike movement via a self-supervised objective; in the fine-tuning phase, the encoder is adapted to a downstream binary classification task. Comprehensive experiments on a real-world dataset of 10,730 bikes (1,870 unusable, 8,860 normal) from Chengdu, China, demonstrate that SSTransformer significantly outperforms traditional machine learning, ensemble learning, and deep learning baselines, achieving the best accuracy (97.81%), precision (0.8889), and F1-score (0.9358). This work highlights the effectiveness of self-supervised Transformer on ST data for capturing complex anomalies in BSS, paving the way toward more reliable and scalable maintenance solutions for shared mobility.
Authors: A. H. Abbas
Abstract: Hybrid quantum-classical neural networks (HQCNNs) represent a promising frontier in machine learning, leveraging the complementary strengths of both models. In this work, we propose the development of TunnElQNN, a non-sequential architecture composed of alternating classical and quantum layers. Within the classical component, we employ the Tunnelling Diode Activation Function (TDAF), inspired by the I-V characteristics of quantum tunnelling. We evaluate the performance of this hybrid model on a synthetic dataset of interleaving half-circle for multi-class classification tasks with varying degrees of class overlap. The model is compared against a baseline hybrid architecture that uses the conventional ReLU activation function (ReLUQNN). Our results show that the TunnElQNN model consistently outperforms the ReLUQNN counterpart. Furthermore, we analyse the decision boundaries generated by TunnElQNN under different levels of class overlap and compare them to those produced by a neural network implementing TDAF within a fully classical architecture. These findings highlight the potential of integrating physics-inspired activation functions with quantum components to enhance the expressiveness and robustness of hybrid quantum-classical machine learning architectures.
Authors: Zhenyu Wang, Molei Liu, Jing Lei, Francis Bach, Zijian Guo
Abstract: When synthesizing multisource high-dimensional data, a key objective is to extract low-dimensional feature representations that effectively approximate the original features across different sources. Such general feature extraction facilitates the discovery of transferable knowledge, mitigates systematic biases such as batch effects, and promotes fairness. In this paper, we propose Stable Principal Component Analysis (StablePCA), a novel method for group distributionally robust learning of latent representations from high-dimensional multi-source data. A primary challenge in generalizing PCA to the multi-source regime lies in the nonconvexity of the fixed rank constraint, rendering the minimax optimization nonconvex. To address this challenge, we employ the Fantope relaxation, reformulating the problem as a convex minimax optimization, with the objective defined as the maximum loss across sources. To solve the relaxed formulation, we devise an optimistic-gradient Mirror Prox algorithm with explicit closed-form updates. Theoretically, we establish the global convergence of the Mirror Prox algorithm, with the convergence rate provided from the optimization perspective. Furthermore, we offer practical criteria to assess how closely the solution approximates the original nonconvex formulation. Through extensive numerical experiments, we demonstrate StablePCA's high accuracy and efficiency in extracting robust low-dimensional representations across various finite-sample scenarios.
Authors: Wenxin Zhang, Ding Xu, Guangzhen Yao, Xiaojian Lin, Renxiang Guan, Chengze Du, Renda Han, Xi Xuan, Cuicui Luo
Abstract: Time series anomaly detection is critical for system monitoring and risk identification, across various domains, such as finance and healthcare. However, for most reconstruction-based approaches, detecting anomalies remains a challenge due to the complexity of sequential patterns in time series data. On the one hand, reconstruction-based techniques are susceptible to computational deviation stemming from anomalies, which can lead to impure representations of normal sequence patterns. On the other hand, they often focus on the time-domain dependencies of time series, while ignoring the alignment of frequency information beyond the time domain. To address these challenges, we propose a novel Frequency-augmented Convolutional Transformer (FreCT). FreCT utilizes patch operations to generate contrastive views and employs an improved Transformer architecture integrated with a convolution module to capture long-term dependencies while preserving local topology information. The introduced frequency analysis based on Fourier transformation could enhance the model's ability to capture crucial characteristics beyond the time domain. To protect the training quality from anomalies and improve the robustness, FreCT deploys stop-gradient Kullback-Leibler (KL) divergence and absolute error to optimize consistency information in both time and frequency domains. Extensive experiments on four public datasets demonstrate that FreCT outperforms existing methods in identifying anomalies.
Authors: Wenxin Zhang, Ding Xu, Xi Xuan, Lei Jiang, Guangzhen Yao, Renda Han, Xiangxiang Lang, Cuicui Luo
Abstract: Fraud detection is crucial in social service networks to maintain user trust and improve service network security. Existing spectral graph-based methods address this challenge by leveraging different graph filters to capture signals with different frequencies in service networks. However, most graph filter-based methods struggle with deriving clean and discriminative graph signals. On the one hand, they overlook the noise in the information propagation process, resulting in degradation of filtering ability. On the other hand, they fail to discriminate the frequency-specific characteristics of graph signals, leading to distortion of signals fusion. To address these issues, we develop a novel spectral graph network based on information bottleneck theory (SGNN-IB) for fraud detection in service networks. SGNN-IB splits the original graph into homophilic and heterophilic subgraphs to better capture the signals at different frequencies. For the first limitation, SGNN-IB applies information bottleneck theory to extract key characteristics of encoded representations. For the second limitation, SGNN-IB introduces prototype learning to implement signal fusion, preserving the frequency-specific characteristics of signals. Extensive experiments on three real-world datasets demonstrate that SGNN-IB outperforms state-of-the-art fraud detection methods.
Authors: Kota Fukuda, Guanqin Zhang, Zhenya Zhang, Yulei Sui, Jianjun Zhao
Abstract: Formal verification is a rigorous approach that can provably ensure the quality of neural networks, and to date, Branch and Bound (BaB) is the state-of-the-art that performs verification by splitting the problem as needed and applying off-the-shelf verifiers to sub-problems for improved performance. However, existing BaB may not be efficient, due to its naive way of exploring the space of sub-problems that ignores the \emph{importance} of different sub-problems. To bridge this gap, we first introduce a notion of ``importance'' that reflects how likely a counterexample can be found with a sub-problem, and then we devise a novel verification approach, called ABONN, that explores the sub-problem space of BaB adaptively, in a Monte-Carlo tree search (MCTS) style. The exploration is guided by the ``importance'' of different sub-problems, so it favors the sub-problems that are more likely to find counterexamples. As soon as it finds a counterexample, it can immediately terminate; even though it cannot find, after visiting all the sub-problems, it can still manage to verify the problem. We evaluate ABONN with 552 verification problems from commonly-used datasets and neural network models, and compare it with the state-of-the-art verifiers as baseline approaches. Experimental evaluation shows that ABONN demonstrates speedups of up to $15.2\times$ on MNIST and $24.7\times$ on CIFAR-10. We further study the influences of hyperparameters to the performance of ABONN, and the effectiveness of our adaptive tree exploration.
Authors: Thanh Tran, Viet-Hoang Tran, Thanh Chu, Trang Pham, Laurent El Ghaoui, Tam Le, Tan M. Nguyen
Abstract: Tree-Sliced methods have recently emerged as an alternative to the traditional Sliced Wasserstein (SW) distance, replacing one-dimensional lines with tree-based metric spaces and incorporating a splitting mechanism for projecting measures. This approach enhances the ability to capture the topological structures of integration domains in Sliced Optimal Transport while maintaining low computational costs. Building on this foundation, we propose a novel nonlinear projectional framework for the Tree-Sliced Wasserstein (TSW) distance, substituting the linear projections in earlier versions with general projections, while ensuring the injectivity of the associated Radon Transform and preserving the well-definedness of the resulting metric. By designing appropriate projections, we construct efficient metrics for measures on both Euclidean spaces and spheres. Finally, we validate our proposed metric through extensive numerical experiments for Euclidean and spherical datasets. Applications include gradient flows, self-supervised learning, and generative models, where our methods demonstrate significant improvements over recent SW and TSW variants.
Authors: Xin Chen, Yuze Chen, Yuan Zhou
Abstract: We study a class of sequential decision-making problems with augmented predictions, potentially provided by a machine learning algorithm. In this setting, the decision-maker receives prediction intervals for unknown parameters that become progressively refined over time, and seeks decisions that are competitive with the hindsight optimal under all possible realizations of both parameters and predictions. We propose a minimax Markov Decision Process (minimax-MDP) framework, where the system state consists of an adversarially evolving environment state and an internal state controlled by the decision-maker. We introduce a set of future-imposed conditions that characterize the feasibility of minimax-MDPs and enable the design of efficient, often closed-form, robustly competitive policies. We illustrate the framework through three applications: multi-period inventory ordering with refining demand predictions, resource allocation with uncertain utility functions, and a multi-phase extension of the minimax-MDP applied to the inventory problem with time-varying ordering costs. Our results provide a tractable and versatile approach to robust online decision-making under predictive uncertainty.
Authors: Shunxian Gu, Chaoqun You, Bangbang Ren, Lailong Luo, Junxu Xia, Deke Guo
Abstract: Scaling deep neural network (DNN) training to more devices can reduce time-to-solution. However, it is impractical for users with limited computing resources. FOSI, as a hybrid order optimizer, converges faster than conventional optimizers by taking advantage of both gradient information and curvature information when updating the DNN model. Therefore, it provides a new chance for accelerating DNN training in the resource-constrained setting. In this paper, we explore its distributed design, namely DHO$_2$, including distributed calculation of curvature information and model update with partial curvature information to accelerate DNN training with a low memory burden. To further reduce the training time, we design a novel strategy to parallelize the calculation of curvature information and the model update on different devices. Experimentally, our distributed design can achieve an approximate linear reduction of memory burden on each device with the increase of the device number. Meanwhile, it achieves $1.4\times\sim2.1\times$ speedup in the total training time compared with other distributed designs based on conventional first- and second-order optimizers.
Authors: Xunkai Li, Zhengyu Wu, Kaichi Yu, Hongchao Qin, Guang Zeng, Rong-Hua Li, Guoren Wang
Abstract: The directed graph (digraph), as a generalization of undirected graphs, exhibits superior representation capability in modeling complex topology systems and has garnered considerable attention in recent years. Despite the notable efforts made by existing DiGraph Neural Networks (DiGNNs) to leverage directed edges, they still fail to comprehensively delve into the abundant data knowledge concealed in the digraphs. This data-level limitation results in model-level sub-optimal predictive performance and underscores the necessity of further exploring the potential correlations between the directed edges (topology) and node profiles (feature and labels) from a data-centric perspective, thereby empowering model-centric neural networks with stronger encoding capabilities. In this paper, we propose \textbf{E}ntropy-driven \textbf{D}igraph knowl\textbf{E}dge distillatio\textbf{N} (EDEN), which can serve as a data-centric digraph learning paradigm or a model-agnostic hot-and-plug data-centric Knowledge Distillation (KD) module. The core idea is to achieve data-centric ML, guided by our proposed hierarchical encoding theory for structured data. Specifically, EDEN first utilizes directed structural measurements from a topology perspective to construct a coarse-grained Hierarchical Knowledge Tree (HKT). Subsequently, EDEN quantifies the mutual information of node profiles to refine knowledge flow in the HKT, enabling data-centric KD supervision within model training. As a general framework, EDEN can also naturally extend to undirected scenarios and demonstrate satisfactory performance. In our experiments, EDEN has been widely evaluated on 14 (di)graph datasets (homophily and heterophily) and across 4 downstream tasks. The results demonstrate that EDEN attains SOTA performance and exhibits strong improvement for prevalent (Di)GNNs.
Authors: Xiao Ma, Young D. Kwon, Dong Ma
Abstract: Continual Test-time adaptation (CTTA) continuously adapts the deployed model on every incoming batch of data. While achieving optimal accuracy, existing CTTA approaches present poor real-world applicability on resource-constrained edge devices, due to the substantial memory overhead and energy consumption. In this work, we first introduce a novel paradigm -- on-demand TTA -- which triggers adaptation only when a significant domain shift is detected. Then, we present OD-TTA, an on-demand TTA framework for accurate and efficient adaptation on edge devices. OD-TTA comprises three innovative techniques: 1) a lightweight domain shift detection mechanism to activate TTA only when it is needed, drastically reducing the overall computation overhead, 2) a source domain selection module that chooses an appropriate source model for adaptation, ensuring high and robust accuracy, 3) a decoupled Batch Normalization (BN) update scheme to enable memory-efficient adaptation with small batch sizes. Extensive experiments show that OD-TTA achieves comparable and even better performance while reducing the energy and computation overhead remarkably, making TTA a practical reality.
Authors: Ling Tang, Yuefeng Chen, Hui Xue, Quanshi Zhang
Abstract: This paper proves a new watermarking method to embed the ownership information into a deep neural network (DNN), which is robust to fine-tuning. Specifically, we prove that when the input feature of a convolutional layer only contains low-frequency components, specific frequency components of the convolutional filter will not be changed by gradient descent during the fine-tuning process, where we propose a revised Fourier transform to extract frequency components from the convolutional filter. Additionally, we also prove that these frequency components are equivariant to weight scaling and weight permutations. In this way, we design a watermark module to encode the watermark information to specific frequency components in a convolutional filter. Preliminary experiments demonstrate the effectiveness of our method.
Authors: Haoyue Bai, Yiyou Sun, Wei Cheng, Haifeng Chen
Abstract: The recent proliferation of photorealistic images created by generative models has sparked both excitement and concern, as these images are increasingly indistinguishable from real ones to the human eye. While offering new creative and commercial possibilities, the potential for misuse, such as in misinformation and fraud, highlights the need for effective detection methods. Current detection approaches often rely on access to model weights or require extensive collections of real image datasets, limiting their scalability and practical application in real world scenarios. In this work, we introduce a novel black box detection framework that requires only API access, sidestepping the need for model weights or large auxiliary datasets. Our approach leverages a corrupt and recover strategy: by masking part of an image and assessing the model ability to reconstruct it, we measure the likelihood that the image was generated by the model itself. For black-box models that do not support masked image inputs, we incorporate a cost efficient surrogate model trained to align with the target model distribution, enhancing detection capability. Our framework demonstrates strong performance, outperforming baseline methods by 4.31% in mean average precision across eight diffusion model variant datasets.
Authors: Xiaojun Zhou
Abstract: In the evolutionary computation community, it is widely believed that stagnation impedes convergence in evolutionary algorithms, and that convergence inherently indicates optimality. However, this perspective is misleading. In this study, it is the first to highlight that the stagnation of an individual can actually facilitate the convergence of the entire population, and convergence does not necessarily imply optimality, not even local optimality. Convergence alone is insufficient to ensure the effectiveness of evolutionary algorithms. Several counterexamples are provided to illustrate this argument.
Authors: Xuyang Chen, Jingliang Duan, Lin Zhao
Abstract: Actor-critic methods have achieved state-of-the-art performance in various challenging tasks. However, theoretical understandings of their performance remain elusive and challenging. Existing studies mostly focus on practically uncommon variants such as double-loop or two-timescale stepsize actor-critic algorithms for simplicity. These results certify local convergence on finite state- or action-space only. We push the boundary to investigate the classic single-sample single-timescale actor-critic on continuous (infinite) state-action space, where we employ the canonical linear quadratic regulator (LQR) problem as a case study. We show that the popular single-timescale actor-critic can attain an epsilon-optimal solution with an order of epsilon to -2 sample complexity for solving LQR on the demanding continuous state-action space. Our work provides new insights into the performance of single-timescale actor-critic, which further bridges the gap between theory and practice.
Authors: Zhiwei Hao, Jianyuan Guo, Li Shen, Yong Luo, Han Hu, Guoxia Wang, Dianhai Yu, Yonggang Wen, Dacheng Tao
Abstract: Large language models (LLMs) have achieved impressive performance across various domains. However, the substantial hardware resources required for their training present a significant barrier to efficiency and scalability. To mitigate this challenge, low-precision training techniques have been widely adopted, leading to notable advancements in training efficiency. Despite these gains, low-precision training involves several components$\unicode{x2013}$such as weights, activations, and gradients$\unicode{x2013}$each of which can be represented in different numerical formats. The resulting diversity has created a fragmented landscape in low-precision training research, making it difficult for researchers to gain a unified overview of the field. This survey provides a comprehensive review of existing low-precision training methods. To systematically organize these approaches, we categorize them into three primary groups based on their underlying numerical formats, which is a key factor influencing hardware compatibility, computational efficiency, and ease of reference for readers. The categories are: (1) fixed-point and integer-based methods, (2) floating-point-based methods, and (3) customized format-based methods. Additionally, we discuss quantization-aware training approaches, which share key similarities with low-precision training during forward propagation. Finally, we highlight several promising research directions to advance this field. A collection of papers discussed in this survey is provided in https://github.com/Hao840/Awesome-Low-Precision-Training.
URLs: https://github.com/Hao840/Awesome-Low-Precision-Training.
Authors: Nishant Jain, Xunpeng Huang, Yian Ma, Tong Zhang
Abstract: Consistency models have recently emerged as a compelling alternative to traditional SDE based diffusion models, offering a significant acceleration in generation by producing high quality samples in very few steps. Despite their empirical success, a proper theoretic justification for their speed up is still lacking. In this work, we provide the analysis which bridges this gap, showing that given a consistency model which can map the input at a given time to arbitrary timestamps along the reverse trajectory, one can achieve KL divergence of order $ O(\varepsilon^2) $ using only $ O\left(\log\left(\frac{d}{\varepsilon}\right)\right) $ iterations with constant step size, where d is the data dimension. Additionally, under minimal assumptions on the data distribution an increasingly common setting in recent diffusion model analyses we show that a similar KL convergence guarantee can be obtained, with the number of steps scaling as $ O\left(d \log\left(\frac{d}{\varepsilon}\right)\right) $. Going further, we also provide a theoretical analysis for estimation of such consistency models, concluding that accurate learning is feasible using small discretization steps, both in smooth and non smooth settings. Notably, our results for the non smooth case yield best in class convergence rates compared to existing SDE or ODE based analyses under minimal assumptions.
Authors: Jihong Wang, Xiaochuan Tian, Zhongqiang Zhang, Stewart Silling, Siavash Jafarzadeh, Yue Yu
Abstract: Data-driven methods have emerged as powerful tools for modeling the responses of complex nonlinear materials directly from experimental measurements. Among these methods, the data-driven constitutive models present advantages in physical interpretability and generalizability across different boundary conditions/domain settings. However, the well-posedness of these learned models is generally not guaranteed a priori, which makes the models prone to non-physical solutions in downstream simulation tasks. In this study, we introduce monotone peridynamic neural operator (MPNO), a novel data-driven nonlocal constitutive model learning approach based on neural operators. Our approach learns a nonlocal kernel together with a nonlinear constitutive relation, while ensuring solution uniqueness through a monotone gradient network. This architectural constraint on gradient induces convexity of the learnt energy density function, thereby guaranteeing solution uniqueness of MPNO in small deformation regimes. To validate our approach, we evaluate MPNO's performance on both synthetic and real-world datasets. On synthetic datasets with manufactured kernel and constitutive relation, we show that the learnt model converges to the ground-truth as the measurement grid size decreases both theoretically and numerically. Additionally, our MPNO exhibits superior generalization capabilities than the conventional neural networks: it yields smaller displacement solution errors in down-stream tasks with new and unseen loadings. Finally, we showcase the practical utility of our approach through applications in learning a homogenized model from molecular dynamics data, highlighting its expressivity and robustness in real-world scenarios.
Authors: Edvin Fasth, Sagar Singh
Abstract: Knowledge distillation (KD) has become a powerful tool for training compact student models using larger, pretrained teacher models, often requiring less data and computational resources. Teacher models typically possess more layers and thus exhibit richer feature representations compared to their student counterparts. Furthermore, student models tend to learn simpler, surface-level features in their early layers. This discrepancy can increase errors in groups where labels spuriously correlate with specific input attributes, leading to a decline in group fairness even when overall accuracy remains comparable to the teacher. To mitigate these challenges, Early-Exit Neural Networks (EENNs), which enable predictions at multiple intermediate layers, have been employed. Confidence margins derived from these early exits have been utilized to reweight both cross-entropy and distillation losses on a per-instance basis. In this paper, we propose that leveraging Laplace approximation-based methods to obtain well-calibrated uncertainty estimates can also effectively reweight challenging instances and improve group fairness. We hypothesize that Laplace approximation offers a more robust identification of difficult or ambiguous instances compared to margin-based approaches. To validate our claims, we benchmark our approach using a Bert-based model on the MultiNLI dataset.
Authors: Yiyuan Yang, Guodong Long, Tianyi Zhou, Qinghua Lu, Shanshan Ye, Jing Jiang
Abstract: As foundation models gain prominence, Federated Foundation Models (FedFM) have emerged as a privacy-preserving approach to collaboratively fine-tune models in federated learning (FL) frameworks using distributed datasets across clients. A key challenge for FedFM, given the versatile nature of foundation models, is addressing out-of-distribution (OOD) generalization, where unseen tasks or clients may exhibit distribution shifts leading to suboptimal performance. Although numerous studies have explored OOD generalization in conventional FL, these methods are inadequate for FedFM due to the challenges posed by large parameter scales and increased data heterogeneity. To address these, we propose FedOA, which employs adapter-based parameter-efficient fine-tuning methods for efficacy and introduces personalized adapters with feature distance-based regularization to align distributions and guarantee OOD generalization for each client. Theoretically, we demonstrate that the conventional aggregated global model in FedFM inherently retains OOD generalization capabilities, and our proposed method enhances the personalized model's OOD generalization through regularization informed by the global model, with proven convergence under general non-convex settings. Empirically, the effectiveness of the proposed method is validated on benchmark datasets across various NLP tasks.
Authors: Sungje Park, Stephen Tu
Abstract: Backward stochastic differential equation (BSDE)-based deep learning methods provide an alternative to Physics-Informed Neural Networks (PINNs) for solving high-dimensional partial differential equations (PDEs), offering algorithmic advantages in settings such as stochastic optimal control, where the PDEs of interest are tied to an underlying dynamical system. However, existing BSDE-based solvers have empirically been shown to underperform relative to PINNs in the literature. In this paper, we identify the root cause of this performance gap as a discretization bias introduced by the standard Euler-Maruyama (EM) integration scheme applied to short-horizon self-consistency BSDE losses, which shifts the optimization landscape off target. We find that this bias cannot be satisfactorily addressed through finer step sizes or longer self-consistency horizons. To properly handle this issue, we propose a Stratonovich-based BSDE formulation, which we implement with stochastic Heun integration. We show that our proposed approach completely eliminates the bias issues faced by EM integration. Furthermore, our empirical results show that our Heun-based BSDE method consistently outperforms EM-based variants and achieves competitive results with PINNs across multiple high-dimensional benchmarks. Our findings highlight the critical role of integration schemes in BSDE-based PDE solvers, an algorithmic detail that has received little attention thus far in the literature.
Authors: Zuzanna Osika, Roxana Radelescu, Jazmin Zatarain Salazar, Frans Oliehoek, Pradeep K. Murukannaiah
Abstract: Many real-world problems (e.g., resource management, autonomous driving, drug discovery) require optimizing multiple, conflicting objectives. Multi-objective reinforcement learning (MORL) extends classic reinforcement learning to handle multiple objectives simultaneously, yielding a set of policies that capture various trade-offs. However, the MORL field lacks complex, realistic environments and benchmarks. We introduce a water resource (Nile river basin) management case study and model it as a MORL environment. We then benchmark existing MORL algorithms on this task. Our results show that specialized water management methods outperform state-of-the-art MORL approaches, underscoring the scalability challenges MORL algorithms face in real-world scenarios.
Authors: Thalaiyasingam Ajanthan, Sameera Ramasinghe, Yan Zuo, Gil Avraham, Alexander Long
Abstract: Pipeline Parallelism (PP) enables large neural network training on small, interconnected devices by splitting the model into multiple stages. To maximize pipeline utilization, asynchronous optimization is appealing as it offers 100% pipeline utilization by construction. However, it is inherently challenging as the weights and gradients are no longer synchronized, leading to stale (or delayed) gradients. To alleviate this, we introduce a variant of Nesterov Accelerated Gradient (NAG) for asynchronous optimization in PP. Specifically, we modify the look-ahead step in NAG to effectively address the staleness in gradients. We theoretically prove that our approach converges at a sublinear rate in the presence of fixed delay in gradients. Our experiments on large-scale language modelling tasks using decoder-only architectures with up to 1B parameters, demonstrate that our approach significantly outperforms existing asynchronous methods, even surpassing the synchronous baseline.
Authors: Aurelio Raffa Ugolini, Mara Tanelli, Valentina Breschi
Abstract: Many learning problems involve multiple patterns and varying degrees of uncertainty dependent on the covariates. Advances in Deep Learning (DL) have addressed these issues by learning highly nonlinear input-output dependencies. However, model interpretability and Uncertainty Quantification (UQ) have often straggled behind. In this context, we introduce the Competitive/Collaborative Fusion of Experts (CoCoAFusE), a novel, Bayesian Covariates-Dependent Modeling technique. CoCoAFusE builds on the very philosophy behind Mixtures of Experts (MoEs), blending predictions from several simple sub-models (or "experts") to achieve high levels of expressiveness while retaining a substantial degree of local interpretability. Our formulation extends that of a classical Mixture of Experts by contemplating the fusion of the experts' distributions in addition to their more usual mixing (i.e., superimposition). Through this additional feature, CoCoAFusE better accommodates different scenarios for the intermediate behavior between generating mechanisms, resulting in tighter credible bounds on the response variable. Indeed, only resorting to mixing, as in classical MoEs, may lead to multimodality artifacts, especially over smooth transitions. Instead, CoCoAFusE can avoid these artifacts even under the same structure and priors for the experts, leading to greater expressiveness and flexibility in modeling. This new approach is showcased extensively on a suite of motivating numerical examples and a collection of real-data ones, demonstrating its efficacy in tackling complex regression problems where uncertainty is a key quantity of interest.
Authors: Yukun Li, Li-Ping Liu
Abstract: With the advent of score-matching techniques for model training and Langevin dynamics for sample generation, energy-based models (EBMs) have gained renewed interest as generative models. Recent EBMs usually use neural networks to define their energy functions. In this work, we introduce a novel hybrid approach that combines an EBM with an exponential family model to incorporate inductive bias into data modeling. Specifically, we augment the energy term with a parameter-free statistic function to help the model capture key data statistics. Like an exponential family model, the hybrid model aims to align the distribution statistics with data statistics during model training, even when it only approximately maximizes the data likelihood. This property enables us to impose constraints on the hybrid model. Our empirical study validates the hybrid model's ability to match statistics. Furthermore, experimental results show that data fitting and generation improve when suitable informative statistics are incorporated into the hybrid model.
Authors: Palok Biswas, Zuzanna Osika, Isidoro Tamassia, Adit Whorra, Jazmin Zatarain-Salazar, Jan Kwakkel, Frans A. Oliehoek, Pradeep K. Murukannaiah
Abstract: Addressing climate change requires coordinated policy efforts of nations worldwide. These efforts are informed by scientific reports, which rely in part on Integrated Assessment Models (IAMs), prominent tools used to assess the economic impacts of climate policies. However, traditional IAMs optimize policies based on a single objective, limiting their ability to capture the trade-offs among economic growth, temperature goals, and climate justice. As a result, policy recommendations have been criticized for perpetuating inequalities, fueling disagreements during policy negotiations. We introduce Justice, the first framework integrating IAM with Multi-Objective Multi-Agent Reinforcement Learning (MOMARL). By incorporating multiple objectives, Justice generates policy recommendations that shed light on equity while balancing climate and economic goals. Further, using multiple agents can provide a realistic representation of the interactions among the diverse policy actors. We identify equitable Pareto-optimal policies using our framework, which facilitates deliberative decision-making by presenting policymakers with the inherent trade-offs in climate and economic policy.
Authors: Marco C. Campi, Algo Car\`e, Luis G. Crespo, Simone Garatti, Federico A. Ramponi
Abstract: Learning models capable of providing reliable predictions in the face of adversarial actions has become a central focus of the machine learning community in recent years. This challenge arises from observing that data encountered at deployment time often deviate from the conditions under which the model was trained. In this paper, we address deployment-time adversarial actions and propose a versatile, well-principled framework to evaluate the model's robustness against attacks of diverse types and intensities. While we initially focus on Support Vector Regression (SVR), the proposed approach extends naturally to the broad domain of learning via relaxed optimization techniques. Our results enable an assessment of the model vulnerability without requiring additional test data and operate in a distribution-free setup. These results not only provide a tool to enhance trust in the model's applicability but also aid in selecting among competing alternatives. Later in the paper, we show that our findings also offer useful insights for establishing new results within the out-of-distribution framework.
Authors: Rogelio A Mancisidor, Robert Jenssen, Shujian Yu, Michael Kampffmeyer
Abstract: Multimodal learning with variational autoencoders (VAEs) requires estimating joint distributions to evaluate the evidence lower bound (ELBO). Current methods, the product and mixture of experts, aggregate single-modality distributions assuming independence for simplicity, which is an overoptimistic assumption. This research introduces a novel methodology for aggregating single-modality distributions by exploiting the principle of consensus of dependent experts (CoDE), which circumvents the aforementioned assumption. Utilizing the CoDE method, we propose a novel ELBO that approximates the joint likelihood of the multimodal data by learning the contribution of each subset of modalities. The resulting CoDE-VAE model demonstrates better performance in terms of balancing the trade-off between generative coherence and generative quality, as well as generating more precise log-likelihood estimations. CoDE-VAE further minimizes the generative quality gap as the number of modalities increases. In certain cases, it reaches a generative quality similar to that of unimodal VAEs, which is a desirable property that is lacking in most current methods. Finally, the classification accuracy achieved by CoDE-VAE is comparable to that of state-of-the-art multimodal VAE models.
Authors: Wenfa Wu, Guanyu Zhang, Zheng Tan, Yi Wang, Hongsheng Qi
Abstract: Most existing single-modal time series models rely solely on numerical series, which suffer from the limitations imposed by insufficient information. Recent studies have revealed that multimodal models can address the core issue by integrating textual information. However, these models focus on either historical or future textual information, overlooking the unique contributions each plays in time series forecasting. Besides, these models fail to grasp the intricate relationships between textual and time series data, constrained by their moderate capacity for multimodal comprehension. To tackle these challenges, we propose Dual-Forecaster, a pioneering multimodal time series model that combines both descriptively historical textual information and predictive textual insights, leveraging advanced multimodal comprehension capability empowered by three well-designed cross-modality alignment techniques. Our comprehensive evaluations on fifteen multimodal time series datasets demonstrate that Dual-Forecaster is a distinctly effective multimodal time series model that outperforms or is comparable to other state-of-the-art models, highlighting the superiority of integrating textual information for time series forecasting. This work opens new avenues in the integration of textual information with numerical time series data for multimodal time series analysis.
Authors: Milad Leyli-Abadi, J\'er\^ome Picault, Antoine Marot, Jean-Patrick Brunet, Agathe Gilain, Amarsagar Reddy Ramapuram Matavalam, Shaban Ghias Satti, Quingbin Jiang, Yang Liu, Dean Justin Ninalga
Abstract: This paper addresses the growing computational challenges of power grid simulations, particularly with the increasing integration of renewable energy sources like wind and solar. As grid operators must analyze significantly more scenarios in near real-time to prevent failures and ensure stability, traditional physical-based simulations become computationally impractical. To tackle this, a competition was organized to develop AI-driven methods that accelerate power flow simulations by at least an order of magnitude while maintaining operational reliability. This competition utilized a regional-scale grid model with a 30\% renewable energy mix, mirroring the anticipated near-future composition of the French power grid. A key contribution of this work is through the use of LIPS (Learning Industrial Physical Systems), a benchmarking framework that evaluates solutions based on four critical dimensions: machine learning performance, physical compliance, industrial readiness, and generalization to out-of-distribution scenarios. The paper provides a comprehensive overview of the Machine Learning for Physical Simulation (ML4PhySim) competition, detailing the benchmark suite, analyzing top-performing solutions that outperformed traditional simulation methods, and sharing key organizational insights and best practices for running large-scale AI competitions. Given the promising results achieved, the study aims to inspire further research into more efficient, scalable, and sustainable power network simulation methodologies.
Authors: Massimo Pavan, Claudio Galimberti, Manuel Roveri
Abstract: Tiny Machine Learning (TinyML) algorithms have seen extensive use in recent years, enabling wearable devices to be not only connected but also genuinely intelligent by running machine learning (ML) computations directly on-device. Among such devices, smart glasses have particularly benefited from TinyML advancements. TinyML facilitates the on-device execution of the inference phase of ML algorithms on embedded and wearable devices, and more recently, it has expanded into On-device Learning (ODL), which allows both inference and learning phases to occur directly on the device. The application of ODL techniques to wearable devices is particularly compelling, as it enables the development of more personalized models that adapt based on the data of the user. However, one of the major challenges of ODL algorithms is the scarcity of labeled data collected on-device. In smart wearable contexts, requiring users to manually label large amounts of data is often impractical and could lead to user disengagement with the technology. To address this issue, this paper explores the application of Active Learning (AL) techniques, i.e., techniques that aim at minimizing the labeling effort, by actively selecting from a large quantity of unlabeled data only a small subset to be labeled and added to the training set of the algorithm. In particular, we propose TActiLE, a novel AL algorithm that selects from the stream of on-device sensor data the ones that would help the ML algorithm improve the most once coupled with labels provided by the user. TActiLE is the first Active Learning technique specifically designed for the TinyML context. We evaluate its effectiveness and efficiency through experiments on multiple image classification datasets. The results demonstrate its suitability for tiny and wearable devices.
Authors: Thanh Son Nguyen, Dang Minh Duc Nguyen, Van Thanh Nguyen
Abstract: Accurate time series forecasting is essential in many real-time applications that demand both high predictive accuracy and computational efficiency. This study provides an empirical comparison between a Polynomial Classifier and a Radial Basis Function Neural Network (RBFNN) across four real-world time series datasets (weather conditions, gold prices, crude oil prices, and beer production volumes) that cover both seasonal and nonseasonal patterns. Model performance is evaluated by forecasting accuracy (using Mean Absolute Error, Root Mean Squared Error, and Coefficient of Variation of Root Mean Squared Error) and computational time to assess each model's viability for real time forecasting. The results show that the PC yields more accurate and faster forecasts for non seasonal series, whereas the RBFNN performs better on series with pronounced seasonal patterns. From an interpretability standpoint, the polynomial model offers a simpler, more transparent structure (in contrast to the black box nature of neural network), which is advantageous for understanding and trust in real time decision making. The performance differences between PC and RBFNN are statistically significant, as confirmed by paired t tests and Wilcoxon signed rank tests. These findings provide practical guidance for model selection in time series forecasting, indicating that PC may be preferable for quick, interpretable forecasts in non-seasonal contexts, whereas RBFNN is superior for capturing complex seasonal behaviors
Authors: Zhaoyang Ma, Zhihao Wu, Wang Lu, Xin Gao, Jinghang Yue, Taolin Zhang, Lipo Wang, Youfang Lin, Jing Wang
Abstract: The development of model ensemble attacks has significantly improved the transferability of adversarial examples, but this progress also poses severe threats to the security of deep neural networks. Existing methods, however, face two critical challenges: insufficient capture of shared gradient directions across models and a lack of adaptive weight allocation mechanisms. To address these issues, we propose a novel method Harmonized Ensemble for Adversarial Transferability (HEAT), which introduces domain generalization into adversarial example generation for the first time. HEAT consists of two key modules: Consensus Gradient Direction Synthesizer, which uses Singular Value Decomposition to synthesize shared gradient directions; and Dual-Harmony Weight Orchestrator which dynamically balances intra-domain coherence, stabilizing gradients within individual models, and inter-domain diversity, enhancing transferability across models. Experimental results demonstrate that HEAT significantly outperforms existing methods across various datasets and settings, offering a new perspective and direction for adversarial attack research.
Authors: Pramook Khungurn, Pratch Piyawongwisal, Sira Sriswadi, Supasorn Suwajanakorn
Abstract: A flow matching model learns a time-dependent vector field $v_t(x)$ that generates a probability path $\{ p_t \}_{0 \leq t \leq 1}$ that interpolates between a well-known noise distribution ($p_0$) and the data distribution ($p_1$). It can be distilled into a \emph{two-timed flow model} (TTFM) $\phi_{s,x}(t)$ that can transform a sample belonging to the distribution at an initial time $s$ to another belonging to the distribution at a terminal time $t$ in one function evaluation. We present a new loss function for TTFM distillation called the \emph{initial/terminal velocity matching} (ITVM) loss that extends the Lagrangian Flow Map Distillation (LFMD) loss proposed by Boffi et al. by adding redundant terms to match the initial velocities at time $s$, removing the derivative from the terminal velocity term at time $t$, and using a version of the model under training, stabilized by exponential moving averaging (EMA), to compute the target terminal average velocity. Preliminary experiments show that our loss leads to better few-step generation performance on multiple types of datasets and model architectures over baselines.
Authors: Najmus Sakib Sizan, Md. Abu Layek, Khondokar Fida Hasan
Abstract: To improve crop forecasting and provide farmers with actionable data-driven insights, we propose a novel approach integrating IoT, machine learning, and blockchain technologies. Using IoT, real-time data from sensor networks continuously monitor environmental conditions and soil nutrient levels, significantly improving our understanding of crop growth dynamics. Our study demonstrates the exceptional accuracy of the Random Forest model, achieving a 99.45\% accuracy rate in predicting optimal crop types and yields, thereby offering precise crop projections and customized recommendations. To ensure the security and integrity of the sensor data used for these forecasts, we integrate the Ethereum blockchain, which provides a robust and secure platform. This ensures that the forecasted data remain tamper-proof and reliable. Stakeholders can access real-time and historical crop projections through an intuitive online interface, enhancing transparency and facilitating informed decision-making. By presenting multiple predicted crop scenarios, our system enables farmers to optimize production strategies effectively. This integrated approach promises significant advances in precision agriculture, making crop forecasting more accurate, secure, and user-friendly.
Authors: Tsai-Ning Wang, Lin-Lin Chen, Neil Zeghidour, Aaqib Saeed
Abstract: Medical audio signals, such as heart and lung sounds, play a crucial role in clinical diagnosis. However, analyzing these signals remains challenging: traditional methods rely on handcrafted features or supervised deep learning models that demand extensive labeled datasets, limiting their scalability and applicability. To address these issues, we propose CaReAQA, an audio-language model that integrates a foundation audio model with the reasoning capabilities of large language models, enabling clinically relevant, open-ended diagnostic responses. Alongside CaReAQA, we introduce CaReSound, a benchmark dataset of annotated medical audio recordings enriched with metadata and paired question-answer examples, intended to drive progress in diagnostic reasoning research. Evaluation results show that CaReAQA achieves 86.2% accuracy on open-ended diagnostic reasoning tasks, outperforming baseline models. It also generalizes well to closed-ended classification tasks, achieving an average accuracy of 56.9% on unseen datasets. Our findings show how audio-language integration and reasoning advances medical diagnostics, enabling efficient AI systems for clinical decision support.
Authors: Simar Ghumman, Fabio Di Troia, William Andreopoulos, Mark Stamp, Sanjit Rai
Abstract: Unmanned Ground Vehicles (UGVs) are emerging as a crucial tool in the world of precision agriculture. The combination of UGVs with machine learning allows us to find solutions for a range of complex agricultural problems. This research focuses on developing a UGV capable of autonomously traversing agricultural fields and capturing data. The project, known as AGRO (Autonomous Ground Rover Observer) leverages machine learning, computer vision and other sensor technologies. AGRO uses its capabilities to determine pistachio yields, performing self-localization and real-time environmental mapping while avoiding obstacles. The main objective of this research work is to automate resource-consuming operations so that AGRO can support farmers in making data-driven decisions. Furthermore, AGRO provides a foundation for advanced machine learning techniques as it captures the world around it.
Authors: Akira Tamamori
Abstract: Traditional Hopfield networks, using Hebbian learning, face severe storage capacity limits ($\approx 0.14$ P/N) and spurious attractors. Kernel Logistic Regression (KLR) offers a non-linear approach, mapping patterns to high-dimensional feature spaces for improved separability. Our previous work showed KLR dramatically improves capacity and noise robustness over conventional methods. This paper quantitatively analyzes the attractor structures in KLR-trained networks via extensive simulations. We evaluated recall from diverse initial states across wide storage loads (up to 4.0 P/N) and noise levels. We quantified convergence rates and speed. Our analysis confirms KLR's superior performance: high capacity (up to 4.0 P/N) and robustness. The attractor landscape is remarkably "clean," with near-zero spurious fixed points. Recall failures under high load/noise are primarily due to convergence to other learned patterns, not spurious ones. Dynamics are exceptionally fast (typically 1-2 steps for high-similarity states). This characterization reveals how KLR reshapes dynamics for high-capacity associative memory, highlighting its effectiveness and contributing to AM understanding.
Authors: Evelyn Chapuma, Grey Mengezi, Lewis Msasa, Amelia Taylor
Abstract: This paper describes the mwBTFreddy dataset, a resource developed to support flash flood damage assessment in urban Malawi, specifically focusing on the impacts of Cyclone Freddy in 2023. The dataset comprises paired pre- and post-disaster satellite images sourced from Google Earth Pro, accompanied by JSON files containing labelled building annotations with geographic coordinates and damage levels (no damage, minor, major, or destroyed). Developed by the Kuyesera AI Lab at the Malawi University of Business and Applied Sciences, this dataset is intended to facilitate the development of machine learning models tailored to building detection and damage classification in African urban contexts. It also supports flood damage visualisation and spatial analysis to inform decisions on relocation, infrastructure planning, and emergency response in climate-vulnerable regions.
Authors: Elie Saad, Mariem Besbes, Marc Zolghadri, Victor Czmil, Claude Baron, Vincent Bourgeois
Abstract: The challenge of electronic component obsolescence is particularly critical in systems with long life cycles. Various obsolescence management methods are employed to mitigate its impact, with obsolescence forecasting being a highly sought-after and prominent approach. As a result, numerous machine learning-based forecasting methods have been proposed. However, machine learning models require a substantial amount of relevant data to achieve high precision, which is lacking in the current obsolescence landscape in some situations. This work introduces a novel framework for obsolescence forecasting based on deep learning. The proposed framework solves the lack of available data through deep generative modeling, where new obsolescence cases are generated and used to augment the training dataset. The augmented dataset is then used to train a classical machine learning-based obsolescence forecasting model. To train classical forecasting models using augmented datasets, existing classical supervised-learning classifiers are adapted for semi-supervised learning within this framework. The proposed framework demonstrates state-of-the-art results on benchmarking datasets.
Authors: Zhaoyan Wang, Xiangchi Song, In-Young Ko
Abstract: Accurate traffic forecasting and swift inference provision are essential for intelligent transportation systems. However, the present Graph Convolutional Network (GCN)-based approaches cannot extract and fuse multi-granular spatiotemporal features across various spatial and temporal scales sufficiently, proven to yield less accurate forecasts. Besides, additional feature extraction branches introduced in prior studies critically increased model complexity and extended inference time, making it challenging to provide fast inference for traffic forecasting. In this paper, we propose MultiGran-STGCNFog, an efficient fog distributed inference system with a novel traffic forecasting model that employs multi-granular spatiotemporal feature fusion on generated dynamic traffic graphs to fully capture interdependent traffic dynamics. The proposed scheduling algorithm GA-DPHDS, optimizing layer execution order and layer-device scheduling scheme simultaneously, contributes to considerable inference throughput improvement by leveraging heterogeneous fog devices in a pipelined manner. Extensive experiments on real-world datasets demonstrate the superiority of the proposed method over selected baselines.
Authors: Hao-Ran Yang, Chuan-Xian Ren
Abstract: While data-driven methods such as neural operator have achieved great success in solving differential equations (DEs), they suffer from domain shift problems caused by different learning environments (with data bias or equation changes), which can be alleviated by transfer learning (TL). However, existing TL methods adopted in DEs problems lack either generalizability in general DEs problems or physics preservation during training. In this work, we focus on a general transfer learning method that adaptively correct the domain shift and preserve physical information. Mathematically, we characterize the data domain as product distribution and the essential problems as distribution bias and operator bias. A Physics-preserved Optimal Tensor Transport (POTT) method that simultaneously admits generalizability to common DEs and physics preservation of specific problem is proposed to adapt the data-driven model to target domain utilizing the push-forward distribution induced by the POTT map. Extensive experiments demonstrate the superior performance, generalizability and physics preservation of the proposed POTT method.
Authors: Yajuan Zhang, Jiahai Jiang, Yule Yan, Liang Yang, Ping Zhang
Abstract: Accurate wind power forecasting can help formulate scientific dispatch plans, which is of great significance for maintaining the safety, stability, and efficient operation of the power system. In recent years, wind power forecasting methods based on deep learning have focused on extracting the spatiotemporal correlations among data, achieving significant improvements in forecasting accuracy. However, they exhibit two limitations. First, there is a lack of modeling for the inter-variable relationships, which limits the accuracy of the forecasts. Second, by treating endogenous and exogenous variables equally, it leads to unnecessary interactions between the endogenous and exogenous variables, increasing the complexity of the model. In this paper, we propose the 2DXformer, which, building upon the previous work's focus on spatiotemporal correlations, addresses the aforementioned two limitations. Specifically, we classify the inputs of the model into three types: exogenous static variables, exogenous dynamic variables, and endogenous variables. First, we embed these variables as variable tokens in a channel-independent manner. Then, we use the attention mechanism to capture the correlations among exogenous variables. Finally, we employ a multi-layer perceptron with residual connections to model the impact of exogenous variables on endogenous variables. Experimental results on two real-world large-scale datasets indicate that our proposed 2DXformer can further improve the performance of wind power forecasting. The code is available in this repository: \href{https://github.com/jseaj/2DXformer}{https://github.com/jseaj/2DXformer}.
URLs: https://github.com/jseaj/2DXformer, https://github.com/jseaj/2DXformer
Authors: Mohammed Sumayli, Olugbenga Moses Anubi
Abstract: Home Energy Management Systems (HEMS) have emerged as a pivotal tool in the smart home ecosystem, aiming to enhance energy efficiency, reduce costs, and improve user comfort. By enabling intelligent control and optimization of household energy consumption, HEMS plays a significant role in bridging the gap between consumer needs and energy utility objectives. However, much of the existing literature construes consumer comfort as a mere deviation from the standard appliance settings. Such deviations are typically incorporated into optimization objectives via static weighting factors. These factors often overlook the dynamic nature of consumer behaviors and preferences. Addressing this oversight, our paper introduces a multi-mode Deep Reinforcement Learning-based HEMS (DRL-HEMS) framework, meticulously designed to optimize based on dynamic, consumer-defined preferences. Our primary goal is to augment consumer involvement in Demand Response (DR) programs by embedding dynamic multi-mode preferences tailored to individual appliances. In this study, we leverage a model-free, single-agent DRL algorithm to deliver a HEMS framework that is not only dynamic but also user-friendly. To validate its efficacy, we employed real-world data at 15-minute intervals, including metrics such as electricity price, ambient temperature, and appliances' power consumption. Our results show that the model performs exceptionally well in optimizing energy consumption within different preference modes. Furthermore, when compared to traditional algorithms based on Mixed-Integer Linear Programming (MILP), our model achieves nearly optimal performance while outperforming in computational efficiency.
Authors: Vincenzo De Paola, Riccardo Zamboni, Mirco Mutti, Marcello Restelli
Abstract: Parallel data collection has redefined Reinforcement Learning (RL), unlocking unprecedented efficiency and powering breakthroughs in large-scale real-world applications. In this paradigm, $N$ identical agents operate in $N$ replicas of an environment simulator, accelerating data collection by a factor of $N$. A critical question arises: \textit{Does specializing the policies of the parallel agents hold the key to surpass the $N$ factor acceleration?} In this paper, we introduce a novel learning framework that maximizes the entropy of collected data in a parallel setting. Our approach carefully balances the entropy of individual agents with inter-agent diversity, effectively minimizing redundancies. The latter idea is implemented with a centralized policy gradient method, which shows promise when evaluated empirically against systems of identical agents, as well as synergy with batch RL techniques that can exploit data diversity. Finally, we provide an original concentration analysis that shows faster rates for specialized parallel sampling distributions, which supports our methodology and may be of independent interest.
Authors: Marie-Charlotte Brandenburg, Katharina Jochemko
Abstract: We consider binary classification restricted to a class of continuous piecewise linear functions whose decision boundaries are (possibly nonconvex) starshaped polyhedral sets, supported on a fixed polyhedral simplicial fan. We investigate the expressivity of these function classes and describe the combinatorial and geometric structure of the loss landscape, most prominently the sublevel sets, for two loss-functions: the 0/1-loss (discrete loss) and an exponential loss function. In particular, we give explicit bounds on the VC dimension of this model, and concretely describe the sublevel sets of the discrete loss as chambers in a hyperplane arrangement. For the exponential loss, we give sufficient conditions for the optimum to be unique, and describe the geometry of the optimum when varying the rate parameter of the underlying exponential probability distribution.
Authors: Leonardo F. Toso, Lintao Ye, James Anderson
Abstract: We study the problem of learning to stabilize (LTS) a linear time-invariant (LTI) system. Policy gradient (PG) methods for control assume access to an initial stabilizing policy. However, designing such a policy for an unknown system is one of the most fundamental problems in control, and it may be as hard as learning the optimal policy itself. Existing work on the LTS problem requires large data as it scales quadratically with the ambient dimension. We propose a two-phase approach that first learns the left unstable subspace of the system and then solves a series of discounted linear quadratic regulator (LQR) problems on the learned unstable subspace, targeting to stabilize only the system's unstable dynamics and reduce the effective dimension of the control space. We provide non-asymptotic guarantees for both phases and demonstrate that operating on the unstable subspace reduces sample complexity. In particular, when the number of unstable modes is much smaller than the state dimension, our analysis reveals that LTS on the unstable subspace substantially speeds up the stabilization process. Numerical experiments are provided to support this sample complexity reduction achieved by our approach.
Authors: Hwanwoo Kim, Panos Toulis, Eric Laber
Abstract: Temporal Difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized algorithms. However, despite its widespread use, it is not without drawbacks, the most prominent being its sensitivity to step size. A poor choice of step size can dramatically inflate the error of value estimates and slow convergence. Consequently, in practice, researchers must use trial and error in order to identify a suitable step size -- a process that can be tedious and time consuming. As an alternative, we propose implicit TD algorithms that reformulate TD updates into fixed-point equations. These updates are more stable and less sensitive to step size without sacrificing computational efficiency. Moreover, our theoretical analysis establishes asymptotic convergence guarantees and finite-time error bounds. Our results demonstrate their robustness and practicality for modern RL tasks, establishing implicit TD as a versatile tool for policy evaluation and value approximation.
Authors: Kola Ayonrinde, Louis Jaburi
Abstract: Mechanistic Interpretability (MI) aims to understand neural networks through causal explanations. Though MI has many explanation-generating methods, progress has been limited by the lack of a universal approach to evaluating explanations. Here we analyse the fundamental question "What makes a good explanation?" We introduce a pluralist Explanatory Virtues Framework drawing on four perspectives from the Philosophy of Science - the Bayesian, Kuhnian, Deutschian, and Nomological - to systematically evaluate and improve explanations in MI. We find that Compact Proofs consider many explanatory virtues and are hence a promising approach. Fruitful research directions implied by our framework include (1) clearly defining explanatory simplicity, (2) focusing on unifying explanations and (3) deriving universal principles for neural networks. Improved MI methods enhance our ability to monitor, predict, and steer AI systems.
Authors: Irene Wang, Newsha Ardalani, Mostafa Elhoushi, Daniel Jiang, Samuel Hsia, Ekin Sumbul, Divya Mahajan, Carole-Jean Wu, Bilge Acun
Abstract: The rapid growth of machine learning (ML) systems necessitates a more comprehensive evaluation of their environmental impact, particularly their carbon footprint, which comprises operational carbon from training and inference execution and embodied carbon from hardware manufacturing and its entire life-cycle. Despite the increasing importance of embodied emissions, there is a lack of tools and frameworks to holistically quantify and optimize the total carbon footprint of ML systems. To address this, we propose CATransformers, a carbon-aware architecture search framework that enables sustainability-driven co-optimization of ML models and hardware architectures. By incorporating both operational and embodied carbon metrics into early design space exploration of domain-specific hardware accelerators, CATransformers demonstrates that optimizing for carbon yields design choices distinct from those optimized solely for latency or energy efficiency. We apply our framework to multi-modal CLIP-based models, producing CarbonCLIP, a family of CLIP models achieving up to 17% reduction in total carbon emissions while maintaining accuracy and latency compared to state-of-the-art edge small CLIP baselines. This work underscores the need for holistic optimization methods to design high-performance, environmentally sustainable AI systems.
Authors: Alessandro Trenta, Andrea Cossu, Davide Bacciu
Abstract: We propose Derivative Learning (DERL), a supervised approach that models physical systems by learning their partial derivatives. We also leverage DERL to build physical models incrementally, by designing a distillation protocol that effectively transfers knowledge from a pre-trained to a student model. We provide theoretical guarantees that our approach can learn the true physical system, being consistent with the underlying physical laws, even when using empirical derivatives. DERL outperforms state-of-the-art methods in generalizing an ODE to unseen initial conditions and a parametric PDE to unseen parameters. We finally propose a method based on DERL to transfer physical knowledge across models by extending them to new portions of the physical domain and new range of PDE parameters. We believe this is the first attempt at building physical models incrementally in multiple stages.
Authors: Mohammadhossein Rashidi, Mohammad Modarres
Abstract: Predicting the price that has the least error and can provide the best and highest accuracy has been one of the most challenging issues and one of the most critical concerns among capital market activists and researchers. Therefore, a model that can solve problems and provide results with high accuracy is one of the topics of interest among researchers. In this project, using time series prediction models such as ARIMA to estimate the price, variables, and indicators related to technical analysis show the behavior of traders involved in involving psychological factors for the model. By linking all of these variables to stepwise regression, we identify the best variables influencing the prediction of the variable. Finally, we enter the selected variables as inputs to the artificial neural network. In other words, we want to call this whole prediction process the "ARIMA_Stepwise Regression_Neural Network" model and try to predict the price of gold in international financial markets. This approach is expected to be able to be used to predict the types of stocks, commodities, currency pairs, financial market indicators, and other items used in local and international financial markets. Moreover, a comparison between the results of this method and time series methods is also expressed. Finally, based on the results, it can be seen that the resulting hybrid model has the highest accuracy compared to the time series method, regression, and stepwise regression.
Authors: Rahuul Rangaraj, Jimeng Shi, Azam Shirali, Rajendra Paudel, Yanzhao Wu, Giri Narasimhan
Abstract: The Everglades play a crucial role in flood and drought regulation, water resource planning, and ecosystem management in the surrounding regions. However, traditional physics-based and statistical methods for predicting water levels often face significant challenges, including high computational costs and limited adaptability to diverse or unforeseen conditions. Recent advancements in large time series models have demonstrated the potential to address these limitations, with state-of-the-art deep learning and foundation models achieving remarkable success in time series forecasting across various domains. Despite this progress, their application to critical environmental systems, such as the Everglades, remains underexplored. In this study, we fill the gap by investigating twelve task-specific models and five time series foundation models across six categories for a real-world application focused on water level prediction in the Everglades. Our primary results show that the foundation model, Chronos, significantly outperforms all other models while the remaining foundation models exhibit relatively poor performance. Moreover, the performance of task-specific models varies with the model architectures. Lastly, we discuss the possible reasons for the varying performance of models.
Authors: Mary Phuong, Roland S. Zimmermann, Ziyue Wang, David Lindner, Victoria Krakovna, Sarah Cogan, Allan Dafoe, Lewis Ho, Rohin Shah
Abstract: Recent work has demonstrated the plausibility of frontier AI models scheming -- knowingly and covertly pursuing an objective misaligned with its developer's intentions. Such behavior could be very hard to detect, and if present in future advanced systems, could pose severe loss of control risk. It is therefore important for AI developers to rule out harm from scheming prior to model deployment. In this paper, we present a suite of scheming reasoning evaluations measuring two types of reasoning capabilities that we believe are prerequisites for successful scheming: First, we propose five evaluations of ability to reason about and circumvent oversight (stealth). Second, we present eleven evaluations for measuring a model's ability to instrumentally reason about itself, its environment and its deployment (situational awareness). We demonstrate how these evaluations can be used as part of a scheming inability safety case: a model that does not succeed on these evaluations is almost certainly incapable of causing severe harm via scheming in real deployment. We run our evaluations on current frontier models and find that none of them show concerning levels of either situational awareness or stealth.
Authors: D. Patel, R. Sharma, Y. B. Guo
Abstract: Metal additive manufacturing enables unprecedented design freedom and the production of customized, complex components. However, the rapid melting and solidification dynamics inherent to metal AM processes generate heterogeneous, non-equilibrium microstructures that significantly impact mechanical properties and subsequent functionality. Predicting microstructure and its evolution across spatial and temporal scales remains a central challenge for process optimization and defect mitigation. While conventional experimental techniques and physics-based simulations provide a physical foundation and valuable insights, they face critical limitations. In contrast, data-driven machine learning offers an alternative prediction approach and powerful pattern recognition but often operate as black-box, lacking generalizability and physical consistency. To overcome these limitations, physics-informed machine learning, including physics-informed neural networks, has emerged as a promising paradigm by embedding governing physical laws into neural network architectures, thereby enhancing accuracy, transparency, data efficiency, and extrapolation capabilities. This work presents a comprehensive evaluation of modeling strategies for microstructure prediction in metal AM. The strengths and limitations of experimental, computational, and data-driven methods are analyzed in depth, and highlight recent advances in hybrid PIML frameworks that integrate physical knowledge with ML. Key challenges, such as data scarcity, multi-scale coupling, and uncertainty quantification, are discussed alongside future directions. Ultimately, this assessment underscores the importance of PIML-based hybrid approaches in enabling predictive, scalable, and physically consistent microstructure modeling for site-specific, microstructure-aware process control and the reliable production of high-performance AM components.
Authors: Sabbir M. Saleh, Ibrahim Mohammed Sayem, Nazim Madhavji, John Steinbacher
Abstract: Continuous Integration/Continuous Deployment (CI/CD) is fundamental for advanced software development, supporting faster and more efficient delivery of code changes into cloud environments. However, security issues in the CI/CD pipeline remain challenging, and incidents (e.g., DDoS, Bot, Log4j, etc.) are happening over the cloud environments. While plenty of literature discusses static security testing and CI/CD practices, only a few deal with network traffic pattern analysis to detect different cyberattacks. This research aims to enhance CI/CD pipeline security by implementing anomaly detection through AI (Artificial Intelligence) support. The goal is to identify unusual behaviour or variations from network traffic patterns in pipeline and cloud platforms. The system shall integrate into the workflow to continuously monitor pipeline activities and cloud infrastructure. Additionally, it aims to explore adaptive response mechanisms to mitigate the detected anomalies or security threats. This research employed two popular network traffic datasets, CSE-CIC-IDS2018 and CSE-CIC-IDS2017. We implemented a combination of Convolution Neural Network(CNN) and Long Short-Term Memory (LSTM) to detect unusual traffic patterns. We achieved an accuracy of 98.69% and 98.30% and generated log files in different CI/CD pipeline stages that resemble the network anomalies affected to address security challenges in modern DevOps practices, contributing to advancing software security and reliability.
Authors: Mihai Nadas, Laura Diosan, Andrei Piscoran, Andreea Tomescu
Abstract: Moral stories are a time-tested vehicle for transmitting values, yet modern NLP lacks a large, structured corpus that couples coherent narratives with explicit ethical lessons. We close this gap with TF1-EN-3M, the first open dataset of three million English-language fables generated exclusively by instruction-tuned models no larger than 8B parameters. Each story follows a six-slot scaffold (character -> trait -> setting -> conflict -> resolution -> moral), produced through a combinatorial prompt engine that guarantees genre fidelity while covering a broad thematic space. A hybrid evaluation pipeline blends (i) a GPT-based critic that scores grammar, creativity, moral clarity, and template adherence with (ii) reference-free diversity and readability metrics. Among ten open-weight candidates, an 8B-parameter Llama-3 variant delivers the best quality-speed trade-off, producing high-scoring fables on a single consumer GPU (<24 GB VRAM) at approximately 13.5 cents per 1,000 fables. We release the dataset, generation code, evaluation scripts, and full metadata under a permissive license, enabling exact reproducibility and cost benchmarking. TF1-EN-3M opens avenues for research in instruction following, narrative intelligence, value alignment, and child-friendly educational AI, demonstrating that large-scale moral storytelling no longer requires proprietary giant models.
Authors: Bithiah Yuan
Abstract: Motivated by the emerging demand in the financial industry for the automatic analysis of unstructured and structured data at scale, Question Answering (QA) systems can provide lucrative and competitive advantages to companies by facilitating the decision making of financial advisers. Consequently, we propose a novel financial QA system using the transformer-based pre-trained BERT language model to address the limitations of data scarcity and language specificity in the financial domain. Our system focuses on financial non-factoid answer selection, which retrieves a set of passage-level texts and selects the most relevant as the answer. To increase efficiency, we formulate the answer selection task as a re-ranking problem, in which our system consists of an Answer Retriever using BM25, a simple information retrieval approach, to first return a list of candidate answers, and an Answer Re-ranker built with variants of pre-trained BERT language models to re-rank and select the most relevant answers. We investigate various learning, further pre-training, and fine-tuning approaches for BERT. Our experiments suggest that FinBERT-QA, a model built from applying the Transfer and Adapt further fine-tuning and pointwise learning approach, is the most effective, improving the state-of-the-art results of task 2 of the FiQA dataset by 16% on MRR, 17% on NDCG, and 21% on Precision@1.
Authors: Marius-Constantin Dinu
Abstract: This paper presents a novel primality test based on the eigenvalue structure of circulant matrices constructed from roots of unity. We prove that an integer $n > 2$ is prime if and only if the minimal polynomial of the circulant matrix $C_n = W_n + W_n^2$ has exactly two irreducible factors over $\mathbb{Q}$. This characterization connects cyclotomic field theory with matrix algebra, providing both theoretical insights and practical applications. We demonstrate that the eigenvalue patterns of these matrices reveal fundamental distinctions between prime and composite numbers, leading to a deterministic primality test. Our approach leverages the relationship between primitive roots of unity, Galois theory, and the factorization of cyclotomic polynomials. We provide comprehensive experimental validation across various ranges of integers, discuss practical implementation considerations, and analyze the computational complexity of our method in comparison with established primality tests. The visual interpretation of our mathematical framework provides intuitive understanding of the algebraic structures that distinguish prime numbers. Our experimental validation demonstrates that our approach offers a deterministic alternative to existing methods, with performance characteristics reflecting its algebraic foundations.
Authors: Yuxi Li, Lu Si, Yujie Hou, Chengaung Liu, Bin Li, Hongjian Fang, Jun Zhang
Abstract: Advancements in remote sensing (RS) imagery have provided high-resolution detail and vast coverage, yet existing methods, such as image-level captioning/retrieval and object-level detection/segmentation, often fail to capture mid-scale semantic entities essential for interpreting large-scale scenes. To address this, we propose the conteXtual referring Map (XeMap) task, which focuses on contextual, fine-grained localization of text-referred regions in large-scale RS scenes. Unlike traditional approaches, XeMap enables precise mapping of mid-scale semantic entities that are often overlooked in image-level or object-level methods. To achieve this, we introduce XeMap-Network, a novel architecture designed to handle the complexities of pixel-level cross-modal contextual referring mapping in RS. The network includes a fusion layer that applies self- and cross-attention mechanisms to enhance the interaction between text and image embeddings. Furthermore, we propose a Hierarchical Multi-Scale Semantic Alignment (HMSA) module that aligns multiscale visual features with the text semantic vector, enabling precise multimodal matching across large-scale RS imagery. To support XeMap task, we provide a novel, annotated dataset, XeMap-set, specifically tailored for this task, overcoming the lack of XeMap datasets in RS imagery. XeMap-Network is evaluated in a zero-shot setting against state-of-the-art methods, demonstrating superior performance. This highlights its effectiveness in accurately mapping referring regions and providing valuable insights for interpreting large-scale RS environments.
Authors: Srinivas Kanakala, Sneha Ningappa
Abstract: Plant diseases pose a serious challenge to agriculture by reducing crop yield and affecting food quality. Early detection and classification of these diseases are essential for minimising losses and improving crop management practices. This study applies Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) models to classify plant leaf diseases using a dataset containing 70,295 training images and 17,572 validation images across 38 disease classes. The CNN model was trained using the Adam optimiser with a learning rate of 0.0001 and categorical cross-entropy as the loss function. After 10 training epochs, the model achieved a training accuracy of 99.1% and a validation accuracy of 96.4%. The LSTM model reached a validation accuracy of 93.43%. Performance was evaluated using precision, recall, F1-score, and confusion matrix, confirming the reliability of the CNN-based approach. The results suggest that deep learning models, particularly CNN, enable an effective solution for accurate and scalable plant disease classification, supporting practical applications in agricultural monitoring.
Authors: Maozhe Zhao, Shengzhong Liu, Fan Wu, Guihai Chen
Abstract: Mobile video analysis systems often encounter various deploying environments, where environment shifts present greater demands for responsiveness in adaptations of deployed "expert DNN models". Existing model adaptation frameworks primarily operate in a cloud-centric way, exhibiting degraded performance during adaptation and delayed reactions to environment shifts. Instead, this paper proposes MOCHA, a novel framework optimizing the responsiveness of continuous model adaptation through hierarchical collaborations between mobile and cloud resources. Specifically, MOCHA (1) reduces adaptation response delays by performing on-device model reuse and fast fine-tuning before requesting cloud model retrieval and end-to-end retraining; (2) accelerates history expert model retrieval by organizing them into a structured taxonomy utilizing domain semantics analyzed by a cloud foundation model as indices; (3) enables efficient local model reuse by maintaining onboard expert model caches for frequent scenes, which proactively prefetch model weights from the cloud model database. Extensive evaluations with real-world videos on three DNN tasks show MOCHA improves the model accuracy during adaptation by up to 6.8% while saving the response delay and retraining time by up to 35.5x and 3.0x respectively.
Authors: Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Yankai Chen, Chunyu Miao, Hoang Nguyen, Yue Zhou, Weizhi Zhang, Liancheng Fang, Langzhou He, Yangning Li, Yuwei Cao, Dongyuan Li, Renhe Jiang, Philip S. Yu
Abstract: Recent advances in large language models (LLMs) have sparked growing interest in building fully autonomous agents. However, fully autonomous LLM-based agents still face significant challenges, including limited reliability due to hallucinations, difficulty in handling complex tasks, and substantial safety and ethical risks, all of which limit their feasibility and trustworthiness in real-world applications. To overcome these limitations, LLM-based human-agent systems (LLM-HAS) incorporate human-provided information, feedback, or control into the agent system to enhance system performance, reliability and safety. This paper provides the first comprehensive and structured survey of LLM-HAS. It clarifies fundamental concepts, systematically presents core components shaping these systems, including environment & profiling, human feedback, interaction types, orchestration and communication, explores emerging applications, and discusses unique challenges and opportunities. By consolidating current knowledge and offering a structured overview, we aim to foster further research and innovation in this rapidly evolving interdisciplinary field. Paper lists and resources are available at https://github.com/HenryPengZou/Awesome-LLM-Based-Human-Agent-System-Papers.
URLs: https://github.com/HenryPengZou/Awesome-LLM-Based-Human-Agent-System-Papers.
Authors: Sung Hak Lim, Kohei Hayashi, Shun'ichi Horigome, Shigeki Matsumoto, Mihoko M. Nojiri
Abstract: The kinematics of stars in dwarf spheroidal galaxies have been studied to understand the structure of dark matter halos. However, the kinematic information of these stars is often limited to celestial positions and line-of-sight velocities, making full phase space analysis challenging. Conventional methods rely on projected analytic phase space density models with several parameters and infer dark matter halo structures by solving the spherical Jeans equation. In this paper, we introduce an unsupervised machine learning method for solving the spherical Jeans equation in a model-independent way as a first step toward model-independent analysis of dwarf spheroidal galaxies. Using equivariant continuous normalizing flows, we demonstrate that spherically symmetric stellar phase space densities and velocity dispersions can be estimated without model assumptions. As a proof of concept, we apply our method to Gaia challenge datasets for spherical models and measure dark matter mass densities given velocity anisotropy profiles. Our method can identify halo structures accurately, even with a small number of tracer stars.
Authors: Junwon Seo, Kensuke Nakamura, Andrea Bajcsy
Abstract: Recent advances in generative world models have enabled classical safe control methods, such as Hamilton-Jacobi (HJ) reachability, to generalize to complex robotic systems operating directly from high-dimensional sensor observations. However, obtaining comprehensive coverage of all safety-critical scenarios during world model training is extremely challenging. As a result, latent safety filters built on top of these models may miss novel hazards and even fail to prevent known ones, overconfidently misclassifying risky out-of-distribution (OOD) situations as safe. To address this, we introduce an uncertainty-aware latent safety filter that proactively steers robots away from both known and unseen failures. Our key idea is to use the world model's epistemic uncertainty as a proxy for identifying unseen potential hazards. We propose a principled method to detect OOD world model predictions by calibrating an uncertainty threshold via conformal prediction. By performing reachability analysis in an augmented state space-spanning both the latent representation and the epistemic uncertainty-we synthesize a latent safety filter that can reliably safeguard arbitrary policies from both known and unseen safety hazards. In simulation and hardware experiments on vision-based control tasks with a Franka manipulator, we show that our uncertainty-aware safety filter preemptively detects potential unsafe scenarios and reliably proposes safe, in-distribution actions. Video results can be found on the project website at https://cmu-intentlab.github.io/UNISafe
Authors: Max M. Chumley, Firas A. Khasawneh
Abstract: Nonlinear dynamical systems are complex and typically only simple systems can be analytically studied. In applications, these systems are usually defined with a set of tunable parameters and as the parameters are varied the system response undergoes significant topological changes or bifurcations. In a high dimensional parameter space, it is difficult to determine which direction to vary the system parameters to achieve a desired system response or state. In this paper, we introduce a new approach for optimally navigating a dynamical system parameter space that is rooted in topological data analysis. Specifically we use the differentiability of persistence diagrams to define a topological language for intuitively promoting or deterring different topological features in the state space response of a dynamical system and use gradient descent to optimally move from one point in the parameter space to another. The end result is a path in this space that guides the system to a set of parameters that yield the desired topological features defined by the loss function. We show a number of examples by applying the methods to different dynamical systems and scenarios to demonstrate how to promote different features and how to choose the hyperparameters to achieve different outcomes.
Authors: Vasiliki Papanikou, Danae Pla Karidi, Evaggelia Pitoura, Emmanouil Panagiotou, Eirini Ntoutsi
Abstract: As Artificial Intelligence (AI) is increasingly used in areas that significantly impact human lives, concerns about fairness and transparency have grown, especially regarding their impact on protected groups. Recently, the intersection of explainability and fairness has emerged as an important area to promote responsible AI systems. This paper explores how explainability methods can be leveraged to detect and interpret unfairness. We propose a pipeline that integrates local post-hoc explanation methods to derive fairness-related insights. During the pipeline design, we identify and address critical questions arising from the use of explanations as bias detectors such as the relationship between distributive and procedural fairness, the effect of removing the protected attribute, the consistency and quality of results across different explanation methods, the impact of various aggregation strategies of local explanations on group fairness evaluations, and the overall trustworthiness of explanations as bias detectors. Our results show the potential of explanation methods used for fairness while highlighting the need to carefully consider the aforementioned critical aspects.
Authors: Santiago del Rey, Paulo S\'ergio Medeiros dos Santos, Guilherme Horta Travassos, Xavier Franch, Silverio Mart\'inez-Fern\'andez
Abstract: Background: As empirical software engineering evolves, more studies adopt data strategies$-$approaches that investigate digital artifacts such as models, source code, or system logs rather than relying on human subjects. Synthesizing results from such studies introduces new methodological challenges. Aims: This study assesses the effects of model quantization on correctness and resource efficiency in deep learning (DL) systems. Additionally, it explores the methodological implications of aggregating evidence from empirical studies that adopt data strategies. Method: We conducted a research synthesis of six primary studies that empirically evaluate model quantization. We applied the Structured Synthesis Method (SSM) to aggregate the findings, which combines qualitative and quantitative evidence through diagrammatic modeling. A total of 19 evidence models were extracted and aggregated. Results: The aggregated evidence indicates that model quantization weakly negatively affects correctness metrics while consistently improving resource efficiency metrics, including storage size, inference latency, and GPU energy consumption$-$a manageable trade-off for many DL deployment contexts. Evidence across quantization techniques remains fragmented, underscoring the need for more focused empirical studies per technique. Conclusions: Model quantization offers substantial efficiency benefits with minor trade-offs in correctness, making it a suitable optimization strategy for resource-constrained environments. This study also demonstrates the feasibility of using SSM to synthesize findings from data strategy-based research.
Authors: Yao Song, Kelly Speth, Amy Kilbourne, Andrew Quanbeck, Daniel Almirall, Lu Wang
Abstract: A clustered adaptive intervention (cAI) is a pre-specified sequence of decision rules that guides practitioners on how best - and based on which measures - to tailor cluster-level intervention to improve outcomes at the level of individuals within the clusters. A clustered sequential multiple assignment randomized trial (cSMART) is a type of trial that is used to inform the empirical development of a cAI. The most common type of secondary aim in a cSMART focuses on assessing causal effect moderation by candidate tailoring variables. We introduce a clustered Q-learning framework with the M-out-of-N Cluster Bootstrap using data from a cSMART to evaluate whether a set of candidate tailoring variables may be useful in defining an optimal cAI. This approach could construct confidence intervals (CI) with near-nominal coverage to assess parameters indexing the causal effect moderation function. Specifically, it allows reliable inferences concerning the utility of candidate tailoring variables in constructing a cAI that maximizes a mean end-of-study outcome even when "non-regularity", a well-known challenge exists. Simulations demonstrate the numerical performance of the proposed method across varying non-regularity conditions and investigate the impact of varying number of clusters and intra-cluster correlation coefficient on CI coverage. Methods are applied on ADEPT dataset to inform the construction of a clinic-level cAI for improving evidence-based practice in treating mood disorders.
Authors: Nathan Huet, Philippe Naveau, Anne Sabourin
Abstract: Appropriate modelling of extreme skew surges is crucial, particularly for coastal risk management. Our study focuses on modelling extreme skew surges along the French Atlantic coast, with a particular emphasis on investigating the extremal dependence structure between stations. We employ the peak-over-threshold framework, where a multivariate extreme event is defined whenever at least one location records a large value, though not necessarily all stations simultaneously. A novel method for determining an appropriate level (threshold) above which observations can be classified as extreme is proposed. Two complementary approaches are explored. First, the multivariate generalized Pareto distribution is employed to model extremes, leveraging its properties to derive a generative model that predicts extreme skew surges at one station based on observed extremes at nearby stations. Second, a novel extreme regression framework is assessed for point predictions. This specific regression framework enables accurate point predictions using only the "angle" of input variables, i.e. input variables divided by their norms. The ultimate objective is to reconstruct historical skew surge time series at stations with limited data. This is achieved by integrating extreme skew surge data from stations with longer records, such as Brest and Saint-Nazaire, which provide over 150 years of observations.
Authors: Edmilson Roque dos Santos, Erik Bollt
Abstract: Next Generation Reservoir Computing (NGRC) is a low-cost machine learning method for forecasting chaotic time series from data. However, ensuring the dynamical stability of NGRC models during autonomous prediction remains a challenge. In this work, we uncover a key connection between the numerical conditioning of the NGRC feature matrix -- formed by polynomial evaluations on time-delay coordinates -- and the long-term NGRC dynamics. Merging tools from numerical linear algebra and ergodic theory of dynamical systems, we systematically study how the feature matrix conditioning varies across hyperparameters. We demonstrate that the NGRC feature matrix tends to be ill-conditioned for short time lags and high-degree polynomials. Ill-conditioning amplifies sensitivity to training data perturbations, which can produce unstable NGRC dynamics. We evaluate the impact of different numerical algorithms (Cholesky, SVD, and LU) for solving the regularized least-squares problem.
Authors: Sahar Torkhesari, Behnam Yousefimehr, Mehdi Ghatee
Abstract: Driver assistance systems provide a wide range of crucial services, including closely monitoring the condition of vehicles. This paper showcases a groundbreaking sensor health monitoring system designed for the automotive industry. The ingenious system leverages cutting-edge techniques to process data collected from various vehicle sensors. It compares their outputs within the Electronic Control Unit (ECU) to evaluate the health of each sensor. To unravel the intricate correlations between sensor data, an extensive exploration of machine learning and deep learning methodologies was conducted. Through meticulous analysis, the most correlated sensor data were identified. These valuable insights were then utilized to provide accurate estimations of sensor values. Among the diverse learning methods examined, the combination of autoencoders for detecting sensor failures and random forest regression for estimating sensor values proved to yield the most impressive outcomes. A statistical model using the normal distribution has been developed to identify possible sensor failures proactively. By comparing the actual values of the sensors with their estimated values based on correlated sensors, faulty sensors can be detected early. When a defective sensor is detected, both the driver and the maintenance department are promptly alerted. Additionally, the system replaces the value of the faulty sensor with the estimated value obtained through analysis. This proactive approach was evaluated using data from twenty essential sensors in the Saipa's Quick vehicle's ECU, resulting in an impressive accuracy rate of 99\%.
Authors: Tianya Zhao, Ningning Wang, Junqing Zhang, Xuyu Wang
Abstract: While supervised deep neural networks (DNNs) have proven effective for device authentication via radio frequency (RF) fingerprinting, they are hindered by domain shift issues and the scarcity of labeled data. The success of large language models has led to increased interest in unsupervised pre-trained models (PTMs), which offer better generalization and do not require labeled datasets, potentially addressing the issues mentioned above. However, the inherent vulnerabilities of PTMs in RF fingerprinting remain insufficiently explored. In this paper, we thoroughly investigate data-free backdoor attacks on such PTMs in RF fingerprinting, focusing on a practical scenario where attackers lack access to downstream data, label information, and training processes. To realize the backdoor attack, we carefully design a set of triggers and predefined output representations (PORs) for the PTMs. By mapping triggers and PORs through backdoor training, we can implant backdoor behaviors into the PTMs, thereby introducing vulnerabilities across different downstream RF fingerprinting tasks without requiring prior knowledge. Extensive experiments demonstrate the wide applicability of our proposed attack to various input domains, protocols, and PTMs. Furthermore, we explore potential detection and defense methods, demonstrating the difficulty of fully safeguarding against our proposed backdoor attack.
Authors: Tian Bai, Yue Zhao, Xiang Yu, Archer Y. Yang
Abstract: Selecting high-quality candidates from large datasets is critical in applications such as drug discovery, precision medicine, and alignment of large language models (LLMs). While Conformal Selection (CS) provides rigorous uncertainty quantification, it is limited to univariate responses and scalar criteria. To address this issue, we propose Multivariate Conformal Selection (mCS), a generalization of CS designed for multivariate response settings. Our method introduces regional monotonicity and employs multivariate nonconformity scores to construct conformal p-values, enabling finite-sample False Discovery Rate (FDR) control. We present two variants: mCS-dist, using distance-based scores, and mCS-learn, which learns optimal scores via differentiable optimization. Experiments on simulated and real-world datasets demonstrate that mCS significantly improves selection power while maintaining FDR control, establishing it as a robust framework for multivariate selection tasks.
Authors: Shubham Vaishnav, Praveen Kumar Donta, Sindri Magn\'usson
Abstract: The last few decades have witnessed a rapid increase in IoT devices owing to their wide range of applications, such as smart healthcare monitoring systems, smart cities, and environmental monitoring. A critical task in IoT networks is sensing and transmitting information over the network. The IoT nodes gather data by sensing the environment and then transmit this data to a destination node via multi-hop communication, following some routing protocols. These protocols are usually designed to optimize possibly contradictory objectives, such as maximizing packet delivery ratio and energy efficiency. While most literature has focused on optimizing a static objective that remains unchanged, many real-world IoT applications require adapting to rapidly shifting priorities. For example, in monitoring systems, some transmissions are time-critical and require a high priority on low latency, while other transmissions are less urgent and instead prioritize energy efficiency. To meet such dynamic demands, we propose novel dynamic and distributed routing based on multiobjective Q-learning that can adapt to changes in preferences in real-time. Our algorithm builds on ideas from both multi-objective optimization and Q-learning. We also propose a novel greedy interpolation policy scheme to take near-optimal decisions for unexpected preference changes. The proposed scheme can approximate and utilize the Pareto-efficient solutions for dynamic preferences, thus utilizing past knowledge to adapt to unpredictable preferences quickly during runtime. Simulation results show that the proposed scheme outperforms state-of-the-art algorithms for various exploration strategies, preference variation patterns, and important metrics like overall reward, energy efficiency, and packet delivery ratio.
Authors: Akhiad Bercovich, Itay Levy, Izik Golan, Mohammad Dabbah, Ran El-Yaniv, Omri Puny, Ido Galil, Zach Moshe, Tomer Ronen, Najeeb Nabwani, Ido Shahaf, Oren Tropp, Ehud Karpas, Ran Zilberstein, Jiaqi Zeng, Soumye Singhal, Alexander Bukharin, Yian Zhang, Tugrul Konuk, Gerald Shen, Ameya Sunil Mahabaleshwarkar, Bilal Kartal, Yoshi Suhara, Olivier Delalleau, Zijia Chen, Zhilin Wang, David Mosallanezhad, Adi Renduchintala, Haifeng Qian, Dima Rekesh, Fei Jia, Somshubra Majumdar, Vahid Noroozi, Wasi Uddin Ahmad, Sean Narenthiran, Aleksander Ficek, Mehrzad Samadi, Jocelyn Huang, Siddhartha Jain, Igor Gitman, Ivan Moshkov, Wei Du, Shubham Toshniwal, George Armstrong, Branislav Kisacanin, Matvei Novikov, Daria Gitman, Evelina Bakhturina, Jane Polak Scowcroft, John Kamalu, Dan Su, Kezhi Kong, Markus Kliegl, Rabeeh Karimi, Ying Lin, Sanjeev Satheesh, Jupinder Parmar, Pritam Gundecha, Brandon Norick, Joseph Jennings, Shrimai Prabhumoye, Syeda Nahida Akter, Mostofa Patwary, Abhinav Khattar, Deepak Narayanan, Roger Waleffe, Jimmy Zhang, Bor-Yiing Su, Guyue Huang, Terry Kong, Parth Chadha, Sahil Jain, Christine Harvey, Elad Segal, Jining Huang, Sergey Kashirsky, Robert McQueen, Izzy Putterman, George Lam, Arun Venkatesan, Sherry Wu, Vinh Nguyen, Manoj Kilaru, Andrew Wang, Anna Warno, Abhilash Somasamudramath, Sandip Bhaskar, Maka Dong, Nave Assaf, Shahar Mor, Omer Ullman Argov, Scot Junkin, Oleksandr Romanenko, Pedro Larroy, Monika Katariya, Marco Rovinelli, Viji Balas, Nicholas Edelman, Anahita Bhiwandiwalla, Muthu Subramaniam, Smita Ithape, Karthik Ramamoorthy, Yuting Wu, Suguna Varshini Velury, Omri Almog, Joyjit Daw, Denys Fridman, Erick Galinkin, Michael Evans, Katherine Luna, Leon Derczynski, Nikki Pope, Eileen Long, Seth Schneider, Guillermo Siman, Tomasz Grzegorzek, Pablo Ribalta, Monika Katariya, Joey Conway, Trisha Saar, Ann Guan, Krzysztof Pawelec, Shyamala Prayaga, Oleksii Kuchaiev, Boris Ginsburg, Oluwatobi Olabiyi, Kari Briski, Jonathan Cohen, Bryan Catanzaro, Jonah Alben, Yonatan Geifman, Eric Chung
Abstract: We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three sizes -- Nano (8B), Super (49B), and Ultra (253B) -- and performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior inference throughput and memory efficiency. In this report, we discuss the training procedure for these models, which entails using neural architecture search from Llama 3 models for accelerated inference, knowledge distillation, and continued pretraining, followed by a reasoning-focused post-training stage consisting of two main parts: supervised fine-tuning and large scale reinforcement learning. Llama-Nemotron models are the first open-source models to support a dynamic reasoning toggle, allowing users to switch between standard chat and reasoning modes during inference. To further support open research and facilitate model development, we provide the following resources: 1. We release the Llama-Nemotron reasoning models -- LN-Nano, LN-Super, and LN-Ultra -- under the commercially permissive NVIDIA Open Model License Agreement. 2. We release the complete post-training dataset: Llama-Nemotron-Post-Training-Dataset. 3. We also release our training codebases: NeMo, NeMo-Aligner, and Megatron-LM.
Authors: Tina Khezresmaeilzadeh, Jiang Zhang, Dimitrios Andreadis, Konstantinos Psounis
Abstract: Large Language Model (LLM)-based recommendation systems leverage powerful language models to generate personalized suggestions by processing user interactions and preferences. Unlike traditional recommendation systems that rely on structured data and collaborative filtering, LLM-based models process textual and contextual information, often using cloud-based infrastructure. This raises privacy concerns, as user data is transmitted to remote servers, increasing the risk of exposure and reducing control over personal information. To address this, we propose a hybrid privacy-preserving recommendation framework which separates sensitive from nonsensitive data and only shares the latter with the cloud to harness LLM-powered recommendations. To restore lost recommendations related to obfuscated sensitive data, we design a de-obfuscation module that reconstructs sensitive recommendations locally. Experiments on real-world e-commerce datasets show that our framework achieves almost the same recommendation utility with a system which shares all data with an LLM, while preserving privacy to a large extend. Compared to obfuscation-only techniques, our approach improves HR@10 scores and category distribution alignment, offering a better balance between privacy and recommendation quality. Furthermore, our method runs efficiently on consumer-grade hardware, making privacy-aware LLM-based recommendation systems practical for real-world use.
Authors: Yuhan Liu, Lin Ning, Neo Wu, Karan Singhal, Philip Andrew Mansfield, Devora Berlowitz, Sushant Prakash, Bradley Green
Abstract: User sequence modeling is crucial for modern large-scale recommendation systems, as it enables the extraction of informative representations of users and items from their historical interactions. These user representations are widely used for a variety of downstream tasks to enhance users' online experience. A key challenge for learning these representations is the lack of labeled training data. While self-supervised learning (SSL) methods have emerged as a promising solution for learning representations from unlabeled data, many existing approaches rely on extensive negative sampling, which can be computationally expensive and may not always be feasible in real-world scenario. In this work, we propose an adaptation of Barlow Twins, a state-of-the-art SSL methods, to user sequence modeling by incorporating suitable augmentation methods. Our approach aims to mitigate the need for large negative sample batches, enabling effective representation learning with smaller batch sizes and limited labeled data. We evaluate our method on the MovieLens-1M, MovieLens-20M, and Yelp datasets, demonstrating that our method consistently outperforms the widely-used dual encoder model across three downstream tasks, achieving an 8%-20% improvement in accuracy. Our findings underscore the effectiveness of our approach in extracting valuable sequence-level information for user modeling, particularly in scenarios where labeled data is scarce and negative examples are limited.
Authors: Shu Tamano, Masanori Nojima
Abstract: Off-policy evaluation (OPE) and off-policy learning (OPL) for contextual bandit policies leverage historical data to evaluate and optimize a target policy. Most existing OPE/OPL methods--based on importance weighting or imputation--assume common support between the target and logging policies. When this assumption is violated, these methods typically require unstable extrapolation, truncation, or conservative strategies for individuals outside the common support assumption. However, such approaches can be inadequate in settings where explicit evaluation or optimization for such individuals is required. To address this issue, we propose DOLCE: Decomposing Off-policy evaluation/learning into Lagged and Current Effects, a novel estimator that leverages contextual information from multiple time points to decompose rewards into lagged and current effects. By incorporating both past and present contexts, DOLCE effectively handles individuals who violate the common support assumption. We show that the proposed estimator is unbiased under two assumptions--local correctness and conditional independence. Our experiments demonstrate that DOLCE achieves substantial improvements in OPE and OPL, particularly as the proportion of individuals outside the common support assumption increases.
Authors: Zhiyu Liao, Kang Chen, Yuanguo Lin, Kangkang Li, Yunxuan Liu, Hefeng Chen, Xingwang Huang, Yuanhui Yu
Abstract: Large Language Models (LLMs) have become central to numerous natural language processing tasks, but their vulnerabilities present significant security and ethical challenges. This systematic survey explores the evolving landscape of attack and defense techniques in LLMs. We classify attacks into adversarial prompt attack, optimized attacks, model theft, as well as attacks on application of LLMs, detailing their mechanisms and implications. Consequently, we analyze defense strategies, including prevention-based and detection-based defense methods. Although advances have been made, challenges remain to adapt to the dynamic threat landscape, balance usability with robustness, and address resource constraints in defense implementation. We highlight open problems, including the need for adaptive scalable defenses, explainable security techniques, and standardized evaluation frameworks. This survey provides actionable insights and directions for developing secure and resilient LLMs, emphasizing the importance of interdisciplinary collaboration and ethical considerations to mitigate risks in real-world applications.
Authors: Kilian Tscharke, Maximilian Wendlinger, Sebastian Issel, Pascal Debus
Abstract: Anomaly Detection (AD) is critical in data analysis, particularly within the domain of IT security. In recent years, Machine Learning (ML) algorithms have emerged as a powerful tool for AD in large-scale data. In this study, we explore the potential of quantum ML approaches, specifically quantum kernel methods, for the application to robust AD. We build upon previous work on Quantum Support Vector Regression (QSVR) for semisupervised AD by conducting a comprehensive benchmark on IBM quantum hardware using eleven datasets. Our results demonstrate that QSVR achieves strong classification performance and even outperforms the noiseless simulation on two of these datasets. Moreover, we investigate the influence of - in the NISQ-era inevitable - quantum noise on the performance of the QSVR. Our findings reveal that the model exhibits robustness to depolarizing, phase damping, phase flip, and bit flip noise, while amplitude damping and miscalibration noise prove to be more disruptive. Finally, we explore the domain of Quantum Adversarial Machine Learning and demonstrate that QSVR is highly vulnerable to adversarial attacks and that noise does not improve the adversarial robustness of the model.
Authors: Zihan Zhou, Muhammad Qasim Elahi, Murat Kocaoglu
Abstract: A fundamental challenge in the empirical sciences involves uncovering causal structure through observation and experimentation. Causal discovery entails linking the conditional independence (CI) invariances in observational data to their corresponding graphical constraints via d-separation. In this paper, we consider a general setting where we have access to data from multiple experimental distributions resulting from hard interventions, as well as potentially from an observational distribution. By comparing different interventional distributions, we propose a set of graphical constraints that are fundamentally linked to Pearl's do-calculus within the framework of hard interventions. These graphical constraints associate each graphical structure with a set of interventional distributions that are consistent with the rules of do-calculus. We characterize the interventional equivalence class of causal graphs with latent variables and introduce a graphical representation that can be used to determine whether two causal graphs are interventionally equivalent, i.e., whether they are associated with the same family of hard interventional distributions, where the elements of the family are indistinguishable using the invariances from do-calculus. We also propose a learning algorithm to integrate multiple datasets from hard interventions, introducing new orientation rules. The learning objective is a tuple of augmented graphs which entails a set of causal graphs. We also prove the soundness of the proposed algorithm.
Authors: Kai Hu, Weichen Yu, Li Zhang, Alexander Robey, Andy Zou, Chengming Xu, Haoqi Hu, Matt Fredrikson
Abstract: Vision Large Language Models (VLLMs) are increasingly deployed to offer advanced capabilities on inputs comprising both text and images. While prior research has shown that adversarial attacks can transfer from open-source to proprietary black-box models in text-only and vision-only contexts, the extent and effectiveness of such vulnerabilities remain underexplored for VLLMs. We present a comprehensive analysis demonstrating that targeted adversarial examples are highly transferable to widely-used proprietary VLLMs such as GPT-4o, Claude, and Gemini. We show that attackers can craft perturbations to induce specific attacker-chosen interpretations of visual information, such as misinterpreting hazardous content as safe, overlooking sensitive or restricted material, or generating detailed incorrect responses aligned with the attacker's intent. Furthermore, we discover that universal perturbations -- modifications applicable to a wide set of images -- can consistently induce these misinterpretations across multiple proprietary VLLMs. Our experimental results on object recognition, visual question answering, and image captioning show that this vulnerability is common across current state-of-the-art models, and underscore an urgent need for robust mitigations to ensure the safe and secure deployment of VLLMs.
Authors: An T. Le, Khai Nguyen, Minh Nhat Vu, Jo\~ao Carvalho, Jan Peters
Abstract: Sampling-based model predictive control (MPC) offers strong performance in nonlinear and contact-rich robotic tasks, yet often suffers from poor exploration due to locally greedy sampling schemes. We propose \emph{Model Tensor Planning} (MTP), a novel sampling-based MPC framework that introduces high-entropy control trajectory generation through structured tensor sampling. By sampling over randomized multipartite graphs and interpolating control trajectories with B-splines and Akima splines, MTP ensures smooth and globally diverse control candidates. We further propose a simple $\beta$-mixing strategy that blends local exploitative and global exploratory samples within the modified Cross-Entropy Method (CEM) update, balancing control refinement and exploration. Theoretically, we show that MTP achieves asymptotic path coverage and maximum entropy in the control trajectory space in the limit of infinite tensor depth and width. Our implementation is fully vectorized using JAX and compatible with MuJoCo XLA, supporting \emph{Just-in-time} (JIT) compilation and batched rollouts for real-time control with online domain randomization. Through experiments on various challenging robotic tasks, ranging from dexterous in-hand manipulation to humanoid locomotion, we demonstrate that MTP outperforms standard MPC and evolutionary strategy baselines in task success and control robustness. Design and sensitivity ablations confirm the effectiveness of MTP tensor sampling structure, spline interpolation choices, and mixing strategy. Altogether, MTP offers a scalable framework for robust exploration in model-based planning and control.
Authors: Hari Chandana Kuchibhotla, Sai Srinivas Kancheti, Abbavaram Gowtham Reddy, Vineeth N Balasubramanian
Abstract: Fine-grained Visual Recognition (FGVR) involves distinguishing between visually similar categories, which is inherently challenging due to subtle inter-class differences and the need for large, expert-annotated datasets. In domains like medical imaging, such curated datasets are unavailable due to issues like privacy concerns and high annotation costs. In such scenarios lacking labeled data, an FGVR model cannot rely on a predefined set of training labels, and hence has an unconstrained output space for predictions. We refer to this task as Vocabulary-Free FGVR (VF-FGVR), where a model must predict labels from an unconstrained output space without prior label information. While recent Multimodal Large Language Models (MLLMs) show potential for VF-FGVR, querying these models for each test input is impractical because of high costs and prohibitive inference times. To address these limitations, we introduce \textbf{Nea}rest-Neighbor Label \textbf{R}efinement (NeaR), a novel approach that fine-tunes a downstream CLIP model using labels generated by an MLLM. Our approach constructs a weakly supervised dataset from a small, unlabeled training set, leveraging MLLMs for label generation. NeaR is designed to handle the noise, stochasticity, and open-endedness inherent in labels generated by MLLMs, and establishes a new benchmark for efficient VF-FGVR.
Authors: Yingjie Qi, Jianlei Yang, Yiou Wang, Yikun Wang, Dayu Wang, Ling Tang, Cenlin Duan, Xiaolin He, Weisheng Zhao
Abstract: Digital Compute-in-Memory (CIM) architectures have shown great promise in Deep Neural Network (DNN) acceleration by effectively addressing the "memory wall" bottleneck. However, the development and optimization of digital CIM accelerators are hindered by the lack of comprehensive tools that encompass both software and hardware design spaces. Moreover, existing design and evaluation frameworks often lack support for the capacity constraints inherent in digital CIM architectures. In this paper, we present CIMFlow, an integrated framework that provides an out-of-the-box workflow for implementing and evaluating DNN workloads on digital CIM architectures. CIMFlow bridges the compilation and simulation infrastructures with a flexible instruction set architecture (ISA) design, and addresses the constraints of digital CIM through advanced partitioning and parallelism strategies in the compilation flow. Our evaluation demonstrates that CIMFlow enables systematic prototyping and optimization of digital CIM architectures across diverse configurations, providing researchers and designers with an accessible platform for extensive design space exploration.
Authors: Riccardo Busetto, Manas Mejari, Marco Forgione, Alberto Bemporad, Dario Piga
Abstract: When gradient-based methods are impractical, black-box optimization (BBO) provides a valuable alternative. However, BBO often struggles with high-dimensional problems and limited trial budgets. In this work, we propose a novel approach based on meta-learning to pre-compute a reduced-dimensional manifold where optimal points lie for a specific class of optimization problems. When optimizing a new problem instance sampled from the class, black-box optimization is carried out in the reduced-dimensional space, effectively reducing the effort required for finding near-optimal solutions.
Authors: S. Kondati Natarajan, J. Schneider, N. Pandey, J. Wellendorff, S. Smidstrup
Abstract: Atomistic modeling of thin-film processes provides an avenue not only for discovering key chemical mechanisms of the processes but also to extract quantitative metrics on the events and reactions taking place at the gas-surface interface. Molecular dynamics (MD) is a powerful computational method to study the evolution of a process at the atomic scale, but studies of industrially relevant processes usually require suitable force fields, which are in general not available for all processes of interest. However, machine learned force fields (MLFF) are conquering the field of computational materials and surface science. In this paper, we demonstrate how to efficiently build MLFFs suitable for process simulations and provide two examples for technologically relevant processes: precursor pulse in the atomic layer deposition of HfO2 and atomic layer etching of MoS2.
Authors: Phuoc Pham, Murali Sridharan, Matteo Esposito, Valentina Lenarduzzi
Abstract: In software development, technical debt (TD) refers to suboptimal implementation choices made by the developers to meet urgent deadlines and limited resources, posing challenges for future maintenance. Self-Admitted Technical Debt (SATD) is a sub-type of TD, representing specific TD instances ``openly admitted'' by the developers and often expressed through source code comments. Previous research on SATD has focused predominantly on the Java programming language, revealing a significant gap in cross-language SATD. Such a narrow focus limits the generalizability of existing findings as well as SATD detection techniques across multiple programming languages. Our work addresses such limitation by introducing CppSATD, a dedicated C++ SATD dataset, comprising over 531,000 annotated comments and their source code contexts. Our dataset can serve as a foundation for future studies that aim to develop SATD detection methods in C++, generalize the existing findings to other languages, or contribute novel insights to cross-language SATD research.
Authors: Francisco Aguilera-Mart\'inez, Fernando Berzal
Abstract: As large language models (LLMs) continue to evolve, it is critical to assess the security threats and vulnerabilities that may arise both during their training phase and after models have been deployed. This survey seeks to define and categorize the various attacks targeting LLMs, distinguishing between those that occur during the training phase and those that affect already trained models. A thorough analysis of these attacks is presented, alongside an exploration of defense mechanisms designed to mitigate such threats. Defenses are classified into two primary categories: prevention-based and detection-based defenses. Furthermore, our survey summarizes possible attacks and their corresponding defense strategies. It also provides an evaluation of the effectiveness of the known defense mechanisms for the different security threats. Our survey aims to offer a structured framework for securing LLMs, while also identifying areas that require further research to improve and strengthen defenses against emerging security challenges.
Authors: Puria Radmard, Paul M. Bays, M\'at\'e Lengyel
Abstract: Human behavioural data in psychophysics has been used to elucidate the underlying mechanisms of many cognitive processes, such as attention, sensorimotor integration, and perceptual decision making. Visual working memory has particularly benefited from this approach: analyses of VWM errors have proven crucial for understanding VWM capacity and coding schemes, in turn constraining neural models of both. One poorly understood class of VWM errors are swap errors, whereby participants recall an uncued item from memory. Swap errors could arise from erroneous memory encoding, noisy storage, or errors at retrieval time - previous research has mostly implicated the latter two. However, these studies made strong a priori assumptions on the detailed mechanisms and/or parametric form of errors contributed by these sources. Here, we pursue a data-driven approach instead, introducing a Bayesian non-parametric mixture model of swap errors (BNS) which provides a flexible descriptive model of swapping behaviour, such that swaps are allowed to depend on both the probed and reported features of every stimulus item. We fit BNS to the trial-by-trial behaviour of human participants and show that it recapitulates the strong dependence of swaps on cue similarity in multiple datasets. Critically, BNS reveals that this dependence coexists with a non-monotonic modulation in the report feature dimension for a random dot motion direction-cued, location-reported dataset. The form of the modulation inferred by BNS opens new questions about the importance of memory encoding in causing swap errors in VWM, a distinct source to the previously suggested binding and cueing errors. Our analyses, combining qualitative comparisons of the highly interpretable BNS parameter structure with rigorous quantitative model comparison and recovery methods, show that previous interpretations of swap errors may have been incomplete.
Authors: Nahshon Mokua Obiri, Kristof Van Laerhoven
Abstract: LoRaWAN technology's extensive coverage positions it as a strong contender for large-scale IoT deployments. However, achieving sub-10 m accuracy in indoor localization remains challenging due to complex environmental conditions, multipath fading, and transient obstructions. This paper proposes a lightweight but robust approach combining adaptive filtering with an extended log-distance, multi-wall path loss and shadowing (PLS) model. Our methodology augments conventional models with critical LoRaWAN parameters (received signal strength indicator (RSSI), frequency, and signal-to-noise ratio (SNR)) and dynamic environmental indicators (temperature, humidity, carbon dioxide, particulate matter, and barometric pressure). An adaptive Kalman filter reduces RSSI fluctuations, isolating persistent trends from momentary noise. Using a six-month dataset of 1,328,334 field measurements, we evaluate three models: the baseline COST 231 multi-wall model (MWM), the baseline model augmented with environmental parameters (MWM-EP), and a forward-only adaptive Kalman-filtered RSSI version of the latter (MWM-EP-KF). Results confirm that the MWM-EP-KF achieves a mean absolute error (MAE) of 5.81 m, outperforming both the MWM-EP (10.56 m) and the baseline MWM framework (17.98 m). Environmental augmentation reduces systematic errors by 41.22%, while Kalman filtering significantly enhances robustness under high RSSI volatility by 42.63%, on average across all devices. These findings present an interpretable, efficient solution for precise indoor LoRaWAN localization in dynamically changing environments.
Authors: M. Saeid HaghighiFard, Sinem Coleri
Abstract: Hierarchical Federated Learning (HFL) has recently emerged as a promising solution for intelligent decision-making in vehicular networks, helping to address challenges such as limited communication resources, high vehicle mobility, and data heterogeneity. However, HFL remains vulnerable to adversarial and unreliable vehicles, whose misleading updates can significantly compromise the integrity and convergence of the global model. To address these challenges, we propose a novel defense framework that integrates dynamic vehicle selection with robust anomaly detection within a cluster-based HFL architecture, specifically designed to counter Gaussian noise and gradient ascent attacks. The framework performs a comprehensive reliability assessment for each vehicle by evaluating historical accuracy, contribution frequency, and anomaly records. Anomaly detection combines Z-score and cosine similarity analyses on model updates to identify both statistical outliers and directional deviations in model updates. To further refine detection, an adaptive thresholding mechanism is incorporated into the cosine similarity metric, dynamically adjusting the threshold based on the historical accuracy of each vehicle to enforce stricter standards for consistently high-performing vehicles. In addition, a weighted gradient averaging mechanism is implemented, which assigns higher weights to gradient updates from more trustworthy vehicles. To defend against coordinated attacks, a cross-cluster consistency check is applied to identify collaborative attacks in which multiple compromised clusters coordinate misleading updates. Together, these mechanisms form a multi-level defense strategy to filter out malicious contributions effectively. Simulation results show that the proposed algorithm significantly reduces convergence time compared to benchmark methods across both 1-hop and 3-hop topologies.
Authors: Holger Dette, Carina Graw
Abstract: Bootstrap is a common tool for quantifying uncertainty in data analysis. However, besides additional computational costs in the application of the bootstrap on massive data, a challenging problem in bootstrap based inference under Differential Privacy consists in the fact that it requires repeated access to the data. As a consequence, bootstrap based differentially private inference requires a significant increase of the privacy budget, which on the other hand comes with a substantial loss in statistical accuracy. A potential solution to reconcile the conflicting goals of statistical accuracy and privacy is to analyze the data under parametric model assumptions and in the last decade, several parametric bootstrap methods for inference under privacy have been investigated. However, uncertainty quantification by parametric bootstrap is only valid if the the quantities of interest can be identified as the parameters of a statistical model and the imposed model assumptions are (at least approximately) satisfied. An alternative to parametric methods is the empirical bootstrap that is a widely used tool for non-parametric inference and well studied in the non-private regime. However, under privacy, less insight is available. In this paper, we propose a private empirical $m$ out of $n$ bootstrap and validate its consistency and privacy guarantees under Gaussian Differential Privacy. Compared to the the private $n$ out of $n$ bootstrap, our approach has several advantages. First, it comes with less computational costs, in particular for massive data. Second, the proposed procedure needs less additional noise in the bootstrap iterations, which leads to an improved statistical accuracy while asymptotically guaranteeing the same level of privacy. Third, we demonstrate much better finite sample properties compared to the currently available procedures.
Authors: Mahdi Dhaini, Ege Erdogan, Nils Feldhus, Gjergji Kasneci
Abstract: While research on applications and evaluations of explanation methods continues to expand, fairness of the explanation methods concerning disparities in their performance across subgroups remains an often overlooked aspect. In this paper, we address this gap by showing that, across three tasks and five language models, widely used post-hoc feature attribution methods exhibit significant gender disparity with respect to their faithfulness, robustness, and complexity. These disparities persist even when the models are pre-trained or fine-tuned on particularly unbiased datasets, indicating that the disparities we observe are not merely consequences of biased training data. Our results highlight the importance of addressing disparities in explanations when developing and applying explainability methods, as these can lead to biased outcomes against certain subgroups, with particularly critical implications in high-stakes contexts. Furthermore, our findings underscore the importance of incorporating the fairness of explanations, alongside overall model fairness and explainability, as a requirement in regulatory frameworks.
Authors: Mahdi Dhaini, Kafaite Zahra Hussain, Efstratios Zaradoukas, Gjergji Kasneci
Abstract: As Natural Language Processing (NLP) models continue to evolve and become integral to high-stakes applications, ensuring their interpretability remains a critical challenge. Given the growing variety of explainability methods and diverse stakeholder requirements, frameworks that help stakeholders select appropriate explanations tailored to their specific use cases are increasingly important. To address this need, we introduce EvalxNLP, a Python framework for benchmarking state-of-the-art feature attribution methods for transformer-based NLP models. EvalxNLP integrates eight widely recognized explainability techniques from the Explainable AI (XAI) literature, enabling users to generate and evaluate explanations based on key properties such as faithfulness, plausibility, and complexity. Our framework also provides interactive, LLM-based textual explanations, facilitating user understanding of the generated explanations and evaluation outcomes. Human evaluation results indicate high user satisfaction with EvalxNLP, suggesting it is a promising framework for benchmarking explanation methods across diverse user groups. By offering a user-friendly and extensible platform, EvalxNLP aims at democratizing explainability tools and supporting the systematic comparison and advancement of XAI techniques in NLP.
Authors: Christopher K. I. Williams
Abstract: Humans (and many vertebrates) face the problem of fusing together multiple fixations of a scene in order to obtain a representation of the whole, where each fixation uses a high-resolution fovea and decreasing resolution in the periphery. In this paper we explicitly represent the retinal transformation of a fixation as a linear downsampling of a high-resolution latent image of the scene, exploiting the known geometry. This linear transformation allows us to carry out exact inference for the latent variables in factor analysis (FA) and mixtures of FA models of the scene. Further, this allows us to formulate and solve the choice of "where to look next" as a Bayesian experimental design problem using the Expected Information Gain criterion. Experiments on the Frey faces and MNIST datasets demonstrate the effectiveness of our models.
Authors: Vladimir Somers, Baptiste Standaert, Victor Joos, Alexandre Alahi, Christophe De Vleeschouwer
Abstract: Online multi-object tracking has been recently dominated by tracking-by-detection (TbD) methods, where recent advances rely on increasingly sophisticated heuristics for tracklet representation, feature fusion, and multi-stage matching. The key strength of TbD lies in its modular design, enabling the integration of specialized off-the-shelf models like motion predictors and re-identification. However, the extensive usage of human-crafted rules for temporal associations makes these methods inherently limited in their ability to capture the complex interplay between various tracking cues. In this work, we introduce CAMEL, a novel association module for Context-Aware Multi-Cue ExpLoitation, that learns resilient association strategies directly from data, breaking free from hand-crafted heuristics while maintaining TbD's valuable modularity. At its core, CAMEL employs two transformer-based modules and relies on a novel association-centric training scheme to effectively model the complex interactions between tracked targets and their various association cues. Unlike end-to-end detection-by-tracking approaches, our method remains lightweight and fast to train while being able to leverage external off-the-shelf models. Our proposed online tracking pipeline, CAMELTrack, achieves state-of-the-art performance on multiple tracking benchmarks. Our code is available at https://github.com/TrackingLaboratory/CAMELTrack.
Authors: Tianshu Chu, Dachuan Xu, Wei Yao, Chengming Yu, Jin Zhang
Abstract: Bilevel optimization has recently attracted significant attention in machine learning due to its wide range of applications and advanced hierarchical optimization capabilities. In this paper, we propose a plug-and-play framework, named PnPBO, for developing and analyzing stochastic bilevel optimization methods. This framework integrates both modern unbiased and biased stochastic estimators into the single-loop bilevel optimization framework introduced in [9], with several improvements. In the implementation of PnPBO, all stochastic estimators for different variables can be independently incorporated, and an additional moving average technique is applied when using an unbiased estimator for the upper-level variable. In the theoretical analysis, we provide a unified convergence and complexity analysis for PnPBO, demonstrating that the adaptation of various stochastic estimators (including PAGE, ZeroSARAH, and mixed strategies) within the PnPBO framework achieves optimal sample complexity, comparable to that of single-level optimization. This resolves the open question of whether the optimal complexity bounds for solving bilevel optimization are identical to those for single-level optimization. Finally, we empirically validate our framework, demonstrating its effectiveness on several benchmark problems and confirming our theoretical findings.
Authors: Hooman Danesh, Maruthi Annamaraju, Tim Brepols, Stefanie Reese, Surya R. Kalidindi
Abstract: The capabilities of additive manufacturing have facilitated the design and production of mechanical metamaterials with diverse unit cell geometries. Establishing linkages between the vast design space of unit cells and their effective mechanical properties is critical for the efficient design and performance evaluation of such metamaterials. However, physics-based simulations of metamaterial unit cells across the entire design space are computationally expensive, necessitating a materials informatics framework to efficiently capture complex structure-property relationships. In this work, principal component analysis of 2-point correlation functions is performed to extract the salient features from a large dataset of randomly generated 2D metamaterials. Physics-based simulations are performed using a fast Fourier transform (FFT)-based homogenization approach to efficiently compute the homogenized effective elastic stiffness across the extensive unit cell designs. Subsequently, Gaussian process regression is used to generate reduced-order surrogates, mapping unit cell designs to their homogenized effective elastic constant. It is demonstrated that the adopted workflow enables a high-value low-dimensional representation of the voluminous stochastic metamaterial dataset, facilitating the construction of robust structure-property maps. Finally, an uncertainty-based active learning framework is utilized to train a surrogate model with a significantly smaller number of data points compared to the original full dataset. It is shown that a dataset as small as $0.61\%$ of the entire dataset is sufficient to generate accurate and robust structure-property maps.
Authors: Yifang Pan, Karan Singh, Luiz Gustavo Hafemann
Abstract: Speech-driven 3D facial animation plays a key role in applications such as virtual avatars, gaming, and digital content creation. While existing methods have made significant progress in achieving accurate lip synchronization and generating basic emotional expressions, they often struggle to capture and effectively transfer nuanced performance styles. We propose a novel example-based generation framework that conditions a latent diffusion model on a reference style clip to produce highly expressive and temporally coherent facial animations. To address the challenge of accurately adhering to the style reference, we introduce a novel conditioning mechanism called style basis, which extracts key poses from the reference and additively guides the diffusion generation process to fit the style without compromising lip synchronization quality. This approach enables the model to capture subtle stylistic cues while ensuring that the generated animations align closely with the input speech. Extensive qualitative, quantitative, and perceptual evaluations demonstrate the effectiveness of our method in faithfully reproducing the desired style while achieving superior lip synchronization across various speech scenarios.
Authors: Satvik Venkatesh, Philip Coleman, Arthur Benilov, Simon Brown, Selim Sheta, Frederic Roskam
Abstract: Dereverberation is an important sub-task of Speech Enhancement (SE) to improve the signal's intelligibility and quality. However, it remains challenging because the reverberation is highly correlated with the signal. Furthermore, the single-channel SE literature has predominantly focused on rooms with short reverb times (typically under 1 second), smaller rooms (under volumes of 1000 cubic meters) and relatively short distances (up to 2 meters). In this paper, we explore real-time low-latency single-channel SE under distant microphone scenarios, such as 5 to 10 meters, and focus on conference rooms and theatres, with larger room dimensions and reverberation times. Such a setup is useful for applications such as lecture demonstrations, drama, and to enhance stage acoustics. First, we show that single-channel SE in such challenging scenarios is feasible. Second, we investigate the relationship between room volume and reverberation time, and demonstrate its importance when randomly simulating room impulse responses. Lastly, we show that for dereverberation with short decay times, preserving early reflections before decaying the transfer function of the room improves overall signal quality.
Authors: Jonathan Frey, Katrin Baumg\"artner, Gianluca Frison, Dirk Reinhardt, Jasper Hoffmann, Leonard Fichtner, Sebastien Gros, Moritz Diehl
Abstract: The efficient computation of parametric solution sensitivities is a key challenge in the integration of learning-enhanced methods with nonlinear model predictive control (MPC), as their availability is crucial for many learning algorithms. While approaches presented in the machine learning community are limited to convex or unconstrained formulations, this paper discusses the computation of solution sensitivities of general nonlinear programs (NLPs) using the implicit function theorem (IFT) and smoothed optimality conditions treated in interior-point methods (IPM). We detail sensitivity computation within a sequential quadratic programming (SQP) method which employs an IPM for the quadratic subproblems. The publication is accompanied by an efficient open-source implementation within the framework, providing both forward and adjoint sensitivities for general optimal control problems, achieving speedups exceeding 3x over the state-of-the-art solver mpc.pytorch.
Authors: Gen Li, Yuchen Jiao
Abstract: Diffusion models have emerged as a powerful framework for generative modeling, with guidance techniques playing a crucial role in enhancing sample quality. Despite their empirical success, a comprehensive theoretical understanding of the guidance effect remains limited. Existing studies only focus on case studies, where the distribution conditioned on each class is either isotropic Gaussian or supported on a one-dimensional interval with some extra conditions. How to analyze the guidance effect beyond these case studies remains an open question. Towards closing this gap, we make an attempt to analyze diffusion guidance under general data distributions. Rather than demonstrating uniform sample quality improvement, which does not hold in some distributions, we prove that guidance can improve the whole sample quality, in the sense that the average reciprocal of the classifier probability decreases with the existence of guidance. This aligns with the motivation of introducing guidance.
Authors: Fahong Zhang, Yilei Shi, Xiao Xiang Zhu
Abstract: This paper addresses the challenge of mapping polygonal buildings from remote sensing images and introduces a novel algorithm, the Global Collinearity-aware Polygonizer (GCP). GCP, built upon an instance segmentation framework, processes binary masks produced by any instance segmentation model. The algorithm begins by collecting polylines sampled along the contours of the binary masks. These polylines undergo a refinement process using a transformer-based regression module to ensure they accurately fit the contours of the targeted building instances. Subsequently, a collinearity-aware polygon simplification module simplifies these refined polylines and generate the final polygon representation. This module employs dynamic programming technique to optimize an objective function that balances the simplicity and fidelity of the polygons, achieving globally optimal solutions. Furthermore, the optimized collinearity-aware objective is seamlessly integrated into network training, enhancing the cohesiveness of the entire pipeline. The effectiveness of GCP has been validated on two public benchmarks for polygonal building mapping. Further experiments reveal that applying the collinearity-aware polygon simplification module to arbitrary polylines, without prior knowledge, enhances accuracy over traditional methods such as the Douglas-Peucker algorithm. This finding underscores the broad applicability of GCP. The code for the proposed method will be made available at https://github.com/zhu-xlab.
Authors: Alice Natalina Caragliano, Claudia Tacconi, Carlo Greco, Lorenzo Nibid, Edy Ippolito, Michele Fiore, Giuseppe Perrone, Sara Ramella, Paolo Soda, Valerio Guarrasi
Abstract: This study proposes a novel approach combining Multimodal Deep Learning with intrinsic eXplainable Artificial Intelligence techniques to predict pathological response in non-small cell lung cancer patients undergoing neoadjuvant therapy. Due to the limitations of existing radiomics and unimodal deep learning approaches, we introduce an intermediate fusion strategy that integrates imaging and clinical data, enabling efficient interaction between data modalities. The proposed Multimodal Doctor-in-the-Loop method further enhances clinical relevance by embedding clinicians' domain knowledge directly into the training process, guiding the model's focus gradually from broader lung regions to specific lesions. Results demonstrate improved predictive accuracy and explainability, providing insights into optimal data integration strategies for clinical applications.
Authors: Yang Jin, Jun Lv, Wenye Yu, Hongjie Fang, Yong-Lu Li, Cewu Lu
Abstract: Self-improvement requires robotic systems to initially learn from human-provided data and then gradually enhance their capabilities through interaction with the environment. This is similar to how humans improve their skills through continuous practice. However, achieving effective self-improvement is challenging, primarily because robots tend to repeat their existing abilities during interactions, often failing to generate new, valuable data for learning. In this paper, we identify the key to successful self-improvement: modal-level exploration and data selection. By incorporating a modal-level exploration mechanism during policy execution, the robot can produce more diverse and multi-modal interactions. At the same time, we select the most valuable trials and high-quality segments from these interactions for learning. We successfully demonstrate effective robot self-improvement on both simulation benchmarks and real-world experiments. The capability for self-improvement will enable us to develop more robust and high-success-rate robotic control strategies at a lower cost. Our code and experiment scripts are available at https://ericjin2002.github.io/SIME/
Authors: Mohammadreza Teymoorianfard, Shiqing Ma, Amir Houmansadr
Abstract: The rapid rise of video diffusion models has enabled the generation of highly realistic and temporally coherent videos, raising critical concerns about content authenticity, provenance, and misuse. Existing watermarking approaches, whether passive, post-hoc, or adapted from image-based techniques, often struggle to withstand video-specific manipulations such as frame insertion, dropping, or reordering, and typically degrade visual quality. In this work, we introduce VIDSTAMP, a watermarking framework that embeds per-frame or per-segment messages directly into the latent space of temporally-aware video diffusion models. By fine-tuning the model's decoder through a two-stage pipeline, first on static image datasets to promote spatial message separation, and then on synthesized video sequences to restore temporal consistency, VIDSTAMP learns to embed high-capacity, flexible watermarks with minimal perceptual impact. Leveraging architectural components such as 3D convolutions and temporal attention, our method imposes no additional inference cost and offers better perceptual quality than prior methods, while maintaining comparable robustness against common distortions and tampering. VIDSTAMP embeds 768 bits per video (48 bits per frame) with a bit accuracy of 95.0%, achieves a log P-value of -166.65 (lower is better), and maintains a video quality score of 0.836, comparable to unwatermarked outputs (0.838) and surpassing prior methods in capacity-quality tradeoffs. Code: Code: \url{https://github.com/SPIN-UMass/VidStamp}
Authors: Henry Shugart, Jason M. Altschuler
Abstract: Efficient computation of min-max problems is a central question in optimization, learning, games, and controls. Arguably the most natural algorithm is gradient-descent-ascent (GDA). However, since the 1970s, conventional wisdom has argued that GDA fails to converge even on simple problems. This failure spurred an extensive literature on modifying GDA with additional building blocks such as extragradients, optimism, momentum, anchoring, etc. In contrast, we show that GDA converges in its original form by simply using a judicious choice of stepsizes. The key innovation is the proposal of unconventional stepsize schedules (dubbed slingshot stepsize schedules) that are time-varying, asymmetric, and periodically negative. We show that all three properties are necessary for convergence, and that altogether this enables GDA to converge on the classical counterexamples (e.g., unconstrained convex-concave problems). All of our results apply to the last iterate of GDA, as is typically desired in practice. The core algorithmic intuition is that although negative stepsizes make backward progress, they de-synchronize the min and max variables (overcoming the cycling issue of GDA), and lead to a slingshot phenomenon in which the forward progress in the other iterations is overwhelmingly larger. This results in fast overall convergence. Geometrically, the slingshot dynamics leverage the non-reversibility of gradient flow: positive/negative steps cancel to first order, yielding a second-order net movement in a new direction that leads to convergence and is otherwise impossible for GDA to move in. We interpret this as a second-order finite-differencing algorithm and show that, intriguingly, it approximately implements consensus optimization, an empirically popular algorithm for min-max problems involving deep neural networks (e.g., training GANs).
Authors: Jiefeng Li, Jinkun Cao, Haotian Zhang, Davis Rempe, Jan Kautz, Umar Iqbal, Ye Yuan
Abstract: Human motion modeling traditionally separates motion generation and estimation into distinct tasks with specialized models. Motion generation models focus on creating diverse, realistic motions from inputs like text, audio, or keyframes, while motion estimation models aim to reconstruct accurate motion trajectories from observations like videos. Despite sharing underlying representations of temporal dynamics and kinematics, this separation limits knowledge transfer between tasks and requires maintaining separate models. We present GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Our key insight is to reformulate motion estimation as constrained motion generation, where the output motion must precisely satisfy observed conditioning signals. Leveraging the synergy between regression and diffusion, GENMO achieves accurate global motion estimation while enabling diverse motion generation. We also introduce an estimation-guided training objective that exploits in-the-wild videos with 2D annotations and text descriptions to enhance generative diversity. Furthermore, our novel architecture handles variable-length motions and mixed multimodal conditions (text, audio, video) at different time intervals, offering flexible control. This unified approach creates synergistic benefits: generative priors improve estimated motions under challenging conditions like occlusions, while diverse video data enhances generation capabilities. Extensive experiments demonstrate GENMO's effectiveness as a generalist framework that successfully handles multiple human motion tasks within a single model.
Authors: Ayon Ghosh, L. A. Prashanth, Dipayan Sen, Aditya Gopalan
Abstract: We consider the problem of sequentially learning to estimate, in the mean squared error (MSE) sense, a Gaussian $K$-vector of unknown covariance by observing only $m < K$ of its entries in each round. We propose two MSE estimators, and analyze their concentration properties. The first estimator is non-adaptive, as it is tied to a predetermined $m$-subset and lacks the flexibility to transition to alternative subsets. The second estimator, which is derived using a regression framework, is adaptive and exhibits better concentration bounds in comparison to the first estimator. We frame the MSE estimation problem with bandit feedback, where the objective is to find the MSE-optimal subset with high confidence. We propose a variant of the successive elimination algorithm to solve this problem. We also derive a minimax lower bound to understand the fundamental limit on the sample complexity of this problem.
Authors: Ming Zhu, Xiao-Yang Liu, Sem Borst, Anwar Walid
Abstract: Smart traffic lights in intelligent transportation systems (ITSs) are envisioned to greatly increase traffic efficiency and reduce congestion. Deep reinforcement learning (DRL) is a promising approach to adaptively control traffic lights based on the real-time traffic situation in a road network. However, conventional methods may suffer from poor scalability. In this paper, we investigate deep reinforcement learning to control traffic lights, and both theoretical analysis and numerical experiments show that the intelligent behavior ``greenwave" (i.e., a vehicle will see a progressive cascade of green lights, and not have to brake at any intersection) emerges naturally a grid road network, which is proved to be the optimal policy in an avenue with multiple cross streets. As a first step, we use two DRL algorithms for the traffic light control problems in two scenarios. In a single road intersection, we verify that the deep Q-network (DQN) algorithm delivers a thresholding policy; and in a grid road network, we adopt the deep deterministic policy gradient (DDPG) algorithm. Secondly, numerical experiments show that the DQN algorithm delivers the optimal control, and the DDPG algorithm with passive observations has the capability to produce on its own a high-level intelligent behavior in a grid road network, namely, the ``greenwave" policy emerges. We also verify the ``greenwave" patterns in a $5 \times 10$ grid road network. Thirdly, the ``greenwave" patterns demonstrate that DRL algorithms produce favorable solutions since the ``greenwave" policy shown in experiment results is proved to be optimal in a specified traffic model (an avenue with multiple cross streets). The delivered policies both in a single road intersection and a grid road network demonstrate the scalability of DRL algorithms.
Authors: Michael I. Jordan, Guy Kornowski, Tianyi Lin, Ohad Shamir, Manolis Zampetakis
Abstract: We study the complexity of optimizing nonsmooth nonconvex Lipschitz functions by producing $(\delta,\epsilon)$-stationary points. Several recent works have presented randomized algorithms that produce such points using $\tilde O(\delta^{-1}\epsilon^{-3})$ first-order oracle calls, independent of the dimension $d$. It has been an open problem as to whether a similar result can be obtained via a deterministic algorithm. We resolve this open problem, showing that randomization is necessary to obtain a dimension-free rate. In particular, we prove a lower bound of $\Omega(d)$ for any deterministic algorithm. Moreover, we show that unlike smooth or convex optimization, access to function values is required for any deterministic algorithm to halt within any finite time. On the other hand, we prove that if the function is even slightly smooth, then the dimension-free rate of $\tilde O(\delta^{-1}\epsilon^{-3})$ can be obtained by a deterministic algorithm with merely a logarithmic dependence on the smoothness parameter. Motivated by these findings, we turn to study the complexity of deterministically smoothing Lipschitz functions. Though there are efficient black-box randomized smoothings, we start by showing that no such deterministic procedure can smooth functions in a meaningful manner, resolving an open question. We then bypass this impossibility result for the structured case of ReLU neural networks. To that end, in a practical white-box setting in which the optimizer is granted access to the network's architecture, we propose a simple, dimension-free, deterministic smoothing that provably preserves $(\delta,\epsilon)$-stationary points. Our method applies to a variety of architectures of arbitrary depth, including ResNets and ConvNets. Combined with our algorithm, this yields the first deterministic dimension-free algorithm for optimizing ReLU networks, circumventing our lower bound.
Authors: Alessio Mazzetto, Reza Esfandiarpoor, Akash Singirikonda, Eli Upfal, Stephen H. Bach
Abstract: We introduce an adaptive method with formal quality guarantees for weak supervision in a non-stationary setting. Our goal is to infer the unknown labels of a sequence of data by using weak supervision sources that provide independent noisy signals of the correct classification for each data point. This setting includes crowdsourcing and programmatic weak supervision. We focus on the non-stationary case, where the accuracy of the weak supervision sources can drift over time, e.g., because of changes in the underlying data distribution. Due to the drift, older data could provide misleading information to infer the label of the current data point. Previous work relied on a priori assumptions on the magnitude of the drift to decide how much data to use from the past. In contrast, our algorithm does not require any assumptions on the drift, and it adapts based on the input by dynamically varying its window size. In particular, at each step, our algorithm estimates the current accuracies of the weak supervision sources by identifying a window of past observations that guarantees a near-optimal minimization of the trade-off between the error due to the variance of the estimation and the error due to the drift. Experiments on synthetic and real-world labelers show that our approach adapts to the drift.
Authors: Erisa Hasani, Rachel A. Ward
Abstract: Recent advances in the literature show promising potential of deep learning methods, particularly neural operators, in obtaining numerical solutions to partial differential equations (PDEs) beyond the reach of current numerical solvers. However, existing data-driven approaches often rely on training data produced by numerical PDE solvers (e.g., finite difference or finite element methods). We introduce a "backward" data generation method that avoids solving the PDE numerically: by randomly sampling candidate solutions $u_j$ from the appropriate solution space (e.g., $H_0^1(\Omega)$), we compute the corresponding right-hand side $f_j$ directly from the equation by differentiation. This produces training pairs ${(f_j, u_j)}$ by computing derivatives rather than solving a PDE numerically for each data point, enabling fast, large-scale data generation consisting of exact solutions. Experiments indicate that models trained on this synthetic data generalize well when tested on data produced by standard solvers. While the idea is simple, we hope this method will expand the potential of neural PDE solvers that do not rely on classical numerical solvers to generate their data.
Authors: Filip Szatkowski, Yaoyue Zheng, Fei Yang, Bart{\l}omiej Twardowski, Tomasz Trzci\'nski, Joost van de Weijer
Abstract: Continual learning is crucial for applying machine learning in challenging, dynamic, and often resource-constrained environments. However, catastrophic forgetting - overwriting previously learned knowledge when new information is acquired - remains a major challenge. In this work, we examine the intermediate representations in neural network layers during continual learning and find that such representations are less prone to forgetting, highlighting their potential to accelerate computation. Motivated by these findings, we propose to use auxiliary classifiers(ACs) to enhance performance and demonstrate that integrating ACs into various continual learning methods consistently improves accuracy across diverse evaluation settings, yielding an average 10% relative gain. We also leverage the ACs to reduce the average cost of the inference by 10-60% without compromising accuracy, enabling the model to return the predictions before computing all the layers. Our approach provides a scalable and efficient solution for continual learning.
Authors: Viet-Hoang Tran, Trang Pham, Tho Tran, Minh Khoi Nguyen Nhat, Thanh Chu, Tam Le, Tan M. Nguyen
Abstract: Many variants of Optimal Transport (OT) have been developed to address its heavy computation. Among them, notably, Sliced Wasserstein (SW) is widely used for application domains by projecting the OT problem onto one-dimensional lines, and leveraging the closed-form expression of the univariate OT to reduce the computational burden. However, projecting measures onto low-dimensional spaces can lead to a loss of topological information. To mitigate this issue, in this work, we propose to replace one-dimensional lines with a more intricate structure, called tree systems. This structure is metrizable by a tree metric, which yields a closed-form expression for OT problems on tree systems. We provide an extensive theoretical analysis to formally define tree systems with their topological properties, introduce the concept of splitting maps, which operate as the projection mechanism onto these structures, then finally propose a novel variant of Radon transform for tree systems and verify its injectivity. This framework leads to an efficient metric between measures, termed Tree-Sliced Wasserstein distance on Systems of Lines (TSW-SL). By conducting a variety of experiments on gradient flows, image style transfer, and generative models, we illustrate that our proposed approach performs favorably compared to SW and its variants.
Authors: Chi-Heng Lin, Shangqian Gao, James Seale Smith, Abhishek Patel, Shikhar Tuli, Yilin Shen, Hongxia Jin, Yen-Chang Hsu
Abstract: Large Language Models (LLMs) have reshaped the landscape of artificial intelligence by demonstrating exceptional performance across various tasks. However, substantial computational requirements make their deployment challenging on devices with limited resources. Recently, compression methods using low-rank matrix techniques have shown promise, yet these often lead to degraded accuracy or introduce significant overhead in parameters and inference latency. This paper introduces \textbf{Mo}dular \textbf{De}composition (MoDeGPT), a novel structured compression framework that does not need recovery fine-tuning while resolving the above drawbacks. MoDeGPT partitions the Transformer block into modules comprised of matrix pairs and reduces the hidden dimensions via reconstructing the module-level outputs. MoDeGPT is developed based on a theoretical framework that utilizes three well-established matrix decomposition algorithms -- Nystr\"om approximation, CR decomposition, and SVD -- and applies them to our redefined transformer modules. Our comprehensive experiments show MoDeGPT, without backward propagation, matches or surpasses previous structured compression methods that rely on gradient information, and saves 98% of compute costs on compressing a 13B model. On \textsc{Llama}-2/3 and OPT models, MoDeGPT maintains 90-95% zero-shot performance with 25-30% compression rates. Moreover, the compression can be done on a single GPU within a few hours and increases the inference throughput by up to 46%.
Authors: Haoxuan Ji, Zheng Lin, Yuyao Sun, Gao Fei, Yuhang Wang, Haichang Gao, Zhenxing Niu
Abstract: Modern privacy regulations have spurred the evolution of machine unlearning, a technique that enables the removal of data from an already trained ML model without requiring retraining from scratch. Previous unlearning methods tend to induce the model to achieve lowest classification accuracy on the removal data. Nonetheless, the authentic objective of machine unlearning is to align the unlearned model with the gold model, i.e., achieving the same classification accuracy as the gold model. For this purpose, we present a Twin Machine Unlearning (TMU) approach, where a twin unlearning problem is defined corresponding to the original unlearning problem. As a results, the generalization-label predictor trained on the twin problem can be transferred to the original problem, facilitating aligned data removal. Comprehensive empirical experiments illustrate that our approach significantly enhances the alignment between the unlearned model and the gold model. Meanwhile, our method allows data removal without compromising the model accuracy.
Authors: Huizhen Yu, Yi Wan, Richard S. Sutton
Abstract: This paper studies asynchronous stochastic approximation (SA) algorithms and their theoretical application to reinforcement learning in semi-Markov decision processes (SMDPs) with an average-reward criterion. We first extend Borkar and Meyn's stability proof method to accommodate more general noise conditions, yielding broader convergence guarantees for asynchronous SA. To sharpen the convergence analysis, we further examine shadowing properties in the asynchronous setting, building on a dynamical-systems approach of Hirsch and Bena\"{i}m. Leveraging these SA results, we establish the convergence of an asynchronous SA analogue of Schweitzer's classical relative value iteration algorithm, RVI Q-learning, for finite-space, weakly communicating SMDPs. Moreover, to make full use of these SA results in this application, we introduce new monotonicity conditions for estimating the optimal reward rate in RVI Q-learning. These conditions substantially expand the previously considered algorithmic framework, and we address them with novel arguments in the stability and convergence analysis of RVI Q-learning.
Authors: Yongxin Deng, Xihe Qiu, Jue Chen, Xiaoyu Tan
Abstract: The inherent uncertainty in the environmental transition model of Reinforcement Learning (RL) necessitates a delicate balance between exploration and exploitation. This balance is crucial for optimizing computational resources to accurately estimate expected rewards for the agent. In scenarios with sparse rewards, such as robotic control systems, achieving this balance is particularly challenging. However, given that many environments possess extensive prior knowledge, learning from the ground up in such contexts may be redundant. To address this issue, we propose Language Model Guided reward Tuning (LMGT), a novel, sample-efficient framework. LMGT leverages the comprehensive prior knowledge embedded in Large Language Models (LLMs) and their proficiency in processing non-standard data forms, such as wiki tutorials. By utilizing LLM-guided reward shifts, LMGT adeptly balances exploration and exploitation, thereby guiding the agent's exploratory behavior and enhancing sample efficiency. We have rigorously evaluated LMGT across various RL tasks and evaluated it in the embodied robotic environment Housekeep. Our results demonstrate that LMGT consistently outperforms baseline methods. Furthermore, the findings suggest that our framework can substantially reduce the computational resources required during the RL training phase.
Authors: Shenao Zhang, Zhihan Liu, Boyi Liu, Yufeng Zhang, Yingxiang Yang, Yongfei Liu, Liyu Chen, Tao Sun, Zhaoran Wang
Abstract: Preference alignment in Large Language Models (LLMs) has significantly improved their ability to adhere to human instructions and intentions. However, existing direct alignment algorithms primarily focus on relative preferences and often overlook the qualitative aspects of responses, despite having access to preference data that includes reward scores from judge models during AI feedback. Striving to maximize the implicit reward gap between the chosen and the slightly inferior rejected responses can cause overfitting and unnecessary unlearning of the high-quality rejected responses. The unawareness of the reward scores also drives the LLM to indiscriminately favor the low-quality chosen responses and fail to generalize to optimal responses that are sparse in data. To overcome these shortcomings, our study introduces reward-conditioned LLM policies that discern and learn from the entire spectrum of response quality within the dataset, helping extrapolate to more optimal regions. We propose an effective yet simple data relabeling method that conditions the preference pairs on quality scores to construct a reward-augmented dataset. The experiments across various benchmarks and diverse models demonstrate that our approach consistently boosts DPO by a considerable margin. Through comprehensive ablation studies, we demonstrate that our method not only maximizes the utility of preference data but also mitigates the issue of unlearning, demonstrating its broad effectiveness beyond mere data expansion. Our code is available at https://github.com/shenao-zhang/reward-augmented-preference.
URLs: https://github.com/shenao-zhang/reward-augmented-preference.
Authors: Rong-Xi Tan, Ke Xue, Shen-Huan Lyu, Haopu Shang, Yao Wang, Yaoyuan Wang, Sheng Fu, Chao Qian
Abstract: Offline model-based optimization (MBO) aims to identify a design that maximizes a black-box function using only a fixed, pre-collected dataset of designs and their corresponding scores. A common approach in offline MBO is to train a regression-based surrogate model by minimizing mean squared error (MSE) and then find the best design within this surrogate model by different optimizers (e.g., gradient ascent). However, a critical challenge is the risk of out-of-distribution errors, i.e., the surrogate model may typically overestimate the scores and mislead the optimizers into suboptimal regions. Prior works have attempted to address this issue in various ways, such as using regularization techniques and ensemble learning to enhance the robustness of the model, but it still remains. In this paper, we argue that regression models trained with MSE are not well-aligned with the primary goal of offline MBO, which is to select promising designs rather than to predict their scores precisely. Notably, if a surrogate model can maintain the order of candidate designs based on their relative score relationships, it can produce the best designs even without precise predictions. To validate it, we conduct experiments to compare the relationship between the quality of the final designs and MSE, finding that the correlation is really very weak. In contrast, a metric that measures order-maintaining quality shows a significantly stronger correlation. Based on this observation, we propose learning a ranking-based model that leverages learning to rank techniques to prioritize promising designs based on their relative scores. We show that the generalization error on ranking loss can be well bounded. Empirical results across diverse tasks demonstrate the superior performance of our proposed ranking-based models than twenty existing methods.
Authors: Weiqin Chen, Santiago Paternain
Abstract: Pretrained foundation models have exhibited extraordinary in-context learning performance, allowing zero-shot generalization to new tasks not encountered during pretraining. In the case of reinforcement learning (RL), in-context RL (ICRL) emerges when pretraining FMs on decision-making problems in an autoregressive-supervised manner. Nevertheless, current state-of-the-art ICRL algorithms, like Algorithm Distillation, Decision Pretrained Transformer and Decision Importance Transformer, impose stringent requirements on the pretraining dataset concerning the source policies, context information, and action labels. Notably, these algorithms either demand optimal policies or require varying degrees of well-trained behavior policies for all pretraining environments. This significantly hinders the application of ICRL to real-world scenarios, where acquiring optimal or well-trained policies for a substantial volume of real-world training environments can be intractable. To overcome this challenge, we introduce a novel approach, termed State-Action Distillation (SAD), that allows to generate an effective pretraining dataset guided solely by random policies. In particular, SAD selects query states and corresponding action labels by distilling outstanding state-action pairs from the entire state and action spaces by using random policies within a trust horizon, and then inherits the classical autoregressive-supervised mechanism during pretraining. To the best of our knowledge, this is the first work that enables effective ICRL under random policies and random contexts. We also establish quantitative analysis of the trustworthiness as well as the performance guarantees of SAD. Moreover, our empirical results across multiple popular ICRL benchmark environments demonstrate that, on average, SAD outperforms the best baseline by 236.3% in the offline evaluation and by 135.2% in the online evaluation.
Authors: Mohsen Dehghankar, Mahdi Erfanian, Abolfazl Asudeh
Abstract: Despite their tremendous success and versatility, Deep Neural Networks (DNNs) such as Large Language Models (LLMs) suffer from inference inefficiency and rely on advanced computational infrastructure. To address these challenges and make these models more accessible and cost-effective, in this paper, we propose algorithms to improve the inference time and memory efficiency of DNNs with binary and ternary weight matrices. Particularly focusing on matrix multiplication as the bottleneck operation of inference, we observe that, once trained, the weight matrices of a model no longer change. This allows us to preprocess these matrices and create indices that help reduce the storage requirements by a logarithmic factor while enabling our efficient inference algorithms. Specifically, for a $n\times n$ weight matrix, our efficient algorithm guarantees a time complexity of $O(\frac{n^2}{\log n})$, a logarithmic factor improvement over the standard vector-matrix multiplication. Besides theoretical analysis, we conduct extensive experiments to evaluate the practical efficiency of our algorithms. Our results confirm the superiority of our approach both with respect to time and memory, as we observed a reduction in the multiplication time up to 29x and memory usage up to 6x. When applied to LLMs, our experiments show up to a 5.24x speedup in the inference time.
Authors: Behnoosh Zamanlooy, Mario Diaz, Shahab Asoodeh
Abstract: Local differential privacy (LDP) is increasingly employed in privacy-preserving machine learning to protect user data before sharing it with an untrusted aggregator. Most LDP methods assume that users possess only a single data record, which is a significant limitation since users often gather extensive datasets (e.g., images, text, time-series data) and frequently have access to public datasets. To address this limitation, we propose a locally private sampling framework that leverages both the private and public datasets of each user. Specifically, we assume each user has two distributions: $p$ and $q$ that represent their private dataset and the public dataset, respectively. The objective is to design a mechanism that generates a private sample approximating $p$ while simultaneously preserving $q$. We frame this objective as a minimax optimization problem using $f$-divergence as the utility measure. We fully characterize the minimax optimal mechanisms for general $f$-divergences provided that $p$ and $q$ are discrete distributions. Remarkably, we demonstrate that this optimal mechanism is universal across all $f$-divergences. Experiments validate the effectiveness of our minimax optimal sampler compared to the state-of-the-art locally private sampler.
Authors: Core Francisco Park, Ekdeep Singh Lubana, Itamar Pres, Hidenori Tanaka
Abstract: In-Context Learning (ICL) has significantly expanded the general-purpose nature of large language models, allowing them to adapt to novel tasks using merely the inputted context. This has motivated a series of papers that analyze tractable synthetic domains and postulate precise mechanisms that may underlie ICL. However, the use of relatively distinct setups that often lack a sequence modeling nature to them makes it unclear how general the reported insights from such studies are. Motivated by this, we propose a synthetic sequence modeling task that involves learning to simulate a finite mixture of Markov chains. As we show, models trained on this task reproduce most well-known results on ICL, hence offering a unified setting for studying the concept. Building on this setup, we demonstrate we can explain a model's behavior by decomposing it into four broad algorithms that combine a fuzzy retrieval vs. inference approach with either unigram or bigram statistics of the context. These algorithms engage in a competition dynamics to dominate model behavior, with the precise experimental conditions dictating which algorithm ends up superseding others: e.g., we find merely varying context size or amount of training yields (at times sharp) transitions between which algorithm dictates the model behavior, revealing a mechanism that explains the transient nature of ICL. In this sense, we argue ICL is best thought of as a mixture of different algorithms, each with its own peculiarities, instead of a monolithic capability. This also implies that making general claims about ICL that hold universally across all settings may be infeasible.
Authors: Xin Huang, Eric M. Wolff, Paul Vernaza, Tung Phan-Minh, Hongge Chen, David S. Hayden, Mark Edmonds, Brian Pierce, Xinxin Chen, Pratik Elias Jacob, Xiaobai Chen, Chingiz Tairbekov, Pratik Agarwal, Tianshi Gao, Yuning Chai, Siddhartha Srinivasa
Abstract: We present DriveGPT, a scalable behavior model for autonomous driving. We model driving as a sequential decision-making task, and learn a transformer model to predict future agent states as tokens in an autoregressive fashion. We scale up our model parameters and training data by multiple orders of magnitude, enabling us to explore the scaling properties in terms of dataset size, model parameters, and compute. We evaluate DriveGPT across different scales in a planning task, through both quantitative metrics and qualitative examples, including closed-loop driving in complex real-world scenarios. In a separate prediction task, DriveGPT outperforms state-of-the-art baselines and exhibits improved performance by pretraining on a large-scale dataset, further validating the benefits of data scaling.
Authors: Ke Alexander Wang, Jiaxin Shi, Emily B. Fox
Abstract: Sequence models lie at the heart of modern deep learning. However, rapid advancements have produced a diversity of seemingly unrelated architectures, such as Transformers and recurrent alternatives. In this paper, we introduce a unifying framework to understand and derive these sequence models, inspired by the empirical importance of associative recall, the capability to retrieve contextually relevant tokens. We formalize associative recall as a two-step process, memorization and retrieval, casting memorization as a regression problem. Layers that combine these two steps perform associative recall via ``test-time regression'' over its input tokens. Prominent layers, including linear attention, state-space models, fast-weight programmers, online learners, and softmax attention, arise as special cases defined by three design choices: the regression weights, the regressor function class, and the test-time optimization algorithm. Our approach clarifies how linear attention fails to capture inter-token correlations and offers a mathematical justification for the empirical effectiveness of query-key normalization in softmax attention. Further, it illuminates unexplored regions within the design space, which we use to derive novel higher-order generalizations of softmax attention. Beyond unification, our work bridges sequence modeling with classic regression methods, a field with extensive literature, paving the way for developing more powerful and theoretically principled architectures.
Authors: Shuaiqun Pan, Diederick Vermetten, Manuel L\'opez-Ib\'a\~nez, Thomas B\"ack, Hao Wang
Abstract: Surrogate models are frequently employed as efficient substitutes for the costly execution of real-world processes. However, constructing a high-quality surrogate model often demands extensive data acquisition. A solution to this issue is to transfer pre-trained surrogate models for new tasks, provided that certain invariances exist between tasks. This study focuses on transferring non-differentiable surrogate models (e.g., random forests) from a source function to a target function, where we assume their domains are related by an unknown affine transformation, using only a limited amount of transfer data points evaluated on the target. Previous research attempts to tackle this challenge for differentiable models, e.g., Gaussian process regression, which minimizes the empirical loss on the transfer data by tuning the affine transformations. In this paper, we extend the previous work to the random forest and assess its effectiveness on a widely-used artificial problem set - Black-Box Optimization Benchmark (BBOB) testbed, and on four real-world transfer learning problems. The results highlight the significant practical advantages of the proposed method, particularly in reducing both the data requirements and computational costs of training surrogate models for complex real-world scenarios.
Authors: Harshwardhan Praveen, Jacob Brown, Christopher Earls
Abstract: In this work, we present a mesh-independent, data-driven library, chebgreen, to mathematically model one-dimensional systems, possessing an associated control parameter, and whose governing partial differential equation is unknown. The proposed method learns an Empirical Green's Function for the associated, but hidden, boundary value problem, in the form of a Rational Neural Network from which we subsequently construct a bivariate representation in a Chebyshev basis. We uncover the Green's function, at an unseen control parameter value, by interpolating the left and right singular functions within a suitable library, expressed as points on a manifold of Quasimatrices, while the associated singular values are interpolated with Lagrange polynomials.
Authors: Alessandro Licciardi, Davide Leo, Eros Fan\`i, Barbara Caputo, Marco Ciccone
Abstract: Federated Learning (FL) emerged as a decentralized paradigm to train models while preserving privacy. However, conventional FL struggles with data heterogeneity and class imbalance, which degrade model performance. Clustered FL balances personalization and decentralized training by grouping clients with analogous data distributions, enabling improved accuracy while adhering to privacy constraints. This approach effectively mitigates the adverse impact of heterogeneity in FL. In this work, we propose a novel clustered FL method, FedGWC (Federated Gaussian Weighting Clustering), which groups clients based on their data distribution, allowing training of a more robust and personalized model on the identified clusters. FedGWC identifies homogeneous clusters by transforming individual empirical losses to model client interactions with a Gaussian reward mechanism. Additionally, we introduce the Wasserstein Adjusted Score, a new clustering metric for FL to evaluate cluster cohesion with respect to the individual class distribution. Our experiments on benchmark datasets show that FedGWC outperforms existing FL algorithms in cluster quality and classification accuracy, validating the efficacy of our approach.
Authors: Keon Vin Park
Abstract: Physics-Informed Neural Networks (PINNs) have emerged as a promising approach for solving Partial Differential Equations (PDEs) by incorporating physical constraints into deep learning models. However, standard PINNs often require a large number of training samples to achieve high accuracy, leading to increased computational costs. To address this issue, we propose Active Learning-Driven PINNs (AL-PINN), which integrates Uncertainty Quantification (UQ) and Active Learning (AL) strategies to optimize sample selection dynamically. AL-PINN utilizes Monte Carlo Dropout to estimate epistemic uncertainty in the model predictions, enabling the adaptive selection of high-uncertainty regions for additional training. This approach significantly enhances learning efficiency by focusing computational resources on the most informative data points. We evaluate AL-PINN on benchmark PDE problems with known analytical solutions and real-world WeatherBench climate data. Our results demonstrate that AL-PINN achieves comparable or superior accuracy compared to traditional PINNs while reducing the number of required training samples. The proposed framework is particularly beneficial for scientific and engineering applications where data collection is expensive or limited, such as climate modeling, medical simulations, and material science. Our findings highlight the potential of active learning in accelerating PINN-based PDE solvers while maintaining high accuracy and computational efficiency.
Authors: Keon Vin Park
Abstract: Physics-Informed Neural Networks (PINNs) have emerged as a powerful framework for solving partial differential equations (PDEs) by embedding physical laws into neural network training. However, traditional PINN models are typically designed for single PDEs, limiting their generalizability across different physical systems. In this work, we explore the potential of a foundation PINN model capable of solving multiple PDEs within a unified architecture. We investigate the efficacy of a single PINN framework trained on four distinct PDEs-the Simple Harmonic Oscillator (SHO), the 1D Heat Equation, the 1D Wave Equation, and the 2D Laplace Equation, demonstrating its ability to learn diverse physical dynamics. To enhance sample efficiency, we incorporate Active Learning (AL) using Monte Carlo (MC) Dropout-based uncertainty estimation, selecting the most informative training samples iteratively. We evaluate different active learning strategies, comparing models trained on 10%, 20%, 30%, 40%, and 50% of the full dataset, and analyze their impact on solution accuracy. Our results indicate that targeted uncertainty sampling significantly improves performance with fewer training samples, leading to efficient learning across multiple PDEs. This work highlights the feasibility of a generalizable PINN-based foundation model, capable of adapting to different physics-based problems without redesigning network architectures. Our findings suggest that multi-PDE PINNs with active learning can serve as an effective approach for reducing computational costs while maintaining high accuracy in physics-based deep learning applications.
Authors: Shashank Kirtania
Abstract: Large Language Models (LLMs) have shown promise in proving formal theorems using proof assistants like Lean. However, current state of the art language models struggles to predict next step in proofs leading practitioners to use different sampling techniques to improve LLMs capabilities. We observe that the LLM is capable of predicting the correct tactic; however, it faces challenges in ranking it appropriately within the set of candidate tactics, affecting the overall selection process. To overcome this hurdle, we use activation steering to guide LLMs responses to improve the generations at the time of inference. Our results suggest that activation steering offers a promising lightweight alternative to specialized fine-tuning for enhancing theorem proving capabilities in LLMs, particularly valuable in resource-constrained environments.
Authors: Yuxiao Wen
Abstract: In combinatorial semi-bandits, a learner repeatedly selects from a combinatorial decision set of arms, receives the realized sum of rewards, and observes the rewards of the individual selected arms as feedback. In this paper, we extend this framework to include \emph{graph feedback}, where the learner observes the rewards of all neighboring arms of the selected arms in a feedback graph $G$. We establish that the optimal regret over a time horizon $T$ scales as $\widetilde{\Theta}(S\sqrt{T}+\sqrt{\alpha ST})$, where $S$ is the size of the combinatorial decisions and $\alpha$ is the independence number of $G$. This result interpolates between the known regrets $\widetilde\Theta(S\sqrt{T})$ under full information (i.e., $G$ is complete) and $\widetilde\Theta(\sqrt{KST})$ under the semi-bandit feedback (i.e., $G$ has only self-loops), where $K$ is the total number of arms. A key technical ingredient is to realize a convexified action using a random decision vector with negative correlations. We also show that online stochastic mirror descent (OSMD) that only realizes convexified actions in expectation is suboptimal.
Authors: Vrushank Ahire, Kunal Shah, Mudasir Nazir Khan, Nikhil Pakhale, Lownish Rai Sookha, M. A. Ganaie, Abhinav Dhall
Abstract: Dynamic emotion recognition in the wild remains challenging due to the transient nature of emotional expressions and temporal misalignment of multi-modal cues. Traditional approaches predict valence and arousal and often overlook the inherent correlation between these two dimensions. The proposed Multi-modal Attention for Valence-Arousal Emotion Network (MAVEN) integrates visual, audio, and textual modalities through a bi-directional cross-modal attention mechanism. MAVEN uses modality-specific encoders to extract features from synchronized video frames, audio segments, and transcripts, predicting emotions in polar coordinates following Russell's circumplex model. The evaluation of the Aff-Wild2 dataset using MAVEN achieved a concordance correlation coefficient (CCC) of 0.3061, surpassing the ResNet-50 baseline model with a CCC of 0.22. The multistage architecture captures the subtle and transient nature of emotional expressions in conversational videos and improves emotion recognition in real-world situations. The code is available at: https://github.com/Vrushank-Ahire/MAVEN_8th_ABAW
Authors: Akira Tamamori
Abstract: Hopfield networks using Hebbian learning suffer from limited storage capacity. While supervised methods like Linear Logistic Regression (LLR) offer some improvement, kernel methods like Kernel Logistic Regression (KLR) significantly enhance capacity and noise robustness. However, KLR requires computationally expensive iterative learning. We propose Kernel Ridge Regression (KRR) as an efficient kernel-based alternative for learning high-capacity Hopfield networks. KRR utilizes the kernel trick and predicts bipolar states via regression, crucially offering a non-iterative, closed-form solution for learning dual variables. We evaluate KRR and compare its performance against Hebbian, LLR, and KLR. Our results demonstrate that KRR achieves state-of-the-art storage capacity (reaching $\beta$=1.5) and noise robustness, comparable to KLR. Crucially, KRR drastically reduces training time, being orders of magnitude faster than LLR and significantly faster than KLR, especially at higher storage loads. This establishes KRR as a potent and highly efficient method for building high-performance associative memories, providing comparable performance to KLR with substantial training speed advantages. This work provides the first empirical comparison between KRR and KLR in the context of Hopfield network learning.
Authors: Wenxin Zhang, Xiaojian Lin, Wenjun Yu, Guangzhen Yao, jingxiang Zhong, Yu Li, Renda Han, Songcheng Xu, Hao Shi, Cuicui Luo
Abstract: Time series anomaly detection holds notable importance for risk identification and fault detection across diverse application domains. Unsupervised learning methods have become popular because they have no requirement for labels. However, due to the challenges posed by the multiplicity of abnormal patterns, the sparsity of anomalies, and the growth of data scale and complexity, these methods often fail to capture robust and representative dependencies within the time series for identifying anomalies. To enhance the ability of models to capture normal patterns of time series and avoid the retrogression of modeling ability triggered by the dependencies on high-quality prior knowledge, we propose a differencing-based contrastive representation learning framework for time series anomaly detection (DConAD). Specifically, DConAD generates differential data to provide additional information about time series and utilizes transformer-based architecture to capture spatiotemporal dependencies, which enhances the robustness of unbiased representation learning ability. Furthermore, DConAD implements a novel KL divergence-based contrastive learning paradigm that only uses positive samples to avoid deviation from reconstruction and deploys the stop-gradient strategy to compel convergence. Extensive experiments on five public datasets show the superiority and effectiveness of DConAD compared with nine baselines. The code is available at https://github.com/shaieesss/DConAD.
Authors: Abhishek Jana, Moeumu Uili, James Atherton, Mark O'Brien, Joe Wood, Leandra Brickson
Abstract: This paper presents an automated one-shot bird call classification pipeline designed for rare species absent from large publicly available classifiers like BirdNET and Perch. While these models excel at detecting common birds with abundant training data, they lack options for species with only 1-3 known recordings-a critical limitation for conservationists monitoring the last remaining individuals of endangered birds. To address this, we leverage the embedding space of large bird classification networks and develop a classifier using cosine similarity, combined with filtering and denoising preprocessing techniques, to optimize detection with minimal training data. We evaluate various embedding spaces using clustering metrics and validate our approach in both a simulated scenario with Xeno-Canto recordings and a real-world test on the critically endangered tooth-billed pigeon (Didunculus strigirostris), which has no existing classifiers and only three confirmed recordings. The final model achieved 1.0 recall and 0.95 accuracy in detecting tooth-billed pigeon calls, making it practical for use in the field. This open-source system provides a practical tool for conservationists seeking to detect and monitor rare species on the brink of extinction.
Authors: Yun Qu, Qi Cheems Wang, Yixiu Mao, Yiqin Lv, Xiangyang Ji
Abstract: Task robust adaptation is a long-standing pursuit in sequential decision-making. Some risk-averse strategies, e.g., the conditional value-at-risk principle, are incorporated in domain randomization or meta reinforcement learning to prioritize difficult tasks in optimization, which demand costly intensive evaluations. The efficiency issue prompts the development of robust active task sampling to train adaptive policies, where risk-predictive models are used to surrogate policy evaluation. This work characterizes the optimization pipeline of robust active task sampling as a Markov decision process, posits theoretical and practical insights, and constitutes robustness concepts in risk-averse scenarios. Importantly, we propose an easy-to-implement method, referred to as Posterior and Diversity Synergized Task Sampling (PDTS), to accommodate fast and robust sequential decision-making. Extensive experiments show that PDTS unlocks the potential of robust active task sampling, significantly improves the zero-shot and few-shot adaptation robustness in challenging tasks, and even accelerates the learning process under certain scenarios. Our project website is at https://thu-rllab.github.io/PDTS_project_page.
Authors: Vishnu Sarukkai, Zhiqiang Xie, Kayvon Fatahalian
Abstract: Many methods for improving Large Language Model (LLM) agents for sequential decision-making tasks depend on task-specific knowledge engineering--such as prompt tuning, curated in-context examples, or customized observation and action spaces. Using these approaches, agent performance improves with the quality or amount of knowledge engineering invested. Instead, we investigate how LLM agents can automatically improve their performance by learning in-context from their own successful experiences on similar tasks. Rather than relying on task-specific knowledge engineering, we focus on constructing and refining a database of self-generated examples. We demonstrate that even a naive accumulation of successful trajectories across training tasks boosts test performance on three benchmarks: ALFWorld (73% to 89%), Wordcraft (55% to 64%), and InterCode-SQL (75% to 79%)--matching the performance the initial agent achieves if allowed two to three attempts per task. We then introduce two extensions: (1) database-level selection through population-based training to identify high-performing example collections, and (2) exemplar-level selection that retains individual trajectories based on their empirical utility as in-context examples. These extensions further enhance performance, achieving 91% on ALFWorld--matching more complex approaches that employ task-specific components and prompts. Our results demonstrate that automatic trajectory database construction offers a compelling alternative to labor-intensive knowledge engineering.
Authors: Haozheng Luo, Chenghao Qiu, Maojiang Su, Zhihan Zhou, Zoe Mehta, Guo Ye, Jerry Yao-Chieh Hu, Han Liu
Abstract: To address the challenge of scarce computational resources in genomic modeling, we introduce GERM, a genomic foundation model with strong compression performance and fast adaptability. GERM improves upon models like DNABERT-2 by eliminating outliers that hinder low-rank adaptation and post-training quantization, enhancing both efficiency and robustness. We replace the vanilla attention layer with an outlier-free mechanism inspired by associative memory models. By removing outliers during both pre-training and fine-tuning, this approach accelerates adaptation, reduces computational costs, and enhances quantization robustness within acceptable loss margins. Additionally, we propose GERM-T, a strategy that employs small-step continual learning within the outlier-free framework, leveraging original checkpoints to avoid retraining from scratch. Empirically, GERM improves fine-tuning performance by 37.98% and quantization by 64.34% over the baseline model. It also reduces average kurtosis by 92.14% and maximum infinity norm by 82.77%. Compared to leading methods, GERM consistently delivers superior performance, offering a practical solution for genomic modeling in resource-constrained settings. Code is available at https://github.com/MAGICS-LAB/GERM.
Authors: Yikun Zhang, Yen-Chi Chen
Abstract: The set of local modes and density ridge lines are important summary characteristics of the data-generating distribution. In this work, we focus on estimating local modes and density ridges from point cloud data in a product space combining two or more Euclidean and/or directional metric spaces. Specifically, our approach extends the (subspace constrained) mean shift algorithm to such product spaces, addressing potential challenges in the generalization process. We establish the algorithmic convergence of the proposed methods, along with practical implementation guidelines. Experiments on simulated and real-world datasets demonstrate the effectiveness of our proposed methods.
Authors: Junwen Qiu, Li Jiang, Andre Milzarek
Abstract: The proximal stochastic gradient method (PSGD) is one of the state-of-the-art approaches for stochastic composite-type problems. In contrast to its deterministic counterpart, PSGD has been found to have difficulties with the correct identification of underlying substructures (such as supports, low rank patterns, or active constraints) and it does not possess a finite-time manifold identification property. Existing solutions rely on convexity assumptions or on the additional usage of variance reduction techniques. In this paper, we address these limitations and present a simple variant of PSGD based on Robinson's normal map. The proposed normal map-based proximal stochastic gradient method (NSGD) is shown to converge globally, i.e., accumulation points of the generated iterates correspond to stationary points almost surely. In addition, we establish complexity bounds for NSGD that match the known results for PSGD and we prove that NSGD can almost surely identify active manifolds in finite-time in a general nonconvex setting. Our derivations are built on almost sure iterate convergence guarantees and utilize analysis techniques based on the Kurdyka-Lojasiewicz inequality.
Authors: Chengjing Wang, Peipei Tang, Wenling He, Meixia Lin
Abstract: Graphical models have exhibited their performance in numerous tasks ranging from biological analysis to recommender systems. However, graphical models with hub nodes are computationally difficult to fit, particularly when the dimension of the data is large. To efficiently estimate the hub graphical models, we introduce a two-phase algorithm. The proposed algorithm first generates a good initial point via a dual alternating direction method of multipliers (ADMM), and then warm starts a semismooth Newton (SSN) based augmented Lagrangian method (ALM) to compute a solution that is accurate enough for practical tasks. We fully excavate the sparsity structure of the generalized Jacobian arising from the hubs in the graphical models, which ensures that the algorithm can obtain a nice solution very efficiently. Comprehensive experiments on both synthetic data and real data show that it obviously outperforms the existing state-of-the-art algorithms. In particular, in some high dimensional tasks, it can save more than 70\% of the execution time, meanwhile still achieves a high-quality estimation.
Authors: Francisca Vasconcelos, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Panayotis Mertikopoulos, Georgios Piliouras, Michael I. Jordan
Abstract: Recent developments in domains such as non-local games, quantum interactive proofs, and quantum generative adversarial networks have renewed interest in quantum game theory and, specifically, quantum zero-sum games. Central to classical game theory is the efficient algorithmic computation of Nash equilibria, which represent optimal strategies for both players. In 2008, Jain and Watrous proposed the first classical algorithm for computing equilibria in quantum zero-sum games using the Matrix Multiplicative Weight Updates (MMWU) method to achieve a convergence rate of $\mathcal{O}(d/\epsilon^2)$ iterations to $\epsilon$-Nash equilibria in the $4^d$-dimensional spectraplex. In this work, we propose a hierarchy of quantum optimization algorithms that generalize MMWU via an extra-gradient mechanism. Notably, within this proposed hierarchy, we introduce the Optimistic Matrix Multiplicative Weights Update (OMMWU) algorithm and establish its average-iterate convergence complexity as $\mathcal{O}(d/\epsilon)$ iterations to $\epsilon$-Nash equilibria. This quadratic speed-up relative to Jain and Watrous' original algorithm sets a new benchmark for computing $\epsilon$-Nash equilibria in quantum zero-sum games.
Authors: Yifan Peng, Yuehaw Khoo, Daren Wang
Abstract: Multivariate density estimation is of great interest in various scientific and engineering disciplines. In this work, we introduce a new framework called Variance-Reduced Sketching (VRS), specifically designed to estimate multivariate density functions with a reduced curse of dimensionality. Our VRS framework conceptualizes multivariate functions as infinite-size matrices/tensors, and facilitates a new sketching technique motivated by the numerical linear algebra literature to reduce the variance in density estimation problems. We demonstrate the robust numerical performance of VRS through a series of simulated experiments and real-world data applications. Notably, VRS shows remarkable improvement over existing neural network density estimators and classical kernel methods in numerous distribution models. Additionally, we offer theoretical justifications for VRS to support its ability to deliver density estimation with a reduced curse of dimensionality.
Authors: Wong Kam-Kwai, Yan Luo, Xuanwu Yue, Wei Chen, Huamin Qu
Abstract: Financial cluster analysis allows investors to discover investment alternatives and avoid undertaking excessive risks. However, this analytical task faces substantial challenges arising from many pairwise comparisons, the dynamic correlations across time spans, and the ambiguity in deriving implications from business relational knowledge. We propose Prismatic, a visual analytics system that integrates quantitative analysis of historical performance and qualitative analysis of business relational knowledge to cluster correlated businesses interactively. Prismatic features three clustering processes: dynamic cluster generation, knowledge-based cluster exploration, and correlation-based cluster validation. Utilizing a multi-view clustering approach, it enriches data-driven clusters with knowledge-driven similarity, providing a nuanced understanding of business correlations. Through well-coordinated visual views, Prismatic facilitates a comprehensive interpretation of intertwined quantitative and qualitative features, demonstrating its usefulness and effectiveness via case studies on formulating concept stocks and extensive interviews with domain experts.
Authors: Gabriele Oliaro, Xupeng Miao, Xinhao Cheng, Vineeth Kada, Ruohan Gao, Yingyi Huang, Remi Delacourt, April Yang, Yingcheng Wang, Mengdi Wu, Colin Unger, Zhihao Jia
Abstract: Finetuning large language models (LLMs) is essential for task adaptation, yet serving stacks today isolate inference and finetuning on separate GPU clusters -- wasting resources and under-utilizing hardware. We introduce FlexLLM, the first system to co-serve LLM inference and PEFT-based finetuning on shared GPUs by fusing computation at the token level. The static compilation optimizations in FlexLLM -- dependent parallelization and graph pruning significantly shrink activation memory, leading to end-to-end GPU memory savings by up to 80%. At runtime, a novel token-level finetuning mechanism paired with a hybrid token scheduler dynamically interleaves inference and training tokens within each co-serving iteration, meeting strict latency SLOs while maximizing utilization. In end-to-end benchmarks on LLaMA-3.1-8B, Qwen-2.5-14B, and Qwen-2.5-32B, FlexLLM sustains the inference SLO requirements up to 20 req/s, and improves finetuning throughput by 1.9-4.8x under heavy inference workloads and 2.5-6.8x under light loads, preserving over 76% of peak finetuning progress even at peak demand. The source code of FlexLLM is publicly available at https://github.com/flexflow/FlexFlow/.
Authors: JooHyun Park, YuJin Jeon, HuiYong Kim, SeungHwan Baek, HyeongYeop Kang
Abstract: Holography stands at the forefront of visual technology, offering immersive, three-dimensional visualizations through the manipulation of light wave amplitude and phase. Although generative models have been extensively explored in the image domain, their application to holograms remains relatively underexplored due to the inherent complexity of phase learning. Exploiting generative models for holograms offers exciting opportunities for advancing innovation and creativity, such as semantic-aware hologram generation and editing. Currently, the most viable approach for utilizing generative models in the hologram domain involves integrating an image-based generative model with an image-to-hologram conversion model, which comes at the cost of increased computational complexity and inefficiency. To tackle this problem, we introduce P-Hologen, the first end-to-end generative framework designed for phase-only holograms (POHs). P-Hologen employs vector quantized variational autoencoders to capture the complex distributions of POHs. It also integrates the angular spectrum method into the training process, constructing latent spaces for complex phase data using strategies from the image processing domain. Extensive experiments demonstrate that P-Hologen achieves superior quality and computational efficiency compared to the existing methods. Furthermore, our model generates high-quality unseen, diverse holographic content from its learned latent space without requiring pre-existing images. Our work paves the way for new applications and methodologies in holographic content creation, opening a new era in the exploration of generative holographic content. The code for our paper is publicly available on https://github.com/james0223/P-Hologen.
Authors: Letitia Parcalabescu, Anette Frank
Abstract: Vision and language model (VLM) decoders are currently the best-performing architectures on multimodal tasks. Next to answers, they are able to produce natural language explanations, either in post-hoc or CoT settings. However, it is not clear to what extent they are using the input vision and text modalities when generating answers or explanations. In this work, we investigate if VLMs rely on their input modalities differently when they produce explanations as opposed to answers. We also evaluate the self-consistency of VLM decoders in both post-hoc and CoT explanation settings, by extending existing unimodal tests and measures to VLM decoders. We find that most tested VLMs are less self-consistent than LLMs. Text contributions in all tested VL decoders are more important than image contributions in all examined tasks. However, when comparing explanation generation to answer generation, the contributions of images are significantly stronger for generating explanations compared to answers. This difference is even larger in CoT compared to post-hoc explanations. Lastly, we provide an up-to-date benchmarking of state-of-the-art VL decoders on the VALSE benchmark, which before was restricted to VL encoders. We find that the tested VL decoders still struggle with most phenomena tested by VALSE.
Authors: Fredrik Hagstr\"om, Vikas Garg, Fabricio Oliveira
Abstract: Buildings account for 40% of global energy consumption. A considerable portion of building energy consumption stems from heating, ventilation, and air conditioning (HVAC), and thus implementing smart, energy-efficient HVAC systems has the potential to significantly impact the course of climate change. In recent years, model-free reinforcement learning algorithms have been increasingly assessed for this purpose due to their ability to learn and adapt purely from experience. They have been shown to outperform classical controllers in terms of energy cost and consumption, as well as thermal comfort. However, their weakness lies in their relatively poor data efficiency, requiring long periods of training to reach acceptable policies, making them inapplicable to real-world controllers directly. In this paper, we demonstrate that using federated learning to train the reinforcement learning controller of HVAC systems can improve the learning speed, as well as improve their ability to generalize, which in turn facilitates transfer learning to unseen building environments. In our setting, a global control policy is learned by aggregating local policies trained on multiple data centers located in different climate zones. The goal of the policy is to minimize energy consumption and maximize thermal comfort. We perform experiments evaluating three different optimizers for local policy training, as well as three different federated learning algorithms against two alternative baselines. Our experiments show that these effects lead to a faster learning speed, as well as greater generalization capabilities in the federated policy compared to any individually trained policy. Furthermore, the learning stability is significantly improved, with the learning process and performance of the federated policy being less sensitive to the choice of parameters and the inherent randomness of reinforcement learning.
Authors: Aditya Ravuri, Neil D. Lawrence
Abstract: This paper shows that dimensionality reduction methods such as UMAP and t-SNE, can be approximately recast as MAP inference methods corresponding to a model introduced in Ravuri et al. (2023), that describes the graph Laplacian (an estimate of the data precision matrix) using a Wishart distribution, with a mean given by a non-linear covariance function evaluated on the latents. This interpretation offers deeper theoretical and semantic insights into such algorithms, and forging a connection to Gaussian process latent variable models by showing that well-known kernels can be used to describe covariances implied by graph Laplacians. We also introduce tools with which similar dimensionality reduction methods can be studied.
Authors: Markus Heinonen, Ba-Hien Tran, Michael Kampffmeyer, Maurizio Filippone
Abstract: Introducing training-time augmentations is a key technique to enhance generalization and prepare deep neural networks against test-time corruptions. Inspired by the success of generative diffusion models, we propose a novel approach of coupling data mollification, in the form of image noising and blurring, with label smoothing to align predicted label confidences with image degradation. The method is simple to implement, introduces negligible overheads, and can be combined with existing augmentations. We demonstrate improved robustness and uncertainty quantification on the corrupted image benchmarks of CIFAR, TinyImageNet and ImageNet datasets.
Authors: Ismail Alkhouri, Cedric Le Denmat, Yingjie Li, Cunxi Yu, Jia Liu, Rongrong Wang, Alvaro Velasquez
Abstract: Combinatorial Optimization (CO) addresses many important problems, including the challenging Maximum Independent Set (MIS) problem. Alongside exact and heuristic solvers, differentiable approaches have emerged, often using continuous relaxations of ReLU-based or quadratic objectives. Noting that an MIS in a graph is a Maximum Clique (MC) in its complement, we propose a new quadratic formulation for MIS by incorporating an MC term, improving convergence and exploration. We show that every maximal independent set corresponds to a local minimizer, derive conditions with respect to the MIS size, and characterize stationary points. To tackle the non-convexity of the objective, we propose optimizing several initializations in parallel using momentum-based gradient descent, complemented by an efficient MIS checking criterion derived from our theory. We dub our method as **p**arallelized **C**lique-Informed **Q**uadratic **O**ptimization for MIS (**pCQO-MIS**). Our experimental results demonstrate the effectiveness of the proposed method compared to exact, heuristic, sampling, and data-centric approaches. Notably, our method avoids the out-of-distribution tuning and reliance on (un)labeled data required by data-centric methods, while achieving superior MIS sizes and competitive runtime relative to their inference time. Additionally, a key advantage of pCQO-MIS is that, unlike exact and heuristic solvers, the runtime scales only with the number of nodes in the graph, not the number of edges. Our code is available at the GitHub repository \href{https://github.com/ledenmat/pCQO-mis-benchmark/tree/refactor}{{{pCQO-MIS}}}
URLs: https://github.com/ledenmat/pCQO-mis-benchmark/tree/refactor
Authors: Shivam Barwey, Pinaki Pal, Saumil Patel, Riccardo Balin, Bethany Lusch, Venkatram Vishwanath, Romit Maulik, Ramesh Balakrishnan
Abstract: A graph neural network (GNN) approach is introduced in this work which enables mesh-based three-dimensional super-resolution of fluid flows. In this framework, the GNN is designed to operate not on the full mesh-based field at once, but on localized meshes of elements (or cells) directly. To facilitate mesh-based GNN representations in a manner similar to spectral (or finite) element discretizations, a baseline GNN layer (termed a message passing layer, which updates local node properties) is modified to account for synchronization of coincident graph nodes, rendering compatibility with commonly used element-based mesh connectivities. The architecture is multiscale in nature, and is comprised of a combination of coarse-scale and fine-scale message passing layer sequences (termed processors) separated by a graph unpooling layer. The coarse-scale processor embeds a query element (alongside a set number of neighboring coarse elements) into a single latent graph representation using coarse-scale synchronized message passing over the element neighborhood, and the fine-scale processor leverages additional message passing operations on this latent graph to correct for interpolation errors. Demonstration studies are performed using hexahedral mesh-based data from Taylor-Green Vortex and backward-facing step flow simulations at Reynolds numbers of 1600 and 3200. Through analysis of both global and local errors, the results ultimately show how the GNN is able to produce accurate super-resolved fields compared to targets in both coarse-scale and multiscale model configurations. Reconstruction errors for fixed architectures were found to increase in proportion to the Reynolds number. Geometry extrapolation studies on a separate cavity flow configuration show promising cross-mesh capabilities of the super-resolution strategy.
Authors: Jake Welde, Nishanth Rao, Pratik Kunapuli, Dinesh Jayaraman, Vijay Kumar
Abstract: Tracking controllers enable robotic systems to accurately follow planned reference trajectories. In particular, reinforcement learning (RL) has shown promise in the synthesis of controllers for systems with complex dynamics and modest online compute budgets. However, the poor sample efficiency of RL and the challenges of reward design make training slow and sometimes unstable, especially for high-dimensional systems. In this work, we leverage the inherent Lie group symmetries of robotic systems with a floating base to mitigate these challenges when learning tracking controllers. We model a general tracking problem as a Markov decision process (MDP) that captures the evolution of both the physical and reference states. Next, we prove that symmetry in the underlying dynamics and running costs leads to an MDP homomorphism, a mapping that allows a policy trained on a lower-dimensional "quotient" MDP to be lifted to an optimal tracking controller for the original system. We compare this symmetry-informed approach to an unstructured baseline, using Proximal Policy Optimization (PPO) to learn tracking controllers for three systems: the Particle (a forced point mass), the Astrobee (a fullyactuated space robot), and the Quadrotor (an underactuated system). Results show that a symmetry-aware approach both accelerates training and reduces tracking error at convergence.
Authors: Jorge Lor\'ia, Anindya Bhadra
Abstract: Neal (1996) proved that infinitely wide shallow Bayesian neural networks (BNN) converge to Gaussian processes (GP), when the network weights have bounded prior variance. Cho & Saul (2009) provided a useful recursive formula for deep kernel processes for relating the covariance kernel of each layer to the layer immediately below. Moreover, they worked out the form of the layer-wise covariance kernel in an explicit manner for several common activation functions. Recent works, including Aitchison et al. (2021), have highlighted that the covariance kernels obtained in this manner are deterministic and hence, precludes any possibility of representation learning, which amounts to learning a non-degenerate posterior of a random kernel given the data. To address this, they propose adding artificial noise to the kernel to retain stochasticity, and develop deep kernel inverse Wishart processes. Nonetheless, this artificial noise injection could be critiqued in that it would not naturally emerge in a classic BNN architecture under an infinite-width limit. To address this, we show that a Bayesian deep neural network, where each layer width approaches infinity, and all network weights are elliptically distributed with infinite variance, converges to a process with $\alpha$-stable marginals in each layer that has a conditionally Gaussian representation. These conditional random covariance kernels could be recursively linked in the manner of Cho & Saul (2009), even though marginally the process exhibits stable behavior, and hence covariances are not even necessarily defined. We also provide useful generalizations of the recent results of Lor\'ia & Bhadra (2024) on shallow networks to multi-layer networks, and remedy the computational burden of their approach. The computational and statistical benefits over competing approaches stand out in simulations and in demonstrations on benchmark data sets.
Authors: Jinghan Li, Zhicheng Sun, Yadong Mu
Abstract: In the endeavor to make autonomous robots take actions, task planning is a major challenge that requires translating high-level task descriptions to long-horizon action sequences. Despite recent advances in language model agents, they remain prone to planning errors and limited in their ability to plan ahead. To address these limitations in robotic planning, we advocate a self-refining scheme that iteratively refines a draft plan until an equilibrium is reached. Remarkably, this process can be optimized end-to-end from an analytical perspective without the need to curate additional verifiers or reward models, allowing us to train self-refining planners in a simple supervised learning fashion. Meanwhile, a nested equilibrium sequence modeling procedure is devised for efficient closed-loop planning that incorporates useful feedback from the environment (or an internal world model). Our method is evaluated on the VirtualHome-Env benchmark, showing advanced performance with improved scaling w.r.t. inference-time computation. Code is available at https://github.com/Singularity0104/equilibrium-planner.
URLs: https://github.com/Singularity0104/equilibrium-planner.
Authors: Wenqing Wang, Yun Fu
Abstract: Audio-driven talking-head generation is a crucial and useful technology for virtual human interaction and film-making. While recent advances have focused on improving image fidelity and lip synchronization, generating accurate emotional expressions remains underexplored. In this paper, we introduce EmoGene, a novel framework for synthesizing high-fidelity, audio-driven video portraits with accurate emotional expressions. Our approach employs a variational autoencoder (VAE)-based audio-to-motion module to generate facial landmarks, which are concatenated with emotional embedding in a motion-to-emotion module to produce emotional landmarks. These landmarks drive a Neural Radiance Fields (NeRF)-based emotion-to-video module to render realistic emotional talking-head videos. Additionally, we propose a pose sampling method to generate natural idle-state (non-speaking) videos for silent audio inputs. Extensive experiments demonstrate that EmoGene outperforms previous methods in generating high-fidelity emotional talking-head videos.
Authors: Akash Kundu, Leopoldo Sarra
Abstract: Designing quantum circuits for specific tasks is challenging due to the exponential growth of the state space. We introduce gadget reinforcement learning (GRL), which integrates reinforcement learning with program synthesis to automatically generate and incorporate composite gates (gadgets) into the action space. This enhances the exploration of parameterized quantum circuits (PQCs) for complex tasks like approximating ground states of quantum Hamiltonians, an NP-hard problem. We evaluate GRL using the transverse field Ising model under typical computational budgets (e.g., 2- 3 days of GPU runtime). Our results show improved accuracy, hardware compatibility and scalability. GRL exhibits robust performance as the size and complexity of the problem increases, even with constrained computational resources. By integrating gadget extraction, GRL facilitates the discovery of reusable circuit components tailored for specific hardware, bridging the gap between algorithmic design and practical implementation. This makes GRL a versatile framework for optimizing quantum circuits with applications in hardware-specific optimizations and variational quantum algorithms. The code is available at: https://github.com/Aqasch/Gadget_RL
Authors: Shuijing Liu, Haochen Xia, Fatemeh Cheraghi Pouria, Kaiwen Hong, Neeloy Chakraborty, Zichao Hu, Joydeep Biswas, Katherine Driggs-Campbell
Abstract: We study the problem of robot navigation in dense and interactive crowds with environmental constraints such as corridors and furniture. Previous methods fail to consider all types of interactions among agents and obstacles, leading to unsafe and inefficient robot paths. In this article, we leverage a graph-based representation of crowded and constrained scenarios and propose a structured framework to learn robot navigation policies with deep reinforcement learning. We first split the representations of different components in the environment and propose a heterogeneous spatio-temporal (st) graph to model distinct interactions among humans, robots, and obstacles. Based on the heterogeneous st-graph, we propose HEIGHT, a novel navigation policy network architecture with different components to capture heterogeneous interactions among entities through space and time. HEIGHT utilizes attention mechanisms to prioritize important interactions and a recurrent network to track changes in the dynamic scene over time, encouraging the robot to avoid collisions adaptively. Through extensive simulation and real-world experiments, we demonstrate that HEIGHT outperforms state-of-the-art baselines in terms of success and efficiency in challenging navigation scenarios. Furthermore, we demonstrate that our pipeline achieves better zero-shot generalization capability than previous works when the densities of humans and obstacles change. More videos are available at https://sites.google.com/view/crowdnav-height/home.
Authors: Andrea Bulgarelli, Elia Cellini, Alessandro Nada
Abstract: Non-equilibrium Markov Chain Monte Carlo (NE-MCMC) simulations provide a well-understood framework based on Jarzynski's equality to sample from a target probability distribution. By driving a base probability distribution out of equilibrium, observables are computed without the need to thermalize. If the base distribution is characterized by mild autocorrelations, this approach provides a way to mitigate critical slowing down. Out-of-equilibrium evolutions share the same framework of flow-based approaches and they can be naturally combined into a novel architecture called Stochastic Normalizing Flows (SNFs). In this work we present the first implementation of SNFs for $\mathrm{SU}(3)$ lattice gauge theory in 4 dimensions, defined by introducing gauge-equivariant layers between out-of-equilibrium Monte Carlo updates. The core of our analysis is focused on the promising scaling properties of this architecture with the degrees of freedom of the system, which are directly inherited from NE-MCMC. Finally, we discuss how systematic improvements of this approach can realistically lead to a general and yet efficient sampling strategy at fine lattice spacings for observables affected by long autocorrelation times.
Authors: Bharath Raj, Garvit Suri, Vikrant Dewangan, Raghav Sonavane
Abstract: Traditional greedy tokenization methods have been a critical step in Natural Language Processing (NLP), influencing how text is converted into tokens and directly impacting model performance. While subword tokenizers like Byte-Pair Encoding (BPE) are widely used, questions remain about their optimality across model scales and languages. In this work, we demonstrate through extensive experiments that an optimal BPE configuration significantly reduces token count compared to greedy segmentation, yielding improvements in token-saving percentages and performance benefits, particularly for smaller models. We evaluate tokenization performance across various intrinsic and extrinsic tasks, including generation and classification. Our findings suggest that compression-optimized tokenization strategies could provide substantial advantages for multilingual and low-resource language applications, highlighting a promising direction for further research and inclusive NLP.
Authors: Raanan Y. Rohekar, Yaniv Gurwicz, Sungduk Yu, Estelle Aflalo, Vasudev Lal
Abstract: Do generative pre-trained transformer (GPT) models, trained only to predict the next token, implicitly learn a world model from which a sequence is generated one token at a time? We address this question by deriving a causal interpretation of the attention mechanism in GPT, and suggesting a causal world model that arises from this interpretation. Furthermore, we propose that GPT models, at inference time, can be utilized for zero-shot causal structure learning for input sequences and present a confidence score. Empirical evaluation is conducted in a controlled environment using the setup and rules of the Othello and Chess strategy games. A GPT, pre-trained on real-world games played with the intention of winning, is tested on out-of-distribution synthetic data consisting of sequences of random legal moves. We find that the GPT model is likely to generate legal next moves for out-of-distribution sequences for which a causal structure is encoded in the attention mechanism with high confidence. In cases for which the GPT model generates illegal moves it also fails to capture any causal structure.
Authors: Core Francisco Park, Andrew Lee, Ekdeep Singh Lubana, Yongyi Yang, Maya Okawa, Kento Nishi, Martin Wattenberg, Hidenori Tanaka
Abstract: Recent work has demonstrated that semantics specified by pretraining data influence how representations of different concepts are organized in a large language model (LLM). However, given the open-ended nature of LLMs, e.g., their ability to in-context learn, we can ask whether models alter these pretraining semantics to adopt alternative, context-specified ones. Specifically, if we provide in-context exemplars wherein a concept plays a different role than what the pretraining data suggests, do models reorganize their representations in accordance with these novel semantics? To answer this question, we take inspiration from the theory of conceptual role semantics and define a toy "graph tracing" task wherein the nodes of the graph are referenced via concepts seen during training (e.g., apple, bird, etc.) and the connectivity of the graph is defined via some predefined structure (e.g., a square grid). Given exemplars that indicate traces of random walks on the graph, we analyze intermediate representations of the model and find that as the amount of context is scaled, there is a sudden re-organization from pretrained semantic representations to in-context representations aligned with the graph structure. Further, we find that when reference concepts have correlations in their semantics (e.g., Monday, Tuesday, etc.), the context-specified graph structure is still present in the representations, but is unable to dominate the pretrained structure. To explain these results, we analogize our task to energy minimization for a predefined graph topology, providing evidence towards an implicit optimization process to infer context-specified semantics. Overall, our findings indicate scaling context-size can flexibly re-organize model representations, possibly unlocking novel capabilities.
Authors: Enes Arda, Aylin Yener
Abstract: This paper introduces an information-theoretic framework for text summarization. We define the summarizer rate-distortion function and show that it provides a fundamental lower bound on summarizer performance. We describe an iterative procedure, similar to Blahut-Arimoto algorithm, for computing this function. To handle real-world text datasets, we also propose a practical method that can calculate the summarizer rate-distortion function with limited data. Finally, we empirically confirm our theoretical results by comparing the summarizer rate-distortion function with the performances of different summarizers used in practice.
Authors: Ritwik Vashistha, Arya Farahi
Abstract: As probabilistic models continue to permeate various facets of our society and contribute to scientific advancements, it becomes a necessity to go beyond traditional metrics such as predictive accuracy and error rates and assess their trustworthiness. Grounded in the competence-based theory of trust, this work formalizes I-trustworthy framework -- a novel framework for assessing the trustworthiness of probabilistic classifiers for inference tasks by linking local calibration to trustworthiness. To assess I-trustworthiness, we use the local calibration error (LCE) and develop a method of hypothesis-testing. This method utilizes a kernel-based test statistic, Kernel Local Calibration Error (KLCE), to test local calibration of a probabilistic classifier. This study provides theoretical guarantees by offering convergence bounds for an unbiased estimator of KLCE. Additionally, we present a diagnostic tool designed to identify and measure biases in cases of miscalibration. The effectiveness of the proposed test statistic is demonstrated through its application to both simulated and real-world datasets. Finally, LCE of related recalibration methods is studied, and we provide evidence of insufficiency of existing methods to achieve I-trustworthiness.
Authors: Yilin Wu, Ran Tian, Gokul Swamy, Andrea Bajcsy
Abstract: While generative robot policies have demonstrated significant potential in learning complex, multimodal behaviors from demonstrations, they still exhibit diverse failures at deployment-time. Policy steering offers an elegant solution to reducing the chance of failure by using an external verifier to select from low-level actions proposed by an imperfect generative policy. Here, one might hope to use a Vision Language Model (VLM) as a verifier, leveraging its open-world reasoning capabilities. However, off-the-shelf VLMs struggle to understand the consequences of low-level robot actions as they are represented fundamentally differently than the text and images the VLM was trained on. In response, we propose FOREWARN, a novel framework to unlock the potential of VLMs as open-vocabulary verifiers for runtime policy steering. Our key idea is to decouple the VLM's burden of predicting action outcomes (foresight) from evaluation (forethought). For foresight, we leverage a latent world model to imagine future latent states given diverse low-level action plans. For forethought, we align the VLM with these predicted latent states to reason about the consequences of actions in its native representation--natural language--and effectively filter proposed plans. We validate our framework across diverse robotic manipulation tasks, demonstrating its ability to bridge representational gaps and provide robust, generalizable policy steering. Videos can be found on the project website: https://yilin-wu98.github.io/forewarn/.
Authors: Wenhao Zheng, Yixiao Chen, Weitong Zhang, Souvik Kundu, Yun Li, Zhengzhong Liu, Eric P. Xing, Hongyi Wang, Huaxiu Yao
Abstract: Large language models have achieved remarkable success in various tasks but suffer from high computational costs during inference, limiting their deployment in resource-constrained applications. To address this issue, we propose a novel Collaborative Inference with Token-lEvel Routing (CITER) framework that enables efficient collaboration between small and large language models (SLMs \& LLMs) through a token-level routing strategy. Specifically, CITER routes non-critical tokens to an SLM for efficiency and routes critical tokens to an LLM for generalization quality. We formulate router training as a policy optimization, where the router receives rewards based on both the quality of predictions and the inference costs of generation. This allows the router to learn to predict token-level routing scores and make routing decisions based on both the current token and the future impact of its decisions. To further accelerate the reward evaluation process, we introduce a shortcut which significantly reduces the costs of the reward estimation and improving the practicality of our approach. Extensive experiments on five benchmark datasets demonstrate that CITER reduces the inference costs while preserving high-quality generation, offering a promising solution for real-time and resource-constrained applications. Our data and code are available at https://github.com/aiming-lab/CITER.
Authors: Shuanghao Bai, Wanqi Zhou, Pengxiang Ding, Wei Zhao, Donglin Wang, Badong Chen
Abstract: Behavior Cloning (BC) is a widely adopted visual imitation learning method in robot manipulation. Current BC approaches often enhance generalization by leveraging large datasets and incorporating additional visual and textual modalities to capture more diverse information. However, these methods overlook whether the learned representations contain redundant information and lack a solid theoretical foundation to guide the learning process. To address these limitations, we adopt an information-theoretic perspective and introduce mutual information to quantify and mitigate redundancy in latent representations. Building on this, we incorporate the Information Bottleneck (IB) principle into BC, which extends the idea of reducing redundancy by providing a structured framework for compressing irrelevant information while preserving task-relevant features. This work presents the first comprehensive study on redundancy in latent representations across various methods, backbones, and experimental settings, while extending the generalizability of the IB to BC. Extensive experiments and analyses on the CortexBench and LIBERO benchmarks demonstrate significant performance improvements with IB, underscoring the importance of reducing input data redundancy and highlighting its practical value for more practical applications. Project Page: https://baishuanghao.github.io/BC-IB.github.io.
Authors: Xiaole Zhang, Peiyu Zhang, Xiongye Xiao, Shixuan Li, Vasileios Tzoumas, Vijay Gupta, Paul Bogdan
Abstract: Integer-order calculus often falls short in capturing the long-range dependencies and memory effects found in many real-world processes. Fractional calculus addresses these gaps via fractional-order integrals and derivatives, but fractional-order dynamical systems pose substantial challenges in system identification and optimal control due to the lack of standard control methodologies. In this paper, we theoretically derive the optimal control via linear quadratic regulator (LQR) for fractional-order linear time-invariant (FOLTI) systems and develop an end-to-end deep learning framework based on this theoretical foundation. Our approach establishes a rigorous mathematical model, derives analytical solutions, and incorporates deep learning to achieve data-driven optimal control of FOLTI systems. Our key contributions include: (i) proposing an innovative system identification method control strategy for FOLTI systems, (ii) developing the first end-to-end data-driven learning framework, Fractional-Order Learning for Optimal Control (FOLOC), that learns control policies from observed trajectories, and (iii) deriving a theoretical analysis of sample complexity to quantify the number of samples required for accurate optimal control in complex real-world problems. Experimental results indicate that our method accurately approximates fractional-order system behaviors without relying on Gaussian noise assumptions, pointing to promising avenues for advanced optimal control.
Authors: Dinesh Kumar, Amin Ghorbanpour, Kin Yen, Iman Soltani
Abstract: Bathymetry, the study of underwater topography, relies on sonar mapping of submerged structures. These measurements, critical for infrastructure health monitoring, often require expensive instrumentation. The high financial risk associated with sensor damage or vessel loss creates a reluctance to deploy uncrewed surface vessels (USVs) for bathymetry. However, the crewed-boat bathymetry operations, are costly, pose hazards to personnel, and frequently fail to achieve the stable conditions necessary for bathymetry data collection, especially under high currents. Further research is essential to advance autonomous control, navigation, and data processing technologies, with a particular focus on bathymetry. There is a notable lack of accessible hardware platforms that allow for integrated research in both bathymetry-focused autonomous control and navigation, as well as data evaluation and processing. This paper addresses this gap through the design and implementation of two complementary USV systems tailored for uncrewed bathymetry research. This includes a low-cost USV for Navigation And Control research (NAC-USV) and a second, high-end USV equipped with a high-resolution multi-beam sonar and the associated hardware for Bathymetry data quality Evaluation and Post-processing research (BEP-USV). The NAC-USV facilitates the investigation of autonomous, fail-safe navigation and control, emphasizing the stability requirements for high-quality bathymetry data collection while minimizing the risk to equipment. The BEP-USV, which mirrors the NAC-USV hardware, is then used for additional control validation and in-depth exploration of bathymetry data evaluation and post-processing methodologies. We detail the design and implementation of both systems, and open source the design. Furthermore, we demonstrate the system's effectiveness in a range of operational scenarios.
Authors: Ching-Hua Lee, Chouchang Yang, Jaejin Cho, Yashas Malur Saidutta, Rakshith Sharma Srinivasa, Yilin Shen, Hongxia Jin
Abstract: Denoising diffusion probabilistic models (DDPMs) can be utilized for recovering a clean signal from its degraded observation(s) by conditioning the model on the degraded signal. The degraded signals are themselves contaminated versions of the clean signals; due to this correlation, they may encompass certain useful information about the target clean data distribution. However, existing adoption of the standard Gaussian as the prior distribution in turn discards such information, resulting in sub-optimal performance. In this paper, we propose to improve conditional DDPMs for signal restoration by leveraging a more informative prior that is jointly learned with the diffusion model. The proposed framework, called RestoreGrad, seamlessly integrates DDPMs into the variational autoencoder framework and exploits the correlation between the degraded and clean signals to encode a better diffusion prior. On speech and image restoration tasks, we show that RestoreGrad demonstrates faster convergence (5-10 times fewer training steps) to achieve better quality of restored signals over existing DDPM baselines, and improved robustness to using fewer sampling steps in inference time (2-2.5 times fewer), advocating the advantages of leveraging jointly learned prior for efficiency improvements in the diffusion process.
Authors: Md Abrar Jahin, Soudeep Shahriar, M. F. Mridha, Md. Jakir Hossen, Nilanjan Dey
Abstract: Soybean leaf disease detection is critical for agricultural productivity but faces challenges due to visually similar symptoms and limited interpretability in conventional methods. While Convolutional Neural Networks (CNNs) excel in spatial feature extraction, they often neglect inter-image relational dependencies, leading to misclassifications. This paper proposes an interpretable hybrid Sequential CNN-Graph Neural Network (GNN) framework that synergizes MobileNetV2 for localized feature extraction and GraphSAGE for relational modeling. The framework constructs a graph where nodes represent leaf images, with edges defined by cosine similarity-based adjacency matrices and adaptive neighborhood sampling. This design captures fine-grained lesion features and global symptom patterns, addressing inter-class similarity challenges. Cross-modal interpretability is achieved via Grad-CAM and Eigen-CAM visualizations, generating heatmaps to highlight disease-influential regions. Evaluated on a dataset of ten soybean leaf diseases, the model achieves $97.16\%$ accuracy, surpassing standalone CNNs ($\le95.04\%$) and traditional machine learning models ($\le77.05\%$). Ablation studies validate the sequential architecture's superiority over parallel or single-model configurations. With only 2.3 million parameters, the lightweight MobileNetV2-GraphSAGE combination ensures computational efficiency, enabling real-time deployment in resource-constrained environments. The proposed approach bridges the gap between accurate classification and practical applicability, offering a robust, interpretable tool for agricultural diagnostics while advancing CNN-GNN integration in plant pathology research.
Authors: Yufei Wang, Ziyu Wang, Mino Nakura, Pratik Bhowal, Chia-Liang Kuo, Yi-Ting Chen, Zackory Erickson, David Held
Abstract: This paper presents ArticuBot, in which a single learned policy enables a robotics system to open diverse categories of unseen articulated objects in the real world. This task has long been challenging for robotics due to the large variations in the geometry, size, and articulation types of such objects. Our system, Articubot, consists of three parts: generating a large number of demonstrations in physics-based simulation, distilling all generated demonstrations into a point cloud-based neural policy via imitation learning, and performing zero-shot sim2real transfer to real robotics systems. Utilizing sampling-based grasping and motion planning, our demonstration generalization pipeline is fast and effective, generating a total of 42.3k demonstrations over 322 training articulated objects. For policy learning, we propose a novel hierarchical policy representation, in which the high-level policy learns the sub-goal for the end-effector, and the low-level policy learns how to move the end-effector conditioned on the predicted goal. We demonstrate that this hierarchical approach achieves much better object-level generalization compared to the non-hierarchical version. We further propose a novel weighted displacement model for the high-level policy that grounds the prediction into the existing 3D structure of the scene, outperforming alternative policy representations. We show that our learned policy can zero-shot transfer to three different real robot settings: a fixed table-top Franka arm across two different labs, and an X-Arm on a mobile base, opening multiple unseen articulated objects across two labs, real lounges, and kitchens. Videos and code can be found on our project website: https://articubot.github.io/.
Authors: Daniele Malpetti, Marco Scutari, Francesco Gualdi, Jessica van Setten, Sander van der Laan, Saskia Haitjema, Aaron Mark Lee, Isabelle Hering, Francesca Mangili
Abstract: Federated learning leverages data across institutions to improve clinical discovery while complying with data-sharing restrictions and protecting patient privacy. As the evolution of biobanks in genetics and systems biology has proved, accessing more extensive and varied data pools leads to a faster and more robust exploration and translation of results. More widespread use of federated learning may have the same impact in bioinformatics, allowing access to many combinations of genotypic, phenotypic and environmental information that are undercovered or not included in existing biobanks. This paper reviews the methodological, infrastructural and legal issues that academic and clinical institutions must address before implementing it. Finally, we provide recommendations for the reliable use of federated learning and its effective translation into clinical practice.
Authors: Anush Lakshman Sivaraman, Kojo Adu-Gyamfi, Ibne Farabi Shihab, Anuj Sharma
Abstract: Adverse weather conditions challenge safe transportation, necessitating robust real-time weather detection from traffic camera imagery. We propose a novel framework combining CycleGAN-based domain adaptation with efficient contrastive learning to enhance weather classification, particularly in low-light nighttime conditions. Our approach leverages the lightweight SigLIP-2 model, which employs pairwise sigmoid loss to reduce computational demands, integrated with CycleGAN to transform nighttime images into day-like representations while preserving weather cues. Evaluated on an Iowa Department of Transportation dataset, the baseline EVA-02 model with CLIP achieves a per-class overall accuracy of 96.55\% across three weather conditions (No Precipitation, Rain, Snow) and a day/night overall accuracy of 96.55\%, but shows a significant day-night gap (97.21\% day vs.\ 63.40\% night). With CycleGAN, EVA-02 improves to 97.01\% per-class accuracy and 96.85\% day/night accuracy, boosting nighttime performance to 82.45\%. Our Vision-SigLIP-2 + Text-SigLIP-2 + CycleGAN + Contrastive configuration excels in nighttime scenarios, achieving the highest nighttime accuracy of 85.90\%, with 94.00\% per-class accuracy and 93.35\% day/night accuracy. This model reduces training time by 89\% (from 6 hours to 40 minutes) and inference time by 80\% (from 15 seconds to 3 seconds) compared to EVA-02. By narrowing the day-night performance gap from 33.81 to 8.90 percentage points, our framework provides a scalable, efficient solution for all-weather classification using existing camera infrastructure.
Authors: Tyler Chen, Archan Ray, Akshay Seshadri, Dylan Herman, Bao Bach, Pranav Deshpande, Abhishek Som, Niraj Kumar, Marco Pistoia
Abstract: The $k$-means algorithm (Lloyd's algorithm) is a widely used method for clustering unlabeled data. A key bottleneck of the $k$-means algorithm is that each iteration requires time linear in the number of data points, which can be expensive in big data applications. This was improved in recent works proposing quantum and quantum-inspired classical algorithms to approximate the $k$-means algorithm locally, in time depending only logarithmically on the number of data points (along with data dependent parameters) [$q$-means: A quantum algorithm for unsupervised machine learning; Kerenidis, Landman, Luongo, and Prakash, NeurIPS 2019; Do you know what $q$-means?, Doriguello, Luongo, Tang]. In this work, we describe a simple randomized mini-batch $k$-means algorithm and a quantum algorithm inspired by the classical algorithm. We prove worse-case guarantees that significantly improve upon the bounds for previous algorithms. Our improvements are due to a careful use of uniform sampling, which preserves certain symmetries of the $k$-means problem that are not preserved in previous algorithms that use data norm-based sampling.
Authors: Rui Xin, Niloofar Mireshghallah, Shuyue Stella Li, Michael Duan, Hyunwoo Kim, Yejin Choi, Yulia Tsvetkov, Sewoong Oh, Pang Wei Koh
Abstract: Sanitizing sensitive text data typically involves removing personally identifiable information (PII) or generating synthetic data under the assumption that these methods adequately protect privacy; however, their effectiveness is often only assessed by measuring the leakage of explicit identifiers but ignoring nuanced textual markers that can lead to re-identification. We challenge the above illusion of privacy by proposing a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release. Our approach shows that seemingly innocuous auxiliary information -- such as routine social activities -- can be used to infer sensitive attributes like age or substance use history from sanitized data. For instance, we demonstrate that Azure's commercial PII removal tool fails to protect 74\% of information in the MedQA dataset. Although differential privacy mitigates these risks to some extent, it significantly reduces the utility of the sanitized text for downstream tasks. Our findings indicate that current sanitization techniques offer a \textit{false sense of privacy}, highlighting the need for more robust methods that protect against semantic-level information leakage.
Authors: Tygo Bloem, Filip Ilievski
Abstract: Meme clustering is critical for toxicity detection, virality modeling, and typing, but it has received little attention in previous research. Clustering similar Internet memes is challenging due to their multimodality, cultural context, and adaptability. Existing approaches rely on databases, overlook semantics, and struggle to handle diverse dimensions of similarity. This paper introduces a novel method that uses template-based matching with multi-dimensional similarity features, thus eliminating the need for predefined databases and supporting adaptive matching. Memes are clustered using local and global features across similarity categories such as form, visual content, text, and identity. Our combined approach outperforms existing clustering methods, producing more consistent and coherent clusters, while similarity-based feature sets enable adaptability and align with human intuition. We make all supporting code publicly available to support subsequent research.
Authors: Liu Junchi, Tang Ying, Tretiak Sergei, Duan Wenhui, Zhou Liujiang
Abstract: Recent advances in machine learning have demonstrated an enormous utility of deep learning approaches, particularly Graph Neural Networks (GNNs) for materials science. These methods have emerged as powerful tools for high-throughput prediction of material properties, offering a compelling enhancement and alternative to traditional first-principles calculations. While the community has predominantly focused on developing increasingly complex and universal models to enhance predictive accuracy, such approaches often lack physical interpretability and insights into materials behavior. Here, we introduce a novel computational paradigm, Self-Adaptable Graph Attention Networks integrated with Symbolic Regression (SA-GAT-SR), that synergistically combines the predictive capability of GNNs with the interpretative power of symbolic regression. Our framework employs a self-adaptable encoding algorithm that automatically identifies and adjust attention weights so as to screen critical features from an expansive 180-dimensional feature space while maintaining O(n) computational scaling. The integrated SR module subsequently distills these features into compact analytical expressions that explicitly reveal quantum-mechanically meaningful relationships, achieving 23 times acceleration compared to conventional SR implementations that heavily rely on first principle calculations-derived features as input. This work suggests a new framework in computational materials science, bridging the gap between predictive accuracy and physical interpretability, offering valuable physical insights into material behavior.