new Why does Knowledge Distillation Work? Rethink its Attention and Fidelity Mechanism

Authors: Chenqi Guo, Shiwei Zhong, Xiaofeng Liu, Qianli Feng, Yinglong Ma

Abstract: Does Knowledge Distillation (KD) really work? Conventional wisdom viewed it as a knowledge transfer procedure where a perfect mimicry of the student to its teacher is desired. However, paradoxical studies indicate that closely replicating the teacher's behavior does not consistently improve student generalization, posing questions on its possible causes. Confronted with this gap, we hypothesize that diverse attentions in teachers contribute to better student generalization at the expense of reduced fidelity in ensemble KD setups. By increasing data augmentation strengths, our key findings reveal a decrease in the Intersection over Union (IoU) of attentions between teacher models, leading to reduced student overfitting and decreased fidelity. We propose this low-fidelity phenomenon as an underlying characteristic rather than a pathology when training KD. This suggests that stronger data augmentation fosters a broader perspective provided by the divergent teacher ensemble and lower student-teacher mutual information, benefiting generalization performance. These insights clarify the mechanism on low-fidelity phenomenon in KD. Thus, we offer new perspectives on optimizing student model performance, by emphasizing increased diversity in teacher attentions and reduced mimicry behavior between teachers and student.

new On the weight dynamics of learning networks

Authors: Nahal Sharafi, Christoph Martin, Sarah Hallerberg

Abstract: Neural networks have become a widely adopted tool for tackling a variety of problems in machine learning and artificial intelligence. In this contribution we use the mathematical framework of local stability analysis to gain a deeper understanding of the learning dynamics of feed forward neural networks. Therefore, we derive equations for the tangent operator of the learning dynamics of three-layer networks learning regression tasks. The results are valid for an arbitrary numbers of nodes and arbitrary choices of activation functions. Applying the results to a network learning a regression task, we investigate numerically, how stability indicators relate to the final training-loss. Although the specific results vary with different choices of initial conditions and activation functions, we demonstrate that it is possible to predict the final training loss, by monitoring finite-time Lyapunov exponents or covariant Lyapunov vectors during the training process.

new Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning

Authors: Calarina Muslimani, Matthew E. Taylor

Abstract: To create useful reinforcement learning (RL) agents, step zero is to design a suitable reward function that captures the nuances of the task. However, reward engineering can be a difficult and time-consuming process. Instead, human-in-the-loop (HitL) RL allows agents to learn reward functions from human feedback. Despite recent successes, many of the HitL RL methods still require numerous human interactions to learn successful reward functions. To improve the feedback efficiency of HitL RL methods (i.e., require less feedback), this paper introduces Sub-optimal Data Pre-training, SDP, an approach that leverages reward-free, sub-optimal data to improve scalar- and preference-based HitL RL algorithms. In SDP, we start by pseudo-labeling all low-quality data with rewards of zero. Through this process, we obtain free reward labels to pre-train our reward model. This pre-training phase provides the reward model a head start in learning, whereby it can identify that low-quality transitions should have a low reward, all without any actual feedback. Through extensive experiments with a simulated teacher, we demonstrate that SDP can significantly improve or achieve competitive performance with state-of-the-art (SOTA) HitL RL algorithms across nine robotic manipulation and locomotion tasks.

new Soft Preference Optimization: Aligning Language Models to Expert Distributions

Authors: Arsalan Sharifnassab, Sina Ghiassian, Saber Salehkaleybar, Surya Kanoria, Dale Schuurmans

Abstract: We propose Soft Preference Optimization (SPO), a method for aligning generative models, such as Large Language Models (LLMs), with human preferences, without the need for a reward model. SPO optimizes model outputs directly over a preference dataset through a natural loss function that integrates preference loss with a regularization term across the model's entire output distribution rather than limiting it to the preference dataset. Although SPO does not require the assumption of an existing underlying reward model, we demonstrate that, under the Bradley-Terry (BT) model assumption, it converges to a softmax of scaled rewards, with the distribution's "softness" adjustable via the softmax exponent, an algorithm parameter. We showcase SPO's methodology, its theoretical foundation, and its comparative advantages in simplicity, computational efficiency, and alignment precision.

new Error Exponent in Agnostic PAC Learning

Authors: Adi Hendel, Meir Feder

Abstract: Statistical learning theory and the Probably Approximately Correct (PAC) criterion are the common approach to mathematical learning theory. PAC is widely used to analyze learning problems and algorithms, and have been studied thoroughly. Uniform worst case bounds on the convergence rate have been well established using, e.g., VC theory or Radamacher complexity. However, in a typical scenario the performance could be much better. In this paper, we consider PAC learning using a somewhat different tradeoff, the error exponent - a well established analysis method in Information Theory - which describes the exponential behavior of the probability that the risk will exceed a certain threshold as function of the sample size. We focus on binary classification and find, under some stability assumptions, an improved distribution dependent error exponent for a wide range of problems, establishing the exponential behavior of the PAC error probability in agnostic learning. Interestingly, under these assumptions, agnostic learning may have the same error exponent as realizable learning. The error exponent criterion can be applied to analyze knowledge distillation, a problem that so far lacks a theoretical analysis.

new ICU Bloodstream Infection Prediction: A Transformer-Based Approach for EHR Analysis

Authors: Ortal Hirszowicz, Dvir Aran

Abstract: We introduce RatchetEHR, a novel transformer-based framework designed for the predictive analysis of electronic health records (EHR) data in intensive care unit (ICU) settings, with a specific focus on bloodstream infection (BSI) prediction. Leveraging the MIMIC-IV dataset, RatchetEHR demonstrates superior predictive performance compared to other methods, including RNN, LSTM, and XGBoost, particularly due to its advanced handling of sequential and temporal EHR data. A key innovation in RatchetEHR is the integration of the Graph Convolutional Transformer (GCT) component, which significantly enhances the ability to identify hidden structural relationships within EHR data, resulting in more accurate clinical predictions. Through SHAP value analysis, we provide insights into influential features for BSI prediction. RatchetEHR integrates multiple advancements in deep learning which together provide accurate predictions even with a relatively small sample size and highly imbalanced dataset. This study contributes to medical informatics by showcasing the application of advanced AI techniques in healthcare and sets a foundation for further research to optimize these capabilities in EHR data analysis.

new Locality Regularized Reconstruction: Structured Sparsity and Delaunay Triangulations

Authors: Marshall Mueller, James M. Murphy, Abiy Tasissa

Abstract: Linear representation learning is widely studied due to its conceptual simplicity and empirical utility in tasks such as compression, classification, and feature extraction. Given a set of points $[\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_n] = \mathbf{X} \in \mathbb{R}^{d \times n}$ and a vector $\mathbf{y} \in \mathbb{R}^d$, the goal is to find coefficients $\mathbf{w} \in \mathbb{R}^n$ so that $\mathbf{X} \mathbf{w} \approx \mathbf{y}$, subject to some desired structure on $\mathbf{w}$. In this work we seek $\mathbf{w}$ that forms a local reconstruction of $\mathbf{y}$ by solving a regularized least squares regression problem. We obtain local solutions through a locality function that promotes the use of columns of $\mathbf{X}$ that are close to $\mathbf{y}$ when used as a regularization term. We prove that, for all levels of regularization and under a mild condition that the columns of $\mathbf{X}$ have a unique Delaunay triangulation, the optimal coefficients' number of non-zero entries is upper bounded by $d+1$, thereby providing local sparse solutions when $d \ll n$. Under the same condition we also show that for any $\mathbf{y}$ contained in the convex hull of $\mathbf{X}$ there exists a regime of regularization parameter such that the optimal coefficients are supported on the vertices of the Delaunay simplex containing $\mathbf{y}$. This provides an interpretation of the sparsity as having structure obtained implicitly from the Delaunay triangulation of $\mathbf{X}$. We demonstrate that our locality regularized problem can be solved in comparable time to other methods that identify the containing Delaunay simplex.

new Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning

Authors: Seyed Mahmoud Sajjadi Mohammadabadi, Lei Yang, Feng Yan, Junshan Zhang

Abstract: Decentralized Multi-agent Learning (DML) enables collaborative model training while preserving data privacy. However, inherent heterogeneity in agents' resources (computation, communication, and task size) may lead to substantial variations in training time. This heterogeneity creates a bottleneck, lengthening the overall training time due to straggler effects and potentially wasting spare resources of faster agents. To minimize training time in heterogeneous environments, we present a Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning (ComDML), which balances the workload among agents through a decentralized approach. Leveraging local-loss split training, ComDML enables parallel updates, where slower agents offload part of their workload to faster agents. To minimize the overall training time, ComDML optimizes the workload balancing by jointly considering the communication and computation capacities of agents, which hinges upon integer programming. A dynamic decentralized pairing scheduler is developed to efficiently pair agents and determine optimal offloading amounts. We prove that in ComDML, both slower and faster agents' models converge, for convex and non-convex functions. Furthermore, extensive experimental results on popular datasets (CIFAR-10, CIFAR-100, and CINIC-10) and their non-I.I.D. variants, with large models such as ResNet-56 and ResNet-110, demonstrate that ComDML can significantly reduce the overall training time while maintaining model accuracy, compared to state-of-the-art methods. ComDML demonstrates robustness in heterogeneous environments, and privacy measures can be seamlessly integrated for enhanced data protection.

new Efficient Algorithms for Learning Monophonic Halfspaces in Graphs

Authors: Marco Bressan, Emmanuel Esposito, Maximilian Thiessen

Abstract: We study the problem of learning a binary classifier on the vertices of a graph. In particular, we consider classifiers given by monophonic halfspaces, partitions of the vertices that are convex in a certain abstract sense. Monophonic halfspaces, and related notions such as geodesic halfspaces,have recently attracted interest, and several connections have been drawn between their properties(e.g., their VC dimension) and the structure of the underlying graph $G$. We prove several novel results for learning monophonic halfspaces in the supervised, online, and active settings. Our main result is that a monophonic halfspace can be learned with near-optimal passive sample complexity in time polynomial in $n = |V(G)|$. This requires us to devise a polynomial-time algorithm for consistent hypothesis checking, based on several structural insights on monophonic halfspaces and on a reduction to $2$-satisfiability. We prove similar results for the online and active settings. We also show that the concept class can be enumerated with delay $\operatorname{poly}(n)$, and that empirical risk minimization can be performed in time $2^{\omega(G)}\operatorname{poly}(n)$ where $\omega(G)$ is the clique number of $G$. These results answer open questions from the literature (Gonz\'alez et al., 2020), and show a contrast with geodesic halfspaces, for which some of the said problems are NP-hard (Seiffarth et al., 2023).

new Markov flow policy -- deep MC

Authors: Nitsan Soffair, Gilad Katz

Abstract: Discounted algorithms often encounter evaluation errors due to their reliance on short-term estimations, which can impede their efficacy in addressing simple, short-term tasks and impose undesired temporal discounts (\(\gamma\)). Interestingly, these algorithms are often tested without applying a discount, a phenomenon we refer as the \textit{train-test bias}. In response to these challenges, we propose the Markov Flow Policy, which utilizes a non-negative neural network flow to enable comprehensive forward-view predictions. Through integration into the TD7 codebase and evaluation using the MuJoCo benchmark, we observe significant performance improvements, positioning MFP as a straightforward, practical, and easily implementable solution within the domain of average rewards algorithms.

new Machine Learning Techniques for Data Reduction of Climate Applications

Authors: Xiao Li, Qian Gong, Jaemoon Lee, Scott Klasky, Anand Rangarajan, Sanjay Ranka

Abstract: Scientists conduct large-scale simulations to compute derived quantities-of-interest (QoI) from primary data. Often, QoI are linked to specific features, regions, or time intervals, such that data can be adaptively reduced without compromising the integrity of QoI. For many spatiotemporal applications, these QoI are binary in nature and represent presence or absence of a physical phenomenon. We present a pipelined compression approach that first uses neural-network-based techniques to derive regions where QoI are highly likely to be present. Then, we employ a Guaranteed Autoencoder (GAE) to compress data with differential error bounds. GAE uses QoI information to apply low-error compression to only these regions. This results in overall high compression ratios while still achieving downstream goals of simulation or data collections. Experimental results are presented for climate data generated from the E3SM Simulation model for downstream quantities such as tropical cyclone and atmospheric river detection and tracking. These results show that our approach is superior to comparable methods in the literature.

new WHALE-FL: Wireless and Heterogeneity Aware Latency Efficient Federated Learning over Mobile Devices via Adaptive Subnetwork Scheduling

Authors: Huai-an Su, Jiaxiang Geng, Liang Li, Xiaoqi Qin, Yanzhao Hou, Xin Fu, Miao Pan

Abstract: As a popular distributed learning paradigm, federated learning (FL) over mobile devices fosters numerous applications, while their practical deployment is hindered by participating devices' computing and communication heterogeneity. Some pioneering research efforts proposed to extract subnetworks from the global model, and assign as large a subnetwork as possible to the device for local training based on its full computing and communications capacity. Although such fixed size subnetwork assignment enables FL training over heterogeneous mobile devices, it is unaware of (i) the dynamic changes of devices' communication and computing conditions and (ii) FL training progress and its dynamic requirements of local training contributions, both of which may cause very long FL training delay. Motivated by those dynamics, in this paper, we develop a wireless and heterogeneity aware latency efficient FL (WHALE-FL) approach to accelerate FL training through adaptive subnetwork scheduling. Instead of sticking to the fixed size subnetwork, WHALE-FL introduces a novel subnetwork selection utility function to capture device and FL training dynamics, and guides the mobile device to adaptively select the subnetwork size for local training based on (a) its computing and communication capacity, (b) its dynamic computing and/or communication conditions, and (c) FL training status and its corresponding requirements for local training contributions. Our evaluation shows that, compared with peer designs, WHALE-FL effectively accelerates FL training without sacrificing learning accuracy.

new MESA: Cooperative Meta-Exploration in Multi-Agent Learning through Exploiting State-Action Space Structure

Authors: Zhicheng Zhang, Yancheng Liang, Yi Wu, Fei Fang

Abstract: Multi-agent reinforcement learning (MARL) algorithms often struggle to find strategies close to Pareto optimal Nash Equilibrium, owing largely to the lack of efficient exploration. The problem is exacerbated in sparse-reward settings, caused by the larger variance exhibited in policy learning. This paper introduces MESA, a novel meta-exploration method for cooperative multi-agent learning. It learns to explore by first identifying the agents' high-rewarding joint state-action subspace from training tasks and then learning a set of diverse exploration policies to "cover" the subspace. These trained exploration policies can be integrated with any off-policy MARL algorithm for test-time tasks. We first showcase MESA's advantage in a multi-step matrix game. Furthermore, experiments show that with learned exploration policies, MESA achieves significantly better performance in sparse-reward tasks in several multi-agent particle environments and multi-agent MuJoCo environments, and exhibits the ability to generalize to more challenging tasks at test time.

new Quantum Federated Learning Experiments in the Cloud with Data Encoding

Authors: Shiva Raj Pokhrel, Naman Yash, Jonathan Kua, Gang Li, Lei Pan

Abstract: Quantum Federated Learning (QFL) is an emerging concept that aims to unfold federated learning (FL) over quantum networks, enabling collaborative quantum model training along with local data privacy. We explore the challenges of deploying QFL on cloud platforms, emphasizing quantum intricacies and platform limitations. The proposed data-encoding-driven QFL, with a proof of concept (GitHub Open Source) using genomic data sets on quantum simulators, shows promising results.

new De-Biasing Models of Biased Decisions: A Comparison of Methods Using Mortgage Application Data

Authors: Nicholas Tenev

Abstract: Prediction models can improve efficiency by automating decisions such as the approval of loan applications. However, they may inherit bias against protected groups from the data they are trained on. This paper adds counterfactual (simulated) ethnic bias to real data on mortgage application decisions, and shows that this bias is replicated by a machine learning model (XGBoost) even when ethnicity is not used as a predictive variable. Next, several other de-biasing methods are compared: averaging over prohibited variables, taking the most favorable prediction over prohibited variables (a novel method), and jointly minimizing errors as well as the association between predictions and prohibited variables. De-biasing can recover some of the original decisions, but the results are sensitive to whether the bias is effected through a proxy.

new MTDT: A Multi-Task Deep Learning Digital Twin

Authors: Nooshin Yousefzadeh, Rahul Sengupta, Yashaswi Karnati, Anand Rangarajan, Sanjay Ranka

Abstract: Traffic congestion has significant impacts on both the economy and the environment. Measures of Effectiveness (MOEs) have long been the standard for evaluating the level of service and operational efficiency of traffic intersections. However, the scarcity of traditional high-resolution loop detector data (ATSPM) presents challenges in accurately measuring MOEs or capturing the intricate temporospatial characteristics inherent in urban intersection traffic. In response to this challenge, we have introduced the Multi-Task Deep Learning Digital Twin (MTDT) as a solution for multifaceted and precise intersection traffic flow simulation. MTDT enables accurate, fine-grained estimation of loop detector waveform time series for each lane of movement, alongside successful estimation of several MOEs for each lane group associated with a traffic phase concurrently and for all approaches of an arbitrary urban intersection. Unlike existing deep learning methodologies, MTDT distinguishes itself through its adaptability to local temporal and spatial features, such as signal timing plans, intersection topology, driving behaviors, and turning movement counts. While maintaining a straightforward design, our model emphasizes the advantages of multi-task learning in traffic modeling. By consolidating the learning process across multiple tasks, MTDT demonstrates reduced overfitting, increased efficiency, and enhanced effectiveness by sharing representations learned by different tasks. Furthermore, our approach facilitates sequential computation and lends itself to complete parallelization through GPU implementation. This not only streamlines the computational process but also enhances scalability and performance.

new New bounds on the cohesion of complete-link and other linkage methods for agglomeration clustering

Authors: Sanjoy Dasgupta, Eduardo Laber

Abstract: Linkage methods are among the most popular algorithms for hierarchical clustering. Despite their relevance the current knowledge regarding the quality of the clustering produced by these methods is limited. Here, we improve the currently available bounds on the maximum diameter of the clustering obtained by complete-link for metric spaces. One of our new bounds, in contrast to the existing ones, allows us to separate complete-link from single-link in terms of approximation for the diameter, which corroborates the common perception that the former is more suitable than the latter when the goal is producing compact clusters. We also show that our techniques can be employed to derive upper bounds on the cohesion of a class of linkage methods that includes the quite popular average-link.

new SparseTSF: Modeling Long-term Time Series Forecasting with 1k Parameters

Authors: Shengsheng Lin, Weiwei Lin, Wentai Wu, Haojun Chen, Junjie Yang

Abstract: This paper introduces SparseTSF, a novel, extremely lightweight model for Long-term Time Series Forecasting (LTSF), designed to address the challenges of modeling complex temporal dependencies over extended horizons with minimal computational resources. At the heart of SparseTSF lies the Cross-Period Sparse Forecasting technique, which simplifies the forecasting task by decoupling the periodicity and trend in time series data. This technique involves downsampling the original sequences to focus on cross-period trend prediction, effectively extracting periodic features while minimizing the model's complexity and parameter count. Based on this technique, the SparseTSF model uses fewer than 1k parameters to achieve competitive or superior performance compared to state-of-the-art models. Furthermore, SparseTSF showcases remarkable generalization capabilities, making it well-suited for scenarios with limited computational resources, small samples, or low-quality data. The code is available at: https://github.com/lss-1138/SparseTSF.

URLs: https://github.com/lss-1138/SparseTSF.

new The Role of Model Architecture and Scale in Predicting Molecular Properties: Insights from Fine-Tuning RoBERTa, BART, and LLaMA

Authors: Lee Youngmin, Lang S. I. D. Andrew, Cai Duoduo, Wheat R. Stephen

Abstract: This study introduces a systematic framework to compare the efficacy of Large Language Models (LLMs) for fine-tuning across various cheminformatics tasks. Employing a uniform training methodology, we assessed three well-known models-RoBERTa, BART, and LLaMA-on their ability to predict molecular properties using the Simplified Molecular Input Line Entry System (SMILES) as a universal molecular representation format. Our comparative analysis involved pre-training 18 configurations of these models, with varying parameter sizes and dataset scales, followed by fine-tuning them on six benchmarking tasks from DeepChem. We maintained consistent training environments across models to ensure reliable comparisons. This approach allowed us to assess the influence of model type, size, and training dataset size on model performance. Specifically, we found that LLaMA-based models generally offered the lowest validation loss, suggesting their superior adaptability across tasks and scales. However, we observed that absolute validation loss is not a definitive indicator of model performance - contradicts previous research - at least for fine-tuning tasks: instead, model size plays a crucial role. Through rigorous replication and validation, involving multiple training and fine-tuning cycles, our study not only delineates the strengths and limitations of each model type but also provides a robust methodology for selecting the most suitable LLM for specific cheminformatics applications. This research underscores the importance of considering model architecture and dataset characteristics in deploying AI for molecular property prediction, paving the way for more informed and effective utilization of AI in drug discovery and related fields.

new Provably Efficient Reinforcement Learning for Adversarial Restless Multi-Armed Bandits with Unknown Transitions and Bandit Feedback

Authors: Guojun Xiong, Jian Li

Abstract: Restless multi-armed bandits (RMAB) play a central role in modeling sequential decision making problems under an instantaneous activation constraint that at most B arms can be activated at any decision epoch. Each restless arm is endowed with a state that evolves independently according to a Markov decision process regardless of being activated or not. In this paper, we consider the task of learning in episodic RMAB with unknown transition functions and adversarial rewards, which can change arbitrarily across episodes. Further, we consider a challenging but natural bandit feedback setting that only adversarial rewards of activated arms are revealed to the decision maker (DM). The goal of the DM is to maximize its total adversarial rewards during the learning process while the instantaneous activation constraint must be satisfied in each decision epoch. We develop a novel reinforcement learning algorithm with two key contributors: a novel biased adversarial reward estimator to deal with bandit feedback and unknown transitions, and a low-complexity index policy to satisfy the instantaneous activation constraint. We show $\tilde{\mathcal{O}}(H\sqrt{T})$ regret bound for our algorithm, where $T$ is the number of episodes and $H$ is the episode length. To our best knowledge, this is the first algorithm to ensure $\tilde{\mathcal{O}}(\sqrt{T})$ regret for adversarial RMAB in our considered challenging settings.

new Recovering Labels from Local Updates in Federated Learning

Authors: Huancheng Chen, Haris Vikalo

Abstract: Gradient inversion (GI) attacks present a threat to the privacy of clients in federated learning (FL) by aiming to enable reconstruction of the clients' data from communicated model updates. A number of such techniques attempts to accelerate data recovery by first reconstructing labels of the samples used in local training. However, existing label extraction methods make strong assumptions that typically do not hold in realistic FL settings. In this paper we present a novel label recovery scheme, Recovering Labels from Local Updates (RLU), which provides near-perfect accuracy when attacking untrained (most vulnerable) models. More significantly, RLU achieves high performance even in realistic real-world settings where the clients in an FL system run multiple local epochs, train on heterogeneous data, and deploy various optimizers to minimize different objective functions. Specifically, RLU estimates labels by solving a least-square problem that emerges from the analysis of the correlation between labels of the data points used in a training round and the resulting update of the output layer. The experimental results on several datasets, architectures, and data heterogeneity scenarios demonstrate that the proposed method consistently outperforms existing baselines, and helps improve quality of the reconstructed images in GI attacks in terms of both PSNR and LPIPS.

new IntraMix: Intra-Class Mixup Generation for Accurate Labels and Neighbors

Authors: Shenghe Zheng, Hongzhi Wang, Xianglong Liu

Abstract: Graph Neural Networks (GNNs) demonstrate excellent performance on graphs, with their core idea about aggregating neighborhood information and learning from labels. However, the prevailing challenges in most graph datasets are twofold of Insufficient High-Quality Labels and Lack of Neighborhoods, resulting in weak GNNs. Existing data augmentation methods designed to address these two issues often tackle only one. They may either require extensive training of generators, rely on overly simplistic strategies, or demand substantial prior knowledge, leading to suboptimal generalization abilities. To simultaneously address both of these two challenges, we propose an elegant method called IntraMix. IntraMix innovatively employs Mixup among low-quality labeled data of the same class, generating high-quality labeled data at minimal cost. Additionally, it establishes neighborhoods for the generated data by connecting them with data from the same class with high confidence, thereby enriching the neighborhoods of graphs. IntraMix efficiently tackles both challenges faced by graphs and challenges the prior notion of the limited effectiveness of Mixup in node classification. IntraMix serves as a universal framework that can be readily applied to all GNNs. Extensive experiments demonstrate the effectiveness of IntraMix across various GNNs and datasets.

new Generative manufacturing systems using diffusion models and ChatGPT

Authors: Xingyu Li, Fei Tao, Wei Ye, Aydin Nassehi, John W. Sutherland

Abstract: In this study, we introduce Generative Manufacturing Systems (GMS) as a novel approach to effectively manage and coordinate autonomous manufacturing assets, thereby enhancing their responsiveness and flexibility to address a wide array of production objectives and human preferences. Deviating from traditional explicit modeling, GMS employs generative AI, including diffusion models and ChatGPT, for implicit learning from envisioned futures, marking a shift from a model-optimum to a training-sampling decision-making. Through the integration of generative AI, GMS enables complex decision-making through interactive dialogue with humans, allowing manufacturing assets to generate multiple high-quality global decisions that can be iteratively refined based on human feedback. Empirical findings showcase GMS's substantial improvement in system resilience and responsiveness to uncertainties, with decision times reduced from seconds to milliseconds. The study underscores the inherent creativity and diversity in the generated solutions, facilitating human-centric decision-making through seamless and continuous human-machine interactions.

new Robust Decentralized Learning with Local Updates and Gradient Tracking

Authors: Sajjad Ghiasvand, Amirhossein Reisizadeh, Mahnoosh Alizadeh, Ramtin Pedarsani

Abstract: As distributed learning applications such as Federated Learning, the Internet of Things (IoT), and Edge Computing grow, it is critical to address the shortcomings of such technologies from a theoretical perspective. As an abstraction, we consider decentralized learning over a network of communicating clients or nodes and tackle two major challenges: data heterogeneity and adversarial robustness. We propose a decentralized minimax optimization method that employs two important modules: local updates and gradient tracking. Minimax optimization is the key tool to enable adversarial training for ensuring robustness. Having local updates is essential in Federated Learning (FL) applications to mitigate the communication bottleneck, and utilizing gradient tracking is essential to proving convergence in the case of data heterogeneity. We analyze the performance of the proposed algorithm, Dec-FedTrack, in the case of nonconvex-strongly concave minimax optimization, and prove that it converges a stationary point. We also conduct numerical experiments to support our theoretical findings.

new FREE: Faster and Better Data-Free Meta-Learning

Authors: Yongxian Wei, Zixuan Hu, Zhenyi Wang, Li Shen, Chun Yuan, Dacheng Tao

Abstract: Data-Free Meta-Learning (DFML) aims to extract knowledge from a collection of pre-trained models without requiring the original data, presenting practical benefits in contexts constrained by data privacy concerns. Current DFML methods primarily focus on the data recovery from these pre-trained models. However, they suffer from slow recovery speed and overlook gaps inherent in heterogeneous pre-trained models. In response to these challenges, we introduce the Faster and Better Data-Free Meta-Learning (FREE) framework, which contains: (i) a meta-generator for rapidly recovering training tasks from pre-trained models; and (ii) a meta-learner for generalizing to new unseen tasks. Specifically, within the module Faster Inversion via Meta-Generator, each pre-trained model is perceived as a distinct task. The meta-generator can rapidly adapt to a specific task in just five steps, significantly accelerating the data recovery. Furthermore, we propose Better Generalization via Meta-Learner and introduce an implicit gradient alignment algorithm to optimize the meta-learner. This is achieved as aligned gradient directions alleviate potential conflicts among tasks from heterogeneous pre-trained models. Empirical experiments on multiple benchmarks affirm the superiority of our approach, marking a notable speed-up (20$\times$) and performance enhancement (1.42\% $\sim$ 4.78\%) in comparison to the state-of-the-art.

new Progressive Feedforward Collapse of ResNet Training

Authors: Sicong Wang, Kuo Gai, Shihua Zhang

Abstract: Neural collapse (NC) is a simple and symmetric phenomenon for deep neural networks (DNNs) at the terminal phase of training, where the last-layer features collapse to their class means and form a simplex equiangular tight frame aligning with the classifier vectors. However, the relationship of the last-layer features to the data and intermediate layers during training remains unexplored. To this end, we characterize the geometry of intermediate layers of ResNet and propose a novel conjecture, progressive feedforward collapse (PFC), claiming the degree of collapse increases during the forward propagation of DNNs. We derive a transparent model for the well-trained ResNet according to that ResNet with weight decay approximates the geodesic curve in Wasserstein space at the terminal phase. The metrics of PFC indeed monotonically decrease across depth on various datasets. We propose a new surrogate model, multilayer unconstrained feature model (MUFM), connecting intermediate layers by an optimal transport regularizer. The optimal solution of MUFM is inconsistent with NC but is more concentrated relative to the input data. Overall, this study extends NC to PFC to model the collapse phenomenon of intermediate layers and its dependence on the input data, shedding light on the theoretical understanding of ResNet in classification problems.

new S$^2$AC: Energy-Based Reinforcement Learning with Stein Soft Actor Critic

Authors: Safa Messaoud, Billel Mokeddem, Zhenghai Xue, Linsey Pang, Bo An, Haipeng Chen, Sanjay Chawla

Abstract: Learning expressive stochastic policies instead of deterministic ones has been proposed to achieve better stability, sample complexity, and robustness. Notably, in Maximum Entropy Reinforcement Learning (MaxEnt RL), the policy is modeled as an expressive Energy-Based Model (EBM) over the Q-values. However, this formulation requires the estimation of the entropy of such EBMs, which is an open problem. To address this, previous MaxEnt RL methods either implicitly estimate the entropy, resulting in high computational complexity and variance (SQL), or follow a variational inference procedure that fits simplified actor distributions (e.g., Gaussian) for tractability (SAC). We propose Stein Soft Actor-Critic (S$^2$AC), a MaxEnt RL algorithm that learns expressive policies without compromising efficiency. Specifically, S$^2$AC uses parameterized Stein Variational Gradient Descent (SVGD) as the underlying policy. We derive a closed-form expression of the entropy of such policies. Our formula is computationally efficient and only depends on first-order derivatives and vector products. Empirical results show that S$^2$AC yields more optimal solutions to the MaxEnt objective than SQL and SAC in the multi-goal environment, and outperforms SAC and SQL on the MuJoCo benchmark. Our code is available at: https://github.com/SafaMessaoud/S2AC-Energy-Based-RL-with-Stein-Soft-Actor-Critic

URLs: https://github.com/SafaMessaoud/S2AC-Energy-Based-RL-with-Stein-Soft-Actor-Critic

new Tackling Graph Oversquashing by Global and Local Non-Dissipativity

Authors: Alessio Gravina, Moshe Eliasof, Claudio Gallicchio, Davide Bacciu, Carola-Bibiane Sch\"onlieb

Abstract: A common problem in Message-Passing Neural Networks is oversquashing -- the limited ability to facilitate effective information flow between distant nodes. Oversquashing is attributed to the exponential decay in information transmission as node distances increase. This paper introduces a novel perspective to address oversquashing, leveraging properties of global and local non-dissipativity, that enable the maintenance of a constant information flow rate. Namely, we present SWAN, a uniquely parameterized model GNN with antisymmetry both in space and weight domains, as a means to obtain non-dissipativity. Our theoretical analysis asserts that by achieving these properties, SWAN offers an enhanced ability to transmit information over extended distances. Empirical evaluations on synthetic and real-world benchmarks that emphasize long-range interactions validate the theoretical understanding of SWAN, and its ability to mitigate oversquashing.

new Efficient and Adaptive Posterior Sampling Algorithms for Bandits

Authors: Bingshan Hu, Zhiming Huang, Tianyue H. Zhang, Mathias L\'ecuyer, Nidhi Hegde

Abstract: We study Thompson Sampling-based algorithms for stochastic bandits with bounded rewards. As the existing problem-dependent regret bound for Thompson Sampling with Gaussian priors [Agrawal and Goyal, 2017] is vacuous when $T \le 288 e^{64}$, we derive a more practical bound that tightens the coefficient of the leading term %from $288 e^{64}$ to $1270$. Additionally, motivated by large-scale real-world applications that require scalability, adaptive computational resource allocation, and a balance in utility and computation, we propose two parameterized Thompson Sampling-based algorithms: Thompson Sampling with Model Aggregation (TS-MA-$\alpha$) and Thompson Sampling with Timestamp Duelling (TS-TD-$\alpha$), where $\alpha \in [0,1]$ controls the trade-off between utility and computation. Both algorithms achieve $O \left(K\ln^{\alpha+1}(T)/\Delta \right)$ regret bound, where $K$ is the number of arms, $T$ is the finite learning horizon, and $\Delta$ denotes the single round performance loss when pulling a sub-optimal arm.

new Non-clairvoyant Scheduling with Partial Predictions

Authors: Ziyad Benomar, Vianney Perchet

Abstract: The non-clairvoyant scheduling problem has gained new interest within learning-augmented algorithms, where the decision-maker is equipped with predictions without any quality guarantees. In practical settings, access to predictions may be reduced to specific instances, due to cost or data limitations. Our investigation focuses on scenarios where predictions for only $B$ job sizes out of $n$ are available to the algorithm. We first establish near-optimal lower bounds and algorithms in the case of perfect predictions. Subsequently, we present a learning-augmented algorithm satisfying the robustness, consistency, and smoothness criteria, and revealing a novel tradeoff between consistency and smoothness inherent in the scenario with a restricted number of predictions.

new The Privacy Power of Correlated Noise in Decentralized Learning

Authors: Youssef Allouah, Anastasia Koloskova, Aymane El Firdoussi, Martin Jaggi, Rachid Guerraoui

Abstract: Decentralized learning is appealing as it enables the scalable usage of large amounts of distributed data and resources (without resorting to any central entity), while promoting privacy since every user minimizes the direct exposure of their data. Yet, without additional precautions, curious users can still leverage models obtained from their peers to violate privacy. In this paper, we propose Decor, a variant of decentralized SGD with differential privacy (DP) guarantees. Essentially, in Decor, users securely exchange randomness seeds in one communication round to generate pairwise-canceling correlated Gaussian noises, which are injected to protect local models at every communication round. We theoretically and empirically show that, for arbitrary connected graphs, Decor matches the central DP optimal privacy-utility trade-off. We do so under SecLDP, our new relaxation of local DP, which protects all user communications against an external eavesdropper and curious users, assuming that every pair of connected users shares a secret, i.e., an information hidden to all others. The main theoretical challenge is to control the accumulation of non-canceling correlated noise due to network sparsity. We also propose a companion SecLDP privacy accountant for public use.

new CrossMPT: Cross-attention Message-Passing Transformer for Error Correcting Codes

Authors: Seong-Joon Park, Hee-Youl Kwak, Sang-Hyo Kim, Yongjune Kim, Jong-Seon No

Abstract: Error correcting codes~(ECCs) are indispensable for reliable transmission in communication systems. The recent advancements in deep learning have catalyzed the exploration of ECC decoders based on neural networks. Among these, transformer-based neural decoders have achieved state-of-the-art decoding performance. In this paper, we propose a novel Cross-attention Message-Passing Transformer~(CrossMPT). CrossMPT iteratively updates two types of input vectors (i.e., magnitude and syndrome vectors) using two masked cross-attention blocks. The mask matrices in these cross-attention blocks are determined by the code's parity-check matrix that delineates the relationship between magnitude and syndrome vectors. Our experimental results show that CrossMPT significantly outperforms existing neural network-based decoders, particularly in decoding low-density parity-check codes. Notably, CrossMPT also achieves a significant reduction in computational complexity, achieving over a 50\% decrease in its attention layers compared to the original transformer-based decoder, while retaining the computational complexity of the remaining layers.

new Efficient and Flexible Method for Reducing Moderate-size Deep Neural Networks with Condensation

Authors: Tianyi Chen, Zhi-Qin John Xu

Abstract: Neural networks have been extensively applied to a variety of tasks, achieving astounding results. Applying neural networks in the scientific field is an important research direction that is gaining increasing attention. In scientific applications, the scale of neural networks is generally moderate-size, mainly to ensure the speed of inference during application. Additionally, comparing neural networks to traditional algorithms in scientific applications is inevitable. These applications often require rapid computations, making the reduction of neural network sizes increasingly important. Existing work has found that the powerful capabilities of neural networks are primarily due to their non-linearity. Theoretical work has discovered that under strong non-linearity, neurons in the same layer tend to behave similarly, a phenomenon known as condensation. Condensation offers an opportunity to reduce the scale of neural networks to a smaller subnetwork with similar performance. In this article, we propose a condensation reduction algorithm to verify the feasibility of this idea in practical problems. Our reduction method can currently be applied to both fully connected networks and convolutional networks, achieving positive results. In complex combustion acceleration tasks, we reduced the size of the neural network to 41.7% of its original scale while maintaining prediction accuracy. In the CIFAR10 image classification task, we reduced the network size to 11.5% of the original scale, still maintaining a satisfactory validation accuracy. Our method can be applied to most trained neural networks, reducing computational pressure and improving inference speed.

new Polynomial Chaos Expanded Gaussian Process

Authors: Dominik Polke, Tim K\"osters, Elmar Ahle, Dirk S\"offker

Abstract: In complex and unknown processes, global models are initially generated over the entire experimental space, but they often fail to provide accurate predictions in local areas. Recognizing this limitation, this study addresses the need for models that effectively represent both global and local experimental spaces. It introduces a novel machine learning (ML) approach: Polynomial Chaos Expanded Gaussian Process (PCEGP), leveraging polynomial chaos expansion (PCE) to calculate input-dependent hyperparameters of the Gaussian process (GP). This approach provides a mathematically interpretable method that incorporates non-stationary covariance functions and heteroscedastic noise estimation to generate locally adapted models. The model performance is compared to different algorithms in benchmark tests for regression tasks. The results demonstrate low prediction errors of the PCEGP in these benchmark applications, highlighting model performance that is often competitive with or superior to previous methods. A key advantage of the presented model is the transparency and traceability in the calculation of hyperparameters and model predictions.

new Explicitly Modeling Generality into Self-Supervised Learning

Authors: Jingyao Wang, Wenwen Qiang, Changwen Zheng

Abstract: The goal of generality in machine learning is to achieve excellent performance on various unseen tasks and domains. Recently, self-supervised learning (SSL) has been regarded as an effective method to achieve this goal. It can learn high-quality representations from unlabeled data and achieve promising empirical performance on multiple downstream tasks. Existing SSL methods mainly constrain generality from two aspects: (i) large-scale training data, and (ii) learning task-level shared knowledge. However, these methods lack explicit modeling of the SSL generality in the learning objective, and the theoretical understanding of SSL's generality remains limited. This may cause SSL models to overfit in data-scarce situations and generalize poorly in the real world, making it difficult to achieve true generality. To address these issues, we provide a theoretical definition of generality in SSL and define a $\sigma$-measurement to help quantify it. Based on this insight, we explicitly model generality into self-supervised learning and further propose a novel SSL framework, called GeSSL. It introduces a self-motivated target based on $\sigma$-measurement, which enables the model to find the optimal update direction towards generality. Extensive theoretical and empirical evaluations demonstrate the superior performance of the proposed GeSSL.

new Leverage Multi-source Traffic Demand Data Fusion with Transformer Model for Urban Parking Prediction

Authors: Yin Huang, Yongqi Dong, Youhua Tang, Li Li

Abstract: The escalation in urban private car ownership has worsened the urban parking predicament, necessitating effective parking availability prediction for urban planning and management. However, the existing prediction methods suffer from low prediction accuracy with the lack of spatial-temporal correlation features related to parking volume, and neglect of flow patterns and correlations between similar parking lots within certain areas. To address these challenges, this study proposes a parking availability prediction framework integrating spatial-temporal deep learning with multi-source data fusion, encompassing traffic demand data from multiple sources (e.g., metro, bus, taxi services), and parking lot data. The framework is based on the Transformer as the spatial-temporal deep learning model and leverages K-means clustering to establish parking cluster zones, extracting and integrating traffic demand characteristics from various transportation modes (i.e., metro, bus, online ride-hailing, and taxi) connected to parking lots. Real-world empirical data was used to verify the effectiveness of the proposed method compared with different machine learning, deep learning, and traditional statistical models for predicting parking availability. Experimental results reveal that, with the proposed pipeline, the developed Transformer model outperforms other models in terms of various metrics, e.g., Mean Squared Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). By fusing multi-source demanding data with spatial-temporal deep learning techniques, this approach offers the potential to develop parking availability prediction systems that furnish more accurate and timely information to both drivers and urban planners, thereby fostering more efficient and sustainable urban mobility.

new A text-based, generative deep learning model for soil reflectance spectrum simulation in the VIS-NIR (400-2499 nm) bands

Authors: Tong Lei, Brian N. Bailey

Abstract: Simulating soil reflectance spectra is invaluable for soil-plant radiative modeling and training machine learning models, yet it is difficult as the intricate relationships between soil structure and its constituents. To address this, a fully data-driven soil optics generative model (SOGM) for simulation of soil reflectance spectra based on soil property inputs was developed. The model is trained on an extensive dataset comprising nearly 180,000 soil spectra-property pairs from 17 datasets. It generates soil reflectance spectra from text-based inputs describing soil properties and their values rather than only numerical values and labels in binary vector format. The generative model can simulate output spectra based on an incomplete set of input properties. SOGM is based on the denoising diffusion probabilistic model (DDPM). Two additional sub-models were also built to complement the SOGM: a spectral padding model that can fill in the gaps for spectra shorter than the full visible-near-infrared range (VIS-NIR; 400 to 2499 nm), and a wet soil spectra model that can estimate the effects of water content on soil reflectance spectra given the dry spectrum predicted by the SOGM. The SOGM was up-scaled by coupling with the Helios 3D plant modeling software, which allowed for generation of synthetic aerial images of simulated soil and plant scenes. It can also be easily integrated with soil-plant radiation model used for remote sensin research like PROSAIL. The testing results of the SOGM on new datasets that not included in model training proved that the model can generate reasonable soil reflectance spectra based on available property inputs. The presented models are openly accessible on: https://github.com/GEMINI-Breeding/SOGM_soil_spectra_simulation.

URLs: https://github.com/GEMINI-Breeding/SOGM_soil_spectra_simulation.

new AB-Training: A Communication-Efficient Approach for Distributed Low-Rank Learning

Authors: Daniel Coquelin, Katherina Fl\"ugel, Marie Weiel, Nicholas Kiefer, Muhammed \"Oz, Charlotte Debus, Achim Streit, Markus G\"otz

Abstract: Communication bottlenecks hinder the scalability of distributed neural network training, particularly on distributed-memory computing clusters. To significantly reduce this communication overhead, we introduce AB-training, a novel data-parallel training method that decomposes weight matrices into low-rank representations and utilizes independent group-based training. This approach consistently reduces network traffic by 50% across multiple scaling scenarios, increasing the training potential on communication-constrained systems. Our method exhibits regularization effects at smaller scales, leading to improved generalization for models like VGG16, while achieving a remarkable 44.14 : 1 compression ratio during training on CIFAR-10 and maintaining competitive accuracy. Albeit promising, our experiments reveal that large batch effects remain a challenge even in low-rank training regimes.

new Poisoning Attacks on Federated Learning for Autonomous Driving

Authors: Sonakshi Garg, Hugo J\"onsson, Gustav Kalander, Axel Nilsson, Bhhaanu Pirange, Viktor Valadi, Johan \"Ostman

Abstract: Federated Learning (FL) is a decentralized learning paradigm, enabling parties to collaboratively train models while keeping their data confidential. Within autonomous driving, it brings the potential of reducing data storage costs, reducing bandwidth requirements, and to accelerate the learning. FL is, however, susceptible to poisoning attacks. In this paper, we introduce two novel poisoning attacks on FL tailored to regression tasks within autonomous driving: FLStealth and Off-Track Attack (OTA). FLStealth, an untargeted attack, aims at providing model updates that deteriorate the global model performance while appearing benign. OTA, on the other hand, is a targeted attack with the objective to change the global model's behavior when exposed to a certain trigger. We demonstrate the effectiveness of our attacks by conducting comprehensive experiments pertaining to the task of vehicle trajectory prediction. In particular, we show that, among five different untargeted attacks, FLStealth is the most successful at bypassing the considered defenses employed by the server. For OTA, we demonstrate the inability of common defense strategies to mitigate the attack, highlighting the critical need for new defensive mechanisms against targeted attacks within FL for autonomous driving.

new Less is More: on the Over-Globalizing Problem in Graph Transformers

Authors: Yujie Xing, Xiao Wang, Yibo Li, Hai Huang, Chuan Shi

Abstract: Graph Transformer, due to its global attention mechanism, has emerged as a new tool in dealing with graph-structured data. It is well recognized that the global attention mechanism considers a wider receptive field in a fully connected graph, leading many to believe that useful information can be extracted from all the nodes. In this paper, we challenge this belief: does the globalizing property always benefit Graph Transformers? We reveal the over-globalizing problem in Graph Transformer by presenting both empirical evidence and theoretical analysis, i.e., the current attention mechanism overly focuses on those distant nodes, while the near nodes, which actually contain most of the useful information, are relatively weakened. Then we propose a novel Bi-Level Global Graph Transformer with Collaborative Training (CoBFormer), including the inter-cluster and intra-cluster Transformers, to prevent the over-globalizing problem while keeping the ability to extract valuable information from distant nodes. Moreover, the collaborative training is proposed to improve the model's generalization ability with a theoretical guarantee. Extensive experiments on various graphs well validate the effectiveness of our proposed CoBFormer.

new Continual Imitation Learning for Prosthetic Limbs

Authors: Sharmita Dey, Benjamin Paassen, Sarath Ravindran Nair, Sabri Boughorbel, Arndt F. Schilling

Abstract: Lower limb amputations and neuromuscular impairments severely restrict mobility, necessitating advancements beyond conventional prosthetics. Motorized bionic limbs offer promise, but their utility depends on mimicking the evolving synergy of human movement in various settings. In this context, we present a novel model for bionic prostheses' application that leverages camera-based motion capture and wearable sensor data, to learn the synergistic coupling of the lower limbs during human locomotion, empowering it to infer the kinematic behavior of a missing lower limb across varied tasks, such as climbing inclines and stairs. We propose a model that can multitask, adapt continually, anticipate movements, and refine. The core of our method lies in an approach which we call -- multitask prospective rehearsal -- that anticipates and synthesizes future movements based on the previous prediction and employs a corrective mechanism for subsequent predictions. We design an evolving architecture that merges lightweight, task-specific modules on a shared backbone, ensuring both specificity and scalability. We empirically validate our model against various baselines using real-world human gait datasets, including experiments with transtibial amputees, which encompass a broad spectrum of locomotion tasks. The results show that our approach consistently outperforms baseline models, particularly under scenarios affected by distributional shifts, adversarial perturbations, and noise.

new Lipschitz constant estimation for general neural network architectures using control tools

Authors: Patricia Pauli, Dennis Gramlich, Frank Allg\"ower

Abstract: This paper is devoted to the estimation of the Lipschitz constant of neural networks using semidefinite programming. For this purpose, we interpret neural networks as time-varying dynamical systems, where the $k$-th layer corresponds to the dynamics at time $k$. A key novelty with respect to prior work is that we use this interpretation to exploit the series interconnection structure of neural networks with a dynamic programming recursion. Nonlinearities, such as activation functions and nonlinear pooling layers, are handled with integral quadratic constraints. If the neural network contains signal processing layers (convolutional or state space model layers), we realize them as 1-D/2-D/N-D systems and exploit this structure as well. We distinguish ourselves from related work on Lipschitz constant estimation by more extensive structure exploitation (scalability) and a generalization to a large class of common neural network architectures. To show the versatility and computational advantages of our method, we apply it to different neural network architectures trained on MNIST and CIFAR-10.

new Sharp Bounds for Sequential Federated Learning on Heterogeneous Data

Authors: Yipeng Li, Xinchen Lyu

Abstract: There are two paradigms in Federated Learning (FL): parallel FL (PFL), where models are trained in a parallel manner across clients; and sequential FL (SFL), where models are trained in a sequential manner across clients. In contrast to that of PFL, the convergence theory of SFL on heterogeneous data is still lacking. To resolve the theoretical dilemma of SFL, we establish sharp convergence guarantees for SFL on heterogeneous data with both upper and lower bounds. Specifically, we derive the upper bounds for strongly convex, general convex and non-convex objective functions, and construct the matching lower bounds for the strongly convex and general convex objective functions. Then, we compare the upper bounds of SFL with those of PFL, showing that SFL outperforms PFL (at least, when the level of heterogeneity is relatively high). Experimental results on quadratic functions and real data sets validate the counterintuitive comparison result.

new Why Tabular Foundation Models Should Be a Research Priority

Authors: Boris van Breugel, Mihaela van der Schaar

Abstract: Recent text and image foundation models are incredibly impressive, and these models are attracting an ever-increasing portion of research resources. In this position piece we aim to shift the ML research community's priorities ever so slightly to a different modality: tabular data. Tabular data is the dominant modality in many fields, yet it is given hardly any research attention and significantly lags behind in terms of scale and power. We believe the time is now to start developing tabular foundation models, or what we coin a Large Tabular Model (LTM). LTMs could revolutionise the way science and ML use tabular data: not as single datasets that are analyzed in a vacuum, but contextualized with respect to related datasets. The potential impact is far-reaching: from few-shot tabular models to automating data science; from out-of-distribution synthetic data to empowering multidisciplinary scientific discovery. We intend to excite reflections on the modalities we study, and convince some researchers to study large tabular models.

new SynFlowNet: Towards Molecule Design with Guaranteed Synthesis Pathways

Authors: Miruna Cretu, Charles Harris, Julien Roy, Emmanuel Bengio, Pietro Li\`o

Abstract: Recent breakthroughs in generative modelling have led to a number of works proposing molecular generation models for drug discovery. While these models perform well at capturing drug-like motifs, they are known to often produce synthetically inaccessible molecules. This is because they are trained to compose atoms or fragments in a way that approximates the training distribution, but they are not explicitly aware of the synthesis constraints that come with making molecules in the lab. To address this issue, we introduce SynFlowNet, a GFlowNet model whose action space uses chemically validated reactions and reactants to sequentially build new molecules. We evaluate our approach using synthetic accessibility scores and an independent retrosynthesis tool. SynFlowNet consistently samples synthetically feasible molecules, while still being able to find diverse and high-utility candidates. Furthermore, we compare molecules designed with SynFlowNet to experimentally validated actives, and find that they show comparable properties of interest, such as molecular weight, SA score and predicted protein binding affinity.

new Tabular and Deep Reinforcement Learning for Gittins Index

Authors: Harshit Dhankar, Kshitij Mishra, Tejas Bodas

Abstract: In the realm of multi-arm bandit problems, the Gittins index policy is known to be optimal in maximizing the expected total discounted reward obtained from pulling the Markovian arms. In most realistic scenarios however, the Markovian state transition probabilities are unknown and therefore the Gittins indices cannot be computed. One can then resort to reinforcement learning (RL) algorithms that explore the state space to learn these indices while exploiting to maximize the reward collected. In this work, we propose tabular (QGI) and Deep RL (DGN) algorithms for learning the Gittins index that are based on the retirement formulation for the multi-arm bandit problem. When compared with existing RL algorithms that learn the Gittins index, our algorithms have a lower run time, require less storage space (small Q-table size in QGI and smaller replay buffer in DGN), and illustrate better empirical convergence to the Gittins index. This makes our algorithm well suited for problems with large state spaces and is a viable alternative to existing methods. As a key application, we demonstrate the use of our algorithms in minimizing the mean flowtime in a job scheduling problem when jobs are available in batches and have an unknown service time distribution. \

new Interpretable Data-driven Anomaly Detection in Industrial Processes with ExIFFI

Authors: Davide Frizzo, Francesco Borsatti, Alessio Arcudi, Antonio De Moliner, Roberto Oboe, Gian Antonio Susto

Abstract: Anomaly detection (AD) is a crucial process often required in industrial settings. Anomalies can signal underlying issues within a system, prompting further investigation. Industrial processes aim to streamline operations as much as possible, encompassing the production of the final product, making AD an essential mean to reach this goal.Conventional anomaly detection methodologies typically classify observations as either normal or anomalous without providing insight into the reasons behind these classifications.Consequently, in light of the emergence of Industry 5.0, a more desirable approach involves providing interpretable outcomes, enabling users to understand the rationale behind the results.This paper presents the first industrial application of ExIFFI, a recently developed approach focused on the production of fast and efficient explanations for the Extended Isolation Forest (EIF) Anomaly detection method. ExIFFI is tested on two publicly available industrial datasets demonstrating superior effectiveness in explanations and computational efficiency with the respect to other state-of-the-art explainable AD models.

new Potential Energy based Mixture Model for Noisy Label Learning

Authors: Zijia Wang, Wenbin Yang, Zhisong Liu, Zhen Jia

Abstract: Training deep neural networks (DNNs) from noisy labels is an important and challenging task. However, most existing approaches focus on the corrupted labels and ignore the importance of inherent data structure. To bridge the gap between noisy labels and data, inspired by the concept of potential energy in physics, we propose a novel Potential Energy based Mixture Model (PEMM) for noise-labels learning. We innovate a distance-based classifier with the potential energy regularization on its class centers. Embedding our proposed classifier with existing deep learning backbones, we can have robust networks with better feature representations. They can preserve intrinsic structures from the data, resulting in a superior noisy tolerance. We conducted extensive experiments to analyze the efficiency of our proposed model on several real-world datasets. Quantitative results show that it can achieve state-of-the-art performance.

new Gradient-Congruity Guided Federated Sparse Training

Authors: Chris Xing Tian, Yibing Liu, Haoliang Li, Ray C. C. Cheung, Shiqi Wang

Abstract: Edge computing allows artificial intelligence and machine learning models to be deployed on edge devices, where they can learn from local data and collaborate to form a global model. Federated learning (FL) is a distributed machine learning technique that facilitates this process while preserving data privacy. However, FL also faces challenges such as high computational and communication costs regarding resource-constrained devices, and poor generalization performance due to the heterogeneity of data across edge clients and the presence of out-of-distribution data. In this paper, we propose the Gradient-Congruity Guided Federated Sparse Training (FedSGC), a novel method that integrates dynamic sparse training and gradient congruity inspection into federated learning framework to address these issues. Our method leverages the idea that the neurons, in which the associated gradients with conflicting directions with respect to the global model contain irrelevant or less generalized information for other clients, and could be pruned during the sparse training process. Conversely, the neurons where the associated gradients with consistent directions could be grown in a higher priority. In this way, FedSGC can greatly reduce the local computation and communication overheads while, at the same time, enhancing the generalization abilities of FL. We evaluate our method on challenging non-i.i.d settings and show that it achieves competitive accuracy with state-of-the-art FL methods across various scenarios while minimizing computation and communication costs.

new Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks

Authors: Mikkel Jordahn, Pablo Olmos

Abstract: Deep Neural Networks (DNN) have shown great promise in many classification applications, yet are widely known to have poorly calibrated predictions when they are over-parametrized. Improving DNN calibration without comprising on model accuracy is of extreme importance and interest in safety critical applications such as in the health-care sector. In this work, we show that decoupling the training of feature extraction layers and classification layers in over-parametrized DNN architectures such as Wide Residual Networks (WRN) and Visual Transformers (ViT) significantly improves model calibration whilst retaining accuracy, and at a low training cost. In addition, we show that placing a Gaussian prior on the last hidden layer outputs of a DNN, and training the model variationally in the classification training stage, even further improves calibration. We illustrate these methods improve calibration across ViT and WRN architectures for several image classification benchmark datasets.

new Towards Interpretable Reinforcement Learning with Constrained Normalizing Flow Policies

Authors: Finn Rietz, Erik Schaffernicht, Stefan Heinrich, Johannes A. Stork

Abstract: Reinforcement learning policies are typically represented by black-box neural networks, which are non-interpretable and not well-suited for safety-critical domains. To address both of these issues, we propose constrained normalizing flow policies as interpretable and safe-by-construction policy models. We achieve safety for reinforcement learning problems with instantaneous safety constraints, for which we can exploit domain knowledge by analytically constructing a normalizing flow that ensures constraint satisfaction. The normalizing flow corresponds to an interpretable sequence of transformations on action samples, each ensuring alignment with respect to a particular constraint. Our experiments reveal benefits beyond interpretability in an easier learning objective and maintained constraint satisfaction throughout the entire learning process. Our approach leverages constraints over reward engineering while offering enhanced interpretability, safety, and direct means of providing domain knowledge to the agent without relying on complex reward functions.

new Error-Driven Uncertainty Aware Training

Authors: Pedro Mendes, Paolo Romano, David Garlan

Abstract: Neural networks are often overconfident about their predictions, which undermines their reliability and trustworthiness. In this work, we present a novel technique, named Error-Driven Uncertainty Aware Training (EUAT), which aims to enhance the ability of neural models to estimate their uncertainty correctly, namely to be highly uncertain when they output inaccurate predictions and low uncertain when their output is accurate. The EUAT approach operates during the model's training phase by selectively employing two loss functions depending on whether the training examples are correctly or incorrectly predicted by the model. This allows for pursuing the twofold goal of i) minimizing model uncertainty for correctly predicted inputs and ii) maximizing uncertainty for mispredicted inputs, while preserving the model's misprediction rate. We evaluate EUAT using diverse neural models and datasets in the image recognition domains considering both non-adversarial and adversarial settings. The results show that EUAT outperforms existing approaches for uncertainty estimation (including other uncertainty-aware training techniques, calibration, ensembles, and DEUP) by providing uncertainty estimates that not only have higher quality when evaluated via statistical metrics (e.g., correlation with residuals) but also when employed to build binary classifiers that decide whether the model's output can be trusted or not and under distributional data shifts.

new Improving Membership Inference in ASR Model Auditing with Perturbed Loss Features

Authors: Francisco Teixeira, Karla Pizzi, Raphael Olivier, Alberto Abad, Bhiksha Raj, Isabel Trancoso

Abstract: Membership Inference (MI) poses a substantial privacy threat to the training data of Automatic Speech Recognition (ASR) systems, while also offering an opportunity to audit these models with regard to user data. This paper explores the effectiveness of loss-based features in combination with Gaussian and adversarial perturbations to perform MI in ASR models. To the best of our knowledge, this approach has not yet been investigated. We compare our proposed features with commonly used error-based features and find that the proposed features greatly enhance performance for sample-level MI. For speaker-level MI, these features improve results, though by a smaller margin, as error-based features already obtained a high performance for this task. Our findings emphasise the importance of considering different feature sets and levels of access to target models for effective MI in ASR systems, providing valuable insights for auditing such models.

new Boosting Jailbreak Attack with Momentum

Authors: Yihao Zhang, Zeming Wei

Abstract: Large Language Models (LLMs) have achieved remarkable success across diverse tasks, yet they remain vulnerable to adversarial attacks, notably the well-documented \textit{jailbreak} attack. Recently, the Greedy Coordinate Gradient (GCG) attack has demonstrated efficacy in exploiting this vulnerability by optimizing adversarial prompts through a combination of gradient heuristics and greedy search. However, the efficiency of this attack has become a bottleneck in the attacking process. To mitigate this limitation, in this paper we rethink the generation of adversarial prompts through an optimization lens, aiming to stabilize the optimization process and harness more heuristic insights from previous iterations. Specifically, we introduce the \textbf{M}omentum \textbf{A}ccelerated G\textbf{C}G (\textbf{MAC}) attack, which incorporates a momentum term into the gradient heuristic. Experimental results showcase the notable enhancement achieved by MAP in gradient-based attacks on aligned language models. Our code is available at https://github.com/weizeming/momentum-attack-llm.

URLs: https://github.com/weizeming/momentum-attack-llm.

new Lying Graph Convolution: Learning to Lie for Node Classification Tasks

Authors: Daniele Castellana

Abstract: In the context of machine learning for graphs, many researchers have empirically observed that Deep Graph Networks (DGNs) perform favourably on node classification tasks when the graph structure is homophilic (\ie adjacent nodes are similar). In this paper, we introduce Lying-GCN, a new DGN inspired by opinion dynamics that can adaptively work in both the heterophilic and the homophilic setting. At each layer, each agent (node) shares its own opinions (node embeddings) with its neighbours. Instead of sharing its opinion directly as in GCN, we introduce a mechanism which allows agents to lie. Such a mechanism is adaptive, thus the agents learn how and when to lie according to the task that should be solved. We provide a characterisation of our proposal in terms of dynamical systems, by studying the spectral property of the coefficient matrix of the system. While the steady state of the system collapses to zero, we believe the lying mechanism is still usable to solve node classification tasks. We empirically prove our belief on both synthetic and real-world datasets, by showing that the lying mechanism allows to increase the performances in the heterophilic setting without harming the results in the homophilic one.

new Revisiting semi-supervised training objectives for differentiable particle filters

Authors: Jiaxi Li, John-Joseph Brady, Xiongjie Chen, Yunpeng Li

Abstract: Differentiable particle filters combine the flexibility of neural networks with the probabilistic nature of sequential Monte Carlo methods. However, traditional approaches rely on the availability of labelled data, i.e., the ground truth latent state information, which is often difficult to obtain in real-world applications. This paper compares the effectiveness of two semi-supervised training objectives for differentiable particle filters. We present results in two simulated environments where labelled data are scarce.

new Causal Influence in Federated Edge Inference

Authors: Mert Kayaalp, Yunus Inan, Visa Koivunen, Ali H. Sayed

Abstract: In this paper, we consider a setting where heterogeneous agents with connectivity are performing inference using unlabeled streaming data. Observed data are only partially informative about the target variable of interest. In order to overcome the uncertainty, agents cooperate with each other by exchanging their local inferences with and through a fusion center. To evaluate how each agent influences the overall decision, we adopt a causal framework in order to distinguish the actual influence of agents from mere correlations within the decision-making process. Various scenarios reflecting different agent participation patterns and fusion center policies are investigated. We derive expressions to quantify the causal impact of each agent on the joint decision, which could be beneficial for anticipating and addressing atypical scenarios, such as adversarial attacks or system malfunctions. We validate our theoretical results with numerical simulations and a real-world application of multi-camera crowd counting.

new Continuously evolving rewards in an open-ended environment

Authors: Richard M. Bailey

Abstract: Unambiguous identification of the rewards driving behaviours of entities operating in complex open-ended real-world environments is difficult, partly because goals and associated behaviours emerge endogenously and are dynamically updated as environments change. Reproducing such dynamics in models would be useful in many domains, particularly where fixed reward functions limit the adaptive capabilities of agents. Simulation experiments described assess a candidate algorithm for the dynamic updating of rewards, RULE: Reward Updating through Learning and Expectation. The approach is tested in a simplified ecosystem-like setting where experiments challenge entities' survival, calling for significant behavioural change. The population of entities successfully demonstrate the abandonment of an initially rewarded but ultimately detrimental behaviour, amplification of beneficial behaviour, and appropriate responses to novel items added to their environment. These adjustment happen through endogenous modification of the entities' underlying reward function, during continuous learning, without external intervention.

new An Online Gradient-Based Caching Policy with Logarithmic Complexity and Regret Guarantees

Authors: Damiano Carra, Giovanni Neglia

Abstract: The commonly used caching policies, such as LRU or LFU, exhibit optimal performance only for specific traffic patterns. Even advanced Machine Learning-based methods, which detect patterns in historical request data, struggle when future requests deviate from past trends. Recently, a new class of policies has emerged that makes no assumptions about the request arrival process. These algorithms solve an online optimization problem, enabling continuous adaptation to the context. They offer theoretical guarantees on the regret metric, which is the gap between the gain of the online policy and the gain of the optimal static cache allocation in hindsight. Nevertheless, the high computational complexity of these solutions hinders their practical adoption. In this study, we introduce a groundbreaking gradient-based online caching policy, the first to achieve logarithmic computational complexity relative to catalog size along with regret guarantees. This means our algorithm can efficiently handle large-scale data while minimizing the performance gap between real-time decisions and optimal hindsight choices. As requests arrive, our policy dynamically adjusts the probabilities of including items in the cache, which drive cache update decisions. Our algorithm's streamlined complexity is a key advantage, enabling its application to real-world traces featuring millions of requests and items. This is a significant achievement, as traces of this scale have been out of reach for existing policies with regret guarantees. To the best of our knowledge, our experimental results show for the first time that the regret guarantees of gradient-based caching policies bring significant benefits in scenarios of practical interest.

new The Importance of Model Inspection for Better Understanding Performance Characteristics of Graph Neural Networks

Authors: Nairouz Shehata, Carolina Pi\c{c}arra, Anees Kazi, Ben Glocker

Abstract: This study highlights the importance of conducting comprehensive model inspection as part of comparative performance analyses. Here, we investigate the effect of modelling choices on the feature learning characteristics of graph neural networks applied to a brain shape classification task. Specifically, we analyse the effect of using parameter-efficient, shared graph convolutional submodels compared to structure-specific, non-shared submodels. Further, we assess the effect of mesh registration as part of the data harmonisation pipeline. We find substantial differences in the feature embeddings at different layers of the models. Our results highlight that test accuracy alone is insufficient to identify important model characteristics such as encoded biases related to data source or potentially non-discriminative features learned in submodels. Our model inspection framework offers a valuable tool for practitioners to better understand performance characteristics of deep learning models in medical imaging.

new Graph is all you need? Lightweight data-agnostic neural architecture search without training

Authors: Zhenhan Huang, Tejaswini Pedapati, Pin-Yu Chen, Chunhen Jiang, Jianxi Gao

Abstract: Neural architecture search (NAS) enables the automatic design of neural network models. However, training the candidates generated by the search algorithm for performance evaluation incurs considerable computational overhead. Our method, dubbed nasgraph, remarkably reduces the computational costs by converting neural architectures to graphs and using the average degree, a graph measure, as the proxy in lieu of the evaluation metric. Our training-free NAS method is data-agnostic and light-weight. It can find the best architecture among 200 randomly sampled architectures from NAS-Bench201 in 217 CPU seconds. Besides, our method is able to achieve competitive performance on various datasets including NASBench-101, NASBench-201, and NDS search spaces. We also demonstrate that nasgraph generalizes to more challenging tasks on Micro TransNAS-Bench-101.

new Data Scoping: Effectively Learning the Evolution of Generic Transport PDEs

Authors: Jiangce Chen, Wenzhuo Xu, Zeda Xu, Noelia Grande Guti\'errez, Sneha Prabha Narra, Christopher McComb

Abstract: Transport phenomena (e.g., fluid flows) are governed by time-dependent partial differential equations (PDEs) describing mass, momentum, and energy conservation, and are ubiquitous in many engineering applications. However, deep learning architectures are fundamentally incompatible with the simulation of these PDEs. This paper clearly articulates and then solves this incompatibility. The local-dependency of generic transport PDEs implies that it only involves local information to predict the physical properties at a location in the next time step. However, the deep learning architecture will inevitably increase the scope of information to make such predictions as the number of layers increases, which can cause sluggish convergence and compromise generalizability. This paper aims to solve this problem by proposing a distributed data scoping method with linear time complexity to strictly limit the scope of information to predict the local properties. The numerical experiments over multiple physics show that our data scoping method significantly accelerates training convergence and improves the generalizability of benchmark models on large-scale engineering simulations. Specifically, over the geometries not included in the training data for heat transferring simulation, it can increase the accuracy of Convolutional Neural Networks (CNNs) by 21.7 \% and that of Fourier Neural Operators (FNOs) by 38.5 \% on average.

new Constrained Reinforcement Learning Under Model Mismatch

Authors: Zhongchang Sun, Sihong He, Fei Miao, Shaofeng Zou

Abstract: Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment. However, when deployed in a real environment, it may easily violate constraints that were originally satisfied during training because there might be model mismatch between the training and real environments. To address the above challenge, we formulate the problem as constrained RL under model uncertainty, where the goal is to learn a good policy that optimizes the reward and at the same time satisfy the constraint under model mismatch. We develop a Robust Constrained Policy Optimization (RCPO) algorithm, which is the first algorithm that applies to large/continuous state space and has theoretical guarantees on worst-case reward improvement and constraint violation at each iteration during the training. We demonstrate the effectiveness of our algorithm on a set of RL tasks with constraints.

new Position Paper: Beyond Robustness Against Single Attack Types

Authors: Sihui Dai, Chong Xiang, Tong Wu, Prateek Mittal

Abstract: Current research on defending against adversarial examples focuses primarily on achieving robustness against a single attack type such as $\ell_2$ or $\ell_{\infty}$-bounded attacks. However, the space of possible perturbations is much larger and currently cannot be modeled by a single attack type. The discrepancy between the focus of current defenses and the space of attacks of interest calls to question the practicality of existing defenses and the reliability of their evaluation. In this position paper, we argue that the research community should look beyond single attack robustness, and we draw attention to three potential directions involving robustness against multiple attacks: simultaneous multiattack robustness, unforeseen attack robustness, and a newly defined problem setting which we call continual adaptive robustness. We provide a unified framework which rigorously defines these problem settings, synthesize existing research in these fields, and outline open directions. We hope that our position paper inspires more research in simultaneous multiattack, unforeseen attack, and continual adaptive robustness.

new Community-Invariant Graph Contrastive Learning

Authors: Shiyin Tan, Dongyuan Li, Renhe Jiang, Ying Zhang, Manabu Okumura

Abstract: Graph augmentation has received great attention in recent years for graph contrastive learning (GCL) to learn well-generalized node/graph representations. However, mainstream GCL methods often favor randomly disrupting graphs for augmentation, which shows limited generalization and inevitably leads to the corruption of high-level graph information, i.e., the graph community. Moreover, current knowledge-based graph augmentation methods can only focus on either topology or node features, causing the model to lack robustness against various types of noise. To address these limitations, this research investigated the role of the graph community in graph augmentation and figured out its crucial advantage for learnable graph augmentation. Based on our observations, we propose a community-invariant GCL framework to maintain graph community structure during learnable graph augmentation. By maximizing the spectral changes, this framework unifies the constraints of both topology and feature augmentation, enhancing the model's robustness. Empirical evidence on 21 benchmark datasets demonstrates the exclusive merits of our framework. Code is released on Github (https://github.com/ShiyinTan/CI-GCL.git).

URLs: https://github.com/ShiyinTan/CI-GCL.git).

new Dynamic Online Ensembles of Basis Expansions

Authors: Daniel Waxman, Petar M. Djuri\'c

Abstract: Practical Bayesian learning often requires (1) online inference, (2) dynamic models, and (3) ensembling over multiple different models. Recent advances have shown how to use random feature approximations to achieve scalable, online ensembling of Gaussian processes with desirable theoretical properties and fruitful applications. One key to these methods' success is the inclusion of a random walk on the model parameters, which makes models dynamic. We show that these methods can be generalized easily to any basis expansion model and that using alternative basis expansions, such as Hilbert space Gaussian processes, often results in better performance. To simplify the process of choosing a specific basis expansion, our method's generality also allows the ensembling of several entirely different models, for example, a Gaussian process and polynomial regression. Finally, we propose a novel method to ensemble static and dynamic models together.

new Invariant Risk Minimization Is A Total Variation Model

Authors: Zhao-Rong Lai, Wei-Wen Wang

Abstract: Invariant risk minimization (IRM) is an arising approach to generalize invariant features to different environments in machine learning. While most related works focus on new IRM settings or new application scenarios, the mathematical essence of IRM remains to be properly explained. We verify that IRM is essentially a total variation based on $L^2$ norm (TV-$\ell_2$) of the learning risk with respect to the classifier variable. Moreover, we propose a novel IRM framework based on the TV-$\ell_1$ model. It not only expands the classes of functions that can be used as the learning risk, but also has robust performance in denoising and invariant feature preservation based on the coarea formula. We also illustrate some requirements for IRM-TV-$\ell_1$ to achieve out-of-distribution generalization. Experimental results show that the proposed framework achieves competitive performance in several benchmark machine learning scenarios.

new Test-time Assessment of a Model's Performance on Unseen Domains via Optimal Transport

Authors: Akshay Mehra, Yunbei Zhang, Jihun Hamm

Abstract: Gauging the performance of ML models on data from unseen domains at test-time is essential yet a challenging problem due to the lack of labels in this setting. Moreover, the performance of these models on in-distribution data is a poor indicator of their performance on data from unseen domains. Thus, it is essential to develop metrics that can provide insights into the model's performance at test time and can be computed only with the information available at test time (such as their model parameters, the training data or its statistics, and the unlabeled test data). To this end, we propose a metric based on Optimal Transport that is highly correlated with the model's performance on unseen domains and is efficiently computable only using information available at test time. Concretely, our metric characterizes the model's performance on unseen domains using only a small amount of unlabeled data from these domains and data or statistics from the training (source) domain(s). Through extensive empirical evaluation using standard benchmark datasets, and their corruptions, we demonstrate the utility of our metric in estimating the model's performance in various practical applications. These include the problems of selecting the source data and architecture that leads to the best performance on data from an unseen domain and the problem of predicting a deployed model's performance at test time on unseen domains. Our empirical results show that our metric, which uses information from both the source and the unseen domain, is highly correlated with the model's performance, achieving a significantly better correlation than that obtained via the popular prediction entropy-based metric, which is computed solely using the data from the unseen domain.

new Uncertainty for Active Learning on Graphs

Authors: Dominik Fuchsgruber, Tom Wollschl\"ager, Bertrand Charpentier, Antonio Oroz, Stephan G\"unnemann

Abstract: Uncertainty Sampling is an Active Learning strategy that aims to improve the data efficiency of machine learning models by iteratively acquiring labels of data points with the highest uncertainty. While it has proven effective for independent data its applicability to graphs remains under-explored. We propose the first extensive study of Uncertainty Sampling for node classification: (1) We benchmark Uncertainty Sampling beyond predictive uncertainty and highlight a significant performance gap to other Active Learning strategies. (2) We develop ground-truth Bayesian uncertainty estimates in terms of the data generating process and prove their effectiveness in guiding Uncertainty Sampling toward optimal queries. We confirm our results on synthetic data and design an approximate approach that consistently outperforms other uncertainty estimators on real datasets. (3) Based on this analysis, we relate pitfalls in modeling uncertainty to existing methods. Our analysis enables and informs the development of principled uncertainty estimation on graphs.

new Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models

Authors: Yifei Ming, Yixuan Li

Abstract: Pre-trained contrastive vision-language models have demonstrated remarkable performance across a wide range of tasks. However, they often struggle on fine-trained datasets with categories not adequately represented during pre-training, which makes adaptation necessary. Recent works have shown promising results by utilizing samples from web-scale databases for retrieval-augmented adaptation, especially in low-data regimes. Despite the empirical success, understanding how retrieval impacts the adaptation of vision-language models remains an open research question. In this work, we adopt a reflective perspective by presenting a systematic study to understand the roles of key components in retrieval-augmented adaptation. We unveil new insights on uni-modal and cross-modal retrieval and highlight the critical role of logit ensemble for effective adaptation. We further present theoretical underpinnings that directly support our empirical observations.

new Common pitfalls to avoid while using multiobjective optimization in machine learning

Authors: Junaid Akhter, Paul David F\"ahrmann, Konstantin Sonntag, Sebastian Peitz

Abstract: Recently, there has been an increasing interest in exploring the application of multiobjective optimization (MOO) in machine learning (ML). The interest is driven by the numerous situations in real-life applications where multiple objectives need to be optimized simultaneously. A key aspect of MOO is the existence of a Pareto set, rather than a single optimal solution, which illustrates the inherent trade-offs between objectives. Despite its potential, there is a noticeable lack of satisfactory literature that could serve as an entry-level guide for ML practitioners who want to use MOO. Hence, our goal in this paper is to produce such a resource. We critically review previous studies, particularly those involving MOO in deep learning (using Physics-Informed Neural Networks (PINNs) as a guiding example), and identify misconceptions that highlight the need for a better grasp of MOO principles in ML. Using MOO of PINNs as a case study, we demonstrate the interplay between the data loss and the physics loss terms. We highlight the most common pitfalls one should avoid while using MOO techniques in ML. We begin by establishing the groundwork for MOO, focusing on well-known approaches such as the weighted sum (WS) method, alongside more complex techniques like the multiobjective gradient descent algorithm (MGDA). Additionally, we compare the results obtained from the WS and MGDA with one of the most common evolutionary algorithms, NSGA-II. We emphasize the importance of understanding the specific problem, the objective space, and the selected MOO method, while also noting that neglecting factors such as convergence can result in inaccurate outcomes and, consequently, a non-optimal solution. Our goal is to offer a clear and practical guide for ML practitioners to effectively apply MOO, particularly in the context of DL.

new Digital Twin Generators for Disease Modeling

Authors: Nameyeh Alam, Jake Basilico, Daniele Bertolini, Satish Casie Chetty, Heather D'Angelo, Ryan Douglas, Charles K. Fisher, Franklin Fuller, Melissa Gomes, Rishabh Gupta, Alex Lang, Anton Loukianov, Rachel Mak-McCully, Cary Murray, Hanalei Pham, Susanna Qiao, Elena Ryapolova-Webb, Aaron Smith, Dimitri Theoharatos, Anil Tolwani, Eric W. Tramel, Anna Vidovszky, Judy Viduya, Jonathan R. Walsh

Abstract: A patient's digital twin is a computational model that describes the evolution of their health over time. Digital twins have the potential to revolutionize medicine by enabling individual-level computer simulations of human health, which can be used to conduct more efficient clinical trials or to recommend personalized treatment options. Due to the overwhelming complexity of human biology, machine learning approaches that leverage large datasets of historical patients' longitudinal health records to generate patients' digital twins are more tractable than potential mechanistic models. In this manuscript, we describe a neural network architecture that can learn conditional generative models of clinical trajectories, which we call Digital Twin Generators (DTGs), that can create digital twins of individual patients. We show that the same neural network architecture can be trained to generate accurate digital twins for patients across 13 different indications simply by changing the training set and tuning hyperparameters. By introducing a general purpose architecture, we aim to unlock the ability to scale machine learning approaches to larger datasets and across more indications so that a digital twin could be created for any patient in the world.

new Accelerating Convergence in Bayesian Few-Shot Classification

Authors: Tianjun Ke, Haoqun Cao, Feng Zhou

Abstract: Bayesian few-shot classification has been a focal point in the field of few-shot learning. This paper seamlessly integrates mirror descent-based variational inference into Gaussian process-based few-shot classification, addressing the challenge of non-conjugate inference. By leveraging non-Euclidean geometry, mirror descent achieves accelerated convergence by providing the steepest descent direction along the corresponding manifold. It also exhibits the parameterization invariance property concerning the variational distribution. Experimental results demonstrate competitive classification accuracy, improved uncertainty quantification, and faster convergence compared to baseline models. Additionally, we investigate the impact of hyperparameters and components. Code is publicly available at https://github.com/keanson/MD-BSFC.

URLs: https://github.com/keanson/MD-BSFC.

new A separability-based approach to quantifying generalization: which layer is best?

Authors: Luciano Dyballa, Evan Gerritz, Steven W. Zucker

Abstract: Generalization to unseen data remains poorly understood for deep learning classification and foundation models. How can one assess the ability of networks to adapt to new or extended versions of their input space in the spirit of few-shot learning, out-of-distribution generalization, and domain adaptation? Which layers of a network are likely to generalize best? We provide a new method for evaluating the capacity of networks to represent a sampled domain, regardless of whether the network has been trained on all classes in the domain. Our approach is the following: after fine-tuning state-of-the-art pre-trained models for visual classification on a particular domain, we assess their performance on data from related but distinct variations in that domain. Generalization power is quantified as a function of the latent embeddings of unseen data from intermediate layers for both unsupervised and supervised settings. Working throughout all stages of the network, we find that (i) high classification accuracy does not imply high generalizability; and (ii) deeper layers in a model do not always generalize the best, which has implications for pruning. Since the trends observed across datasets are largely consistent, we conclude that our approach reveals (a function of) the intrinsic capacity of the different layers of a model to generalize.

new Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models

Authors: Nishad Singhi, Jae Myung Kim, Karsten Roth, Zeynep Akata

Abstract: Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions. Crucially, the CBM design inherently allows for human interventions, in which expert users are given the ability to modify potentially misaligned concept choices to influence the decision behavior of the model in an interpretable fashion. However, existing approaches often require numerous human interventions per image to achieve strong performances, posing practical challenges in scenarios where obtaining human feedback is expensive. In this paper, we find that this is noticeably driven by an independent treatment of concepts during intervention, wherein a change of one concept does not influence the use of other ones in the model's final decision. To address this issue, we introduce a trainable concept intervention realignment module, which leverages concept relations to realign concept assignments post-intervention. Across standard, real-world benchmarks, we find that concept realignment can significantly improve intervention efficacy; significantly reducing the number of interventions needed to reach a target classification performance or concept prediction accuracy. In addition, it easily integrates into existing concept-based architectures without requiring changes to the models themselves. This reduced cost of human-model collaboration is crucial to enhancing the feasibility of CBMs in resource-constrained environments.

new Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks

Authors: Murtaza Dalal, Tarun Chiruvolu, Devendra Chaplot, Ruslan Salakhutdinov

Abstract: Large Language Models (LLMs) have been shown to be capable of performing high-level planning for long-horizon robotics tasks, yet existing methods require access to a pre-defined skill library (e.g. picking, placing, pulling, pushing, navigating). However, LLM planning does not address how to design or learn those behaviors, which remains challenging particularly in long-horizon settings. Furthermore, for many tasks of interest, the robot needs to be able to adjust its behavior in a fine-grained manner, requiring the agent to be capable of modifying low-level control actions. Can we instead use the internet-scale knowledge from LLMs for high-level policies, guiding reinforcement learning (RL) policies to efficiently solve robotic control tasks online without requiring a pre-determined set of skills? In this paper, we propose Plan-Seq-Learn (PSL): a modular approach that uses motion planning to bridge the gap between abstract language and learned low-level control for solving long-horizon robotics tasks from scratch. We demonstrate that PSL achieves state-of-the-art results on over 25 challenging robotics tasks with up to 10 stages. PSL solves long-horizon tasks from raw visual input spanning four benchmarks at success rates of over 85%, out-performing language-based, classical, and end-to-end approaches. Video results and code at https://mihdalal.github.io/planseqlearn/

URLs: https://mihdalal.github.io/planseqlearn/

cross Low-cost modular devices for on-road vehicle detection and characterisation

Authors: Jose-Luis Poza-Lujan, Pedro Uribe-Chavert, Juan-Luis Posadas-Yag\"ue

Abstract: Detecting and characterising vehicles is one of the purposes of embedded systems used in intelligent environments. An analysis of a vehicle characteristics can reveal inappropriate or dangerous behaviour. This detection makes it possible to sanction or notify emergency services to take early and practical actions. Vehicle detection and characterisation systems employ complex sensors such as video cameras, especially in urban environments. These sensors provide high precision and performance, although the price and computational requirements are proportional to their accuracy. These sensors offer high accuracy, but the price and computational requirements are directly proportional to their performance. This article introduces a system based on modular devices that is economical and has a low computational cost. These devices use ultrasonic sensors to detect the speed and length of vehicles. The measurement accuracy is improved through the collaboration of the device modules. The experiments were performed using multiple modules oriented to different angles. This module is coupled with another specifically designed to detect distance using previous modules speed and length data. The collaboration between different modules reduces the speed relative error ranges from 1 to 5, depending on the angle configuration used in the modules.

cross Understanding Social Perception, Interactions, and Safety Aspects of Sidewalk Delivery Robots Using Sentiment Analysis

Authors: Yuchen Du, Tho V. Le

Abstract: This article presents a comprehensive sentiment analysis (SA) of comments on YouTube videos related to Sidewalk Delivery Robots (SDRs). We manually annotated the collected YouTube comments with three sentiment labels: negative (0), positive (1), and neutral (2). We then constructed models for text sentiment classification and tested the models' performance on both binary and ternary classification tasks in terms of accuracy, precision, recall, and F1 score. Our results indicate that, in binary classification tasks, the Support Vector Machine (SVM) model using Term Frequency-Inverse Document Frequency (TF-IDF) and N-gram get the highest accuracy. In ternary classification tasks, the model using Bidirectional Encoder Representations from Transformers (BERT), Long Short-Term Memory Networks (LSTM) and Gated Recurrent Unit (GRU) significantly outperforms other machine learning models, achieving an accuracy, precision, recall, and F1 score of 0.78. Additionally, we employ the Latent Dirichlet Allocation model to generate 10 topics from the comments to explore the public's underlying views on SDRs. Drawing from these findings, we propose targeted recommendations for shaping future policies concerning SDRs. This work provides valuable insights for stakeholders in the SDR sector regarding social perception, interaction, and safety.

cross Joint torques prediction of a robotic arm using neural networks

Authors: Giulia d'Addato, Ruggero Carli, Eurico Pedrosa, Artur Pereira, Luigi Palopoli, Daniele Fontanelli

Abstract: Accurate dynamic models are crucial for many robotic applications. Traditional approaches to deriving these models are based on the application of Lagrangian or Newtonian mechanics. Although these methods provide a good insight into the physical behaviour of the system, they rely on the exact knowledge of parameters such as inertia, friction and joint flexibility. In addition, the system is often affected by uncertain and nonlinear effects, such as saturation and dead zones, which can be difficult to model. A popular alternative is the application of Machine Learning (ML) techniques - e.g., Neural Networks (NNs) - in the context of a "black-box" methodology. This paper reports on our experience with this approach for a real-life 6 degrees of freedom (DoF) manipulator. Specifically, we considered several NN architectures: single NN, multiple NNs, and cascade NN. We compared the performance of the system by using different policies for selecting the NN hyperparameters. Our experiments reveal that the best accuracy and performance are obtained by a cascade NN, in which we encode our prior physical knowledge about the dependencies between joints, complemented by an appropriate optimisation of the hyperparameters.

cross Pricing Catastrophe Bonds -- A Probabilistic Machine Learning Approach

Authors: Xiaowei Chen, Hong Li, Yufan Lu, Rui Zhou

Abstract: This paper proposes a probabilistic machine learning method to price catastrophe (CAT) bonds in the primary market. The proposed method combines machine-learning-based predictive models with Conformal Prediction, an innovative algorithm that generates distribution-free probabilistic forecasts for CAT bond prices. Using primary market CAT bond transaction records between January 1999 and March 2021, the proposed method is found to be more robust and yields more accurate predictions of the bond spreads than traditional regression-based methods. Furthermore, the proposed method generates more informative prediction intervals than linear regression and identifies important nonlinear relationships between various risk factors and bond spreads, suggesting that linear regressions could misestimate the bond spreads. Overall, this paper demonstrates the potential of machine learning methods in improving the pricing of CAT bonds.

cross Direct Training Needs Regularisation: Anytime Optimal Inference Spiking Neural Network

Authors: Dengyu Wu, Yi Qi, Kaiwen Cai, Gaojie Jin, Xinping Yi, Xiaowei Huang

Abstract: Spiking Neural Network (SNN) is acknowledged as the next generation of Artificial Neural Network (ANN) and hold great promise in effectively processing spatial-temporal information. However, the choice of timestep becomes crucial as it significantly impacts the accuracy of the neural network training. Specifically, a smaller timestep indicates better performance in efficient computing, resulting in reduced latency and operations. While, using a small timestep may lead to low accuracy due to insufficient information presentation with few spikes. This observation motivates us to develop an SNN that is more reliable for adaptive timestep by introducing a novel regularisation technique, namely Spatial-Temporal Regulariser (STR). Our approach regulates the ratio between the strength of spikes and membrane potential at each timestep. This effectively balances spatial and temporal performance during training, ultimately resulting in an Anytime Optimal Inference (AOI) SNN. Through extensive experiments on frame-based and event-based datasets, our method, in combination with cutoff based on softmax output, achieves state-of-the-art performance in terms of both latency and accuracy. Notably, with STR and cutoff, SNN achieves 2.14 to 2.89 faster in inference compared to the pre-configured timestep with near-zero accuracy drop of 0.50% to 0.64% over the event-based datasets. Code available: https://github.com/Dengyu-Wu/AOI-SNN-Regularisation

URLs: https://github.com/Dengyu-Wu/AOI-SNN-Regularisation

cross SHED: Shapley-Based Automated Dataset Refinement for Instruction Fine-Tuning

Authors: Yexiao He, Ziyao Wang, Zheyu Shen, Guoheng Sun, Yucong Dai, Yongkai Wu, Hongyi Wang, Ang Li

Abstract: The pre-trained Large Language Models (LLMs) can be adapted for many downstream tasks and tailored to align with human preferences through fine-tuning. Recent studies have discovered that LLMs can achieve desirable performance with only a small amount of high-quality data, suggesting that a large amount of the data in these extensive datasets is redundant or even harmful. Identifying high-quality data from vast datasets to curate small yet effective datasets has emerged as a critical challenge. In this paper, we introduce SHED, an automated dataset refinement framework based on Shapley value for instruction fine-tuning. SHED eliminates the need for human intervention or the use of commercial LLMs. Moreover, the datasets curated through SHED exhibit transferability, indicating they can be reused across different LLMs with consistently high performance. We conduct extensive experiments to evaluate the datasets curated by SHED. The results demonstrate SHED's superiority over state-of-the-art methods across various tasks and LLMs; notably, datasets comprising only 10% of the original data selected by SHED achieve performance comparable to or surpassing that of the full datasets.

cross Interactive Analysis of LLMs using Meaningful Counterfactuals

Authors: Furui Cheng, Vil\'em Zouhar, Robin Shing Moon Chan, Daniel F\"urst, Hendrik Strobelt, Mennatallah El-Assady

Abstract: Counterfactual examples are useful for exploring the decision boundaries of machine learning models and determining feature attributions. How can we apply counterfactual-based methods to analyze and explain LLMs? We identify the following key challenges. First, the generated textual counterfactuals should be meaningful and readable to users and thus can be mentally compared to draw conclusions. Second, to make the solution scalable to long-form text, users should be equipped with tools to create batches of counterfactuals from perturbations at various granularity levels and interactively analyze the results. In this paper, we tackle the above challenges and contribute 1) a novel algorithm for generating batches of complete and meaningful textual counterfactuals by removing and replacing text segments in different granularities, and 2) LLM Analyzer, an interactive visualization tool to help users understand an LLM's behaviors by interactively inspecting and aggregating meaningful counterfactuals. We evaluate the proposed algorithm by the grammatical correctness of its generated counterfactuals using 1,000 samples from medical, legal, finance, education, and news datasets. In our experiments, 97.2% of the counterfactuals are grammatically correct. Through a use case, user studies, and feedback from experts, we demonstrate the usefulness and usability of the proposed interactive visualization tool.

cross Evaluating Tool-Augmented Agents in Remote Sensing Platforms

Authors: Simranjit Singh, Michael Fore, Dimitrios Stamoulis

Abstract: Tool-augmented Large Language Models (LLMs) have shown impressive capabilities in remote sensing (RS) applications. However, existing benchmarks assume question-answering input templates over predefined image-text data pairs. These standalone instructions neglect the intricacies of realistic user-grounded tasks. Consider a geospatial analyst: they zoom in a map area, they draw a region over which to collect satellite imagery, and they succinctly ask "Detect all objects here". Where is `here`, if it is not explicitly hardcoded in the image-text template, but instead is implied by the system state, e.g., the live map positioning? To bridge this gap, we present GeoLLM-QA, a benchmark designed to capture long sequences of verbal, visual, and click-based actions on a real UI platform. Through in-depth evaluation of state-of-the-art LLMs over a diverse set of 1,000 tasks, we offer insights towards stronger agents for RS applications.

cross Homonym Sense Disambiguation in the Georgian Language

Authors: Davit Melikidze, Alexander Gamkrelidze

Abstract: This research proposes a novel approach to the Word Sense Disambiguation (WSD) task in the Georgian language, based on supervised fine-tuning of a pre-trained Large Language Model (LLM) on a dataset formed by filtering the Georgian Common Crawls corpus. The dataset is used to train a classifier for words with multiple senses. Additionally, we present experimental results of using LSTM for WSD. Accurately disambiguating homonyms is crucial in natural language processing. Georgian, an agglutinative language belonging to the Kartvelian language family, presents unique challenges in this context. The aim of this paper is to highlight the specific problems concerning homonym disambiguation in the Georgian language and to present our approach to solving them. The techniques discussed in the article achieve 95% accuracy for predicting lexical meanings of homonyms using a hand-classified dataset of over 7500 sentences.

cross SoK: Behind the Accuracy of Complex Human Activity Recognition Using Deep Learning

Authors: Duc-Anh Nguyen, Nhien-An Le-Khac

Abstract: Human Activity Recognition (HAR) is a well-studied field with research dating back to the 1980s. Over time, HAR technologies have evolved significantly from manual feature extraction, rule-based algorithms, and simple machine learning models to powerful deep learning models, from one sensor type to a diverse array of sensing modalities. The scope has also expanded from recognising a limited set of activities to encompassing a larger variety of both simple and complex activities. However, there still exist many challenges that hinder advancement in complex activity recognition using modern deep learning methods. In this paper, we comprehensively systematise factors leading to inaccuracy in complex HAR, such as data variety and model capacity. Among many sensor types, we give more attention to wearable and camera due to their prevalence. Through this Systematisation of Knowledge (SoK) paper, readers can gain a solid understanding of the development history and existing challenges of HAR, different categorisations of activities, obstacles in deep learning-based complex HAR that impact accuracy, and potential research directions.

cross Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation

Authors: Hanyin Wang, Chufan Gao, Bolun Liu, Qiping Xu, Guleid Hussein, Mohamad El Labban, Kingsley Iheasirim, Hariprasad Korsapati, Jimeng Sun

Abstract: Large Language Models (LLMs) have shown promising capabilities in handling clinical text summarization tasks. In this study, we demonstrate that a small open-source LLM can be effectively trained to generate high-quality clinical notes from outpatient patient-doctor dialogues. We achieve this through a comprehensive domain- and task-specific adaptation process for the LLaMA-2 13 billion parameter model. This process incorporates continued pre-training, supervised fine-tuning, and reinforcement learning from both AI and human feedback. We introduced an enhanced approach, termed DistillDirect, for performing on-policy reinforcement learning with Gemini Pro serving as the teacher model. Our resulting model, LLaMA-Clinic, is capable of generating clinical notes that are comparable in quality to those authored by physicians. In a blinded physician reader study, the majority (90.4%) of individual evaluations rated the notes generated by LLaMA-Clinic as "acceptable" or higher across all three criteria: real-world readiness, completeness, and accuracy. Notably, in the more challenging "Assessment and Plan" section, LLaMA-Clinic scored higher (4.2/5) in real-world readiness compared to physician-authored notes (4.1/5). Additionally, we identified caveats in public clinical note datasets, such as ACI-BENCH. We highlight key considerations for future clinical note-generation tasks, emphasizing the importance of pre-defining a best-practice note format. Overall, our research demonstrates the potential and feasibility of training smaller, open-source LLMs to assist with clinical documentation, capitalizing on healthcare institutions' access to patient records and domain expertise. We have made our newly created synthetic clinic dialogue-note dataset and the physician feedback dataset publicly available to foster future research in this field.

cross EEG-Deformer: A Dense Convolutional Transformer for Brain-computer Interfaces

Authors: Yi Ding, Yong Li, Hao Sun, Rui Liu, Chengxuan Tong, Cuntai Guan

Abstract: Effectively learning the temporal dynamics in electroencephalogram (EEG) signals is challenging yet essential for decoding brain activities using brain-computer interfaces (BCIs). Although Transformers are popular for their long-term sequential learning ability in the BCI field, most methods combining Transformers with convolutional neural networks (CNNs) fail to capture the coarse-to-fine temporal dynamics of EEG signals. To overcome this limitation, we introduce EEG-Deformer, which incorporates two main novel components into a CNN-Transformer: (1) a Hierarchical Coarse-to-Fine Transformer (HCT) block that integrates a Fine-grained Temporal Learning (FTL) branch into Transformers, effectively discerning coarse-to-fine temporal patterns; and (2) a Dense Information Purification (DIP) module, which utilizes multi-level, purified temporal information to enhance decoding accuracy. Comprehensive experiments on three representative cognitive tasks consistently verify the generalizability of our proposed EEG-Deformer, demonstrating that it either outperforms existing state-of-the-art methods or is comparable to them. Visualization results show that EEG-Deformer learns from neurophysiologically meaningful brain regions for the corresponding cognitive tasks. The source code can be found at https://github.com/yi-ding-cs/EEG-Deformer.

URLs: https://github.com/yi-ding-cs/EEG-Deformer.

cross A Novel Machine Learning-based Equalizer for a Downstream 100G PAM-4 PON

Authors: Chen Shao, Elias Giacoumidis, Shi Li, Jialei Li, Michael Faerber, Tobias Kaefer, Andre Richter

Abstract: A frequency-calibrated SCINet (FC-SCINet) equalizer is proposed for down-stream 100G PON with 28.7 dB path loss. At 5 km, FC-SCINet improves the BER by 88.87% compared to FFE and a 3-layer DNN with 10.57% lower complexity.

cross Optimizing Brain-Computer Interface Performance: Advancing EEG Signals Channel Selection through Regularized CSP and SPEA II Multi-Objective Optimization

Authors: M. Moein Esfahani, Hossein Sadati, Vince D Calhoun

Abstract: Brain-computer interface systems and the recording of brain activity has garnered significant attention across a diverse spectrum of applications. EEG signals have emerged as a modality for recording neural electrical activity. Among the methodologies designed for feature extraction from EEG data, the method of RCSP has proven to be an approach, particularly in the context of MI tasks. RCSP exhibits efficacy in the discrimination and classification of EEG signals. In optimizing the performance of this method, our research extends to a comparative analysis with conventional CSP techniques, as well as optimized methodologies designed for similar applications. Notably, we employ the meta-heuristic multi-objective Strength Pareto Evolutionary Algorithm II (SPEA-II) as a pivotal component of our research paradigm. This is a state-of-the-art approach in the selection of an subset of channels from a multichannel EEG signal with MI tasks. Our main objective is to formulate an optimum channel selection strategy aimed at identifying the most pertinent subset of channels from the multi-dimensional electroencephalogram (EEG) signals. One of the primary objectives inherent to channel selection in the EEG signal analysis pertains to the reduction of the channel count, an approach that enhances user comfort when utilizing gel-based EEG electrodes. Additionally, within this research, we took benefit of ensemble learning models as a component of our decision-making. This technique serves to mitigate the challenges associated with overfitting, especially when confronted with an extensive array of potentially redundant EEG channels and data noise. Our findings not only affirm the performance of RCSP in MI-based BCI systems, but also underscore the significance of channel selection strategies and ensemble learning techniques in optimizing the performance of EEG signal classification.

cross EEG_RL-Net: Enhancing EEG MI Classification through Reinforcement Learning-Optimised Graph Neural Networks

Authors: Htoo Wai Aung, Jiao Jiao Li, Yang An, Steven W. Su

Abstract: Brain-Computer Interfaces (BCIs) rely on accurately decoding electroencephalography (EEG) motor imagery (MI) signals for effective device control. Graph Neural Networks (GNNs) outperform Convolutional Neural Networks (CNNs) in this regard, by leveraging the spatial relationships between EEG electrodes through adjacency matrices. The EEG_GLT-Net framework, featuring the state-of-the-art EEG_GLT adjacency matrix method, has notably enhanced EEG MI signal classification, evidenced by an average accuracy of 83.95% across 20 subjects on the PhysioNet dataset. This significantly exceeds the 76.10% accuracy rate achieved using the Pearson Correlation Coefficient (PCC) method within the same framework. In this research, we advance the field by applying a Reinforcement Learning (RL) approach to the classification of EEG MI signals. Our innovative method empowers the RL agent, enabling not only the classification of EEG MI data points with higher accuracy, but effective identification of EEG MI data points that are less distinct. We present the EEG_RL-Net, an enhancement of the EEG_GLT-Net framework, which incorporates the trained EEG GCN Block from EEG_GLT-Net at an adjacency matrix density of 13.39% alongside the RL-centric Dueling Deep Q Network (Dueling DQN) block. The EEG_RL-Net model showcases exceptional classification performance, achieving an unprecedented average accuracy of 96.40% across 20 subjects within 25 milliseconds. This model illustrates the transformative effect of the RL in EEG MI time point classification.

cross Baseline Drift Tolerant Signal Encoding for ECG Classification with Deep Learning

Authors: Robert O Shea, Prabodh Katti, Bipin Rajendran

Abstract: Common artefacts such as baseline drift, rescaling, and noise critically limit the performance of machine learningbased automated ECG analysis and interpretation. This study proposes Derived Peak (DP) encoding, a non-parametric method that generates signed spikes corresponding to zero crossings of the signals first and second-order time derivatives. Notably, DP encoding is invariant to shift and scaling artefacts, and its implementation is further simplified by the absence of userdefined parameters. DP encoding was used to encode the 12-lead ECG data from the PTB-XL dataset (n=18,869 participants) and was fed to 1D-ResNet-18 models trained to identify myocardial infarction, conductive deficits and ST-segment abnormalities. Robustness to artefacts was assessed by corrupting ECG data with sinusoidal baseline drift, shift, rescaling and noise, before encoding. The addition of these artefacts resulted in a significant drop in accuracy for seven other methods from prior art, while DP encoding maintained a baseline AUC of 0.88 under drift, shift and rescaling. DP achieved superior performance to unencoded inputs in the presence of shift (AUC under 1mV shift: 0.91 vs 0.62), and rescaling artefacts (AUC 0.91 vs 0.79). Thus, DP encoding is a simple method by which robustness to common ECG artefacts may be improved for automated ECG analysis and interpretation.

cross Federated Learning and Differential Privacy Techniques on Multi-hospital Population-scale Electrocardiogram Data

Authors: Vikhyat Agrawal, Sunil Vasu Kalmady, Venkataseetharam Manoj Malipeddi, Manisimha Varma Manthena, Weijie Sun, Saiful Islam, Abram Hindle, Padma Kaul, Russell Greiner

Abstract: This research paper explores ways to apply Federated Learning (FL) and Differential Privacy (DP) techniques to population-scale Electrocardiogram (ECG) data. The study learns a multi-label ECG classification model using FL and DP based on 1,565,849 ECG tracings from 7 hospitals in Alberta, Canada. The FL approach allowed collaborative model training without sharing raw data between hospitals while building robust ECG classification models for diagnosing various cardiac conditions. These accurate ECG classification models can facilitate the diagnoses while preserving patient confidentiality using FL and DP techniques. Our results show that the performance achieved using our implementation of the FL approach is comparable to that of the pooled approach, where the model is trained over the aggregating data from all hospitals. Furthermore, our findings suggest that hospitals with limited ECGs for training can benefit from adopting the FL model compared to single-site training. In addition, this study showcases the trade-off between model performance and data privacy by employing DP during model training. Our code is available at https://github.com/vikhyatt/Hospital-FL-DP.

URLs: https://github.com/vikhyatt/Hospital-FL-DP.

cross Generalised envelope spectrum-based signal-to-noise objectives: Formulation, optimisation and application for gear fault detection under time-varying speed conditions

Authors: Stephan Schmidt, Daniel N. Wilke, Konstantinos C. Gryllias

Abstract: In vibration-based condition monitoring, optimal filter design improves fault detection by enhancing weak fault signatures within vibration signals. This process involves optimising a derived objective function from a defined objective. The objectives are often based on proxy health indicators to determine the filter's parameters. However, these indicators can be compromised by irrelevant extraneous signal components and fluctuating operational conditions, affecting the filter's efficacy. Fault detection primarily uses the fault component's prominence in the squared envelope spectrum, quantified by a squared envelope spectrum-based signal-to-noise ratio. New optimal filter objective functions are derived from the proposed generalised envelope spectrum-based signal-to-noise objective for machines operating under variable speed conditions. Instead of optimising proxy health indicators, the optimal filter coefficients of the formulation directly maximise the squared envelope spectrum-based signal-to-noise ratio over targeted frequency bands using standard gradient-based optimisers. Four derived objective functions from the proposed objective effectively outperform five prominent methods in tests on three experimental datasets.

cross LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Authors: Justin Zhao, Timothy Wang, Wael Abid, Geoffrey Angus, Arnav Garg, Jeffery Kinnison, Alex Sherstinsky, Piero Molino, Travis Addair, Devvret Rishi

Abstract: Low Rank Adaptation (LoRA) has emerged as one of the most widely adopted methods for Parameter Efficient Fine-Tuning (PEFT) of Large Language Models (LLMs). LoRA reduces the number of trainable parameters and memory usage while achieving comparable performance to full fine-tuning. We aim to assess the viability of training and serving LLMs fine-tuned with LoRA in real-world applications. First, we measure the quality of LLMs fine-tuned with quantized low rank adapters across 10 base models and 31 tasks for a total of 310 models. We find that 4-bit LoRA fine-tuned models outperform base models by 34 points and GPT-4 by 10 points on average. Second, we investigate the most effective base models for fine-tuning and assess the correlative and predictive capacities of task complexity heuristics in forecasting the outcomes of fine-tuning. Finally, we evaluate the latency and concurrency capabilities of LoRAX, an open-source Multi-LoRA inference server that facilitates the deployment of multiple LoRA fine-tuned models on a single GPU using shared base model weights and dynamic adapter loading. LoRAX powers LoRA Land, a web application that hosts 25 LoRA fine-tuned Mistral-7B LLMs on a single NVIDIA A100 GPU with 80GB memory. LoRA Land highlights the quality and cost-effectiveness of employing multiple specialized LLMs over a single, general-purpose LLM.

cross EEG-MACS: Manifold Attention and Confidence Stratification for EEG-based Cross-Center Brain Disease Diagnosis under Unreliable Annotations

Authors: Zhenxi Song, Ruihan Qin, Huixia Ren, Zhen Liang, Yi Guo, Min Zhang, Zhiguo Zhang

Abstract: Cross-center data heterogeneity and annotation unreliability significantly challenge the intelligent diagnosis of diseases using brain signals. A notable example is the EEG-based diagnosis of neurodegenerative diseases, which features subtler abnormal neural dynamics typically observed in small-group settings. To advance this area, in this work, we introduce a transferable framework employing Manifold Attention and Confidence Stratification (MACS) to diagnose neurodegenerative disorders based on EEG signals sourced from four centers with unreliable annotations. The MACS framework's effectiveness stems from these features: 1) The Augmentor generates various EEG-represented brain variants to enrich the data space; 2) The Switcher enhances the feature space for trusted samples and reduces overfitting on incorrectly labeled samples; 3) The Encoder uses the Riemannian manifold and Euclidean metrics to capture spatiotemporal variations and dynamic synchronization in EEG; 4) The Projector, equipped with dual heads, monitors consistency across multiple brain variants and ensures diagnostic accuracy; 5) The Stratifier adaptively stratifies learned samples by confidence levels throughout the training process; 6) Forward and backpropagation in MACS are constrained by confidence stratification to stabilize the learning system amid unreliable annotations. Our subject-independent experiments, conducted on both neurocognitive and movement disorders using cross-center corpora, have demonstrated superior performance compared to existing related algorithms. This work not only improves EEG-based diagnostics for cross-center and small-setting brain diseases but also offers insights into extending MACS techniques to other data analyses, tackling data heterogeneity and annotation unreliability in multimedia and multimodal content understanding.

cross Joint Signal Detection and Automatic Modulation Classification via Deep Learning

Authors: Huijun Xing, Xuhui Zhang, Shuo Chang, Jinke Ren, Zixun Zhang, Jie Xu, Shuguang Cui

Abstract: Signal detection and modulation classification are two crucial tasks in various wireless communication systems. Different from prior works that investigate them independently, this paper studies the joint signal detection and automatic modulation classification (AMC) by considering a realistic and complex scenario, in which multiple signals with different modulation schemes coexist at different carrier frequencies. We first generate a coexisting RADIOML dataset (CRML23) to facilitate the joint design. Different from the publicly available AMC dataset ignoring the signal detection step and containing only one signal, our synthetic dataset covers the more realistic multiple-signal coexisting scenario. Then, we present a joint framework for detection and classification (JDM) for such a multiple-signal coexisting environment, which consists of two modules for signal detection and AMC, respectively. In particular, these two modules are interconnected using a designated data structure called "proposal". Finally, we conduct extensive simulations over the newly developed dataset, which demonstrate the effectiveness of our designs. Our code and dataset are now available as open-source (https://github.com/Singingkettle/ChangShuoRadioData).

URLs: https://github.com/Singingkettle/ChangShuoRadioData).

cross HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis

Authors: Andy He, Darren Key, Mason Bulling, Andrew Chang, Skyler Shapiro, Everett Lee

Abstract: Graphics Processing Units (GPUs) have become the leading hardware accelerator for deep learning applications and are used widely in training and inference of transformers; transformers have achieved state-of-the-art performance in many areas of machine learning and are especially used in most modern Large Language Models (LLMs). However, GPUs require large amounts of energy, which poses environmental concerns, demands high operational costs, and causes GPUs to be unsuitable for edge computing. We develop an accelerator for transformers, namely, Llama 2, an open-source state-of-the-art LLM, using high level synthesis (HLS) on Field Programmable Gate Arrays (FPGAs). HLS allows us to rapidly prototype FPGA designs without writing code at the register-transfer level (RTL). We name our method HLSTransform, and the FPGA designs we synthesize with HLS achieve up to a 12.75x reduction and 8.25x reduction in energy used per token on the Xilinx Virtex UltraScale+ VU9P FPGA compared to an Intel Xeon Broadwell E5-2686 v4 CPU and NVIDIA RTX 3090 GPU respectively, while increasing inference speeds by up to 2.46x compared to CPU and maintaining 0.53x the speed of an RTX 3090 GPU despite the GPU's 4 times higher base clock rate. With the lack of existing open-source FPGA accelerators for transformers, we open-source our code and document our steps for synthesis. We hope this work will serve as a step in democratizing the use of FPGAs in transformer inference and inspire research into energy-efficient inference methods as a whole. The code can be found on https://github.com/HLSTransform/submission.

URLs: https://github.com/HLSTransform/submission.

cross Modeling Caption Diversity in Contrastive Vision-Language Pretraining

Authors: Samuel Lavoie, Polina Kirichenko, Mark Ibrahim, Mahmoud Assran, Andrew Gordon Wildon, Aaron Courville, Nicolas Ballas

Abstract: There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by mapping an image and its caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image. In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. Llip's vision encoder outputs a set of visual features that are mixed into a final representation by conditioning on information derived from the text. We show that Llip outperforms non-contextualized baselines like CLIP and SigLIP on a variety of tasks even with large-scale encoders. Llip improves zero-shot classification by an average of 2.9% zero-shot classification benchmarks with a ViT-G/14 encoder. Specifically, Llip attains a zero-shot top-1 accuracy of 83.5% on ImageNet outperforming a similarly sized CLIP by 1.4%. We also demonstrate improvement on zero-shot retrieval on MS-COCO by 6.0%. We provide a comprehensive analysis of the components introduced by the method and demonstrate that Llip leads to richer visual representations.

cross Federated Graph Learning for EV Charging Demand Forecasting with Personalization Against Cyberattacks

Authors: Yi Li, Renyou Xie, Chaojie Li, Yi Wang, Zhaoyang Dong

Abstract: Mitigating cybersecurity risk in electric vehicle (EV) charging demand forecasting plays a crucial role in the safe operation of collective EV chargings, the stability of the power grid, and the cost-effective infrastructure expansion. However, existing methods either suffer from the data privacy issue and the susceptibility to cyberattacks or fail to consider the spatial correlation among different stations. To address these challenges, a federated graph learning approach involving multiple charging stations is proposed to collaboratively train a more generalized deep learning model for demand forecasting while capturing spatial correlations among various stations and enhancing robustness against potential attacks. Firstly, for better model performance, a Graph Neural Network (GNN) model is leveraged to characterize the geographic correlation among different charging stations in a federated manner. Secondly, to ensure robustness and deal with the data heterogeneity in a federated setting, a message passing that utilizes a global attention mechanism to aggregate personalized models for each client is proposed. Thirdly, by concerning cyberattacks, a special credit-based function is designed to mitigate potential threats from malicious clients or unwanted attacks. Extensive experiments on a public EV charging dataset are conducted using various deep learning techniques and federated learning methods to demonstrate the prediction accuracy and robustness of the proposed approach.

cross More is Better: Deep Domain Adaptation with Multiple Sources

Authors: Sicheng Zhao, Hui Chen, Hu Huang, Pengfei Xu, Guiguang Ding

Abstract: In many practical applications, it is often difficult and expensive to obtain large-scale labeled data to train state-of-the-art deep neural networks. Therefore, transferring the learned knowledge from a separate, labeled source domain to an unlabeled or sparsely labeled target domain becomes an appealing alternative. However, direct transfer often results in significant performance decay due to domain shift. Domain adaptation (DA) aims to address this problem by aligning the distributions between the source and target domains. Multi-source domain adaptation (MDA) is a powerful and practical extension in which the labeled data may be collected from multiple sources with different distributions. In this survey, we first define various MDA strategies. Then we systematically summarize and compare modern MDA methods in the deep learning era from different perspectives, followed by commonly used datasets and a brief benchmark. Finally, we discuss future research directions for MDA that are worth investigating.

cross F$^3$low: Frame-to-Frame Coarse-grained Molecular Dynamics with SE(3) Guided Flow Matching

Authors: Shaoning Li, Yusong Wang, Mingyu Li, Jian Zhang, Bin Shao, Nanning Zheng, Jian Tang

Abstract: Molecular dynamics (MD) is a crucial technique for simulating biological systems, enabling the exploration of their dynamic nature and fostering an understanding of their functions and properties. To address exploration inefficiency, emerging enhanced sampling approaches like coarse-graining (CG) and generative models have been employed. In this work, we propose a \underline{Frame-to-Frame} generative model with guided \underline{Flow}-matching (F$3$low) for enhanced sampling, which (a) extends the domain of CG modeling to the SE(3) Riemannian manifold; (b) retreating CGMD simulations as autoregressively sampling guided by the former frame via flow-matching models; (c) targets the protein backbone, offering improved insights into secondary structure formation and intricate folding pathways. Compared to previous methods, F$3$low allows for broader exploration of conformational space. The ability to rapidly generate diverse conformations via force-free generative paradigm on SE(3) paves the way toward efficient enhanced sampling methods.

cross CLIPArTT: Light-weight Adaptation of CLIP to New Domains at Test Time

Authors: Gustavo Adolfo Vargas Hakim, David Osowiechi, Mehrdad Noori, Milad Cheraghalikhani, Ali Bahri, Moslem Yazdanpanah, Ismail Ben Ayed, Christian Desrosiers

Abstract: Pre-trained vision-language models (VLMs), exemplified by CLIP, demonstrate remarkable adaptability across zero-shot classification tasks without additional training. However, their performance diminishes in the presence of domain shifts. In this study, we introduce CLIP Adaptation duRing Test-Time (CLIPArTT), a fully test-time adaptation (TTA) approach for CLIP, which involves automatic text prompts construction during inference for their use as text supervision. Our method employs a unique, minimally invasive text prompt tuning process, wherein multiple predicted classes are aggregated into a single new text prompt, used as pseudo label to re-classify inputs in a transductive manner. Additionally, we pioneer the standardization of TTA benchmarks (e.g., TENT) in the realm of VLMs. Our findings demonstrate that, without requiring additional transformations nor new trainable modules, CLIPArTT enhances performance dynamically across non-corrupted datasets such as CIFAR-10, corrupted datasets like CIFAR-10-C and CIFAR-10.1, alongside synthetic datasets such as VisDA-C. This research underscores the potential for improving VLMs' adaptability through novel test-time strategies, offering insights for robust performance across varied datasets and environments. The code can be found at: https://github.com/dosowiechi/CLIPArTT.git

URLs: https://github.com/dosowiechi/CLIPArTT.git

cross Quantum AI for Alzheimer's disease early screening

Authors: Giacomo Cappiello, Filippo Caruso

Abstract: Quantum machine learning is a new research field combining quantum information science and machine learning. Quantum computing technologies seem to be particularly well suited to solving problems in the health sector in an efficient way, because they may deal with large datasets more efficiently than classical AI. Alzheimer's disease is a neurodegenerative brain disorder that mostly affects elderly people, causing important cognitive impairments. It is the most common cause of dementia and it has an effect on memory, thought, learning abilities and movement control. This type of disease has no cure, consequently an early diagnosis is fundamental for reducing its impact. The analysis of handwriting can be effective for diagnosing, as many researches have conjectured. The DARWIN (Diagnosis AlzheimeR WIth haNdwriting) dataset contains handwriting samples from people affected by Alzheimer's disease and a group of healthy people. Here we apply quantum AI to this use-case. In particular, we use this dataset to test kernel methods for classification task and compare their performances with the ones obtained via quantum machine learning methods. We find that quantum and classical algorithms achieve similar performances and in some cases quantum methods perform even better. Our results pave the way for future new quantum machine learning applications in early-screening diagnostics in the healthcare domain.

cross Quantum-Classical Separations in Shallow-Circuit-Based Learning with and without Noises

Authors: Zhihan Zhang, Weiyuan Gong, Weikang Li, Dong-Ling Deng

Abstract: We study quantum-classical separations between classical and quantum supervised learning models based on constant depth (i.e., shallow) circuits, in scenarios with and without noises. We construct a classification problem defined by a noiseless shallow quantum circuit and rigorously prove that any classical neural network with bounded connectivity requires logarithmic depth to output correctly with a larger-than-exponentially-small probability. This unconditional near-optimal quantum-classical separation originates from the quantum nonlocality property that distinguishes quantum circuits from their classical counterparts. We further derive the noise thresholds for demonstrating such a separation on near-term quantum devices under the depolarization noise model. We prove that this separation will persist if the noise strength is upper bounded by an inverse polynomial with respect to the system size, and vanish if the noise strength is greater than an inverse polylogarithmic function. In addition, for quantum devices with constant noise strength, we prove that no super-polynomial classical-quantum separation exists for any classification task defined by shallow Clifford circuits, independent of the structures of the circuits that specify the learning models.

cross A Review of Barren Plateaus in Variational Quantum Computing

Authors: Martin Larocca, Supanut Thanasilp, Samson Wang, Kunal Sharma, Jacob Biamonte, Patrick J. Coles, Lukasz Cincio, Jarrod R. McClean, Zo\"e Holmes, M. Cerezo

Abstract: Variational quantum computing offers a flexible computational paradigm with applications in diverse areas. However, a key obstacle to realizing their potential is the Barren Plateau (BP) phenomenon. When a model exhibits a BP, its parameter optimization landscape becomes exponentially flat and featureless as the problem size increases. Importantly, all the moving pieces of an algorithm -- choices of ansatz, initial state, observable, loss function and hardware noise -- can lead to BPs when ill-suited. Due to the significant impact of BPs on trainability, researchers have dedicated considerable effort to develop theoretical and heuristic methods to understand and mitigate their effects. As a result, the study of BPs has become a thriving area of research, influencing and cross-fertilizing other fields such as quantum optimal control, tensor networks, and learning theory. This article provides a comprehensive review of the current understanding of the BP phenomenon.

cross Rigged Dynamic Mode Decomposition: Data-Driven Generalized Eigenfunction Decompositions for Koopman Operators

Authors: Matthew J. Colbrook, Catherine Drysdale, Andrew Horning

Abstract: We introduce the Rigged Dynamic Mode Decomposition (Rigged DMD) algorithm, which computes generalized eigenfunction decompositions of Koopman operators. By considering the evolution of observables, Koopman operators transform complex nonlinear dynamics into a linear framework suitable for spectral analysis. While powerful, traditional Dynamic Mode Decomposition (DMD) techniques often struggle with continuous spectra. Rigged DMD addresses these challenges with a data-driven methodology that approximates the Koopman operator's resolvent and its generalized eigenfunctions using snapshot data from the system's evolution. At its core, Rigged DMD builds wave-packet approximations for generalized Koopman eigenfunctions and modes by integrating Measure-Preserving Extended Dynamic Mode Decomposition with high-order kernels for smoothing. This provides a robust decomposition encompassing both discrete and continuous spectral elements. We derive explicit high-order convergence theorems for generalized eigenfunctions and spectral measures. Additionally, we propose a novel framework for constructing rigged Hilbert spaces using time-delay embedding, significantly extending the algorithm's applicability. We provide examples, including systems with a Lebesgue spectrum, integrable Hamiltonian systems, the Lorenz system, and a high-Reynolds number lid-driven flow in a two-dimensional square cavity, demonstrating Rigged DMD's convergence, efficiency, and versatility. This work paves the way for future research and applications of decompositions with continuous spectra.

cross SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators

Authors: Mohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque

Abstract: Emerging multi-model workloads with heavy models like recent large language models significantly increased the compute and memory demands on hardware. To address such increasing demands, designing a scalable hardware architecture became a key problem. Among recent solutions, the 2.5D silicon interposer multi-chip module (MCM)-based AI accelerator has been actively explored as a promising scalable solution due to their significant benefits in the low engineering cost and composability. However, previous MCM accelerators are based on homogeneous architectures with fixed dataflow, which encounter major challenges from highly heterogeneous multi-model workloads due to their limited workload adaptivity. Therefore, in this work, we explore the opportunity in the heterogeneous dataflow MCM AI accelerators. We identify the scheduling of multi-model workload on heterogeneous dataflow MCM AI accelerator is an important and challenging problem due to its significance and scale, which reaches O(10^18) scale even for a single model case on 6x6 chiplets. We develop a set of heuristics to navigate the huge scheduling space and codify them into a scheduler with advanced techniques such as inter-chiplet pipelining. Our evaluation on ten multi-model workload scenarios for datacenter multitenancy and AR/VR use-cases has shown the efficacy of our approach, achieving on average 35.3% and 31.4% less energy-delay product (EDP) for the respective applications settings compared to homogeneous baselines.

cross Sifting out communities in large sparse networks

Authors: Sharlee Climer, Kenneth Smith Jr, Wei Yang, Lisa de las Fuentes, Victor G. D\'avila-Rom\'an, C. Charles Gu

Abstract: Research data sets are growing to unprecedented sizes and network modeling is commonly used to extract complex relationships in diverse domains, such as genetic interactions involved in disease, logistics, and social communities. As the number of nodes increases in a network, an increasing sparsity of edges is a practical limitation due to memory restrictions. Moreover, many of these sparse networks exhibit very large numbers of nodes with no adjacent edges, as well as disjoint components of nodes with no edges connecting them. A prevalent aim in network modeling is the identification of clusters, or communities, of nodes that are highly interrelated. Several definitions of strong community structure have been introduced to facilitate this task, each with inherent assumptions and biases. We introduce an intuitive objective function for quantifying the quality of clustering results in large sparse networks. We utilize a two-step method for identifying communities which is especially well-suited for this domain as the first step efficiently divides the network into the disjoint components, while the second step optimizes clustering of the produced components based on the new objective. Using simulated networks, optimization based on the new objective function consistently yields significantly higher accuracy than those based on the modularity function, with the widest gaps appearing for the noisiest networks. Additionally, applications to benchmark problems illustrate the intuitive correctness of our approach. Finally, the practicality of our approach is demonstrated in real-world data in which we identify complex genetic interactions in large-scale networks comprised of tens of thousands of nodes. Based on these three different types of trials, our results clearly demonstrate the usefulness of our two-step procedure and the accuracy of our simple objective.

cross HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond

Authors: Stefan Abi-Karam, Rishov Sarkar, Allison Seigler, Sean Lowe, Zhigang Wei, Hanqiu Chen, Nanditha Rao, Lizy John, Aman Arora, Cong Hao

Abstract: Machine learning (ML) techniques have been applied to high-level synthesis (HLS) flows for quality-of-result (QoR) prediction and design space exploration (DSE). Nevertheless, the scarcity of accessible high-quality HLS datasets and the complexity of building such datasets present challenges. Existing datasets have limitations in terms of benchmark coverage, design space enumeration, vendor extensibility, or lack of reproducible and extensible software for dataset construction. Many works also lack user-friendly ways to add more designs, limiting wider adoption of such datasets. In response to these challenges, we introduce HLSFactory, a comprehensive framework designed to facilitate the curation and generation of high-quality HLS design datasets. HLSFactory has three main stages: 1) a design space expansion stage to elaborate single HLS designs into large design spaces using various optimization directives across multiple vendor tools, 2) a design synthesis stage to execute HLS and FPGA tool flows concurrently across designs, and 3) a data aggregation stage for extracting standardized data into packaged datasets for ML usage. This tripartite architecture ensures broad design space coverage via design space expansion and supports multiple vendor tools. Users can contribute to each stage with their own HLS designs and synthesis results and extend the framework itself with custom frontends and tool flows. We also include an initial set of built-in designs from common HLS benchmarks curated open-source HLS designs. We showcase the versatility and multi-functionality of our framework through six case studies: I) Design space sampling; II) Fine-grained parallelism backend speedup; III) Targeting Intel's HLS flow; IV) Adding new auxiliary designs; V) Integrating published HLS data; VI) HLS tool version regression benchmarking. Code at https://github.com/sharc-lab/HLSFactory.

URLs: https://github.com/sharc-lab/HLSFactory.

cross Quickest Change Detection with Confusing Change

Authors: Yu-Zhen Janice Chen, Jinhang Zuo, Venugopal V. Veeravalli, Don Towsley

Abstract: In the problem of quickest change detection (QCD), a change occurs at some unknown time in the distribution of a sequence of independent observations. This work studies a QCD problem where the change is either a bad change, which we aim to detect, or a confusing change, which is not of our interest. Our objective is to detect a bad change as quickly as possible while avoiding raising a false alarm for pre-change or a confusing change. We identify a specific set of pre-change, bad change, and confusing change distributions that pose challenges beyond the capabilities of standard Cumulative Sum (CuSum) procedures. Proposing novel CuSum-based detection procedures, S-CuSum and J-CuSum, leveraging two CuSum statistics, we offer solutions applicable across all kinds of pre-change, bad change, and confusing change distributions. For both S-CuSum and J-CuSum, we provide analytical performance guarantees and validate them by numerical results. Furthermore, both procedures are computationally efficient as they only require simple recursive updates.

cross Gameplay Filters: Safe Robot Walking through Adversarial Imagination

Authors: Duy P. Nguyen, Kai-Chieh Hsu, Wenhao Yu, Jie Tan, Jaime F. Fisac

Abstract: Ensuring the safe operation of legged robots in uncertain, novel environments is crucial to their widespread adoption. Despite recent advances in safety filters that can keep arbitrary task-driven policies from incurring safety failures, existing solutions for legged robot locomotion still rely on simplified dynamics and may fail when the robot is perturbed away from predefined stable gaits. This paper presents a general approach that leverages offline game-theoretic reinforcement learning to synthesize a highly robust safety filter for high-order nonlinear dynamics. This gameplay filter then maintains runtime safety by continually simulating adversarial futures and precluding task-driven actions that would cause it to lose future games (and thereby violate safety). Validated on a 36-dimensional quadruped robot locomotion task, the gameplay safety filter exhibits inherent robustness to the sim-to-real gap without manual tuning or heuristic designs. Physical experiments demonstrate the effectiveness of the gameplay safety filter under perturbations, such as tugging and unmodeled irregular terrains, while simulation studies shed light on how to trade off computation and conservativeness without compromising safety.

cross Learning to Boost the Performance of Stable Nonlinear Systems

Authors: Luca Furieri, Clara Luc\'ia Galimberti, Giancarlo Ferrari-Trecate

Abstract: The growing scale and complexity of safety-critical control systems underscore the need to evolve current control architectures aiming for the unparalleled performances achievable through state-of-the-art optimization and machine learning algorithms. However, maintaining closed-loop stability while boosting the performance of nonlinear control systems using data-driven and deep-learning approaches stands as an important unsolved challenge. In this paper, we tackle the performance-boosting problem with closed-loop stability guarantees. Specifically, we establish a synergy between the Internal Model Control (IMC) principle for nonlinear systems and state-of-the-art unconstrained optimization approaches for learning stable dynamics. Our methods enable learning over arbitrarily deep neural network classes of performance-boosting controllers for stable nonlinear systems; crucially, we guarantee Lp closed-loop stability even if optimization is halted prematurely, and even when the ground-truth dynamics are unknown, with vanishing conservatism in the class of stabilizing policies as the model uncertainty is reduced to zero. We discuss the implementation details of the proposed control schemes, including distributed ones, along with the corresponding optimization procedures, demonstrating the potential of freely shaping the cost functions through several numerical experiments.

cross Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis

Authors: Prateek Verma, Minh-Hao Van, Xintao Wu

Abstract: Vision language models (VLMs) have recently emerged and gained the spotlight for their ability to comprehend the dual modality of image and textual data. VLMs such as LLaVA, ChatGPT-4, and Gemini have recently shown impressive performance on tasks such as natural image captioning, visual question answering (VQA), and spatial reasoning. Additionally, a universal segmentation model by Meta AI, Segment Anything Model (SAM) shows unprecedented performance at isolating objects from unforeseen images. Since medical experts, biologists, and materials scientists routinely examine microscopy or medical images in conjunction with textual information in the form of captions, literature, or reports, and draw conclusions of great importance and merit, it is indubitably essential to test the performance of VLMs and foundation models such as SAM, on these images. In this study, we charge ChatGPT, LLaVA, Gemini, and SAM with classification, segmentation, counting, and VQA tasks on a variety of microscopy images. We observe that ChatGPT and Gemini are impressively able to comprehend the visual features in microscopy images, while SAM is quite capable at isolating artefacts in a general sense. However, the performance is not close to that of a domain expert - the models are readily encumbered by the introduction of impurities, defects, artefact overlaps and diversity present in the images.

cross LOTUS: Improving Transformer Efficiency with Sparsity Pruning and Data Lottery Tickets

Authors: Ojasw Upadhyay

Abstract: Vision transformers have revolutionized computer vision, but their computational demands present challenges for training and deployment. This paper introduces LOTUS (LOttery Transformers with Ultra Sparsity), a novel method that leverages data lottery ticket selection and sparsity pruning to accelerate vision transformer training while maintaining accuracy. Our approach focuses on identifying and utilizing the most informative data subsets and eliminating redundant model parameters to optimize the training process. Through extensive experiments, we demonstrate the effectiveness of LOTUS in achieving rapid convergence and high accuracy with significantly reduced computational requirements. This work highlights the potential of combining data selection and sparsity techniques for efficient vision transformer training, opening doors for further research and development in this area.

cross Transformer-Based Self-Supervised Learning for Histopathological Classification of Ischemic Stroke Clot Origin

Authors: K. Yeh, M. S. Jabal, V. Gupta, D. F. Kallmes, W. Brinjikji, B. S. Erdal

Abstract: Background and Purpose: Identifying the thromboembolism source in ischemic stroke is crucial for treatment and secondary prevention yet is often undetermined. This study describes a self-supervised deep learning approach in digital pathology of emboli for classifying ischemic stroke clot origin from histopathological images. Methods: The dataset included whole slide images (WSI) from the STRIP AI Kaggle challenge, consisting of retrieved clots from ischemic stroke patients following mechanical thrombectomy. Transformer-based deep learning models were developed using transfer learning and self-supervised pretraining for classifying WSI. Customizations included an attention pooling layer, weighted loss function, and threshold optimization. Various model architectures were tested and compared, and model performances were primarily evaluated using weighted logarithmic loss. Results: The model achieved a logloss score of 0.662 in cross-validation and 0.659 on the test set. Different model backbones were compared, with the swin_large_patch4_window12_384 showed higher performance. Thresholding techniques for clot origin classification were employed to balance false positives and negatives. Conclusion: The study demonstrates the extent of efficacy of transformer-based deep learning models in identifying ischemic stroke clot origins from histopathological images and emphasizes the need for refined modeling techniques specifically adapted to thrombi WSI. Further research is needed to improve model performance, interpretability, validate its effectiveness. Future enhancement could include integrating larger patient cohorts, advanced preprocessing strategies, and exploring ensemble multimodal methods for enhanced diagnostic accuracy.

cross Accelerated Fully First-Order Methods for Bilevel and Minimax Optimization

Authors: Chris Junchi Li

Abstract: This paper presents a new algorithm member for accelerating first-order methods for bilevel optimization, namely the \emph{(Perturbed) Restarted Accelerated Fully First-order methods for Bilevel Approximation}, abbreviated as \texttt{(P)RAF${}^2$BA}. The algorithm leverages \emph{fully} first-order oracles and seeks approximate stationary points in nonconvex-strongly-convex bilevel optimization, enhancing oracle complexity for efficient optimization. Theoretical guarantees for finding approximate first-order stationary points and second-order stationary points at the state-of-the-art query complexities are established, showcasing their effectiveness in solving complex optimization tasks. Empirical studies for real-world problems are provided to further validate the outperformance of our proposed algorithms. The significance of \texttt{(P)RAF${}^2$BA} in optimizing nonconvex-strongly-convex bilevel optimization problems is underscored by its state-of-the-art convergence rates and computational efficiency.

cross EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion

Authors: Guangyao Zhai, Evin P{\i}nar \"Ornek, Dave Zhenyu Chen, Ruotong Liao, Yan Di, Nassir Navab, Federico Tombari, Benjamin Busam

Abstract: We present EchoScene, an interactive and controllable generative model that generates 3D indoor scenes on scene graphs. EchoScene leverages a dual-branch diffusion model that dynamically adapts to scene graphs. Existing methods struggle to handle scene graphs due to varying numbers of nodes, multiple edge combinations, and manipulator-induced node-edge operations. EchoScene overcomes this by associating each node with a denoising process and enables collaborative information exchange, enhancing controllable and consistent generation aware of global constraints. This is achieved through an information echo scheme in both shape and layout branches. At every denoising step, all processes share their denoising data with an information exchange unit that combines these updates using graph convolution. The scheme ensures that the denoising processes are influenced by a holistic understanding of the scene graph, facilitating the generation of globally coherent scenes. The resulting scenes can be manipulated during inference by editing the input scene graph and sampling the noise in the diffusion model. Extensive experiments validate our approach, which maintains scene controllability and surpasses previous methods in generation fidelity. Moreover, the generated scenes are of high quality and thus directly compatible with off-the-shelf texture generation. Code and trained models are open-sourced.

cross Benchmarking Representations for Speech, Music, and Acoustic Events

Authors: Moreno La Quatra, Alkis Koudounas, Lorenzo Vaiani, Elena Baralis, Luca Cagliero, Paolo Garza, Sabato Marco Siniscalchi

Abstract: Limited diversity in standardized benchmarks for evaluating audio representation learning (ARL) methods may hinder systematic comparison of current methods' capabilities. We present ARCH, a comprehensive benchmark for evaluating ARL methods on diverse audio classification domains, covering acoustic events, music, and speech. ARCH comprises 12 datasets, that allow us to thoroughly assess pre-trained SSL models of different sizes. ARCH streamlines benchmarking of ARL techniques through its unified access to a wide range of domains and its ability to readily incorporate new datasets and models. To address the current lack of open-source, pre-trained models for non-speech audio, we also release new pre-trained models that demonstrate strong performance on non-speech datasets. We argue that the presented wide-ranging evaluation provides valuable insights into state-of-the-art ARL methods, and is useful to pinpoint promising research directions.

cross Deriving Lehmer and H\"older means as maximum weighted likelihood estimates for the multivariate exponential family

Authors: Djemel Ziou, Issam Fakir

Abstract: The links between the mean families of Lehmer and H\"older and the weighted maximum likelihood estimator have recently been established in the case of a regular univariate exponential family. In this article, we will extend the outcomes obtained to the multivariate case. This extension provides a probabilistic interpretation of these families of means and could therefore broaden their uses in various applications.

cross CACTUS: Chemistry Agent Connecting Tool-Usage to Science

Authors: Andrew D. McNaughton, Gautham Ramalaxmi, Agustin Kruel, Carter R. Knutson, Rohith A. Varikoti, Neeraj Kumar

Abstract: Large language models (LLMs) have shown remarkable potential in various domains, but they often lack the ability to access and reason over domain-specific knowledge and tools. In this paper, we introduced CACTUS (Chemistry Agent Connecting Tool-Usage to Science), an LLM-based agent that integrates cheminformatics tools to enable advanced reasoning and problem-solving in chemistry and molecular discovery. We evaluate the performance of CACTUS using a diverse set of open-source LLMs, including Gemma-7b, Falcon-7b, MPT-7b, Llama2-7b, and Mistral-7b, on a benchmark of thousands of chemistry questions. Our results demonstrate that CACTUS significantly outperforms baseline LLMs, with the Gemma-7b and Mistral-7b models achieving the highest accuracy regardless of the prompting strategy used. Moreover, we explore the impact of domain-specific prompting and hardware configurations on model performance, highlighting the importance of prompt engineering and the potential for deploying smaller models on consumer-grade hardware without significant loss in accuracy. By combining the cognitive capabilities of open-source LLMs with domain-specific tools, CACTUS can assist researchers in tasks such as molecular property prediction, similarity searching, and drug-likeness assessment. Furthermore, CACTUS represents a significant milestone in the field of cheminformatics, offering an adaptable tool for researchers engaged in chemistry and molecular discovery. By integrating the strengths of open-source LLMs with domain-specific tools, CACTUS has the potential to accelerate scientific advancement and unlock new frontiers in the exploration of novel, effective, and safe therapeutic candidates, catalysts, and materials. Moreover, CACTUS's ability to integrate with automated experimentation platforms and make data-driven decisions in real time opens up new possibilities for autonomous discovery.

cross Context-Aware Clustering using Large Language Models

Authors: Sindhu Tipirneni, Ravinarayana Adkathimar, Nurendra Choudhary, Gaurush Hiranandani, Rana Ali Amjad, Vassilis N. Ioannidis, Changhe Yuan, Chandan K. Reddy

Abstract: Despite the remarkable success of Large Language Models (LLMs) in text understanding and generation, their potential for text clustering tasks remains underexplored. We observed that powerful closed-source LLMs provide good quality clusterings of entity sets but are not scalable due to the massive compute power required and the associated costs. Thus, we propose CACTUS (Context-Aware ClusTering with aUgmented triplet losS), a systematic approach that leverages open-source LLMs for efficient and effective supervised clustering of entity subsets, particularly focusing on text-based entities. Existing text clustering methods fail to effectively capture the context provided by the entity subset. Moreover, though there are several language modeling based approaches for clustering, very few are designed for the task of supervised clustering. This paper introduces a novel approach towards clustering entity subsets using LLMs by capturing context via a scalable inter-entity attention mechanism. We propose a novel augmented triplet loss function tailored for supervised clustering, which addresses the inherent challenges of directly applying the triplet loss to this problem. Furthermore, we introduce a self-supervised clustering task based on text augmentation techniques to improve the generalization of our model. For evaluation, we collect ground truth clusterings from a closed-source LLM and transfer this knowledge to an open-source LLM under the supervised clustering framework, allowing a faster and cheaper open-source model to perform the same task. Experiments on various e-commerce query and product clustering datasets demonstrate that our proposed approach significantly outperforms existing unsupervised and supervised baselines under various external clustering evaluation metrics.

cross Estimate the building height at a 10-meter resolution based on Sentinel data

Authors: Xin Yan

Abstract: Building height is an important indicator for scientific research and practical application. However, building height products with a high spatial resolution (10m) are still very scarce. To meet the needs of high-resolution building height estimation models, this study established a set of spatial-spectral-temporal feature databases, combining SAR data provided by Sentinel-1, optical data provided by Sentinel-2, and shape data provided by building footprints. The statistical indicators on the time scale are extracted to form a rich database of 160 features. This study combined with permutation feature importance, Shapley Additive Explanations, and Random Forest variable importance, and the final stable features are obtained through an expert scoring system. This study took 12 large, medium, and small cities in the United States as the training data. It used moving windows to aggregate the pixels to solve the impact of SAR image displacement and building shadows. This study built a building height model based on a random forest model and compared three model ensemble methods of bagging, boosting, and stacking. To evaluate the accuracy of the prediction results, this study collected Lidar data in the test area, and the evaluation results showed that its R-Square reached 0.78, which can prove that the building height can be obtained effectively. The fast production of high-resolution building height data can support large-scale scientific research and application in many fields.

cross Spider: A Unified Framework for Context-dependent Concept Understanding

Authors: Xiaoqi Zhao, Youwei Pang, Wei Ji, Baicheng Sheng, Jiaming Zuo, Lihe Zhang, Huchuan Lu

Abstract: Different from the context-independent (CI) concepts such as human, car, and airplane, context-dependent (CD) concepts require higher visual understanding ability, such as camouflaged object and medical lesion. Despite the rapid advance of many CD understanding tasks in respective branches, the isolated evolution leads to their limited cross-domain generalisation and repetitive technique innovation. Since there is a strong coupling relationship between foreground and background context in CD tasks, existing methods require to train separate models in their focused domains. This restricts their real-world CD concept understanding towards artificial general intelligence (AGI). We propose a unified model with a single set of parameters, Spider, which only needs to be trained once. With the help of the proposed concept filter driven by the image-mask group prompt, Spider is able to understand and distinguish diverse strong context-dependent concepts to accurately capture the Prompter's intention. Without bells and whistles, Spider significantly outperforms the state-of-the-art specialized models in 8 different context-dependent segmentation tasks, including 4 natural scenes (salient, camouflaged, and transparent objects and shadow) and 4 medical lesions (COVID-19, polyp, breast, and skin lesion with color colonoscopy, CT, ultrasound, and dermoscopy modalities). Besides, Spider shows obvious advantages in continuous learning. It can easily complete the training of new tasks by fine-tuning parameters less than 1\% and bring a tolerable performance degradation of less than 5\% for all old tasks. The source code will be publicly available at \href{https://github.com/Xiaoqi-Zhao-DLUT/Spider-UniCDSeg}{Spider-UniCDSeg}.

URLs: https://github.com/Xiaoqi-Zhao-DLUT/Spider-UniCDSeg

cross Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment

Authors: Aditya Chakravarty

Abstract: Recent transformer-based ASR models have achieved word-error rates (WER) below 4%, surpassing human annotator accuracy, yet they demand extensive server resources, contributing to significant carbon footprints. The traditional server-based architecture of ASR also presents privacy concerns, alongside reliability and latency issues due to network dependencies. In contrast, on-device (edge) ASR enhances privacy, boosts performance, and promotes sustainability by effectively balancing energy use and accuracy for specific applications. This study examines the effects of quantization, memory demands, and energy consumption on the performance of various ASR model inference on the NVIDIA Jetson Orin Nano. By analyzing WER and transcription speed across models using FP32, FP16, and INT8 quantization on clean and noisy datasets, we highlight the crucial trade-offs between accuracy, speeds, quantization, energy efficiency, and memory needs. We found that changing precision from fp32 to fp16 halves the energy consumption for audio transcription across different models, with minimal performance degradation. A larger model size and number of parameters neither guarantees better resilience to noise, nor predicts the energy consumption for a given transcription load. These, along with several other findings offer novel insights for optimizing ASR systems within energy- and memory-limited environments, crucial for the development of efficient on-device ASR solutions. The code and input data needed to reproduce the results in this article are open sourced are available on [https://github.com/zzadiues3338/ASR-energy-jetson].

URLs: https://github.com/zzadiues3338/ASR-energy-jetson].

cross Network reconstruction via the minimum description length principle

Authors: Tiago P. Peixoto

Abstract: A fundamental problem associated with the task of network reconstruction from dynamical or behavioral data consists in determining the most appropriate model complexity in a manner that prevents overfitting, and produces an inferred network with a statistically justifiable number of edges. The status quo in this context is based on $L_{1}$ regularization combined with cross-validation. As we demonstrate, besides its high computational cost, this commonplace approach unnecessarily ties the promotion of sparsity with weight "shrinkage". This combination forces a trade-off between the bias introduced by shrinkage and the network sparsity, which often results in substantial overfitting even after cross-validation. In this work, we propose an alternative nonparametric regularization scheme based on hierarchical Bayesian inference and weight quantization, which does not rely on weight shrinkage to promote sparsity. Our approach follows the minimum description length (MDL) principle, and uncovers the weight distribution that allows for the most compression of the data, thus avoiding overfitting without requiring cross-validation. The latter property renders our approach substantially faster to employ, as it requires a single fit to the complete data. As a result, we have a principled and efficient inference scheme that can be used with a large variety of generative models, without requiring the number of edges to be known in advance. We also demonstrate that our scheme yields systematically increased accuracy in the reconstruction of both artificial and empirical networks. We highlight the use of our method with the reconstruction of interaction networks between microbial communities from large-scale abundance samples involving in the order of $10^{4}$ to $10^{5}$ species, and demonstrate how the inferred model can be used to predict the outcome of interventions in the system.

cross MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts

Authors: Jianan Zhou, Zhiguang Cao, Yaoxin Wu, Wen Song, Yining Ma, Jie Zhang, Chi Xu

Abstract: Learning to solve vehicle routing problems (VRPs) has garnered much attention. However, most neural solvers are only structured and trained independently on a specific problem, making them less generic and practical. In this paper, we aim to develop a unified neural solver that can cope with a range of VRP variants simultaneously. Specifically, we propose a multi-task vehicle routing solver with mixture-of-experts (MVMoE), which greatly enhances the model capacity without a proportional increase in computation. We further develop a hierarchical gating mechanism for the MVMoE, delivering a good trade-off between empirical performance and computational complexity. Experimentally, our method significantly promotes the zero-shot generalization performance on 10 unseen VRP variants, and showcases decent results on the few-shot setting and real-world benchmark instances. We further provide extensive studies on the effect of MoE configurations in solving VRPs. Surprisingly, the hierarchical gating can achieve much better out-of-distribution generalization performance. The source code is available at: https://github.com/RoyalSkye/Routing-MVMoE.

URLs: https://github.com/RoyalSkye/Routing-MVMoE.

cross LOQA: Learning with Opponent Q-Learning Awareness

Authors: Milad Aghajohari, Juan Agustin Duque, Tim Cooijmans, Aaron Courville

Abstract: In various real-world scenarios, interactions among agents often resemble the dynamics of general-sum games, where each agent strives to optimize its own utility. Despite the ubiquitous relevance of such settings, decentralized machine learning algorithms have struggled to find equilibria that maximize individual utility while preserving social welfare. In this paper we introduce Learning with Opponent Q-Learning Awareness (LOQA), a novel, decentralized reinforcement learning algorithm tailored to optimizing an agent's individual utility while fostering cooperation among adversaries in partially competitive environments. LOQA assumes the opponent samples actions proportionally to their action-value function Q. Experimental results demonstrate the effectiveness of LOQA at achieving state-of-the-art performance in benchmark scenarios such as the Iterated Prisoner's Dilemma and the Coin Game. LOQA achieves these outcomes with a significantly reduced computational footprint, making it a promising approach for practical multi-agent applications.

cross Continual Learning for Robust Gate Detection under Dynamic Lighting in Autonomous Drone Racing

Authors: Zhongzheng Qiao, Xuan Huy Pham, Savitha Ramasamy, Xudong Jiang, Erdal Kayacan, Andriy Sarabakha

Abstract: In autonomous and mobile robotics, a principal challenge is resilient real-time environmental perception, particularly in situations characterized by unknown and dynamic elements, as exemplified in the context of autonomous drone racing. This study introduces a perception technique for detecting drone racing gates under illumination variations, which is common during high-speed drone flights. The proposed technique relies upon a lightweight neural network backbone augmented with capabilities for continual learning. The envisaged approach amalgamates predictions of the gates' positional coordinates, distance, and orientation, encapsulating them into a cohesive pose tuple. A comprehensive number of tests serve to underscore the efficacy of this approach in confronting diverse and challenging scenarios, specifically those involving variable lighting conditions. The proposed methodology exhibits notable robustness in the face of illumination variations, thereby substantiating its effectiveness.

cross Fair Recommendations with Limited Sensitive Attributes: A Distributionally Robust Optimization Approach

Authors: Tianhao Shi, Yang Zhang, Jizhi Zhang, Fuli Feng, Xiangnan He

Abstract: As recommender systems are indispensable in various domains such as job searching and e-commerce, providing equitable recommendations to users with different sensitive attributes becomes an imperative requirement. Prior approaches for enhancing fairness in recommender systems presume the availability of all sensitive attributes, which can be difficult to obtain due to privacy concerns or inadequate means of capturing these attributes. In practice, the efficacy of these approaches is limited, pushing us to investigate ways of promoting fairness with limited sensitive attribute information. Toward this goal, it is important to reconstruct missing sensitive attributes. Nevertheless, reconstruction errors are inevitable due to the complexity of real-world sensitive attribute reconstruction problems and legal regulations. Thus, we pursue fair learning methods that are robust to reconstruction errors. To this end, we propose Distributionally Robust Fair Optimization (DRFO), which minimizes the worst-case unfairness over all potential probability distributions of missing sensitive attributes instead of the reconstructed one to account for the impact of the reconstruction errors. We provide theoretical and empirical evidence to demonstrate that our method can effectively ensure fairness in recommender systems when only limited sensitive attributes are accessible.

cross Multivariate trace estimation using quantum state space linear algebra

Authors: Liron Mor Yosef, Shashanka Ubaru, Lior Horesh, Haim Avron

Abstract: In this paper, we present a quantum algorithm for approximating multivariate traces, i.e. the traces of matrix products. Our research is motivated by the extensive utility of multivariate traces in elucidating spectral characteristics of matrices, as well as by recent advancements in leveraging quantum computing for faster numerical linear algebra. Central to our approach is a direct translation of a multivariate trace formula into a quantum circuit, achieved through a sequence of low-level circuit construction operations. To facilitate this translation, we introduce \emph{quantum Matrix States Linear Algebra} (qMSLA), a framework tailored for the efficient generation of state preparation circuits via primitive matrix algebra operations. Our algorithm relies on sets of state preparation circuits for input matrices as its primary inputs and yields two state preparation circuits encoding the multivariate trace as output. These circuits are constructed utilizing qMSLA operations, which enact the aforementioned multivariate trace formula. We emphasize that our algorithm's inputs consist solely of state preparation circuits, eschewing harder to synthesize constructs such as Block Encodings. Furthermore, our approach operates independently of the availability of specialized hardware like QRAM, underscoring its versatility and practicality.

cross Federated Learning with Heterogeneous Data Handling for Robust Vehicular Object Detection

Authors: Ahmad Khalil, Tizian Dege, Pegah Golchin, Rostyslav Olshevskyi, Antonio Fernandez Anta, Tobias Meuser

Abstract: In the pursuit of refining precise perception models for fully autonomous driving, continual online model training becomes essential. Federated Learning (FL) within vehicular networks offers an efficient mechanism for model training while preserving raw sensory data integrity. Yet, FL struggles with non-identically distributed data (e.g., quantity skew), leading to suboptimal convergence rates during model training. In previous work, we introduced FedLA, an innovative Label-Aware aggregation method addressing data heterogeneity in FL for generic scenarios. In this paper, we introduce FedProx+LA, a novel FL method building upon the state-of-the-art FedProx and FedLA to tackle data heterogeneity, which is specifically tailored for vehicular networks. We evaluate the efficacy of FedProx+LA in continuous online object detection model training. Through a comparative analysis against conventional and state-of-the-art methods, our findings reveal the superior convergence rate of FedProx+LA. Notably, if the label distribution is very heterogeneous, our FedProx+LA approach shows substantial improvements in detection performance compared to baseline methods, also outperforming our previous FedLA approach. Moreover, both FedLA and FedProx+LA increase convergence speed by 30% compared to baseline methods.

cross Hypergraph $p$-Laplacian regularization on point clouds for data interpolation

Authors: Kehan Shi, Martin Burger

Abstract: As a generalization of graphs, hypergraphs are widely used to model higher-order relations in data. This paper explores the benefit of the hypergraph structure for the interpolation of point cloud data that contain no explicit structural information. We define the $\varepsilon_n$-ball hypergraph and the $k_n$-nearest neighbor hypergraph on a point cloud and study the $p$-Laplacian regularization on the hypergraphs. We prove the variational consistency between the hypergraph $p$-Laplacian regularization and the continuum $p$-Laplacian regularization in a semisupervised setting when the number of points $n$ goes to infinity while the number of labeled points remains fixed. A key improvement compared to the graph case is that the results rely on weaker assumptions on the upper bound of $\varepsilon_n$ and $k_n$. To solve the convex but non-differentiable large-scale optimization problem, we utilize the stochastic primal-dual hybrid gradient algorithm. Numerical experiments on data interpolation verify that the hypergraph $p$-Laplacian regularization outperforms the graph $p$-Laplacian regularization in preventing the development of spikes at the labeled points.

cross Investigating Self-Supervised Image Denoising with Denaturation

Authors: Hiroki Waida, Kimihiro Yamazaki, Atsushi Tokuhisa, Mutsuyo Wada, Yuichiro Wada

Abstract: Self-supervised learning for image denoising problems in the presence of denaturation for noisy data is a crucial approach in machine learning. However, theoretical understanding of the performance of the approach that uses denatured data is lacking. To provide better understanding of the approach, in this paper, we analyze a self-supervised denoising algorithm that uses denatured data in depth through theoretical analysis and numerical experiments. Through the theoretical analysis, we discuss that the algorithm finds desired solutions to the optimization problem with the population risk, while the guarantee for the empirical risk depends on the hardness of the denoising task in terms of denaturation levels. We also conduct several experiments to investigate the performance of an extended algorithm in practice. The results indicate that the algorithm training with denatured images works, and the empirical performance aligns with the theoretical results. These results suggest several insights for further improvement of self-supervised image denoising that uses denatured data in future directions.

cross Leveraging Procedural Generation for Learning Autonomous Peg-in-Hole Assembly in Space

Authors: Andrej Orsula, Matthieu Geist, Miguel Olivares-Mendez, Carol Martinez

Abstract: The ability to autonomously assemble structures is crucial for the development of future space infrastructure. However, the unpredictable conditions of space pose significant challenges for robotic systems, necessitating the development of advanced learning techniques to enable autonomous assembly. In this study, we present a novel approach for learning autonomous peg-in-hole assembly in the context of space robotics. Our focus is on enhancing the generalization and adaptability of autonomous systems through deep reinforcement learning. By integrating procedural generation and domain randomization, we train agents in a highly parallelized simulation environment across a spectrum of diverse scenarios with the aim of acquiring a robust policy. The proposed approach is evaluated using three distinct reinforcement learning algorithms to investigate the trade-offs among various paradigms. We demonstrate the adaptability of our agents to novel scenarios and assembly sequences while emphasizing the potential of leveraging advanced simulation techniques for robot learning in space. Our findings set the stage for future advancements in intelligent robotic systems capable of supporting ambitious space missions and infrastructure development beyond Earth.

cross Learning-to-solve unit commitment based on few-shot physics-guided spatial-temporal graph convolution network

Authors: Mei Yang, Gao Qiu andJunyong Liu, Kai Liu

Abstract: This letter proposes a few-shot physics-guided spatial temporal graph convolutional network (FPG-STGCN) to fast solve unit commitment (UC). Firstly, STGCN is tailored to parameterize UC. Then, few-shot physics-guided learning scheme is proposed. It exploits few typical UC solutions yielded via commercial optimizer to escape from local minimum, and leverages the augmented Lagrangian method for constraint satisfaction. To further enable both feasibility and continuous relaxation for integers in learning process, straight-through estimator for Tanh-Sign composition is proposed to fully differentiate the mixed integer solution space. Case study on the IEEE benchmark justifies that, our method bests mainstream learning ways on UC feasibility, and surpasses traditional solver on efficiency.

cross Mathematics of Differential Machine Learning in Derivative Pricing and Hedging

Authors: Pedro Duarte Gomes

Abstract: This article introduces the groundbreaking concept of the financial differential machine learning algorithm through a rigorous mathematical framework. Diverging from existing literature on financial machine learning, the work highlights the profound implications of theoretical assumptions within financial models on the construction of machine learning algorithms. This endeavour is particularly timely as the finance landscape witnesses a surge in interest towards data-driven models for the valuation and hedging of derivative products. Notably, the predictive capabilities of neural networks have garnered substantial attention in both academic research and practical financial applications. The approach offers a unified theoretical foundation that facilitates comprehensive comparisons, both at a theoretical level and in experimental outcomes. Importantly, this theoretical grounding lends substantial weight to the experimental results, affirming the differential machine learning method's optimality within the prevailing context. By anchoring the insights in rigorous mathematics, the article bridges the gap between abstract financial concepts and practical algorithmic implementations.

cross TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms

Authors: Yueyuan Sui, Minghui Zhao, Junxi Xia, Xiaofan Jiang, Stephen Xia

Abstract: We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art models with memory footprints of hundreds of MBs and methods better suited for resource-constrained systems. To adapt TRAMBA to vibration-based sensing modalities, we pre-train TRAMBA with audio speech datasets that are widely available. Then, users fine-tune with a small amount of bone conduction data. TRAMBA outperforms state-of-art GANs by up to 7.3% in PESQ and 1.8% in STOI, with an order of magnitude smaller memory footprint and an inference speed up of up to 465 times. We integrate TRAMBA into real systems and show that TRAMBA (i) improves battery life of wearables by up to 160% by requiring less data sampling and transmission; (ii) generates higher quality voice in noisy environments than over-the-air speech; (iii) requires a memory footprint of less than 20.0 MB.

cross Prompt engineering paradigms for medical applications: scoping review and recommendations for better practices

Authors: Jamil Zaghir, Marco Naguib, Mina Bjelogrlic, Aur\'elie N\'ev\'eol, Xavier Tannier, Christian Lovis

Abstract: Prompt engineering is crucial for harnessing the potential of large language models (LLMs), especially in the medical domain where specialized terminology and phrasing is used. However, the efficacy of prompt engineering in the medical domain remains to be explored. In this work, 114 recent studies (2022-2024) applying prompt engineering in medicine, covering prompt learning (PL), prompt tuning (PT), and prompt design (PD) are reviewed. PD is the most prevalent (78 articles). In 12 papers, PD, PL, and PT terms were used interchangeably. ChatGPT is the most commonly used LLM, with seven papers using it for processing sensitive clinical data. Chain-of-Thought emerges as the most common prompt engineering technique. While PL and PT articles typically provide a baseline for evaluating prompt-based approaches, 64% of PD studies lack non-prompt-related baselines. We provide tables and figures summarizing existing work, and reporting recommendations to guide future research contributions.

cross Quantifying Spatial Domain Explanations in BCI using Earth Mover's Distance

Authors: Param Rajpura, Hubert Cecotti, Yogesh Kumar Meena

Abstract: Brain-computer interface (BCI) systems facilitate unique communication between humans and computers, benefiting severely disabled individuals. Despite decades of research, BCIs are not fully integrated into clinical and commercial settings. It's crucial to assess and explain BCI performance, offering clear explanations for potential users to avoid frustration when it doesn't work as expected. This work investigates the efficacy of different deep learning and Riemannian geometry-based classification models in the context of motor imagery (MI) based BCI using electroencephalography (EEG). We then propose an optimal transport theory-based approach using earth mover's distance (EMD) to quantify the comparison of the feature relevance map with the domain knowledge of neuroscience. For this, we utilized explainable AI (XAI) techniques for generating feature relevance in the spatial domain to identify important channels for model outcomes. Three state-of-the-art models are implemented - 1) Riemannian geometry-based classifier, 2) EEGNet, and 3) EEG Conformer, and the observed trend in the model's accuracy across different architectures on the dataset correlates with the proposed feature relevance metrics. The models with diverse architectures perform significantly better when trained on channels relevant to motor imagery than data-driven channel selection. This work focuses attention on the necessity for interpretability and incorporating metrics beyond accuracy, underscores the value of combining domain knowledge and quantifying model interpretations with data-driven approaches in creating reliable and robust Brain-Computer Interfaces (BCIs).

cross Behavior Imitation for Manipulator Control and Grasping with Deep Reinforcement Learning

Authors: Liu Qiyuan

Abstract: The existing Motion Imitation models typically require expert data obtained through MoCap devices, but the vast amount of training data needed is difficult to acquire, necessitating substantial investments of financial resources, manpower, and time. This project combines 3D human pose estimation with reinforcement learning, proposing a novel model that simplifies Motion Imitation into a prediction problem of joint angle values in reinforcement learning. This significantly reduces the reliance on vast amounts of training data, enabling the agent to learn an imitation policy from just a few seconds of video and exhibit strong generalization capabilities. It can quickly apply the learned policy to imitate human arm motions in unfamiliar videos. The model first extracts skeletal motions of human arms from a given video using 3D human pose estimation. These extracted arm motions are then morphologically retargeted onto a robotic manipulator. Subsequently, the retargeted motions are used to generate reference motions. Finally, these reference motions are used to formulate a reinforcement learning problem, enabling the agent to learn a policy for imitating human arm motions. This project excels at imitation tasks and demonstrates robust transferability, accurately imitating human arm motions from other unfamiliar videos. This project provides a lightweight, convenient, efficient, and accurate Motion Imitation model. While simplifying the complex process of Motion Imitation, it achieves notably outstanding performance.

cross Koopman Data-Driven Predictive Control with Robust Stability and Recursive Feasibility Guarantees

Authors: Thomas de Jong, Valentina Breschi, Maarten Schoukens, Mircea Lazar

Abstract: In this paper, we consider the design of data-driven predictive controllers for nonlinear systems from input-output data via linear-in-control input Koopman lifted models. Instead of identifying and simulating a Koopman model to predict future outputs, we design a subspace predictive controller in the Koopman space. This allows us to learn the observables minimizing the multi-step output prediction error of the Koopman subspace predictor, preventing the propagation of prediction errors. To avoid losing feasibility of our predictive control scheme due to prediction errors, we compute a terminal cost and terminal set in the Koopman space and we obtain recursive feasibility guarantees through an interpolated initial state. As a third contribution, we introduce a novel regularization cost yielding input-to-state stability guarantees with respect to the prediction error for the resulting closed-loop system. The performance of the developed Koopman data-driven predictive control methodology is illustrated on a nonlinear benchmark example from the literature.

cross The Effectiveness of LLMs as Annotators: A Comparative Overview and Empirical Analysis of Direct Representation

Authors: Maja Pavlovic, Massimo Poesio

Abstract: Large Language Models (LLMs) have emerged as powerful support tools across various natural language tasks and a range of application domains. Recent studies focus on exploring their capabilities for data annotation. This paper provides a comparative overview of twelve studies investigating the potential of LLMs in labelling data. While the models demonstrate promising cost and time-saving benefits, there exist considerable limitations, such as representativeness, bias, sensitivity to prompt variations and English language preference. Leveraging insights from these studies, our empirical analysis further examines the alignment between human and GPT-generated opinion distributions across four subjective datasets. In contrast to the studies examining representation, our methodology directly obtains the opinion distribution from GPT. Our analysis thereby supports the minority of studies that are considering diverse perspectives when evaluating data annotation tasks and highlights the need for further research in this direction.

cross Non-iterative Optimization of Trajectory and Radio Resource for Aerial Network

Authors: Hyeonsu Lyu, Jonggyu Jang, Harim Lee, Hyun Jong Yang

Abstract: We address a joint trajectory planning, user association, resource allocation, and power control problem to maximize proportional fairness in the aerial IoT network, considering practical end-to-end quality-of-service (QoS) and communication schedules. Though the problem is rather ancient, apart from the fact that the previous approaches have never considered user- and time-specific QoS, we point out a prevalent mistake in coordinate optimization approaches adopted by the majority of the literature. Coordinate optimization approaches, which repetitively optimize radio resources for a fixed trajectory and vice versa, generally converge to local optima when all variables are differentiable. However, these methods often stagnate at a non-stationary point, significantly degrading the network utility in mixed-integer problems such as joint trajectory and radio resource optimization. We detour this problem by converting the formulated problem into the Markov decision process (MDP). Exploiting the beneficial characteristics of the MDP, we design a non-iterative framework that cooperatively optimizes trajectory and radio resources without initial trajectory choice. The proposed framework can incorporate various trajectory planning algorithms such as the genetic algorithm, tree search, and reinforcement learning. Extensive comparisons with diverse baselines verify that the proposed framework significantly outperforms the state-of-the-art method, nearly achieving the global optimum. Our implementation code is available at https://github.com/hslyu/dbspf.

URLs: https://github.com/hslyu/dbspf.

cross LLMSat: A Large Language Model-Based Goal-Oriented Agent for Autonomous Space Exploration

Authors: David Maranto

Abstract: As spacecraft journey further from Earth with more complex missions, systems of greater autonomy and onboard intelligence are called for. Reducing reliance on human-based mission control becomes increasingly critical if we are to increase our rate of solar-system-wide exploration. Recent work has explored AI-based goal-oriented systems to increase the level of autonomy in mission execution. These systems make use of symbolic reasoning managers to make inferences from the state of a spacecraft and a handcrafted knowledge base, enabling autonomous generation of tasks and re-planning. Such systems have proven to be successful in controlled cases, but they are difficult to implement as they require human-crafted ontological models to allow the spacecraft to understand the world. Reinforcement learning has been applied to train robotic agents to pursue a goal. A new architecture for autonomy is called for. This work explores the application of Large Language Models (LLMs) as the high-level control system of a spacecraft. Using a systems engineering approach, this work presents the design and development of an agentic spacecraft controller by leveraging an LLM as a reasoning engine, to evaluate the utility of such an architecture in achieving higher levels of spacecraft autonomy. A series of deep space mission scenarios simulated within the popular game engine Kerbal Space Program (KSP) are used as case studies to evaluate the implementation against the requirements. It is shown the reasoning and planning abilities of present-day LLMs do not scale well as the complexity of a mission increases, but this can be alleviated with adequate prompting frameworks and strategic selection of the agent's level of authority over the host spacecraft. This research evaluates the potential of LLMs in augmenting autonomous decision-making systems for future robotic space applications.

cross Learning Force Control for Legged Manipulation

Authors: Tifanny Portela, Gabriel B. Margolis, Yandong Ji, Pulkit Agrawal

Abstract: Controlling contact forces during interactions is critical for locomotion and manipulation tasks. While sim-to-real reinforcement learning (RL) has succeeded in many contact-rich problems, current RL methods achieve forceful interactions implicitly without explicitly regulating forces. We propose a method for training RL policies for direct force control without requiring access to force sensing. We showcase our method on a whole-body control platform of a quadruped robot with an arm. Such force control enables us to perform gravity compensation and impedance control, unlocking compliant whole-body manipulation. The learned whole-body controller with variable compliance makes it intuitive for humans to teleoperate the robot by only commanding the manipulator, and the robot's body adjusts automatically to achieve the desired position and force. Consequently, a human teleoperator can easily demonstrate a wide variety of loco-manipulation tasks. To the best of our knowledge, we provide the first deployment of learned whole-body force control in legged manipulators, paving the way for more versatile and adaptable legged robots.

cross Random Pareto front surfaces

Authors: Ben Tu, Nikolas Kantas, Robert M. Lee, Behrang Shafei

Abstract: The Pareto front of a set of vectors is the subset which is comprised solely of all of the best trade-off points. By interpolating this subset, we obtain the optimal trade-off surface. In this work, we prove a very useful result which states that all Pareto front surfaces can be explicitly parametrised using polar coordinates. In particular, our polar parametrisation result tells us that we can fully characterise any Pareto front surface using the length function, which is a scalar-valued function that returns the projected length along any positive radial direction. Consequently, by exploiting this representation, we show how it is possible to generalise many useful concepts from linear algebra, probability and statistics, and decision theory to function over the space of Pareto front surfaces. Notably, we focus our attention on the stochastic setting where the Pareto front surface itself is a stochastic process. Among other things, we showcase how it is possible to define and estimate many statistical quantities of interest such as the expectation, covariance and quantile of any Pareto front surface distribution. As a motivating example, we investigate how these statistics can be used within a design of experiments setting, where the goal is to both infer and use the Pareto front surface distribution in order to make effective decisions. Besides this, we also illustrate how these Pareto front ideas can be used within the context of extreme value theory. Finally, as a numerical example, we applied some of our new methodology on a real-world air pollution data set.

cross MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors

Authors: Yuan Tang, Xu Han, Xianzhi Li, Qiao Yu, Yixue Hao, Long Hu, Min Chen

Abstract: Large 2D vision-language models (2D-LLMs) have gained significant attention by bridging Large Language Models (LLMs) with images using a simple projector. Inspired by their success, large 3D point cloud-language models (3D-LLMs) also integrate point clouds into LLMs. However, directly aligning point clouds with LLM requires expensive training costs, typically in hundreds of GPU-hours on A100, which hinders the development of 3D-LLMs. In this paper, we introduce MiniGPT-3D, an efficient and powerful 3D-LLM that achieves multiple SOTA results while training for only 27 hours on one RTX 3090. Specifically, we propose to align 3D point clouds with LLMs using 2D priors from 2D-LLMs, which can leverage the similarity between 2D and 3D visual information. We introduce a novel four-stage training strategy for modality alignment in a cascaded way, and a mixture of query experts module to adaptively aggregate features with high efficiency. Moreover, we utilize parameter-efficient fine-tuning methods LoRA and Norm fine-tuning, resulting in only 47.8M learnable parameters, which is up to 260x fewer than existing methods. Extensive experiments show that MiniGPT-3D achieves SOTA on 3D object classification and captioning tasks, with significantly cheaper training costs. Notably, MiniGPT-3D gains an 8.12 increase on GPT-4 evaluation score for the challenging object captioning task compared to ShapeLLM-13B, while the latter costs 160 total GPU-hours on 8 A800. We are the first to explore the efficient 3D-LLM, offering new insights to the community. Code and weights are available at https://github.com/TangYuan96/MiniGPT-3D.

URLs: https://github.com/TangYuan96/MiniGPT-3D.

cross In-and-Out: Algorithmic Diffusion for Sampling Convex Bodies

Authors: Yunbum Kook, Santosh S. Vempala, Matthew S. Zhang

Abstract: We present a new random walk for uniformly sampling high-dimensional convex bodies. It achieves state-of-the-art runtime complexity with stronger guarantees on the output than previously known, namely in R\'enyi divergence (which implies TV, $\mathcal{W}_2$, KL, $\chi^2$). The proof departs from known approaches for polytime algorithms for the problem -- we utilize a stochastic diffusion perspective to show contraction to the target distribution with the rate of convergence determined by functional isoperimetric constants of the stationary density.

cross Closed-form congestion control via deep symbolic regression

Authors: Jean Martins, Igor Almeida, Ricardo Souza, Silvia Lins

Abstract: As mobile networks embrace the 5G era, the interest in adopting Reinforcement Learning (RL) algorithms to handle challenges in ultra-low-latency and high throughput scenarios increases. Simultaneously, the advent of packetized fronthaul networks imposes demanding requirements that traditional congestion control mechanisms cannot accomplish, highlighting the potential of RL-based congestion control algorithms. Although learning RL policies optimized for satisfying the stringent fronthaul requirements is feasible, the adoption of neural network models in real deployments still poses some challenges regarding real-time inference and interpretability. This paper proposes a methodology to deal with such challenges while maintaining the performance and generalization capabilities provided by a baseline RL policy. The method consists of (1) training a congestion control policy specialized in fronthaul-like networks via reinforcement learning, (2) collecting state-action experiences from the baseline, and (3) performing deep symbolic regression on the collected dataset. The proposed process overcomes the challenges related to inference-time limitations through closed-form expressions that approximate the baseline performance (link utilization, delay, and fairness) and which can be directly implemented in any programming language. Finally, we analyze the inner workings of the closed-form expressions.

cross A Review of Reward Functions for Reinforcement Learning in the context of Autonomous Driving

Authors: Ahmed Abouelazm, Jonas Michel, J. Marius Zoellner

Abstract: Reinforcement learning has emerged as an important approach for autonomous driving. A reward function is used in reinforcement learning to establish the learned skill objectives and guide the agent toward the optimal policy. Since autonomous driving is a complex domain with partly conflicting objectives with varying degrees of priority, developing a suitable reward function represents a fundamental challenge. This paper aims to highlight the gap in such function design by assessing different proposed formulations in the literature and dividing individual objectives into Safety, Comfort, Progress, and Traffic Rules compliance categories. Additionally, the limitations of the reviewed reward functions are discussed, such as objectives aggregation and indifference to driving context. Furthermore, the reward categories are frequently inadequately formulated and lack standardization. This paper concludes by proposing future research that potentially addresses the observed shortcomings in rewards, including a reward validation framework and structured rewards that are context-aware and able to resolve conflicts.

cross Creative Problem Solving in Large Language and Vision Models -- What Would it Take?

Authors: Lakshmi Nair, Evana Gizzi, Jivko Sinapov

Abstract: In this paper, we discuss approaches for integrating Computational Creativity (CC) with research in large language and vision models (LLVMs) to address a key limitation of these models, i.e., creative problem solving. We present preliminary experiments showing how CC principles can be applied to address this limitation through augmented prompting. With this work, we hope to foster discussions of Computational Creativity in the context of ML algorithms for creative problem solving in LLVMs. Our code is at: https://github.com/lnairGT/creative-problem-solving-LLMs

URLs: https://github.com/lnairGT/creative-problem-solving-LLMs

cross UQA: Corpus for Urdu Question Answering

Authors: Samee Arif, Sualeha Farid, Awais Athar, Agha Ali Raza

Abstract: This paper introduces UQA, a novel dataset for question answering and text comprehension in Urdu, a low-resource language with over 70 million native speakers. UQA is generated by translating the Stanford Question Answering Dataset (SQuAD2.0), a large-scale English QA dataset, using a technique called EATS (Enclose to Anchor, Translate, Seek), which preserves the answer spans in the translated context paragraphs. The paper describes the process of selecting and evaluating the best translation model among two candidates: Google Translator and Seamless M4T. The paper also benchmarks several state-of-the-art multilingual QA models on UQA, including mBERT, XLM-RoBERTa, and mT5, and reports promising results. For XLM-RoBERTa-XL, we have an F1 score of 85.99 and 74.56 EM. UQA is a valuable resource for developing and testing multilingual NLP systems for Urdu and for enhancing the cross-lingual transferability of existing models. Further, the paper demonstrates the effectiveness of EATS for creating high-quality datasets for other languages and domains. The UQA dataset and the code are publicly available at www.github.com/sameearif/UQA.

cross Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders

Authors: Yi Yu, Yufei Wang, Song Xia, Wenhan Yang, Shijian Lu, Yap-Peng Tan, Alex C. Kot

Abstract: Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled. Defenses against these poisoning attacks can be categorized based on whether specific interventions are adopted during training. The first approach is training-time defense, such as adversarial training, which can mitigate poisoning effects but is computationally intensive. The other approach is pre-training purification, e.g., image short squeezing, which consists of several simple compressions but often encounters challenges in dealing with various UEs. Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method. Firstly, we uncover rate-constrained variational autoencoders (VAEs), demonstrating a clear tendency to suppress the perturbations in UEs. We subsequently conduct a theoretical analysis for this phenomenon. Building upon these insights, we introduce a disentangle variational autoencoder (D-VAE), capable of disentangling the perturbations with learnable class-wise embeddings. Based on this network, a two-stage purification approach is naturally developed. The first stage focuses on roughly eliminating perturbations, while the second stage produces refined, poison-free results, ensuring effectiveness and robustness across various scenarios. Extensive experiments demonstrate the remarkable performance of our method across CIFAR-10, CIFAR-100, and a 100-class ImageNet-subset. Code is available at https://github.com/yuyi-sd/D-VAE.

URLs: https://github.com/yuyi-sd/D-VAE.

cross Dynamic Local Average Treatment Effects

Authors: Ravi B. Sojitra, Vasilis Syrgkanis

Abstract: We consider Dynamic Treatment Regimes (DTRs) with one sided non-compliance that arise in applications such as digital recommendations and adaptive medical trials. These are settings where decision makers encourage individuals to take treatments over time, but adapt encouragements based on previous encouragements, treatments, states, and outcomes. Importantly, individuals may choose to (not) comply with a treatment recommendation, whenever it is made available to them, based on unobserved confounding factors. We provide non-parametric identification, estimation, and inference for Dynamic Local Average Treatment Effects, which are expected values of multi-period treatment contrasts among appropriately defined complier subpopulations. Under standard assumptions in the Instrumental Variable and DTR literature, we show that one can identify local average effects of contrasts that correspond to offering treatment at any single time step. Under an additional cross-period effect-compliance independence assumption, which is satisfied in Staggered Adoption settings and a generalization of them, which we define as Staggered Compliance settings, we identify local average treatment effects of treating in multiple time periods.

cross NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment

Authors: Gerald Shen, Zhilin Wang, Olivier Delalleau, Jiaqi Zeng, Yi Dong, Daniel Egert, Shengyang Sun, Jimmy Zhang, Sahil Jain, Ali Taghibakhshi, Markel Sanz Ausin, Ashwath Aithal, Oleksii Kuchaiev

Abstract: Aligning Large Language Models (LLMs) with human values and preferences is essential for making them helpful and safe. However, building efficient tools to perform alignment can be challenging, especially for the largest and most competent LLMs which often contain tens or hundreds of billions of parameters. We create NeMo-Aligner, a toolkit for model alignment that can efficiently scale to using hundreds of GPUs for training. NeMo-Aligner comes with highly optimized and scalable implementations for major paradigms of model alignment such as: Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO), SteerLM, and Self-Play Fine-Tuning (SPIN). Additionally, our toolkit supports running most of the alignment techniques in a Parameter Efficient Fine-Tuning (PEFT) setting. NeMo-Aligner is designed for extensibility, allowing support for other alignment techniques with minimal effort. It is open-sourced with Apache 2.0 License and we invite community contributions at https://github.com/NVIDIA/NeMo-Aligner

URLs: https://github.com/NVIDIA/NeMo-Aligner

cross Designing Algorithmic Recommendations to Achieve Human-AI Complementarity

Authors: Bryce McLaughlin, Jann Spiess

Abstract: Algorithms frequently assist, rather than replace, human decision-makers. However, the design and analysis of algorithms often focus on predicting outcomes and do not explicitly model their effect on human decisions. This discrepancy between the design and role of algorithmic assistants becomes of particular concern in light of empirical evidence that suggests that algorithmic assistants again and again fail to improve human decisions. In this article, we formalize the design of recommendation algorithms that assist human decision-makers without making restrictive ex-ante assumptions about how recommendations affect decisions. We formulate an algorithmic-design problem that leverages the potential-outcomes framework from causal inference to model the effect of recommendations on a human decision-maker's binary treatment choice. Within this model, we introduce a monotonicity assumption that leads to an intuitive classification of human responses to the algorithm. Under this monotonicity assumption, we can express the human's response to algorithmic recommendations in terms of their compliance with the algorithm and the decision they would take if the algorithm sends no recommendation. We showcase the utility of our framework using an online experiment that simulates a hiring task. We argue that our approach explains the relative performance of different recommendation algorithms in the experiment, and can help design solutions that realize human-AI complementarity.

cross FeNNol: an Efficient and Flexible Library for Building Force-field-enhanced Neural Network Potentials

Authors: Thomas Pl\'e, Olivier Adjoua, Louis Lagard\`ere, Jean-Philip Piquemal

Abstract: Neural network interatomic potentials (NNPs) have recently proven to be powerful tools to accurately model complex molecular systems while bypassing the high numerical cost of ab-initio molecular dynamics simulations. In recent years, numerous advances in model architectures as well as the development of hybrid models combining machine-learning (ML) with more traditional, physically-motivated, force-field interactions have considerably increased the design space of ML potentials. In this paper, we present FeNNol, a new library for building, training and running force-field-enhanced neural network potentials. It provides a flexible and modular system for building hybrid models, allowing to easily combine state-of-the-art embeddings with ML-parameterized physical interaction terms without the need for explicit programming. Furthermore, FeNNol leverages the automatic differentiation and just-in-time compilation features of the Jax Python library to enable fast evaluation of NNPs, shrinking the performance gap between ML potentials and standard force-fields. This is demonstrated with the popular ANI-2x model reaching simulation speeds nearly on par with the AMOEBA polarizable force-field on commodity GPUs (GPU=Graphics processing unit). We hope that FeNNol will facilitate the development and application of new hybrid NNP architectures for a wide range of molecular simulation problems.

cross Navigating Heterogeneity and Privacy in One-Shot Federated Learning with Diffusion Models

Authors: Matias Mendieta, Guangyu Sun, Chen Chen

Abstract: Federated learning (FL) enables multiple clients to train models collectively while preserving data privacy. However, FL faces challenges in terms of communication cost and data heterogeneity. One-shot federated learning has emerged as a solution by reducing communication rounds, improving efficiency, and providing better security against eavesdropping attacks. Nevertheless, data heterogeneity remains a significant challenge, impacting performance. This work explores the effectiveness of diffusion models in one-shot FL, demonstrating their applicability in addressing data heterogeneity and improving FL performance. Additionally, we investigate the utility of our diffusion model approach, FedDiff, compared to other one-shot FL methods under differential privacy (DP). Furthermore, to improve generated sample quality under DP settings, we propose a pragmatic Fourier Magnitude Filtering (FMF) method, enhancing the effectiveness of generated data for global model training.

cross Analyzing the Role of Semantic Representations in the Era of Large Language Models

Authors: Zhijing Jin, Yuen Chen, Fernando Gonzalez, Jiarui Liu, Jiayi Zhang, Julian Michael, Bernhard Sch\"olkopf, Mona Diab

Abstract: Traditionally, natural language processing (NLP) models often use a rich set of features created by linguistic expertise, such as semantic representations. However, in the era of large language models (LLMs), more and more tasks are turned into generic, end-to-end sequence generation problems. In this paper, we investigate the question: what is the role of semantic representations in the era of LLMs? Specifically, we investigate the effect of Abstract Meaning Representation (AMR) across five diverse NLP tasks. We propose an AMR-driven chain-of-thought prompting method, which we call AMRCoT, and find that it generally hurts performance more than it helps. To investigate what AMR may have to offer on these tasks, we conduct a series of analysis experiments. We find that it is difficult to predict which input examples AMR may help or hurt on, but errors tend to arise with multi-word expressions, named entities, and in the final inference step where the LLM must connect its reasoning over the AMR to its prediction. We recommend focusing on these areas for future work in semantic representations for LLMs. Our code: https://github.com/causalNLP/amr_llm.

URLs: https://github.com/causalNLP/amr_llm.

cross Transformer-Aided Semantic Communications

Authors: Matin Mortaheb, Erciyes Karakaya, Mohammad A. Amir Khojastepour, Sennur Ulukus

Abstract: The transformer structure employed in large language models (LLMs), as a specialized category of deep neural networks (DNNs) featuring attention mechanisms, stands out for their ability to identify and highlight the most relevant aspects of input data. Such a capability is particularly beneficial in addressing a variety of communication challenges, notably in the realm of semantic communication where proper encoding of the relevant data is critical especially in systems with limited bandwidth. In this work, we employ vision transformers specifically for the purpose of compression and compact representation of the input image, with the goal of preserving semantic information throughout the transmission process. Through the use of the attention mechanism inherent in transformers, we create an attention mask. This mask effectively prioritizes critical segments of images for transmission, ensuring that the reconstruction phase focuses on key objects highlighted by the mask. Our methodology significantly improves the quality of semantic communication and optimizes bandwidth usage by encoding different parts of the data in accordance with their semantic information content, thus enhancing overall efficiency. We evaluate the effectiveness of our proposed framework using the TinyImageNet dataset, focusing on both reconstruction quality and accuracy. Our evaluation results demonstrate that our framework successfully preserves semantic information, even when only a fraction of the encoded data is transmitted, according to the intended compression rates.

cross Customizing Text-to-Image Models with a Single Image Pair

Authors: Maxwell Jones, Sheng-Yu Wang, Nupur Kumari, David Bau, Jun-Yan Zhu

Abstract: Art reinterpretation is the practice of creating a variation of a reference work, making a paired artwork that exhibits a distinct artistic style. We ask if such an image pair can be used to customize a generative model to capture the demonstrated stylistic difference. We propose Pair Customization, a new customization method that learns stylistic difference from a single image pair and then applies the acquired style to the generation process. Unlike existing methods that learn to mimic a single concept from a collection of images, our method captures the stylistic difference between paired images. This allows us to apply a stylistic change without overfitting to the specific image content in the examples. To address this new task, we employ a joint optimization method that explicitly separates the style and content into distinct LoRA weight spaces. We optimize these style and content weights to reproduce the style and content images while encouraging their orthogonality. During inference, we modify the diffusion process via a new style guidance based on our learned weights. Both qualitative and quantitative experiments show that our method can effectively learn style while avoiding overfitting to image content, highlighting the potential of modeling such stylistic differences from a single image pair.

cross Multi-Space Alignments Towards Universal LiDAR Segmentation

Authors: Youquan Liu, Lingdong Kong, Xiaoyang Wu, Runnan Chen, Xin Li, Liang Pan, Ziwei Liu, Yuexin Ma

Abstract: A unified and versatile LiDAR segmentation model with strong robustness and generalizability is desirable for safe autonomous driving perception. This work presents M3Net, a one-of-a-kind framework for fulfilling multi-task, multi-dataset, multi-modality LiDAR segmentation in a universal manner using just a single set of parameters. To better exploit data volume and diversity, we first combine large-scale driving datasets acquired by different types of sensors from diverse scenes and then conduct alignments in three spaces, namely data, feature, and label spaces, during the training. As a result, M3Net is capable of taming heterogeneous data for training state-of-the-art LiDAR segmentation models. Extensive experiments on twelve LiDAR segmentation datasets verify our effectiveness. Notably, using a shared set of parameters, M3Net achieves 75.1%, 83.1%, and 72.4% mIoU scores, respectively, on the official benchmarks of SemanticKITTI, nuScenes, and Waymo Open.

replace An extension of McDiarmid's inequality

Authors: Richard Combes

Abstract: We generalize McDiarmid's inequality for functions with bounded differences on a high probability set, using an extension argument. Those functions concentrate around their conditional expectations. We further extend the results to concentration in general metric spaces.

replace MBDP: A Model-based Approach to Achieve both Robustness and Sample Efficiency via Double Dropout Planning

Authors: Wanpeng Zhang, Xi Xiao, Yao Yao, Mingzhe Chen, Dijun Luo

Abstract: Model-based reinforcement learning is a widely accepted solution for solving excessive sample demands. However, the predictions of the dynamics models are often not accurate enough, and the resulting bias may incur catastrophic decisions due to insufficient robustness. Therefore, it is highly desired to investigate how to improve the robustness of model-based RL algorithms while maintaining high sampling efficiency. In this paper, we propose Model-Based Double-dropout Planning (MBDP) to balance robustness and efficiency. MBDP consists of two kinds of dropout mechanisms, where the rollout-dropout aims to improve the robustness with a small cost of sample efficiency, while the model-dropout is designed to compensate for the lost efficiency at a slight expense of robustness. By combining them in a complementary way, MBDP provides a flexible control mechanism to meet different demands of robustness and efficiency by tuning two corresponding dropout ratios. The effectiveness of MBDP is demonstrated both theoretically and experimentally.

replace Quadratic models for understanding catapult dynamics of neural networks

Authors: Libin Zhu, Chaoyue Liu, Adityanarayanan Radhakrishnan, Mikhail Belkin

Abstract: While neural networks can be approximated by linear models as their width increases, certain properties of wide neural networks cannot be captured by linear models. In this work we show that recently proposed Neural Quadratic Models can exhibit the "catapult phase" [Lewkowycz et al. 2020] that arises when training such models with large learning rates. We then empirically show that the behaviour of neural quadratic models parallels that of neural networks in generalization, especially in the catapult phase regime. Our analysis further demonstrates that quadratic models can be an effective tool for analysis of neural networks.

replace Fourier Neural Operator with Learned Deformations for PDEs on General Geometries

Authors: Zongyi Li, Daniel Zhengyu Huang, Burigede Liu, Anima Anandkumar

Abstract: Deep learning surrogate models have shown promise in solving partial differential equations (PDEs). Among them, the Fourier neural operator (FNO) achieves good accuracy, and is significantly faster compared to numerical solvers, on a variety of PDEs, such as fluid flows. However, the FNO uses the Fast Fourier transform (FFT), which is limited to rectangular domains with uniform grids. In this work, we propose a new framework, viz., geo-FNO, to solve PDEs on arbitrary geometries. Geo-FNO learns to deform the input (physical) domain, which may be irregular, into a latent space with a uniform grid. The FNO model with the FFT is applied in the latent space. The resulting geo-FNO model has both the computation efficiency of FFT and the flexibility of handling arbitrary geometries. Our geo-FNO is also flexible in terms of its input formats, viz., point clouds, meshes, and design parameters are all valid inputs. We consider a variety of PDEs such as the Elasticity, Plasticity, Euler's, and Navier-Stokes equations, and both forward modeling and inverse design problems. Geo-FNO is $10^5$ times faster than the standard numerical solvers and twice more accurate compared to direct interpolation on existing ML-based PDE solvers such as the standard FNO.

replace Finite-Time Error Bounds for Greedy-GQ

Authors: Yue Wang, Yi Zhou, Shaofeng Zou

Abstract: Greedy-GQ with linear function approximation, originally proposed in \cite{maei2010toward}, is a value-based off-policy algorithm for optimal control in reinforcement learning, and it has a non-linear two timescale structure with the non-convex objective function. This paper develops its tightest finite-time error bounds. We show that the Greedy-GQ algorithm converges as fast as $\mathcal{O}({1}/{\sqrt{T}})$ under the i.i.d.\ setting and $\mathcal{O}({\log T}/{\sqrt{T}})$ under the Markovian setting. We further design a variant of the vanilla Greedy-GQ algorithm using the nested-loop approach, and show that its sample complexity is $\mathcal{O}({\log(1/\epsilon)\epsilon^{-2}})$, which matches with the one of the vanilla Greedy-GQ. Our finite-time error bounds match with one of the stochastic gradient descent algorithms for general smooth non-convex optimization problems, despite its additonal challenge in the two time-scale updates. Our finite-sample analysis provides theoretical guidance on choosing step-sizes for faster convergence in practice, and suggests the trade-off between the convergence rate and the quality of the obtained policy. Our techniques provide a general approach for finite-sample analysis of non-convex two timescale value-based reinforcement learning algorithms.

replace Time-Varying Propensity Score to Bridge the Gap between the Past and Present

Authors: Rasool Fakoor, Jonas Mueller, Zachary C. Lipton, Pratik Chaudhari, Alexander J. Smola

Abstract: Real-world deployment of machine learning models is challenging because data evolves over time. While no model can work when data evolves in an arbitrary fashion, if there is some pattern to these changes, we might be able to design methods to address it. This paper addresses situations when data evolves gradually. We introduce a time-varying propensity score that can detect gradual shifts in the distribution of data which allows us to selectively sample past data to update the model -- not just similar data from the past like that of a standard propensity score but also data that evolved in a similar fashion in the past. The time-varying propensity score is quite general: we demonstrate different ways of implementing it and evaluate it on a variety of problems ranging from supervised learning (e.g., image classification problems) where data undergoes a sequence of gradual shifts, to reinforcement learning tasks (e.g., robotic manipulation and continuous control) where data shifts as the policy or the task changes.

replace DynamicLight: Two-Stage Dynamic Traffic Signal Timing

Authors: Liang Zhang, Yutong Zhang, Shubin Xie, Jianming Deng, Chen Li

Abstract: Reinforcement learning (RL) is gaining popularity as an effective approach for traffic signal control (TSC) and is increasingly applied in this domain. However, most existing RL methodologies are confined to a single-stage TSC framework, primarily focusing on selecting an appropriate traffic signal phase at fixed action intervals, leading to inflexible and less adaptable phase durations. To address such limitations, we introduce a novel two-stage TSC framework named DynamicLight. This framework initiates with a phase control strategy responsible for determining the optimal traffic phase, followed by a duration control strategy tasked with determining the corresponding phase duration. Experimental results show that DynamicLight outperforms state-of-the-art TSC models and exhibits exceptional model generalization capabilities. Additionally, the robustness and potential for real-world implementation of DynamicLight are further demonstrated and validated through various DynamicLight variants. The code is released at https://github.com/LiangZhang1996/DynamicLight.

URLs: https://github.com/LiangZhang1996/DynamicLight.

replace Disentangled Representation Learning

Authors: Xin Wang, Hong Chen, Si'ao Tang, Zihao Wu, Wenwu Zhu

Abstract: Disentangled Representation Learning (DRL) aims to learn a model capable of identifying and disentangling the underlying factors hidden in the observable data in representation form. The process of separating underlying factors of variation into variables with semantic meaning benefits in learning explainable representations of data, which imitates the meaningful understanding process of humans when observing an object or relation. As a general learning strategy, DRL has demonstrated its power in improving the model explainability, controlability, robustness, as well as generalization capacity in a wide range of scenarios such as computer vision, natural language processing, and data mining. In this article, we comprehensively investigate DRL from various aspects including motivations, definitions, methodologies, evaluations, applications, and model designs. We first present two well-recognized definitions, i.e., Intuitive Definition and Group Theory Definition for disentangled representation learning. We further categorize the methodologies for DRL into four groups from the following perspectives, the model type, representation structure, supervision signal, and independence assumption. We also analyze principles to design different DRL models that may benefit different tasks in practical applications. Finally, we point out challenges in DRL as well as potential research directions deserving future investigations. We believe this work may provide insights for promoting the DRL research in the community.

replace A Watermark for Large Language Models

Authors: John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, Tom Goldstein

Abstract: Potential harms of large language models can be mitigated by watermarking model output, i.e., embedding signals into generated text that are invisible to humans but algorithmically detectable from a short span of tokens. We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality, and can be detected using an efficient open-source algorithm without access to the language model API or parameters. The watermark works by selecting a randomized set of "green" tokens before a word is generated, and then softly promoting use of green tokens during sampling. We propose a statistical test for detecting the watermark with interpretable p-values, and derive an information-theoretic framework for analyzing the sensitivity of the watermark. We test the watermark using a multi-billion parameter model from the Open Pretrained Transformer (OPT) family, and discuss robustness and security.

replace MEGA-DAgger: Imitation Learning with Multiple Imperfect Experts

Authors: Xiatao Sun, Shuo Yang, Mingyan Zhou, Kunpeng Liu, Rahul Mangharam

Abstract: Imitation learning has been widely applied to various autonomous systems thanks to recent development in interactive algorithms that address covariate shift and compounding errors induced by traditional approaches like behavior cloning. However, existing interactive imitation learning methods assume access to one perfect expert. Whereas in reality, it is more likely to have multiple imperfect experts instead. In this paper, we propose MEGA-DAgger, a new DAgger variant that is suitable for interactive learning with multiple imperfect experts. First, unsafe demonstrations are filtered while aggregating the training data, so the imperfect demonstrations have little influence when training the novice policy. Next, experts are evaluated and compared on scenarios-specific metrics to resolve the conflicted labels among experts. Through experiments in autonomous racing scenarios, we demonstrate that policy learned using MEGA-DAgger can outperform both experts and policies learned using the state-of-the-art interactive imitation learning algorithms such as Human-Gated DAgger. The supplementary video can be found at \url{https://youtu.be/wPCht31MHrw}.

URLs: https://youtu.be/wPCht31MHrw

replace DataLight: Offline Data-Driven Traffic Signal Control

Authors: Liang Zhang, Yutong Zhang, Jianming Deng, Chen Li

Abstract: Reinforcement learning (RL) has emerged as a promising solution for addressing traffic signal control (TSC) challenges. While most RL-based TSC systems typically employ an online approach, facilitating frequent active interaction with the environment, learning such strategies in the real world is impractical due to safety and risk concerns. To tackle these challenges, this study introduces an innovative offline data-driven approach, called DataLight. DataLight employs effective state representations and reward function by capturing vehicular speed information within the environment. It then segments roads to capture spatial information and further enhances the spatially segmented state representations with sequential modeling. The experimental results demonstrate the effectiveness of DataLight, showcasing superior performance compared to both state-of-the-art online and offline TSC methods. Additionally, DataLight exhibits robust learning capabilities concerning real-world deployment issues. The code is available at https://github.com/LiangZhang1996/DataLight.

URLs: https://github.com/LiangZhang1996/DataLight.

replace Skeleton Regression: A Graph-Based Approach to Estimation with Manifold Structure

Authors: Zeyu Wei, Yen-Chi Chen

Abstract: We introduce a new regression framework designed to deal with large-scale, complex data that lies around a low-dimensional manifold with noises. Our approach first constructs a graph representation, referred to as the skeleton, to capture the underlying geometric structure. We then define metrics on the skeleton graph and apply nonparametric regression techniques, along with feature transformations based on the graph, to estimate the regression function. We also discuss the limitations of some nonparametric regressors with respect to the general metric space such as the skeleton graph. The proposed regression framework suggests a novel way to deal with data with underlying geometric structures and provides additional advantages in handling the union of multiple manifolds, additive noises, and noisy observations. We provide statistical guarantees for the proposed method and demonstrate its effectiveness through simulations and real data examples.

replace On the Reliability of Watermarks for Large Language Models

Authors: John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, Tom Goldstein

Abstract: As LLMs become commonplace, machine-generated text has the potential to flood the internet with spam, social media bots, and valueless content. Watermarking is a simple and effective strategy for mitigating such harms by enabling the detection and documentation of LLM-generated text. Yet a crucial question remains: How reliable is watermarking in realistic settings in the wild? There, watermarked text may be modified to suit a user's needs, or entirely rewritten to avoid detection. We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document. We find that watermarks remain detectable even after human and machine paraphrasing. While these attacks dilute the strength of the watermark, paraphrases are statistically likely to leak n-grams or even longer fragments of the original text, resulting in high-confidence detections when enough tokens are observed. For example, after strong human paraphrasing the watermark is detectable after observing 800 tokens on average, when setting a 1e-5 false positive rate. We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document, and we compare the robustness of watermarking to other kinds of detectors.

replace A Probabilistic Framework for Modular Continual Learning

Authors: Lazar Valkov, Akash Srivastava, Swarat Chaudhuri, Charles Sutton

Abstract: Modular approaches that use a different composition of modules for each problem are a promising direction in continual learning (CL). However, searching through the large, discrete space of module compositions is challenging, especially because evaluating a composition's performance requires a round of neural network training. We address this challenge through a modular CL framework, PICLE, that uses a probabilistic model to cheaply compute the fitness of each composition, allowing PICLE to achieve both perceptual, few-shot and latent transfer. The model combines prior knowledge about good module compositions with dataset-specific information. We evaluate PICLE using two benchmark suites designed to assess different desiderata of CL techniques. Comparing to a wide range of approaches, we show that PICLE is the first modular CL algorithm to achieve perceptual, few-shot and latent transfer while scaling well to large search spaces, outperforming previous state-of-the-art modular CL approaches on long problem sequences.

replace Adaptive Federated Learning with Auto-Tuned Clients

Authors: Junhyung Lyle Kim, Mohammad Taha Toghani, C\'esar A. Uribe, Anastasios Kyrillidis

Abstract: Federated learning (FL) is a distributed machine learning framework where the global model of a central server is trained via multiple collaborative steps by participating clients without sharing their data. While being a flexible framework, where the distribution of local data, participation rate, and computing power of each client can greatly vary, such flexibility gives rise to many new challenges, especially in the hyperparameter tuning on the client side. We propose $\Delta$-SGD, a simple step size rule for SGD that enables each client to use its own step size by adapting to the local smoothness of the function each client is optimizing. We provide theoretical and empirical results where the benefit of the client adaptivity is shown in various FL scenarios.

replace Memory capacity of two layer neural networks with smooth activations

Authors: Liam Madden, Christos Thrampoulidis

Abstract: Determining the memory capacity of two layer neural networks with $m$ hidden neurons and input dimension $d$ (i.e., $md+2m$ total trainable parameters), which refers to the largest size of general data the network can memorize, is a fundamental machine learning question. For activations that are real analytic at a point and, if restricting to a polynomial there, have sufficiently high degree, we establish a lower bound of $\lfloor md/2\rfloor$ and optimality up to a factor of approximately $2$. All practical activations, such as sigmoids, Heaviside, and the rectified linear unit (ReLU), are real analytic at a point. Furthermore, the degree condition is mild, requiring, for example, that $\binom{k+d-1}{d-1}\ge n$ if the activation is $x^k$. Analogous prior results were limited to Heaviside and ReLU activations -- our result covers almost everything else. In order to analyze general activations, we derive the precise generic rank of the network's Jacobian, which can be written in terms of Hadamard powers and the Khatri-Rao product. Our analysis extends classical linear algebraic facts about the rank of Hadamard powers. Overall, our approach differs from prior works on memory capacity and holds promise for extending to deeper models and other architectures.

replace Delegating Data Collection in Decentralized Machine Learning

Authors: Nivasini Ananthakrishnan, Stephen Bates, Michael I. Jordan, Nika Haghtalab

Abstract: Motivated by the emergence of decentralized machine learning (ML) ecosystems, we study the delegation of data collection. Taking the field of contract theory as our starting point, we design optimal and near-optimal contracts that deal with two fundamental information asymmetries that arise in decentralized ML: uncertainty in the assessment of model quality and uncertainty regarding the optimal performance of any model. We show that a principal can cope with such asymmetry via simple linear contracts that achieve 1-1/e fraction of the optimal utility. To address the lack of a priori knowledge regarding the optimal performance, we give a convex program that can adaptively and efficiently compute the optimal contract. We also study linear contracts and derive the optimal utility in the more complex setting of multiple interactions.

replace 2-Cats: 2D Copula Approximating Transforms

Authors: Flavio Figueiredo, Jos\'e Geraldo Fernandes, Jackson Silva, Renato M. Assun\c{c}\~ao

Abstract: Copulas are powerful statistical tools for capturing dependencies across multiple data dimensions. Applying Copulas involves estimating independent marginals, a straightforward task, followed by the much more challenging task of determining a single copulating function, $C$, that links these marginals. For bivariate data, a copula takes the form of a two-increasing function $C: (u,v)\in \mathbb{I}^2 \rightarrow \mathbb{I}$, where $\mathbb{I} = [0, 1]$. In this paper, we propose 2-Cats, a Neural Network (NN) model that learns two-dimensional Copulas while preserving their key properties, without relying on specific Copula families (e.g., Archimedean). Furthermore, we introduce a training strategy inspired by the literature on Physics-Informed Neural Networks and Sobolev Training. Our proposed method exhibits superior performance compared to the state-of-the-art across various datasets while maintaining the fundamental mathematical properties of a Copula. note: Dear readers, a reviewer correctly captured a mistake in our proof of P2. Given that arXiv does not allow removals, we are keeping this version with this note on Arxiv with this note while we correct this issue.

replace Learning to Relax: Setting Solver Parameters Across a Sequence of Linear System Instances

Authors: Mikhail Khodak, Edmond Chow, Maria-Florina Balcan, Ameet Talwalkar

Abstract: Solving a linear system $Ax=b$ is a fundamental scientific computing primitive for which numerous solvers and preconditioners have been developed. These come with parameters whose optimal values depend on the system being solved and are often impossible or too expensive to identify; thus in practice sub-optimal heuristics are used. We consider the common setting in which many related linear systems need to be solved, e.g. during a single numerical simulation. In this scenario, can we sequentially choose parameters that attain a near-optimal overall number of iterations, without extra matrix computations? We answer in the affirmative for Successive Over-Relaxation (SOR), a standard solver whose parameter $\omega$ has a strong impact on its runtime. For this method, we prove that a bandit online learning algorithm--using only the number of iterations as feedback--can select parameters for a sequence of instances such that the overall cost approaches that of the best fixed $\omega$ as the sequence length increases. Furthermore, when given additional structural information, we show that a contextual bandit method asymptotically achieves the performance of the instance-optimal policy, which selects the best $\omega$ for each instance. Our work provides the first learning-theoretic treatment of high-precision linear system solvers and the first end-to-end guarantees for data-driven scientific computing, demonstrating theoretically the potential to speed up numerical methods using well-understood learning algorithms.

replace Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly

Authors: Herbert Woisetschl\"ager, Alexander Isenko, Shiqiang Wang, Ruben Mayer, Hans-Arno Jacobsen

Abstract: Large Language Models (LLM) and foundation models are popular as they offer new opportunities for individuals and businesses to improve natural language processing, interact with data, and retrieve information faster. However, training or fine-tuning LLMs requires a vast amount of data, which can be challenging to access due to legal or technical restrictions and may require private computing resources. Federated Learning (FL) is a solution designed to overcome these challenges and expand data access for deep learning applications. This paper takes a hardware-centric approach to explore how LLMs can be brought to modern edge computing systems. Our study fine-tunes the FLAN-T5 model family, ranging from 80M to 3B parameters, using FL for a text summarization task. We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions. Our contribution is twofold: First, we evaluate the current capabilities of edge computing systems and their potential for LLM FL workloads. Second, by comparing these systems with a data-center GPU, we demonstrate the potential for improvement and the next steps toward achieving greater computational efficiency at the edge.

replace BrainSCUBA: Fine-Grained Natural Language Captions of Visual Cortex Selectivity

Authors: Andrew F. Luo, Margaret M. Henderson, Michael J. Tarr, Leila Wehbe

Abstract: Understanding the functional organization of higher visual cortex is a central focus in neuroscience. Past studies have primarily mapped the visual and semantic selectivity of neural populations using hand-selected stimuli, which may potentially bias results towards pre-existing hypotheses of visual cortex functionality. Moving beyond conventional approaches, we introduce a data-driven method that generates natural language descriptions for images predicted to maximally activate individual voxels of interest. Our method -- Semantic Captioning Using Brain Alignments ("BrainSCUBA") -- builds upon the rich embedding space learned by a contrastive vision-language model and utilizes a pre-trained large language model to generate interpretable captions. We validate our method through fine-grained voxel-level captioning across higher-order visual regions. We further perform text-conditioned image synthesis with the captions, and show that our images are semantically coherent and yield high predicted activations. Finally, to demonstrate how our method enables scientific discovery, we perform exploratory investigations on the distribution of "person" representations in the brain, and discover fine-grained semantic selectivity in body-selective areas. Unlike earlier studies that decode text, our method derives voxel-wise captions of semantic selectivity. Our results show that BrainSCUBA is a promising means for understanding functional preferences in the brain, and provides motivation for further hypothesis-driven investigation of visual cortex.

replace Probabilistic Feature Augmentation for AIS-Based Multi-Path Long-Term Vessel Trajectory Forecasting

Authors: Gabriel Spadon, Jay Kumar, Derek Eden, Josh van Berkel, Tom Foster, Amilcar Soares, Ronan Fablet, Stan Matwin, Ronald Pelot

Abstract: Maritime transportation is paramount in achieving global economic growth, entailing concurrent ecological obligations in sustainability and safeguarding endangered marine species, most notably preserving large whale populations. In this regard, the Automatic Identification System (AIS) data plays a significant role by offering real-time streaming data on vessel movement, allowing enhanced traffic monitoring. This study explores using AIS data to prevent vessel-to-whale collisions by forecasting long-term vessel trajectories from engineered AIS data sequences. For such a task, we have developed an encoder-decoder model architecture using Bidirectional Long Short-Term Memory Networks (Bi-LSTM) to predict the next 12 hours of vessel trajectories using 1 to 3 hours of AIS data as input. We feed the model with probabilistic features engineered from historical AIS data that refer to each trajectory's potential route and destination. The model then predicts the vessel's trajectory, considering these additional features by leveraging convolutional layers for spatial feature learning and a position-aware attention mechanism that increases the importance of recent timesteps of a sequence during temporal feature learning. The probabilistic features have an F1 Score of approximately 85% and 75% for each feature type, respectively, demonstrating their effectiveness in augmenting information to the neural network. We test our model on the Gulf of St. Lawrence, a region known to be the habitat of North Atlantic Right Whales (NARW). Our model achieved a high R2 score of over 98% using various techniques and features. It stands out among other approaches as it can make complex decisions during turnings and path selection. Our study highlights the potential of data engineering and trajectory forecasting models for marine life species preservation.

replace Bridging Dimensions: Confident Reachability for High-Dimensional Controllers

Authors: Yuang Geng, Jake Brandon Baldauf, Souradeep Dutta, Chao Huang, Ivan Ruchkin

Abstract: Autonomous systems are increasingly implemented using end-to-end learning-based controllers. Such controllers make decisions that are executed on the real system, with images as one of the primary sensing modalities. Deep neural networks form a fundamental building block of such controllers. Unfortunately, the existing neural-network verification tools do not scale to inputs with thousands of dimensions -- especially when the individual inputs (such as pixels) are devoid of clear physical meaning. This paper takes a step towards connecting exhaustive closed-loop verification with high-dimensional controllers. Our key insight is that the behavior of a high-dimensional controller can be approximated with several low-dimensional controllers. To balance the approximation accuracy and verifiability of our low-dimensional controllers, we leverage the latest verification-aware knowledge distillation. Then, we inflate low-dimensional reachability results with statistical approximation errors, yielding a high-confidence reachability guarantee for the high-dimensional controller. We investigate two inflation techniques -- based on trajectories and control actions -- both of which show convincing performance in three OpenAI gym benchmarks.

replace StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization

Authors: Shida Wang, Qianxiao Li

Abstract: In this paper, we investigate the long-term memory learning capabilities of state-space models (SSMs) from the perspective of parameterization. We prove that state-space models without any reparameterization exhibit a memory limitation similar to that of traditional RNNs: the target relationships that can be stably approximated by state-space models must have an exponential decaying memory. Our analysis identifies this ``curse of memory'' as a result of the recurrent weights converging to a stability boundary, suggesting that a reparameterization technique can be effective. To this end, we introduce a class of reparameterization techniques for SSMs that effectively lift its memory limitations. Besides improving approximation capabilities, we further illustrate that a principled choice of reparameterization scheme can also enhance optimization stability. We validate our findings using synthetic datasets, language models and image classifications.

replace On the Learnability of Watermarks for Language Models

Authors: Chenchen Gu, Xiang Lisa Li, Percy Liang, Tatsunori Hashimoto

Abstract: Watermarking of language model outputs enables statistical detection of model-generated text, which can mitigate harms and misuses of language models. Existing watermarking strategies operate by altering the decoder of an existing language model. In this paper, we ask whether language models can directly learn to generate watermarked text, which would have significant implications for the real-world deployment of watermarks. First, learned watermarks could be used to build open models that naturally generate watermarked text, enabling watermarking for open models, where users can control the decoding procedure. Second, if watermarking is used to determine the provenance of generated text, an adversary can hurt the reputation of a victim model by spoofing its watermark and generating damaging watermarked text. To investigate the learnability of watermarks, we propose watermark distillation, which trains a student model to behave like a teacher model that uses decoding-based watermarking. We test our approach on three decoding-based watermarking strategies and various hyperparameter settings, finding that models can learn to generate watermarked text with high detectability. We also find limitations to learnability, including the loss of watermarking capabilities under fine-tuning on normal text and high sample complexity when learning low-distortion watermarks.

replace Improving Subgraph-GNNs via Edge-Level Ego-Network Encodings

Authors: Nurudin Alvarez-Gonzalez, Andreas Kaltenbrunner, Vicen\c{c} G\'omez

Abstract: We present a novel edge-level ego-network encoding for learning on graphs that can boost Message Passing Graph Neural Networks (MP-GNNs) by providing additional node and edge features or extending message-passing formats. The proposed encoding is sufficient to distinguish Strongly Regular Graphs, a family of challenging 3-WL equivalent graphs. We show theoretically that such encoding is more expressive than node-based sub-graph MP-GNNs. In an empirical evaluation on four benchmarks with 10 graph datasets, our results match or improve previous baselines on expressivity, graph classification, graph regression, and proximity tasks -- while reducing memory usage by 18.1x in certain real-world settings.

replace Mixture-of-Linear-Experts for Long-term Time Series Forecasting

Authors: Ronghao Ni, Zinan Lin, Shuaiqi Wang, Giulia Fanti

Abstract: Long-term time series forecasting (LTSF) aims to predict future values of a time series given the past values. The current state-of-the-art (SOTA) on this problem is attained in some cases by linear-centric models, which primarily feature a linear mapping layer. However, due to their inherent simplicity, they are not able to adapt their prediction rules to periodic changes in time series patterns. To address this challenge, we propose a Mixture-of-Experts-style augmentation for linear-centric models and propose Mixture-of-Linear-Experts (MoLE). Instead of training a single model, MoLE trains multiple linear-centric models (i.e., experts) and a router model that weighs and mixes their outputs. While the entire framework is trained end-to-end, each expert learns to specialize in a specific temporal pattern, and the router model learns to compose the experts adaptively. Experiments show that MoLE reduces forecasting error of linear-centric models, including DLinear, RLinear, and RMLP, in over 78% of the datasets and settings we evaluated. By using MoLE existing linear-centric models can achieve SOTA LTSF results in 68% of the experiments that PatchTST reports and we compare to, whereas existing single-head linear-centric models achieve SOTA results in only 25% of cases.

replace Sparse is Enough in Fine-tuning Pre-trained Large Language Models

Authors: Weixi Song, Zuchao Li, Lefei Zhang, Hai Zhao, Bo Du

Abstract: With the prevalence of pre-training-fine-tuning paradigm, how to efficiently adapt the pre-trained model to the downstream tasks has been an intriguing issue. Parameter-Efficient Fine-Tuning (PEFT) methods have been proposed for low-cost adaptation. Although PEFT has demonstrated effectiveness and been widely applied, the underlying principles are still unclear. In this paper, we adopt the PAC-Bayesian generalization error bound, viewing pre-training as a shift of prior distribution which leads to a tighter bound for generalization error. We validate this shift from the perspectives of oscillations in the loss landscape and the quasi-sparsity in gradient distribution. Based on this, we propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT), and validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning. The code is accessible at https://github.com/song-wx/SIFT/.

URLs: https://github.com/song-wx/SIFT/.

replace Maximum entropy GFlowNets with soft Q-learning

Authors: Sobhan Mohammadpour, Emmanuel Bengio, Emma Frejinger, Pierre-Luc Bacon

Abstract: Generative Flow Networks (GFNs) have emerged as a powerful tool for sampling discrete objects from unnormalized distributions, offering a scalable alternative to Markov Chain Monte Carlo (MCMC) methods. While GFNs draw inspiration from maximum entropy reinforcement learning (RL), the connection between the two has largely been unclear and seemingly applicable only in specific cases. This paper addresses the connection by constructing an appropriate reward function, thereby establishing an exact relationship between GFNs and maximum entropy RL. This construction allows us to introduce maximum entropy GFNs, which, in contrast to GFNs with uniform backward policy, achieve the maximum entropy attainable by GFNs without constraints on the state space.

replace Learning to Embed Time Series Patches Independently

Authors: Seunghan Lee, Taeyoung Park, Kibok Lee

Abstract: Masked time series modeling has recently gained much attention as a self-supervised representation learning strategy for time series. Inspired by masked image modeling in computer vision, recent works first patchify and partially mask out time series, and then train Transformers to capture the dependencies between patches by predicting masked patches from unmasked patches. However, we argue that capturing such patch dependencies might not be an optimal strategy for time series representation learning; rather, learning to embed patches independently results in better time series representations. Specifically, we propose to use 1) the simple patch reconstruction task, which autoencode each patch without looking at other patches, and 2) the simple patch-wise MLP that embeds each patch independently. In addition, we introduce complementary contrastive learning to hierarchically capture adjacent time series information efficiently. Our proposed method improves time series forecasting and classification performance compared to state-of-the-art Transformer-based models, while it is more efficient in terms of the number of parameters and training/inference time. Code is available at this repository: https://github.com/seunghan96/pits.

URLs: https://github.com/seunghan96/pits.

replace Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games

Authors: Yang Cai, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng

Abstract: We study policy optimization algorithms for computing correlated equilibria in multi-player general-sum Markov Games. Previous results achieve $O(T^{-1/2})$ convergence rate to a correlated equilibrium and an accelerated $O(T^{-3/4})$ convergence rate to the weaker notion of coarse correlated equilibrium. In this paper, we improve both results significantly by providing an uncoupled policy optimization algorithm that attains a near-optimal $\tilde{O}(T^{-1})$ convergence rate for computing a correlated equilibrium. Our algorithm is constructed by combining two main elements (i) smooth value updates and (ii) the optimistic-follow-the-regularized-leader algorithm with the log barrier regularizer.

replace Monotone, Bi-Lipschitz, and Polyak-Lojasiewicz Networks

Authors: Ruigang Wang, Krishnamurthy Dvijotham, Ian R. Manchester

Abstract: This paper presents a new \emph{bi-Lipschitz} invertible neural network, the BiLipNet, which has the ability to control both its \emph{Lipschitzness} (output sensitivity to input perturbations) and \emph{inverse Lipschitzness} (input distinguishability from different outputs). The main contribution is a novel invertible residual layer with certified strong monotonicity and Lipschitzness, which we compose with orthogonal layers to build bi-Lipschitz networks. The certification is based on incremental quadratic constraints, which achieves much tighter bounds compared to spectral normalization. Moreover, we formulate the model inverse calculation as a three-operator splitting problem, for which fast algorithms are known. Based on the proposed bi-Lipschitz network, we introduce a new scalar-output network, the PLNet, which satisfies the Polyak-\L{}ojasiewicz condition. It can be applied to learn non-convex surrogate losses with favourable properties, e.g., a unique and efficiently-computable global minimum.

replace The Effect of Data Poisoning on Counterfactual Explanations

Authors: Andr\'e Artelt, Shubham Sharma, Freddy Lecu\'e, Barbara Hammer

Abstract: Counterfactual explanations provide a popular method for analyzing the predictions of black-box systems, and they can offer the opportunity for computational recourse by suggesting actionable changes on how to change the input to obtain a different (i.e. more favorable) system output. However, recent work highlighted their vulnerability to different types of manipulations. This work studies the vulnerability of counterfactual explanations to data poisoning. We formalize data poisoning in the context of counterfactual explanations for increasing the cost of recourse on three different levels: locally for a single instance, or a sub-group of instances, or globally for all instances. We demonstrate that state-of-the-art counterfactual generation methods \& toolboxes are vulnerable to such data poisoning.

replace Scalable Decentralized Algorithms for Online Personalized Mean Estimation

Authors: Franco Galante, Giovanni Neglia, Emilio Leonardi

Abstract: In numerous settings, agents lack sufficient data to directly learn a model. Collaborating with other agents may help, but it introduces a bias-variance trade-off, when local data distributions differ. A key challenge is for each agent to identify clients with similar distributions while learning the model, a problem that remains largely unresolved. This study focuses on a simplified version of the overarching problem, where each agent collects samples from a real-valued distribution over time to estimate its mean. Existing algorithms face impractical space and time complexities (quadratic in the number of agents A). To address scalability challenges, we propose a framework where agents self-organize into a graph, allowing each agent to communicate with only a selected number of peers r. We introduce two collaborative mean estimation algorithms: one draws inspiration from belief propagation, while the other employs a consensus-based approach, with complexity of O( r |A| log |A|) and O(r |A|), respectively. We establish conditions under which both algorithms yield asymptotically optimal estimates and offer a theoretical characterization of their performance.

replace SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning

Authors: Chaoqun Du, Yizeng Han, Gao Huang

Abstract: Recent advancements in semi-supervised learning have focused on a more realistic yet challenging task: addressing imbalances in labeled data while the class distribution of unlabeled data remains both unknown and potentially mismatched. Current approaches in this sphere often presuppose rigid assumptions regarding the class distribution of unlabeled data, thereby limiting the adaptability of models to only certain distribution ranges. In this study, we propose a novel approach, introducing a highly adaptable framework, designated as SimPro, which does not rely on any predefined assumptions about the distribution of unlabeled data. Our framework, grounded in a probabilistic model, innovatively refines the expectation-maximization (EM) algorithm by explicitly decoupling the modeling of conditional and marginal class distributions. This separation facilitates a closed-form solution for class distribution estimation during the maximization phase, leading to the formulation of a Bayes classifier. The Bayes classifier, in turn, enhances the quality of pseudo-labels in the expectation phase. Remarkably, the SimPro framework not only comes with theoretical guarantees but also is straightforward to implement. Moreover, we introduce two novel class distributions broadening the scope of the evaluation. Our method showcases consistent state-of-the-art performance across diverse benchmarks and data distribution scenarios. Our code is available at https://github.com/LeapLabTHU/SimPro.

URLs: https://github.com/LeapLabTHU/SimPro.

replace Don't Start from Scratch: Behavioral Refinement via Interpolant-based Policy Diffusion

Authors: Kaiqi Chen, Eugene Lim, Kelvin Lin, Yiyang Chen, Harold Soh

Abstract: Imitation learning empowers artificial agents to mimic behavior by learning from demonstrations. Recently, diffusion models, which have the ability to model high-dimensional and multimodal distributions, have shown impressive performance on imitation learning tasks. These models learn to shape a policy by diffusing actions (or states) from standard Gaussian noise. However, the target policy to be learned is often significantly different from Gaussian and this mismatch can result in poor performance when using a small number of diffusion steps (to improve inference speed) and under limited data. The key idea in this work is that initiating from a more informative source than Gaussian enables diffusion methods to mitigate the above limitations. We contribute both theoretical results, a new method, and empirical findings that show the benefits of using an informative source policy. Our method, which we call BRIDGER, leverages the stochastic interpolants framework to bridge arbitrary policies, thus enabling a flexible approach towards imitation learning. It generalizes prior work in that standard Gaussians can still be applied, but other source policies can be used if available. In experiments on challenging simulation benchmarks and on real robots, BRIDGER outperforms state-of-the-art diffusion policies. We provide further analysis on design considerations when applying BRIDGER.

replace Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation

Authors: Liang Luo, Buyun Zhang, Michael Tsang, Yinbin Ma, Ching-Hsiang Chu, Yuxin Chen, Shen Li, Yuchen Hao, Yanli Zhao, Guna Lakshminarayanan, Ellie Dingqiao Wen, Jongsoo Park, Dheevatsa Mudigere, Maxim Naumov

Abstract: We study a mismatch between the deep learning recommendation models' flat architecture, common distributed training paradigm and hierarchical data center topology. To address the associated inefficiencies, we propose Disaggregated Multi-Tower (DMT), a modeling technique that consists of (1) Semantic-preserving Tower Transform (SPTT), a novel training paradigm that decomposes the monolithic global embedding lookup process into disjoint towers to exploit data center locality; (2) Tower Module (TM), a synergistic dense component attached to each tower to reduce model complexity and communication volume through hierarchical feature interaction; and (3) Tower Partitioner (TP), a feature partitioner to systematically create towers with meaningful feature interactions and load balanced assignments to preserve model quality and training throughput via learned embeddings. We show that DMT can achieve up to 1.9x speedup compared to the state-of-the-art baselines without losing accuracy across multiple generations of hardware at large data center scales.

replace TS-RSR: A provably efficient approach for batch bayesian optimization

Authors: Zhaolin Ren, Na Li

Abstract: This paper presents a new approach for batch Bayesian Optimization (BO) called Thompson Sampling-Regret to Sigma Ratio directed sampling (TS-RSR), where we sample a new batch of actions by minimizing a Thompson Sampling approximation of a regret to uncertainty ratio. Our sampling objective is able to coordinate the actions chosen in each batch in a way that minimizes redundancy between points whilst focusing on points with high predictive means or high uncertainty. Theoretically, we provide rigorous convergence guarantees on our algorithm's regret, and numerically, we demonstrate that our method attains state-of-the-art performance on a range of challenging synthetic and realistic test functions, where it outperforms several competitive benchmark batch BO algorithms.

replace Chronos: Learning the Language of Time Series

Authors: Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Hao Wang, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, Yuyang Wang

Abstract: We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models. Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. We pretrained Chronos models based on the T5 family (ranging from 20M to 710M parameters) on a large collection of publicly available datasets, complemented by a synthetic dataset that we generated via Gaussian processes to improve generalization. In a comprehensive benchmark consisting of 42 datasets, and comprising both classical local models and deep learning methods, we show that Chronos models: (a) significantly outperform other methods on datasets that were part of the training corpus; and (b) have comparable and occasionally superior zero-shot performance on new datasets, relative to methods that were trained specifically on them. Our results demonstrate that Chronos models can leverage time series data from diverse domains to improve zero-shot accuracy on unseen forecasting tasks, positioning pretrained models as a viable tool to greatly simplify forecasting pipelines.

replace Efficient Transformer-based Hyper-parameter Optimization for Resource-constrained IoT Environments

Authors: Ibrahim Shaer, Soodeh Nikan, Abdallah Shami

Abstract: The hyper-parameter optimization (HPO) process is imperative for finding the best-performing Convolutional Neural Networks (CNNs). The automation process of HPO is characterized by its sizable computational footprint and its lack of transparency; both important factors in a resource-constrained Internet of Things (IoT) environment. In this paper, we address these problems by proposing a novel approach that combines transformer architecture and actor-critic Reinforcement Learning (RL) model, TRL-HPO, equipped with multi-headed attention that enables parallelization and progressive generation of layers. These assumptions are founded empirically by evaluating TRL-HPO on the MNIST dataset and comparing it with state-of-the-art approaches that build CNN models from scratch. The results show that TRL-HPO outperforms the classification results of these approaches by 6.8% within the same time frame, demonstrating the efficiency of TRL-HPO for the HPO process. The analysis of the results identifies the main culprit for performance degradation attributed to stacking fully connected layers. This paper identifies new avenues for improving RL-based HPO processes in resource-constrained environments.

replace Knowledge-guided Machine Learning: Current Trends and Future Prospects

Authors: Anuj Karpatne, Xiaowei Jia, Vipin Kumar

Abstract: This paper presents an overview of scientific modeling and discusses the complementary strengths and weaknesses of ML methods for scientific modeling in comparison to process-based models. It also provides an introduction to the current state of research in the emerging field of scientific knowledge-guided machine learning (KGML) that aims to use both scientific knowledge and data in ML frameworks to achieve better generalizability, scientific consistency, and explainability of results. We discuss different facets of KGML research in terms of the type of scientific knowledge used, the form of knowledge-ML integration explored, and the method for incorporating scientific knowledge in ML. We also discuss some of the common categories of use cases in environmental sciences where KGML methods are being developed, using illustrative examples in each category.

replace Detecting Generative Parroting through Overfitting Masked Autoencoders

Authors: Saeid Asgari Taghanaki, Joseph Lambourne

Abstract: The advent of generative AI models has revolutionized digital content creation, yet it introduces challenges in maintaining copyright integrity due to generative parroting, where models mimic their training data too closely. Our research presents a novel approach to tackle this issue by employing an overfitted Masked Autoencoder (MAE) to detect such parroted samples effectively. We establish a detection threshold based on the mean loss across the training dataset, allowing for the precise identification of parroted content in modified datasets. Preliminary evaluations demonstrate promising results, suggesting our method's potential to ensure ethical use and enhance the legal compliance of generative models.

replace Self-Improved Learning for Scalable Neural Combinatorial Optimization

Authors: Fu Luo, Xi Lin, Zhenkun Wang, Xialiang Tong, Mingxuan Yuan, Qingfu Zhang

Abstract: The end-to-end neural combinatorial optimization (NCO) method shows promising performance in solving complex combinatorial optimization problems without the need for expert design. However, existing methods struggle with large-scale problems, hindering their practical applicability. To overcome this limitation, this work proposes a novel Self-Improved Learning (SIL) method for better scalability of neural combinatorial optimization. Specifically, we develop an efficient self-improved mechanism that enables direct model training on large-scale problem instances without any labeled data. Powered by an innovative local reconstruction approach, this method can iteratively generate better solutions by itself as pseudo-labels to guide efficient model training. In addition, we design a linear complexity attention mechanism for the model to efficiently handle large-scale combinatorial problem instances with low computation overhead. Comprehensive experiments on the Travelling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP) with up to 100K nodes in both uniform and real-world distributions demonstrate the superior scalability of our method.

replace Graph Attention Network for Lane-Wise and Topology-Invariant Intersection Traffic Simulation

Authors: Nooshin Yousefzadeh, Rahul Sengupta, Yashaswi Karnati, Anand Rangarajan, Sanjay Ranka

Abstract: Traffic congestion has significant economic, environmental, and social ramifications. Intersection traffic flow dynamics are influenced by numerous factors. While microscopic traffic simulators are valuable tools, they are computationally intensive and challenging to calibrate. Moreover, existing machine-learning approaches struggle to provide lane-specific waveforms or adapt to intersection topology and traffic patterns. In this study, we propose two efficient and accurate "Digital Twin" models for intersections, leveraging Graph Attention Neural Networks (GAT). These attentional graph auto-encoder digital twins capture temporal, spatial, and contextual aspects of traffic within intersections, incorporating various influential factors such as high-resolution loop detector waveforms, signal state records, driving behaviors, and turning-movement counts. Trained on diverse counterfactual scenarios across multiple intersections, our models generalize well, enabling the estimation of detailed traffic waveforms for any intersection approach and exit lanes. Multi-scale error metrics demonstrate that our models perform comparably to microsimulations. The primary application of our study lies in traffic signal optimization, a pivotal area in transportation systems research. These lightweight digital twins can seamlessly integrate into corridor and network signal timing optimization frameworks. Furthermore, our study's applications extend to lane reconfiguration, driving behavior analysis, and facilitating informed decisions regarding intersection safety and efficiency enhancements. A promising avenue for future research involves extending this approach to urban freeway corridors and integrating it with measures of effectiveness metrics.

replace On the Scalability of GNNs for Molecular Graphs

Authors: Maciej Sypetkowski, Frederik Wenkel, Farimah Poursafaei, Nia Dickson, Karush Suri, Philip Fradkin, Dominique Beaini

Abstract: Scaling deep learning models has been at the heart of recent revolutions in language modelling and image generation. Practitioners have observed a strong relationship between model size, dataset size, and performance. However, structure-based architectures such as Graph Neural Networks (GNNs) are yet to show the benefits of scale mainly due to the lower efficiency of sparse operations, large data requirements, and lack of clarity about the effectiveness of various architectures. We address this drawback of GNNs by studying their scaling behavior. Specifically, we analyze message-passing networks, graph Transformers, and hybrid architectures on the largest public collection of 2D molecular graphs. For the first time, we observe that GNNs benefit tremendously from the increasing scale of depth, width, number of molecules, number of labels, and the diversity in the pretraining datasets. We further demonstrate strong finetuning scaling behavior on 38 highly competitive downstream tasks, outclassing previous large models. This gives rise to MolGPS, a new graph foundation model that allows to navigate the chemical space, outperforming the previous state-of-the-arts on 26 out the 38 downstream tasks. We hope that our work paves the way for an era where foundational GNNs drive pharmaceutical drug discovery.

replace Cumulative Hazard Function Based Efficient Multivariate Temporal Point Process Learning

Authors: Bingqing Liu

Abstract: Most existing temporal point process models are characterized by conditional intensity function. These models often require numerical approximation methods for likelihood evaluation, which potentially hurts their performance. By directly modelling the integral of the intensity function, i.e., the cumulative hazard function (CHF), the likelihood can be evaluated accurately, making it a promising approach. However, existing CHF-based methods are not well-defined, i.e., the mathematical constraints of CHF are not completely satisfied, leading to untrustworthy results. For multivariate temporal point process, most existing methods model intensity (or density, etc.) functions for each variate, limiting the scalability. In this paper, we explore using neural networks to model a flexible but well-defined CHF and learning the multivariate temporal point process with low parameter complexity. Experimental results on six datasets show that the proposed model achieves the state-of-the-art performance on data fitting and event prediction tasks while having significantly fewer parameters and memory usage than the strong competitors. The source code and data can be obtained from https://github.com/lbq8942/NPP.

URLs: https://github.com/lbq8942/NPP.

replace MISLEAD: Manipulating Importance of Selected features for Learning Epsilon in Evasion Attack Deception

Authors: Vidit Khazanchi, Pavan Kulkarni, Yuvaraj Govindarajulu, Manojkumar Parmar

Abstract: Emerging vulnerabilities in machine learning (ML) models due to adversarial attacks raise concerns about their reliability. Specifically, evasion attacks manipulate models by introducing precise perturbations to input data, causing erroneous predictions. To address this, we propose a methodology combining SHapley Additive exPlanations (SHAP) for feature importance analysis with an innovative Optimal Epsilon technique for conducting evasion attacks. Our approach begins with SHAP-based analysis to understand model vulnerabilities, crucial for devising targeted evasion strategies. The Optimal Epsilon technique, employing a Binary Search algorithm, efficiently determines the minimum epsilon needed for successful evasion. Evaluation across diverse machine learning architectures demonstrates the technique's precision in generating adversarial samples, underscoring its efficacy in manipulating model outcomes. This study emphasizes the critical importance of continuous assessment and monitoring to identify and mitigate potential security risks in machine learning systems.

replace Automating the Discovery of Partial Differential Equations in Dynamical Systems

Authors: Weizhen Li, Rui Carvalho

Abstract: Identifying partial differential equations (PDEs) from data is crucial for understanding the governing mechanisms of natural phenomena, yet it remains a challenging task. We present an extension to the ARGOS framework, ARGOS-RAL, which leverages sparse regression with the recurrent adaptive lasso to identify PDEs from limited prior knowledge automatically. Our method automates calculating partial derivatives, constructing a candidate library, and estimating a sparse model. We rigorously evaluate the performance of ARGOS-RAL in identifying canonical PDEs under various noise levels and sample sizes, demonstrating its robustness in handling noisy and non-uniformly distributed data. We also test the algorithm's performance on datasets consisting solely of random noise to simulate scenarios with severely compromised data quality. Our results show that ARGOS-RAL effectively and reliably identifies the underlying PDEs from data, outperforming the sequential threshold ridge regression method in most cases. We highlight the potential of combining statistical methods, machine learning, and dynamical systems theory to automatically discover governing equations from collected data, streamlining the scientific modeling process.

replace Towards Real-time Learning in Large Language Models: A Critical Review

Authors: Mladjan Jovanovic, Peter Voss

Abstract: Real-time learning concerns the ability of learning systems to acquire knowledge over time, enabling their adaptation and generalization to novel tasks. It is a critical ability for intelligent, real-world systems, especially when data may be insufficient or difficult to obtain. This review provides a comprehensive analysis of real-time learning in Large Language Models. It synthesizes the state-of-the-art real-time learning paradigms, including continual learning, meta-learning, parameter-efficient learning, and mixture-of-experts learning. We demonstrate their utility for real-time learning by describing specific achievements from these related topics and their critical factors. Finally, the paper highlights current problems and challenges for future research in the field. By consolidating the latest relevant research developments, this review offers a comprehensive understanding of real-time learning and its implications for designing and developing LLM-based learning systems addressing real-world problems.

replace More Compute Is What You Need

Authors: Zhen Guo

Abstract: Large language model pre-training has become increasingly expensive, with most practitioners relying on scaling laws to allocate compute budgets for model size and training tokens, commonly referred to as Compute-Optimal or Chinchilla Optimal. In this paper, we hypothesize a new scaling law that suggests model performance depends mostly on the amount of compute spent for transformer-based models, independent of the specific allocation to model size and dataset size. Using this unified scaling law, we predict that (a) for inference efficiency, training should prioritize smaller model sizes and larger training datasets, and (b) assuming the exhaustion of available web datasets, scaling the model size might be the only way to further improve model performance.

replace KAN: Kolmogorov-Arnold Networks

Authors: Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Solja\v{c}i\'c, Thomas Y. Hou, Max Tegmark

Abstract: Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametrized as a spline. We show that this seemingly simple change makes KANs outperform MLPs in terms of accuracy and interpretability. For accuracy, much smaller KANs can achieve comparable or better accuracy than much larger MLPs in data fitting and PDE solving. Theoretically and empirically, KANs possess faster neural scaling laws than MLPs. For interpretability, KANs can be intuitively visualized and can easily interact with human users. Through two examples in mathematics and physics, KANs are shown to be useful collaborators helping scientists (re)discover mathematical and physical laws. In summary, KANs are promising alternatives for MLPs, opening opportunities for further improving today's deep learning models which rely heavily on MLPs.

replace Utilizing Machine Learning and 3D Neuroimaging to Predict Hearing Loss: A Comparative Analysis of Dimensionality Reduction and Regression Techniques

Authors: Trinath Sai Subhash Reddy Pittala, Uma Maheswara R Meleti, Manasa Thatipamula

Abstract: In this project, we have explored machine learning approaches for predicting hearing loss thresholds on the brain's gray matter 3D images. We have solved the problem statement in two phases. In the first phase, we used a 3D CNN model to reduce high-dimensional input into latent space and decode it into an original image to represent the input in rich feature space. In the second phase, we utilized this model to reduce input into rich features and used these features to train standard machine learning models for predicting hearing thresholds. We have experimented with autoencoders and variational autoencoders in the first phase for dimensionality reduction and explored random forest, XGBoost and multi-layer perceptron for regressing the thresholds. We split the given data set into training and testing sets and achieved an 8.80 range and 22.57 range for PT500 and PT4000 on the test set, respectively. We got the lowest RMSE using multi-layer perceptron among the other models. Our approach leverages the unique capabilities of VAEs to capture complex, non-linear relationships within high-dimensional neuroimaging data. We rigorously evaluated the models using various metrics, focusing on the root mean squared error (RMSE). The results highlight the efficacy of the multi-layer neural network model, which outperformed other techniques in terms of accuracy. This project advances the application of data mining in medical diagnostics and enhances our understanding of age-related hearing loss through innovative machine-learning frameworks.

replace-cross USC: Uncompromising Spatial Constraints for Safety-Oriented 3D Object Detectors in Autonomous Driving

Authors: Brian Hsuan-Cheng Liao, Chih-Hong Cheng, Hasan Esen, Alois Knoll

Abstract: We consider the safety-oriented performance of 3D object detectors in autonomous driving contexts. Specifically, despite impressive results shown by the mass literature, developers often find it hard to ensure the safe deployment of these learning-based perception models. Attributing the challenge to the lack of safety-oriented metrics, we hereby present uncompromising spatial constraints (USC), which characterize a simple yet important localization requirement demanding the predictions to fully cover the objects when seen from the autonomous vehicle. The constraints, as we formulate using the perspective and bird's-eye views, can be naturally reflected by quantitative measures, such that having an object detector with a higher score implies a lower risk of collision. Finally, beyond model evaluation, we incorporate the quantitative measures into common loss functions to enable safety-oriented fine-tuning for existing models. With experiments using the nuScenes dataset and a closed-loop simulation, our work demonstrates such considerations of safety notions at the perception level not only improve model performances beyond accuracy but also allow for a more direct linkage to actual system safety.

replace-cross TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation

Authors: Chuer Pan, Brian Okorn, Harry Zhang, Ben Eisner, David Held

Abstract: How do we imbue robots with the ability to efficiently manipulate unseen objects and transfer relevant skills based on demonstrations? End-to-end learning methods often fail to generalize to novel objects or unseen configurations. Instead, we focus on the task-specific pose relationship between relevant parts of interacting objects. We conjecture that this relationship is a generalizable notion of a manipulation task that can transfer to new objects in the same category; examples include the relationship between the pose of a pan relative to an oven or the pose of a mug relative to a mug rack. We call this task-specific pose relationship "cross-pose" and provide a mathematical definition of this concept. We propose a vision-based system that learns to estimate the cross-pose between two objects for a given manipulation task using learned cross-object correspondences. The estimated cross-pose is then used to guide a downstream motion planner to manipulate the objects into the desired pose relationship (placing a pan into the oven or the mug onto the mug rack). We demonstrate our method's capability to generalize to unseen objects, in some cases after training on only 10 demonstrations in the real world. Results show that our system achieves state-of-the-art performance in both simulated and real-world experiments across a number of tasks. Supplementary information and videos can be found at https://sites.google.com/view/tax-pose/home.

URLs: https://sites.google.com/view/tax-pose/home.

replace-cross Spectral Regularized Kernel Two-Sample Tests

Authors: Omar Hagrass, Bharath K. Sriperumbudur, Bing Li

Abstract: Over the last decade, an approach that has gained a lot of popularity to tackle nonparametric testing problems on general (i.e., non-Euclidean) domains is based on the notion of reproducing kernel Hilbert space (RKHS) embedding of probability distributions. The main goal of our work is to understand the optimality of two-sample tests constructed based on this approach. First, we show the popular MMD (maximum mean discrepancy) two-sample test to be not optimal in terms of the separation boundary measured in Hellinger distance. Second, we propose a modification to the MMD test based on spectral regularization by taking into account the covariance information (which is not captured by the MMD test) and prove the proposed test to be minimax optimal with a smaller separation boundary than that achieved by the MMD test. Third, we propose an adaptive version of the above test which involves a data-driven strategy to choose the regularization parameter and show the adaptive test to be almost minimax optimal up to a logarithmic factor. Moreover, our results hold for the permutation variant of the test where the test threshold is chosen elegantly through the permutation of the samples. Through numerical experiments on synthetic and real data, we demonstrate the superior performance of the proposed test in comparison to the MMD test and other popular tests in the literature.

replace-cross Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA

Authors: James Seale Smith, Yen-Chang Hsu, Lingyu Zhang, Ting Hua, Zsolt Kira, Yilin Shen, Hongxia Jin

Abstract: Recent works demonstrate a remarkable ability to customize text-to-image diffusion models while only providing a few example images. What happens if you try to customize such models using multiple, fine-grained concepts in a sequential (i.e., continual) manner? In our work, we show that recent state-of-the-art customization of text-to-image models suffer from catastrophic forgetting when new concepts arrive sequentially. Specifically, when adding a new concept, the ability to generate high quality images of past, similar concepts degrade. To circumvent this forgetting, we propose a new method, C-LoRA, composed of a continually self-regularized low-rank adaptation in cross attention layers of the popular Stable Diffusion model. Furthermore, we use customization prompts which do not include the word of the customized object (i.e., "person" for a human face dataset) and are initialized as completely random embeddings. Importantly, our method induces only marginal additional parameter costs and requires no storage of user data for replay. We show that C-LoRA not only outperforms several baselines for our proposed setting of text-to-image continual customization, which we refer to as Continual Diffusion, but that we achieve a new state-of-the-art in the well-established rehearsal-free continual learning setting for image classification. The high achieving performance of C-LoRA in two separate domains positions it as a compelling solution for a wide range of applications, and we believe it has significant potential for practical impact. Project page: https://jamessealesmith.github.io/continual-diffusion/

URLs: https://jamessealesmith.github.io/continual-diffusion/

replace-cross Causal Discovery via Conditional Independence Testing with Proxy Variables

Authors: Mingzhou Liu, Xinwei Sun, Yu Qiao, Yizhou Wang

Abstract: Distinguishing causal connections from correlations is important in many scenarios. However, the presence of unobserved variables, such as the latent confounder, can introduce bias in conditional independence testing commonly employed in constraint-based causal discovery for identifying causal relations. To address this issue, existing methods introduced proxy variables to adjust for the bias caused by unobserveness. However, these methods were either limited to categorical variables or relied on strong parametric assumptions for identification. In this paper, we propose a novel hypothesis-testing procedure that can effectively examine the existence of the causal relationship over continuous variables, without any parametric constraint. Our procedure is based on discretization, which under completeness conditions, is able to asymptotically establish a linear equation whose coefficient vector is identifiable under the causal null hypothesis. Based on this, we introduce our test statistic and demonstrate its asymptotic level and power. We validate the effectiveness of our procedure using both synthetic and real-world data.

replace-cross Accelerating Diffusion Models for Inverse Problems through Shortcut Sampling

Authors: Gongye Liu, Haoze Sun, Jiayi Li, Fei Yin, Yujiu Yang

Abstract: Diffusion models have recently demonstrated an impressive ability to address inverse problems in an unsupervised manner. While existing methods primarily focus on modifying the posterior sampling process, the potential of the forward process remains largely unexplored. In this work, we propose Shortcut Sampling for Diffusion(SSD), a novel approach for solving inverse problems in a zero-shot manner. Instead of initiating from random noise, the core concept of SSD is to find a specific transitional state that bridges the measurement image y and the restored image x. By utilizing the shortcut path of "input - transitional state - output", SSD can achieve precise restoration with fewer steps. To derive the transitional state during the forward process, we introduce Distortion Adaptive Inversion. Moreover, we apply back projection as additional consistency constraints during the generation process. Experimentally, we demonstrate SSD's effectiveness on multiple representative IR tasks. Our method achieves competitive results with only 30 NFEs compared to state-of-the-art zero-shot methods(100 NFEs) and outperforms them with 100 NFEs in certain tasks. Code is available at https://github.com/GongyeLiu/SSD

URLs: https://github.com/GongyeLiu/SSD

replace-cross High-probability sample complexities for policy evaluation with linear function approximation

Authors: Gen Li, Weichen Wu, Yuejie Chi, Cong Ma, Alessandro Rinaldo, Yuting Wei

Abstract: This paper is concerned with the problem of policy evaluation with linear function approximation in discounted infinite horizon Markov decision processes. We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms: the temporal difference (TD) learning algorithm and the two-timescale linear TD with gradient correction (TDC) algorithm. In both the on-policy setting, where observations are generated from the target policy, and the off-policy setting, where samples are drawn from a behavior policy potentially different from the target policy, we establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level. We also exhihit an explicit dependence on problem-related quantities, and show in the on-policy setting that our upper bound matches the minimax lower bound on crucial problem parameters, including the choice of the feature maps and the problem dimension.

replace-cross Online Dynamic Submodular Optimization

Authors: Antoine Lesage-Landry, Julien Pallage

Abstract: We propose new algorithms with provable performance for online binary optimization subject to general constraints and in dynamic settings. We consider the subset of problems in which the objective function is submodular. We propose the online submodular greedy algorithm (OSGA) which solves to optimality an approximation of the previous round loss function to avoid the NP-hardness of the original problem. We extend OSGA to a generic approximation function. We show that OSGA has a dynamic regret bound similar to the tightest bounds in online convex optimization with respect to the time horizon and the cumulative round optimum variation. For instances where no approximation exists or a computationally simpler implementation is desired, we design the online submodular projected gradient descent (OSPGD) by leveraging the Lova\'sz extension. We obtain a regret bound that is akin to the conventional online gradient descent (OGD). Finally, we numerically test our algorithms in two power system applications: fast-timescale demand response and real-time distribution network reconfiguration.

replace-cross Foundational Policy Acquisition via Multitask Learning for Motor Skill Generation

Authors: Satoshi Yamamori, Jun Morimoto

Abstract: In this study, we propose a multitask reinforcement learning algorithm for foundational policy acquisition to generate novel motor skills. Inspired by human sensorimotor adaptation mechanisms, we aim to train encoder-decoder networks that can be commonly used to learn novel motor skills in a single movement category. To train the policy network, we develop the multitask reinforcement learning method, where the policy needs to cope with changes in goals or environments with different reward functions or physical parameters of the environment in dynamic movement generation tasks. Here, as a concrete task, we evaluated the proposed method with the ball heading task using a monopod robot model. The results showed that the proposed method could adapt to novel target positions or inexperienced ball restitution coefficients. Furthermore, we demonstrated that the acquired foundational policy network originally learned for heading motion, can be used to generate an entirely new overhead kicking skill.

replace-cross TExplain: Explaining Learned Visual Features via Pre-trained (Frozen) Language Models

Authors: Saeid Asgari Taghanaki, Aliasghar Khani, Ali Saheb Pasand, Amir Khasahmadi, Aditya Sanghi, Karl D. D. Willis, Ali Mahdavi-Amiri

Abstract: Interpreting the learned features of vision models has posed a longstanding challenge in the field of machine learning. To address this issue, we propose a novel method that leverages the capabilities of language models to interpret the learned features of pre-trained image classifiers. Our method, called TExplain, tackles this task by training a neural network to establish a connection between the feature space of image classifiers and language models. Then, during inference, our approach generates a vast number of sentences to explain the features learned by the classifier for a given image. These sentences are then used to extract the most frequent words, providing a comprehensive understanding of the learned features and patterns within the classifier. Our method, for the first time, utilizes these frequent words corresponding to a visual representation to provide insights into the decision-making process of the independently trained classifier, enabling the detection of spurious correlations, biases, and a deeper comprehension of its behavior. To validate the effectiveness of our approach, we conduct experiments on diverse datasets, including ImageNet-9L and Waterbirds. The results demonstrate the potential of our method to enhance the interpretability and robustness of image classifiers.

replace-cross Fin-Fact: A Benchmark Dataset for Multimodal Financial Fact Checking and Explanation Generation

Authors: Aman Rangapur, Haoran Wang, Ling Jian, Kai Shu

Abstract: Fact-checking in financial domain is under explored, and there is a shortage of quality dataset in this domain. In this paper, we propose Fin-Fact, a benchmark dataset for multimodal fact-checking within the financial domain. Notably, it includes professional fact-checker annotations and justifications, providing expertise and credibility. With its multimodal nature encompassing both textual and visual content, Fin-Fact provides complementary information sources to enhance factuality analysis. Its primary objective is combating misinformation in finance, fostering transparency, and building trust in financial reporting and news dissemination. By offering insightful explanations, Fin-Fact empowers users, including domain experts and end-users, to understand the reasoning behind fact-checking decisions, validating claim credibility, and fostering trust in the fact-checking process. The Fin-Fact dataset, along with our experimental codes is available at https://github.com/IIT-DM/Fin-Fact/.

URLs: https://github.com/IIT-DM/Fin-Fact/.

replace-cross Large language models can accurately predict searcher preferences

Authors: Paul Thomas, Seth Spielman, Nick Craswell, Bhaskar Mitra

Abstract: Relevance labels, which indicate whether a search result is valuable to a searcher, are key to evaluating and optimising search systems. The best way to capture the true preferences of users is to ask them for their careful feedback on which results would be useful, but this approach does not scale to produce a large number of labels. Getting relevance labels at scale is usually done with third-party labellers, who judge on behalf of the user, but there is a risk of low-quality data if the labeller doesn't understand user needs. To improve quality, one standard approach is to study real users through interviews, user studies and direct feedback, find areas where labels are systematically disagreeing with users, then educate labellers about user needs through judging guidelines, training and monitoring. This paper introduces an alternate approach for improving label quality. It takes careful feedback from real users, which by definition is the highest-quality first-party gold data that can be derived, and develops an large language model prompt that agrees with that data. We present ideas and observations from deploying language models for large-scale relevance labelling at Bing, and illustrate with data from TREC. We have found large language models can be effective, with accuracy as good as human labellers and similar capability to pick the hardest queries, best runs, and best groups. Systematic changes to the prompts make a difference in accuracy, but so too do simple paraphrases. To measure agreement with real searchers needs high-quality ``gold'' labels, but with these we find that models produce better labels than third-party workers, for a fraction of the cost, and these labels let us train notably better rankers.

replace-cross CPLLM: Clinical Prediction with Large Language Models

Authors: Ofir Ben Shoham, Nadav Rappoport

Abstract: We present Clinical Prediction with Large Language Models (CPLLM), a method that involves fine-tuning a pre-trained Large Language Model (LLM) for clinical disease and readmission prediction. We utilized quantization and fine-tuned the LLM using prompts. For diagnosis prediction, we predict whether patients will be diagnosed with a target disease during their next visit or in the subsequent diagnosis, leveraging their historical diagnosis records. We compared our results to various baselines, including RETAIN, and Med-BERT, the current state-of-the-art model for disease prediction using temporal structured EHR data. In addition, We also evaluated CPLLM for patient hospital readmission prediction and compared our method's performance with benchmark baselines. Our experiments have shown that our proposed method, CPLLM, surpasses all the tested models in terms of PR-AUC and ROC-AUC metrics, showing state-of-the-art results for diagnosis prediction and patient hospital readmission prediction. Such a method can be easily implemented and integrated into the clinical process to help care providers estimate the next steps of patients

replace-cross Conformal Decision Theory: Safe Autonomous Decisions from Imperfect Predictions

Authors: Jordan Lekeufack, Anastasios N. Angelopoulos, Andrea Bajcsy, Michael I. Jordan, Jitendra Malik

Abstract: We introduce Conformal Decision Theory, a framework for producing safe autonomous decisions despite imperfect machine learning predictions. Examples of such decisions are ubiquitous, from robot planning algorithms that rely on pedestrian predictions, to calibrating autonomous manufacturing to exhibit high throughput and low error, to the choice of trusting a nominal policy versus switching to a safe backup policy at run-time. The decisions produced by our algorithms are safe in the sense that they come with provable statistical guarantees of having low risk without any assumptions on the world model whatsoever; the observations need not be I.I.D. and can even be adversarial. The theory extends results from conformal prediction to calibrate decisions directly, without requiring the construction of prediction sets. Experiments demonstrate the utility of our approach in robot motion planning around humans, automated stock trading, and robot manufacturing.

replace-cross Language Models As Semantic Indexers

Authors: Bowen Jin, Hansi Zeng, Guoyin Wang, Xiusi Chen, Tianxin Wei, Ruirui Li, Zhengyang Wang, Zheng Li, Yang Li, Hanqing Lu, Suhang Wang, Jiawei Han, Xianfeng Tang

Abstract: Semantic identifier (ID) is an important concept in information retrieval that aims to preserve the semantics of objects such as documents and items inside their IDs. Previous studies typically adopt a two-stage pipeline to learn semantic IDs by first procuring embeddings using off-the-shelf text encoders and then deriving IDs based on the embeddings. However, each step introduces potential information loss, and there is usually an inherent mismatch between the distribution of embeddings within the latent space produced by text encoders and the anticipated distribution required for semantic indexing. It is non-trivial to design a method that can learn the document's semantic representations and its hierarchical structure simultaneously, given that semantic IDs are discrete and sequentially structured, and the semantic supervision is deficient. In this paper, we introduce LMIndexer, a self-supervised framework to learn semantic IDs with a generative language model. We tackle the challenge of sequential discrete ID by introducing a semantic indexer capable of generating neural sequential discrete representations with progressive training and contrastive learning. In response to the semantic supervision deficiency, we propose to train the model with a self-supervised document reconstruction objective. We show the high quality of the learned IDs and demonstrate their effectiveness on three tasks including recommendation, product search, and document retrieval on five datasets from various domains. Code is available at https://github.com/PeterGriffinJin/LMIndexer.

URLs: https://github.com/PeterGriffinJin/LMIndexer.

replace-cross The Perception-Robustness Tradeoff in Deterministic Image Restoration

Authors: Guy Ohayon, Tomer Michaeli, Michael Elad

Abstract: We study the behavior of deterministic methods for solving inverse problems in imaging. These methods are commonly designed to achieve two goals: (1) attaining high perceptual quality, and (2) generating reconstructions that are consistent with the measurements. We provide a rigorous proof that the better a predictor satisfies these two requirements, the larger its Lipschitz constant must be, regardless of the nature of the degradation involved. In particular, to approach perfect perceptual quality and perfect consistency, the Lipschitz constant of the model must grow to infinity. This implies that such methods are necessarily more susceptible to adversarial attacks. We demonstrate our theory on single image super-resolution algorithms, addressing both noisy and noiseless settings. We also show how this undesired behavior can be leveraged to explore the posterior distribution, thereby allowing the deterministic model to imitate stochastic methods.

replace-cross Transfer Learning in Robotics: An Upcoming Breakthrough? A Review of Promises and Challenges

Authors: No\'emie Jaquier, Michael C. Welle, Andrej Gams, Kunpeng Yao, Bernardo Fichera, Aude Billard, Ale\v{s} Ude, Tamim Asfour, Danica Kragic

Abstract: Transfer learning is a conceptually-enticing paradigm in pursuit of truly intelligent embodied agents. The core concept -- reusing prior knowledge to learn in and from novel situations -- is successfully leveraged by humans to handle novel situations. In recent years, transfer learning has received renewed interest from the community from different perspectives, including imitation learning, domain adaptation, and transfer of experience from simulation to the real world, among others. In this paper, we unify the concept of transfer learning in robotics and provide the first taxonomy of its kind considering the key concepts of robot, task, and environment. Through a review of the promises and challenges in the field, we identify the need of transferring at different abstraction levels, the need of quantifying the transfer gap and the quality of transfer, as well as the dangers of negative transfer. Via this position paper, we hope to channel the effort of the community towards the most significant roadblocks to realize the full potential of transfer learning in robotics.

replace-cross Learning finitely correlated states: stability of the spectral reconstruction

Authors: Marco Fanizza, Niklas Galke, Josep Lumbreras, Cambyse Rouz\'e, Andreas Winter

Abstract: We show that marginals of blocks of $t$ systems of any finitely correlated translation invariant state on a chain can be learned, in trace distance, with $O(t^2)$ copies -- with an explicit dependence on local dimension, memory dimension and spectral properties of a certain map constructed from the state -- and computational complexity polynomial in $t$. The algorithm requires only the estimation of a marginal of a controlled size, in the worst case bounded by the minimum bond dimension, from which it reconstructs a translation invariant matrix product operator. In the analysis, a central role is played by the theory of operator systems. A refined error bound can be proven for $C^*$-finitely correlated states, which have an operational interpretation in terms of sequential quantum channels applied to the memory system. We can also obtain an analogous error bound for a class of matrix product density operators reconstructible by local marginals. In this case, a linear number of marginals must be estimated, obtaining a sample complexity of $\tilde{O}(t^3)$. The learning algorithm also works for states that are only close to a finitely correlated state, with the potential of providing competitive algorithms for other interesting families of states.

replace-cross Structured Probabilistic Coding

Authors: Dou Hu, Lingwei Wei, Yaxin Liu, Wei Zhou, Songlin Hu

Abstract: This paper presents a new supervised representation learning framework, namely structured probabilistic coding (SPC), to learn compact and informative representations from input related to the target task. SPC is an encoder-only probabilistic coding technology with a structured regularization from the target space. It can enhance the generalization ability of pre-trained language models for better language understanding. Specifically, our probabilistic coding simultaneously performs information encoding and task prediction in one module to more fully utilize the effective information from input data. It uses variational inference in the output space to reduce randomness and uncertainty. Besides, to better control the learning process of probabilistic representations, a structured regularization is proposed to promote uniformity across classes in the latent space. With the regularization term, SPC can preserve the Gaussian structure of the latent code and achieve better coverage of the hidden space with class uniformly. Experimental results on 12 natural language understanding tasks demonstrate that our SPC effectively improves the performance of pre-trained language models for classification and regression. Extensive experiments show that SPC can enhance the generalization capability, robustness to label noise, and clustering quality of output representations.

replace-cross Attention-Enhanced Reservoir Computing

Authors: Felix K\"oster, Kazutaka Kanno, Jun Ohkubo, Atsushi Uchida

Abstract: Photonic reservoir computing has been successfully utilized in time-series prediction as the need for hardware implementations has increased. Prediction of chaotic time series remains a significant challenge, an area where the conventional reservoir computing framework encounters limitations of prediction accuracy. We introduce an attention mechanism to the reservoir computing model in the output stage. This attention layer is designed to prioritize distinct features and temporal sequences, thereby substantially enhancing the prediction accuracy. Our results show that a photonic reservoir computer enhanced with the attention mechanism exhibits improved prediction capabilities for smaller reservoirs. These advancements highlight the transformative possibilities of reservoir computing for practical applications where accurate prediction of chaotic time series is crucial.

replace-cross Scalable network reconstruction in subquadratic time

Authors: Tiago P. Peixoto

Abstract: Network reconstruction consists in determining the unobserved pairwise couplings between $N$ nodes given only observational data on the resulting behavior that is conditioned on those couplings -- typically a time-series or independent samples from a graphical model. A major obstacle to the scalability of algorithms proposed for this problem is a seemingly unavoidable quadratic complexity of $\Omega(N^2)$, corresponding to the requirement of each possible pairwise coupling being contemplated at least once, despite the fact that most networks of interest are sparse, with a number of non-zero couplings that is only $O(N)$. Here we present a general algorithm applicable to a broad range of reconstruction problems that significantly outperforms this quadratic baseline. Our algorithm relies on a stochastic second neighbor search (Dong et al., 2011) that produces the best edge candidates with high probability, thus bypassing an exhaustive quadratic search. If we rely on the conjecture that the second-neighbor search finishes in log-linear time (Baron & Darling, 2020; 2022), we demonstrate theoretically that our algorithm finishes in subquadratic time, with a data-dependent complexity loosely upper bounded by $O(N^{3/2}\log N)$, but with a more typical log-linear complexity of $O(N\log^2N)$. In practice, we show that our algorithm achieves a performance that is many orders of magnitude faster than the quadratic baseline -- in a manner consistent with our theoretical analysis -- allows for easy parallelization, and thus enables the reconstruction of networks with hundreds of thousands and even millions of nodes and edges.

replace-cross What Are We Optimizing For? A Human-centric Evaluation of Deep Learning-based Movie Recommenders

Authors: Ruixuan Sun, Xinyi Wu, Avinash Akella, Ruoyan Kong, Bart Knijnenburg, Joseph A. Konstan

Abstract: In the past decade, deep learning (DL) models have gained prominence for their exceptional accuracy on benchmark datasets in recommender systems (RecSys). However, their evaluation has primarily relied on offline metrics, overlooking direct user perception and experience. To address this gap, we conduct a human-centric evaluation case study of four leading DL-RecSys models in the movie domain. We test how different DL-RecSys models perform in personalized recommendation generation by conducting survey study with 445 real active users. We find some DL-RecSys models to be superior in recommending novel and unexpected items and weaker in diversity, trustworthiness, transparency, accuracy, and overall user satisfaction compared to classic collaborative filtering (CF) methods. To further explain the reasons behind the underperformance, we apply a comprehensive path analysis. We discover that the lack of diversity and too much serendipity from DL models can negatively impact the consequent perceived transparency and personalization of recommendations. Such a path ultimately leads to lower summative user satisfaction. Qualitatively, we confirm with real user quotes that accuracy plus at least one other attribute is necessary to ensure a good user experience, while their demands for transparency and trust can not be neglected. Based on our findings, we discuss future human-centric DL-RecSys design and optimization strategies.

replace-cross Differentially private Bayesian tests

Authors: Abhisek Chakraborty, Saptati Datta

Abstract: Differential privacy has emerged as an significant cornerstone in the realm of scientific hypothesis testing utilizing confidential data. In reporting scientific discoveries, Bayesian tests are widely adopted since they effectively circumnavigate the key criticisms of P-values, namely, lack of interpretability and inability to quantify evidence in support of the competing hypotheses. We present a novel differentially private Bayesian hypotheses testing framework that arise naturally under a principled data generative mechanism, inherently maintaining the interpretability of the resulting inferences. Furthermore, by focusing on differentially private Bayes factors based on widely used test statistics, we circumvent the need to model the complete data generative mechanism and ensure substantial computational benefits. We also provide a set of sufficient conditions to establish results on Bayes factor consistency under the proposed framework. The utility of the devised technology is showcased via several numerical experiments.

replace-cross Retrieval-Augmented Score Distillation for Text-to-3D Generation

Authors: Junyoung Seo, Susung Hong, Wooseok Jang, In\`es Hyeonsu Kim, Minseop Kwak, Doyup Lee, Seungryong Kim

Abstract: Text-to-3D generation has achieved significant success by incorporating powerful 2D diffusion models, but insufficient 3D prior knowledge also leads to the inconsistency of 3D geometry. Recently, since large-scale multi-view datasets have been released, fine-tuning the diffusion model on the multi-view datasets becomes a mainstream to solve the 3D inconsistency problem. However, it has confronted with fundamental difficulties regarding the limited quality and diversity of 3D data, compared with 2D data. To sidestep these trade-offs, we explore a retrieval-augmented approach tailored for score distillation, dubbed ReDream. We postulate that both expressiveness of 2D diffusion models and geometric consistency of 3D assets can be fully leveraged by employing the semantically relevant assets directly within the optimization process. To this end, we introduce novel framework for retrieval-based quality enhancement in text-to-3D generation. We leverage the retrieved asset to incorporate its geometric prior in the variational objective and adapt the diffusion model's 2D prior toward view consistency, achieving drastic improvements in both geometry and fidelity of generated scenes. We conduct extensive experiments to demonstrate that ReDream exhibits superior quality with increased geometric consistency. Project page is available at https://ku-cvlab.github.io/ReDream/.

URLs: https://ku-cvlab.github.io/ReDream/.

replace-cross Diffusive Gibbs Sampling

Authors: Wenlin Chen, Mingtian Zhang, Brooks Paige, Jos\'e Miguel Hern\'andez-Lobato, David Barber

Abstract: The inadequate mixing of conventional Markov Chain Monte Carlo (MCMC) methods for multi-modal distributions presents a significant challenge in practical applications such as Bayesian inference and molecular dynamics. Addressing this, we propose Diffusive Gibbs Sampling (DiGS), an innovative family of sampling methods designed for effective sampling from distributions characterized by distant and disconnected modes. DiGS integrates recent developments in diffusion models, leveraging Gaussian convolution to create an auxiliary noisy distribution that bridges isolated modes in the original space and applying Gibbs sampling to alternately draw samples from both spaces. A novel Metropolis-within-Gibbs scheme is proposed to enhance mixing in the denoising sampling step. DiGS exhibits a better mixing property for sampling multi-modal distributions than state-of-the-art methods such as parallel tempering, attaining substantially improved performance across various tasks, including mixtures of Gaussians, Bayesian neural networks and molecular dynamics.

replace-cross Blue noise for diffusion models

Authors: Xingchang Huang, Corentin Sala\"un, Cristina Vasconcelos, Christian Theobalt, Cengiz \"Oztireli, Gurprit Singh

Abstract: Most of the existing diffusion models use Gaussian noise for training and sampling across all time steps, which may not optimally account for the frequency contents reconstructed by the denoising network. Despite the diverse applications of correlated noise in computer graphics, its potential for improving the training process has been underexplored. In this paper, we introduce a novel and general class of diffusion models taking correlated noise within and across images into account. More specifically, we propose a time-varying noise model to incorporate correlated noise into the training process, as well as a method for fast generation of correlated noise mask. Our model is built upon deterministic diffusion models and utilizes blue noise to help improve the generation quality compared to using Gaussian white (random) noise only. Further, our framework allows introducing correlation across images within a single mini-batch to improve gradient flow. We perform both qualitative and quantitative evaluations on a variety of datasets using our method, achieving improvements on different tasks over existing deterministic diffusion models in terms of FID metric.

replace-cross Instance-Level Safety-Aware Fidelity of Synthetic Data and Its Calibration

Authors: Chih-Hong Cheng, Paul St\"ockel, Xingyu Zhao

Abstract: Modeling and calibrating the fidelity of synthetic data is paramount in shaping the future of safe and reliable self-driving technology by offering a cost-effective and scalable alternative to real-world data collection. We focus on its role in safety-critical applications, introducing four types of instance-level fidelity that go beyond mere visual input characteristics. The aim is to ensure that applying testing on synthetic data can reveal real-world safety issues, and the absence of safety-critical issues when testing under synthetic data can provide a strong safety guarantee in real-world behavior. We suggest an optimization method to refine the synthetic data generator, reducing fidelity gaps identified by deep learning components. Experiments show this tuning enhances the correlation between safety-critical errors in synthetic and real data.

replace-cross Converting Representational Counterfactuals to Natural Language

Authors: Matan Avitan, Ryan Cotterell, Yoav Goldberg, Shauli Ravfogel

Abstract: Interventions targeting the representation space of language models (LMs) have emerged as an effective means to influence model behavior. Such methods are employed, for example, to eliminate or alter the encoding of demographic information such as gender within the model's representations and, in so doing, create a counterfactual representation. However, because the intervention operates within the representation space, understanding precisely what aspects of the text it modifies poses a challenge. In this paper, we give a method to convert representation counterfactuals into string counterfactuals. We demonstrate that this approach enables us to analyze the linguistic alterations corresponding to a given representation space intervention and to interpret the features utilized to encode a specific concept. Moreover, the resulting counterfactuals can be used to mitigate bias in classification through data augmentation.

replace-cross Zero-shot generalization across architectures for visual classification

Authors: Evan Gerritz, Luciano Dyballa, Steven W. Zucker

Abstract: Generalization to unseen data is a key desideratum for deep networks, but its relation to classification accuracy is unclear. Using a minimalist vision dataset and a measure of generalizability, we show that popular networks, from deep convolutional networks (CNNs) to transformers, vary in their power to extrapolate to unseen classes both across layers and across architectures. Accuracy is not a good predictor of generalizability, and generalization varies non-monotonically with layer depth.

replace-cross Neural Exec: Learning (and Learning from) Execution Triggers for Prompt Injection Attacks

Authors: Dario Pasquini, Martin Strohmeier, Carmela Troncoso

Abstract: We introduce a new family of prompt injection attacks, termed Neural Exec. Unlike known attacks that rely on handcrafted strings (e.g., "Ignore previous instructions and..."), we show that it is possible to conceptualize the creation of execution triggers as a differentiable search problem and use learning-based methods to autonomously generate them. Our results demonstrate that a motivated adversary can forge triggers that are not only drastically more effective than current handcrafted ones but also exhibit inherent flexibility in shape, properties, and functionality. In this direction, we show that an attacker can design and generate Neural Execs capable of persisting through multi-stage preprocessing pipelines, such as in the case of Retrieval-Augmented Generation (RAG)-based applications. More critically, our findings show that attackers can produce triggers that deviate markedly in form and shape from any known attack, sidestepping existing blacklist-based detection and sanitation approaches.

replace-cross Conformal online model aggregation

Authors: Matteo Gasparin, Aaditya Ramdas

Abstract: Conformal prediction equips machine learning models with a reasonable notion of uncertainty quantification without making strong distributional assumptions. It wraps around any black-box prediction model and converts point predictions into set predictions that have a predefined marginal coverage guarantee. However, conformal prediction only works if we fix the underlying machine learning model in advance. A relatively unaddressed issue in conformal prediction is that of model selection and/or aggregation: for a given problem, which of the plethora of prediction methods (random forests, neural nets, regularized linear models, etc.) should we conformalize? This paper proposes a new approach towards conformal model aggregation in online settings that is based on combining the prediction sets from several algorithms by voting, where weights on the models are adapted over time based on past performance.

replace-cross Correcting Diffusion-Based Perceptual Image Compression with Privileged End-to-End Decoder

Authors: Yiyang Ma, Wenhan Yang, Jiaying Liu

Abstract: The images produced by diffusion models can attain excellent perceptual quality. However, it is challenging for diffusion models to guarantee distortion, hence the integration of diffusion models and image compression models still needs more comprehensive explorations. This paper presents a diffusion-based image compression method that employs a privileged end-to-end decoder model as correction, which achieves better perceptual quality while guaranteeing the distortion to an extent. We build a diffusion model and design a novel paradigm that combines the diffusion model and an end-to-end decoder, and the latter is responsible for transmitting the privileged information extracted at the encoder side. Specifically, we theoretically analyze the reconstruction process of the diffusion models at the encoder side with the original images being visible. Based on the analysis, we introduce an end-to-end convolutional decoder to provide a better approximation of the score function $\nabla_{\mathbf{x}_t}\log p(\mathbf{x}_t)$ at the encoder side and effectively transmit the combination. Experiments demonstrate the superiority of our method in both distortion and perception compared with previous perceptual compression methods.

replace-cross How Deep Networks Learn Sparse and Hierarchical Data: the Sparse Random Hierarchy Model

Authors: Umberto Tomasini, Matthieu Wyart

Abstract: Understanding what makes high-dimensional data learnable is a fundamental question in machine learning. On the one hand, it is believed that the success of deep learning lies in its ability to build a hierarchy of representations that become increasingly more abstract with depth, going from simple features like edges to more complex concepts. On the other hand, learning to be insensitive to invariances of the task, such as smooth transformations for image datasets, has been argued to be important for deep networks and it strongly correlates with their performance. In this work, we aim to explain this correlation and unify these two viewpoints. We show that by introducing sparsity to generative hierarchical models of data, the task acquires insensitivity to spatial transformations that are discrete versions of smooth transformations. In particular, we introduce the Sparse Random Hierarchy Model (SRHM), where we observe and rationalize that a hierarchical representation mirroring the hierarchical model is learnt precisely when such insensitivity is learnt, thereby explaining the strong correlation between the latter and performance. Moreover, we quantify how the sample complexity of CNNs learning the SRHM depends on both the sparsity and hierarchical structure of the task.

replace-cross Fewer Truncations Improve Language Modeling

Authors: Hantian Ding, Zijian Wang, Giovanni Paolini, Varun Kumar, Anoop Deoras, Dan Roth, Stefano Soatto

Abstract: In large language model training, input documents are typically concatenated together and then split into sequences of equal length to avoid padding tokens. Despite its efficiency, the concatenation approach compromises data integrity -- it inevitably breaks many documents into incomplete pieces, leading to excessive truncations that hinder the model from learning to compose logically coherent and factually consistent content that is grounded on the complete context. To address the issue, we propose Best-fit Packing, a scalable and efficient method that packs documents into training sequences through length-aware combinatorial optimization. Our method completely eliminates unnecessary truncations while retaining the same training efficiency as concatenation. Empirical results from both text and code pre-training show that our method achieves superior performance (e.g., relatively +4.7% on reading comprehension; +16.8% in context following; and +9.2% on program synthesis), and reduces closed-domain hallucination effectively by up to 58.3%.

replace-cross OpenELM: An Efficient Language Model Family with Open Training and Inference Framework

Authors: Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari

Abstract: The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. For example, with a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring $2\times$ fewer pre-training tokens. Diverging from prior practices that only provide model weights and inference code, and pre-train on private datasets, our release includes the complete framework for training and evaluation of the language model on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations. We also release code to convert models to MLX library for inference and fine-tuning on Apple devices. This comprehensive release aims to empower and strengthen the open research community, paving the way for future open research endeavors. Our source code along with pre-trained model weights and training recipes is available at \url{https://github.com/apple/corenet}. Additionally, \model models can be found on HuggingFace at: \url{https://huggingface.co/apple/OpenELM}.

URLs: https://github.com/apple/corenet, https://huggingface.co/apple/OpenELM

replace-cross Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment

Authors: Tengjun Huang

Abstract: With the rise of Visual and Language Pretraining (VLP), an increasing number of downstream tasks are adopting the paradigm of pretraining followed by fine-tuning. Although this paradigm has demonstrated potential in various multimodal downstream tasks, its implementation in the remote sensing domain encounters some obstacles. Specifically, the tendency for same-modality embeddings to cluster together impedes efficient transfer learning. To tackle this issue, we review the aim of multimodal transfer learning for downstream tasks from a unified perspective, and rethink the optimization process based on three distinct objectives. We propose "Harmonized Transfer Learning and Modality Alignment (HarMA)", a method that simultaneously satisfies task constraints, modality alignment, and single-modality uniform alignment, while minimizing training overhead through parameter-efficient fine-tuning. Remarkably, without the need for external data for training, HarMA achieves state-of-the-art performance in two popular multimodal retrieval tasks in the field of remote sensing. Our experiments reveal that HarMA achieves competitive and even superior performance to fully fine-tuned models with only minimal adjustable parameters. Due to its simplicity, HarMA can be integrated into almost all existing multimodal pretraining models. We hope this method can facilitate the efficient application of large models to a wide range of downstream tasks while significantly reducing the resource consumption. Code is available at https://github.com/seekerhuang/HarMA.

URLs: https://github.com/seekerhuang/HarMA.

replace-cross Physics-informed Convolutional Neural Network for Microgrid Economic Dispatch

Authors: Xiaoyu Ge, Javad Khazaei

Abstract: The variability of renewable energy generation and the unpredictability of electricity demand create a need for real-time economic dispatch (ED) of assets in microgrids. However, solving numerical optimization problems in real-time can be incredibly challenging. This study proposes using a convolutional neural network (CNN) based on deep learning to address these challenges. Compared to traditional methods, CNN is more efficient, delivers more dependable results, and has a shorter response time when dealing with uncertainties. While CNN has shown promising results, it does not extract explainable knowledge from the data. To address this limitation, a physics-inspired CNN model is developed by incorporating constraints of the ED problem into the CNN training to ensure that the model follows physical laws while fitting the data. The proposed method can significantly accelerate real-time economic dispatch of microgrids without compromising the accuracy of numerical optimization techniques. The effectiveness of the proposed data-driven approach for optimal allocation of microgrid resources in real-time is verified through a comprehensive comparison with conventional numerical optimization approaches.

replace-cross A Careful Examination of Large Language Model Performance on Grade School Arithmetic

Authors: Hugh Zhang, Jeff Da, Dean Lee, Vaughn Robinson, Catherine Wu, Will Song, Tiffany Zhao, Pranav Raja, Dylan Slack, Qin Lyu, Sean Hendryx, Russell Kaplan, Michele Lunati, Summer Yue

Abstract: Large language models (LLMs) have achieved impressive success on many benchmarks for mathematical reasoning. However, there is growing concern that some of this performance actually reflects dataset contamination, where data closely resembling benchmark questions leaks into the training data, instead of true reasoning ability. To investigate this claim rigorously, we commission Grade School Math 1000 (GSM1k). GSM1k is designed to mirror the style and complexity of the established GSM8k benchmark, the gold standard for measuring elementary mathematical reasoning. We ensure that the two benchmarks are comparable across important metrics such as human solve rates, number of steps in solution, answer magnitude, and more. When evaluating leading open- and closed-source LLMs on GSM1k, we observe accuracy drops of up to 13%, with several families of models (e.g., Phi and Mistral) showing evidence of systematic overfitting across almost all model sizes. At the same time, many models, especially those on the frontier, (e.g., Gemini/GPT/Claude) show minimal signs of overfitting. Further analysis suggests a positive relationship (Spearman's r^2=0.32) between a model's probability of generating an example from GSM8k and its performance gap between GSM8k and GSM1k, suggesting that many models may have partially memorized GSM8k.