Authors: Sinan Ibrahim, Mostafa Mostafa, Ali Jnadi, Pavel Osinenko
Abstract: The aim of Reinforcement Learning (RL) in real-world applications is to create systems capable of making autonomous decisions by learning from their environment through trial and error. This paper emphasizes the importance of reward engineering and reward shaping in enhancing the efficiency and effectiveness of reinforcement learning algorithms. Reward engineering involves designing reward functions that accurately reflect the desired outcomes, while reward shaping provides additional feedback to guide the learning process, accelerating convergence to optimal policies. Despite significant advancements in reinforcement learning, several limitations persist. One key challenge is the sparse and delayed nature of rewards in many real-world scenarios, which can hinder learning progress. Additionally, the complexity of accurately modeling real-world environments and the computational demands of reinforcement learning algorithms remain substantial obstacles. On the other hand, recent advancements in deep learning and neural networks have significantly improved the capability of reinforcement learning systems to handle high-dimensional state and action spaces, enabling their application to complex tasks such as robotics, autonomous driving, and game playing. This paper provides a comprehensive review of the current state of reinforcement learning, focusing on the methodologies and techniques used in reward engineering and reward shaping. It critically analyzes the limitations and recent advancements in the field, offering insights into future research directions and potential applications in various domains.
Authors: Arya Hadizadeh Moghaddam, Mohsen Nayebi Kerdabadi, Mei Liu, Zijun Yao
Abstract: Recent advancements in sequential modeling applied to Electronic Health Records (EHR) have greatly influenced prescription recommender systems. While the recent literature on drug recommendation has shown promising performance, the study of discovering a diversity of coexisting temporal relationships at the level of medical codes over consecutive visits remains less explored. The goal of this study can be motivated from two perspectives. First, there is a need to develop a sophisticated sequential model capable of disentangling the complex relationships across sequential visits. Second, it is crucial to establish multiple and diverse health profiles for the same patient to ensure a comprehensive consideration of different medical intents in drug recommendation. To achieve this goal, we introduce Attentive Recommendation with Contrasted Intents (ARCI), a multi-level transformer-based method designed to capture the different but coexisting temporal paths across a shared sequence of visits. Specifically, we propose a novel intent-aware method with contrastive learning, that links specialized medical intents of the patients to the transformer heads for extracting distinct temporal paths associated with different health profiles. We conducted experiments on two real-world datasets for the prescription recommendation task using both ranking and classification metrics. Our results demonstrate that ARCI has outperformed the state-of-the-art prescription recommendation methods and is capable of providing interpretable insights for healthcare practitioners.
Authors: Matthew Morris, David J. Tena Cucala, Bernardo Cuenca Grau, Ian Horrocks
Abstract: Graph neural networks (GNNs) are frequently used to predict missing facts in knowledge graphs (KGs). Motivated by the lack of explainability for the outputs of these models, recent work has aimed to explain their predictions using Datalog, a widely used logic-based formalism. However, such work has been restricted to certain subclasses of GNNs. In this paper, we consider one of the most popular GNN architectures for KGs, R-GCN, and we provide two methods to extract rules that explain its predictions and are sound, in the sense that each fact derived by the rules is also predicted by the GNN, for any input dataset. Furthermore, we provide a method that can verify that certain classes of Datalog rules are not sound for the R-GCN. In our experiments, we train R-GCNs on KG completion benchmarks, and we are able to verify that no Datalog rule is sound for these models, even though the models often obtain high to near-perfect accuracy. This raises some concerns about the ability of R-GCN models to generalise and about the explainability of their predictions. We further provide two variations to the training paradigm of R-GCN that encourage it to learn sound rules and find a trade-off between model accuracy and the number of learned sound rules.
Authors: Yang Lu, Felix Zhan
Abstract: Kolmogorov Arnold Networks (KAN) are highly efficient in inference and can handle complex patterns once trained, making them desirable for production environments and ensuring a fast service experience in the finance and electronic shopping industries. However, we found that KAN, in general, is not suitable for fraud detection problems. We also discovered a quick method to determine whether a problem is solvable by KAN: if the data can be effectively separated using spline interpolation with varying intervals after applying Principal Component Analysis (PCA) to reduce the data dimensions to two, KAN can outperform most machine learning algorithms. Otherwise, it indicates KAN may not solve the problem effectively compared to other machine learning algorithms. We also propose a heuristic approach for selecting the appropriate hyperparameters for KAN to significantly accelerate training time compared to grid search hyperparameter tuning, which usually takes a month for a comprehensive grid search. Specifically, the width parameter should generally follow a pyramid structure, allowing efficient spline mixing, and k should be fixed at 15, with the grid number fixed at 5. This streamlined approach minimizes the number of evaluations required, significantly speeding up the hyperparameter tuning process while still achieving robust performance metrics.
Authors: Chengyu Gong, Gefei Shen, Luanzheng Guo, Nathan Tallent, Dongfang Zhao
Abstract: One of the most common operations in multimodal scientific data management is searching for the $k$ most similar items (or, $k$-nearest neighbors, KNN) from the database after being provided a new item. Although recent advances of multimodal machine learning models offer a \textit{semantic} index, the so-called \textit{embedding vectors} mapped from the original multimodal data, the dimension of the resulting embedding vectors are usually on the order of hundreds or a thousand, which are impractically high for time-sensitive scientific applications. This work proposes to reduce the dimensionality of the output embedding vectors such that the set of top-$k$ nearest neighbors do not change in the lower-dimensional space, namely Order-Preserving Dimension Reduction (OPDR). In order to develop such an OPDR method, our central hypothesis is that by analyzing the intrinsic relationship among key parameters during the dimension-reduction map, a quantitative function may be constructed to reveal the correlation between the target (lower) dimensionality and other variables. To demonstrate the hypothesis, this paper first defines a formal measure function to quantify the KNN similarity for a specific vector, then extends the measure into an aggregate accuracy of the global metric spaces, and finally derives a closed-form function between the target (lower) dimensionality and other variables. We incorporate the closed-function into popular dimension-reduction methods, various distance metrics, and embedding models.
Authors: Toshihide Ubukata, Jialong Li, Kenji Tei
Abstract: Diffusion models, which leverage stochastic processes to capture complex data distributions effectively, have shown their performance as generative models, achieving notable success in image-related tasks through iterative denoising processes. Recently, diffusion models have been further applied and show their strong abilities in planning tasks, leading to a significant growth in related publications since 2023. To help researchers better understand the field and promote the development of the field, we conduct a systematic literature review of recent advancements in the application of diffusion models for planning. Specifically, this paper categorizes and discusses the current literature from the following perspectives: (i) relevant datasets and benchmarks used for evaluating diffusion modelbased planning; (ii) fundamental studies that address aspects such as sampling efficiency; (iii) skill-centric and condition-guided planning for enhancing adaptability; (iv) safety and uncertainty managing mechanism for enhancing safety and robustness; and (v) domain-specific application such as autonomous driving. Finally, given the above literature review, we further discuss the challenges and future directions in this field.
Authors: Zhonghang Li, Long Xia, Lei Shi, Yong Xu, Dawei Yin, Chao Huang
Abstract: Accurate traffic forecasting is crucial for effective urban planning and transportation management, enabling efficient resource allocation and enhanced travel experiences. However, existing models often face limitations in generalization, struggling with zero-shot prediction on unseen regions and cities, as well as diminished long-term accuracy. This is primarily due to the inherent challenges in handling the spatial and temporal heterogeneity of traffic data, coupled with the significant distribution shift across time and space. In this work, we aim to unlock new possibilities for building versatile, resilient and adaptive spatio-temporal foundation models for traffic prediction. To achieve this goal, we introduce a novel foundation model, named OpenCity, that can effectively capture and normalize the underlying spatio-temporal patterns from diverse data characteristics, facilitating zero-shot generalization across diverse urban environments. OpenCity integrates the Transformer architecture with graph neural networks to model the complex spatio-temporal dependencies in traffic data. By pre-training OpenCity on large-scale, heterogeneous traffic datasets, we enable the model to learn rich, generalizable representations that can be seamlessly applied to a wide range of traffic forecasting scenarios. Experimental results demonstrate that OpenCity exhibits exceptional zero-shot predictive performance. Moreover, OpenCity showcases promising scaling laws, suggesting the potential for developing a truly one-for-all traffic prediction solution that can adapt to new urban contexts with minimal overhead. We made our proposed OpenCity model open-source and it is available at the following link: https://github.com/HKUDS/OpenCity.
Authors: Manon Revel, Matteo Cargnelutti, Tyna Eloundou, Greg Leppert
Abstract: Reinforcement Learning from Human Feedback (RLHF) aims to align language models (LMs) with human values by training reward models (RMs) on binary preferences and using these RMs to fine-tune the base LMs. Despite its importance, the internal mechanisms of RLHF remain poorly understood. This paper introduces new metrics to evaluate the effectiveness of modeling and aligning human values, namely feature imprint, alignment resistance and alignment robustness. We categorize alignment datasets into target features (desired values) and spoiler features (undesired concepts). By regressing RM scores against these features, we quantify the extent to which RMs reward them - a metric we term feature imprint. We define alignment resistance as the proportion of the preference dataset where RMs fail to match human preferences, and we assess alignment robustness by analyzing RM responses to perturbed inputs. Our experiments, utilizing open-source components like the Anthropic/hh-rlhf preference dataset and OpenAssistant RMs, reveal significant imprints of target features and a notable sensitivity to spoiler features. We observed a 26% incidence of alignment resistance in portions of the dataset where LM-labelers disagreed with human preferences. Furthermore, we find that misalignment often arises from ambiguous entries within the alignment dataset. These findings underscore the importance of scrutinizing both RMs and alignment datasets for a deeper understanding of value alignment.
Authors: Xin Tong, Bryan Quaife
Abstract: Data-driven techniques are being increasingly applied to complement physics-based models in fire science. However, the lack of sufficiently large datasets continues to hinder the application of certain machine learning techniques. In this paper, we use simulated data to investigate the ability of neural networks to parameterize dynamics in fire science. In particular, we investigate neural networks that map five key parameters in fire spread to the first arrival time, and the corresponding inverse problem. By using simulated data, we are able to characterize the error, the required dataset size, and the convergence properties of these neural networks. For the inverse problem, we quantify the network's sensitivity in estimating each of the key parameters. The findings demonstrate the potential of machine learning in fire science, highlight the challenges associated with limited dataset sizes, and quantify the sensitivity of neural networks to estimate key parameters governing fire spread dynamics.
Authors: Jingyun Chen, Martin King, Yading Yuan
Abstract: Dose prediction plays a key role in knowledge-based planning (KBP) by automatically generating patient-specific dose distribution. Recent advances in deep learning-based dose prediction methods necessitates collaboration among data contributors for improved performance. Federated learning (FL) has emerged as a solution, enabling medical centers to jointly train deep-learning models without compromising patient data privacy. We developed the FedKBP framework to evaluate the performances of centralized, federated, and individual (i.e. separated) training of dose prediction model on the 340 plans from OpenKBP dataset. To simulate FL and individual training, we divided the data into 8 training sites. To evaluate the effect of inter-site data variation on model training, we implemented two types of case distributions: 1) Independent and identically distributed (IID), where the training and validating cases were evenly divided among the 8 sites, and 2) non-IID, where some sites have more cases than others. The results show FL consistently outperforms individual training on both model optimization speed and out-of-sample testing scores, highlighting the advantage of FL over individual training. Under IID data division, FL shows comparable performance to centralized training, underscoring FL as a promising alternative to traditional pooled-data training. Under non-IID division, larger sites outperformed smaller sites by up to 19% on testing scores, confirming the need of collaboration among data owners to achieve better prediction accuracy. Meanwhile, non-IID FL showed reduced performance as compared to IID FL, posing the need for more sophisticated FL method beyond mere model averaging to handle data variation among participating sites.
Authors: Xiaochen Wang, Jiaqi Wang, Houping Xiao, Jinghui Chen, Fenglong Ma
Abstract: Foundation models have demonstrated remarkable capabilities in handling diverse modalities and tasks, outperforming conventional artificial intelligence (AI) approaches that are highly task-specific and modality-reliant. In the medical domain, however, the development of comprehensive foundation models is constrained by limited access to diverse modalities and stringent privacy regulations. To address these constraints, this study introduces a novel knowledge injection approach, FedKIM, designed to scale the medical foundation model within a federated learning framework. FedKIM leverages lightweight local models to extract healthcare knowledge from private data and integrates this knowledge into a centralized foundation model using a designed adaptive Multitask Multimodal Mixture Of Experts (M3OE) module. This method not only preserves privacy but also enhances the model's ability to handle complex medical tasks involving multiple modalities. Our extensive experiments across twelve tasks in seven modalities demonstrate the effectiveness of FedKIM in various settings, highlighting its potential to scale medical foundation models without direct access to sensitive data.
Authors: R. I. Cukier
Abstract: Transformers suffer from the computational overhead of their quadratic dependence on the length of sequences processed. We present three methods, all adding an intermediate step between training and inference/generation, which extend the autoregressive length of transformers. All rely on a Maximum Entropy Principle (MEP) whereby entropy is maximized in the presence of suitable constraints, accounted for by use of Lagrange Multipliers. These constraint methods extend the autoregressive character from T to 2T tokens in a linear-with-T fashion. There is overhead associated with this added step, but they should still be faster than the standard methods.
Authors: Cheng Lin, Lujun Li, Dezhi Li, Jie Zou, Wenhan Luo, Wei Xue, Yike Guo
Abstract: In this paper, we introduce Nested Low-Rank Adaptation (NoRA), a novel approach to parameter-efficient fine-tuning that extends the capabilities of Low-Rank Adaptation (LoRA) techniques. Vanilla LoRA overlooks pre-trained weight inheritance and still requires fine-tuning numerous parameters. To addresses these issues, our NoRA adopts a dual-layer nested structure with Singular Value Decomposition (SVD), effectively leveraging original matrix knowledge while reducing tunable parameters. Specifically, NoRA freezes the outer LoRA weights and utilizes an inner LoRA design, providing enhanced control over model optimization. This approach allows the model to more precisely adapt to specific tasks while maintaining a compact parameter space. By freezing outer LoRA weights and using an inner LoRA design, NoRA enables precise task adaptation with a compact parameter space. Evaluations on tasks including commonsense reasoning with large language models, fine-tuning vision-language models, and subject-driven generation demonstrate NoRA's superiority over LoRA and its variants. Notably, NoRA reduces fine-tuning parameters|training-time|memory-usage by 4\%|22.5\%|20.7\% compared to LoRA on LLaMA-3 8B, while achieving 2.2\% higher performance. Code will be released upon acceptance.
Authors: Shuzhang Zhong, Ling Liang, Yuan Wang, Runsheng Wang, Ru Huang, Meng Li
Abstract: Mixture-of-Experts (MoE) models are designed to enhance the efficiency of large language models (LLMs) without proportionally increasing the computational demands. However, their deployment on edge devices still faces significant challenges due to high on-demand loading overheads from managing sparsely activated experts. This paper introduces AdapMoE, an algorithm-system co-design framework for efficient MoE inference. AdapMoE features adaptive expert gating and management to reduce the on-demand loading overheads. We observe the heterogeneity of experts loading across layers and tokens, based on which we propose a sensitivity-based strategy to adjust the number of activated experts dynamically. Meanwhile, we also integrate advanced prefetching and cache management techniques to further reduce the loading latency. Through comprehensive evaluations on various platforms, we demonstrate AdapMoE consistently outperforms existing techniques, reducing the average number of activated experts by 25% and achieving a 1.35x speedup without accuracy degradation. Code is available at: https://github.com/PKU-SEC-Lab/AdapMoE.
Authors: Yifei Yang, Runhan Shi, Zuchao Li, Shu Jiang, Bao-Liang Lu, Yang Yang, Hai Zhao
Abstract: Retrosynthesis analysis is pivotal yet challenging in drug discovery and organic chemistry. Despite the proliferation of computational tools over the past decade, AI-based systems often fall short in generalizing across diverse reaction types and exploring alternative synthetic pathways. This paper presents BatGPT-Chem, a large language model with 15 billion parameters, tailored for enhanced retrosynthesis prediction. Integrating chemical tasks via a unified framework of natural language and SMILES notation, this approach synthesizes extensive instructional data from an expansive chemical database. Employing both autoregressive and bidirectional training techniques across over one hundred million instances, BatGPT-Chem captures a broad spectrum of chemical knowledge, enabling precise prediction of reaction conditions and exhibiting strong zero-shot capabilities. Superior to existing AI methods, our model demonstrates significant advancements in generating effective strategies for complex molecules, as validated by stringent benchmark tests. BatGPT-Chem not only boosts the efficiency and creativity of retrosynthetic analysis but also establishes a new standard for computational tools in synthetic design. This development empowers chemists to adeptly address the synthesis of novel compounds, potentially expediting the innovation cycle in drug manufacturing and materials science. We release our trial platform at \url{https://www.batgpt.net/dapp/chem}.
Authors: Xiao Han, Zijian Zhang, Xiangyu Zhao, Guojiang Shen, Xiangjie Kong, Xuetao Wei, Liqiang Nie, Jieping Ye
Abstract: As urban residents demand higher travel quality, vehicle dispatch has become a critical component of online ride-hailing services. However, current vehicle dispatch systems struggle to navigate the complexities of urban traffic dynamics, including unpredictable traffic conditions, diverse driver behaviors, and fluctuating supply and demand patterns. These challenges have resulted in travel difficulties for passengers in certain areas, while many drivers in other areas are unable to secure orders, leading to a decline in the overall quality of urban transportation services. To address these issues, this paper introduces GARLIC: a framework of GPT-Augmented Reinforcement Learning with Intelligent Control for vehicle dispatching. GARLIC utilizes multiview graphs to capture hierarchical traffic states, and learns a dynamic reward function that accounts for individual driving behaviors. The framework further integrates a GPT model trained with a custom loss function to enable high-precision predictions and optimize dispatching policies in real-world scenarios. Experiments conducted on two real-world datasets demonstrate that GARLIC effectively aligns with driver behaviors while reducing the empty load rate of vehicles.
Authors: Georges Tod, Jean Bruggeman, Evert Bevernage, Pieter Moelans, Walter Eeckhout, Jean-Luc Glineur
Abstract: Train operational incidents are so far diagnosed individually and manually by train maintenance technicians. In order to assist maintenance crews in their responsiveness and task prioritization, a learning machine is developed and deployed in production to suggest diagnostics to train technicians on their phones, tablets or laptops as soon as a train incident is declared. A feedback loop allows to take into account the actual diagnose by designated train maintenance experts to refine the learning machine. By formulating the problem as a discrete set classification task, feature engineering methods are proposed to extract physically plausible sets of events from traces generated on-board railway vehicles. The latter feed an original ensemble classifier to class incidents by their potential technical cause. Finally, the resulting model is trained and validated using real operational data and deployed on a cloud platform. Future work will explore how the extracted sets of events can be used to avoid incidents by assisting human experts in the creation predictive maintenance alerts.
Authors: Sauro Succi
Abstract: It is argued that the pursuit of an ever increasing number of weights in large-scale machine learning applications, besides being energetically unsustainable, is also conducive to manipulative strategies whereby Science is easily served as a strawman for economic and financial power. If machine learning is meant to serve science ahead of vested business interests, a paradigm shift is needed: from more weights and little insight to more insight and less weights.
Authors: Xuechu Yu
Abstract: Contrastive representation learning, which aims to learnthe shared information between different views of unlabeled data by maximizing the mutual information between them, has shown its powerful competence in self-supervised learning for downstream tasks. However, recent works have demonstrated that more estimated mutual information does not guarantee better performance in different downstream tasks. Such works inspire us to conjecture that the learned representations not only maintain task-relevant information from unlabeled data but also carry task-irrelevant information which is superfluous for downstream tasks, thus leading to performance degeneration. In this paper we show that superfluous information does exist during the conventional contrastive learning framework, and further design a new objective, namely SuperInfo, to learn robust representations by a linear combination of both predictive and superfluous information. Besides, we notice that it is feasible to tune the coefficients of introduced losses to discard task-irrelevant information, while keeping partial non-shared task-relevant information according to our SuperInfo loss.We demonstrate that learning with our loss can often outperform the traditional contrastive learning approaches on image classification, object detection and instance segmentation tasks with significant improvements.
Authors: Shyam K Sateesh, Sparsh BK, Uma D
Abstract: Emotion recognition from electroencephalogram (EEG) signals is a thriving field, particularly in neuroscience and Human-Computer Interaction (HCI). This study aims to understand and improve the predictive accuracy of emotional state classification through metrics such as valence, arousal, dominance, and likeness by applying a Long Short-Term Memory (LSTM) network to analyze EEG signals. Using a popular dataset of multi-channel EEG recordings known as DEAP, we look towards leveraging LSTM networks' properties to handle temporal dependencies within EEG signal data. This allows for a more comprehensive understanding and classification of emotional parameter states. We obtain accuracies of 89.89%, 90.33%, 90.70%, and 90.54% for arousal, valence, dominance, and likeness, respectively, demonstrating significant improvements in emotion recognition model capabilities. This paper elucidates the methodology and architectural specifics of our LSTM model and provides a benchmark analysis with existing papers.
Authors: Di Fang, Yinan Zhu, Runze Fang, Cen Chen, Ziqian Zeng, Huiping Zhuang
Abstract: Continual learning enables AI models to learn new data sequentially without retraining in real-world scenarios. Most existing methods assume the training data are balanced, aiming to reduce the catastrophic forgetting problem that models tend to forget previously generated data. However, data imbalance and the mixture of new and old data in real-world scenarios lead the model to ignore categories with fewer training samples. To solve this problem, we propose an analytic imbalance rectifier algorithm (AIR), a novel online exemplar-free continual learning method with an analytic (i.e., closed-form) solution for data-imbalanced class-incremental learning (CIL) and generalized CIL scenarios in real-world continual learning. AIR introduces an analytic re-weighting module (ARM) that calculates a re-weighting factor for each class for the loss function to balance the contribution of each category to the overall loss and solve the problem of imbalanced training data. AIR uses the least squares technique to give a non-discriminatory optimal classifier and its iterative update method in continual learning. Experimental results on multiple datasets show that AIR significantly outperforms existing methods in long-tailed and generalized CIL scenarios. The source code is available at https://github.com/fang-d/AIR.
Authors: Ignavier Ng, Yujia Zheng, Xinshuai Dong, Kun Zhang
Abstract: Independent component analysis (ICA) is a fundamental statistical tool used to reveal hidden generative processes from observed data. However, traditional ICA approaches struggle with the rotational invariance inherent in Gaussian distributions, often necessitating the assumption of non-Gaussianity in the underlying sources. This may limit their applicability in broader contexts. To accommodate Gaussian sources, we develop an identifiability theory that relies on second-order statistics without imposing further preconditions on the distribution of sources, by introducing novel assumptions on the connective structure from sources to observed variables. Different from recent work that focuses on potentially restrictive connective structures, our proposed assumption of structural variability is both considerably less restrictive and provably necessary. Furthermore, we propose two estimation methods based on second-order statistics and sparsity constraint. Experimental results are provided to validate our identifiability theory and estimation methods.
Authors: Yuntao Wu, Jiayuan Guo, Goutham Gopalakrishna, Zisis Poulos
Abstract: In this paper, we present Deep-MacroFin, a comprehensive framework designed to solve partial differential equations, with a particular focus on models in continuous time economics. This framework leverages deep learning methodologies, including conventional Multi-Layer Perceptrons and the newly developed Kolmogorov-Arnold Networks. It is optimized using economic information encapsulated by Hamilton-Jacobi-Bellman equations and coupled algebraic equations. The application of neural networks holds the promise of accurately resolving high-dimensional problems with fewer computational demands and limitations compared to standard numerical methods. This versatile framework can be readily adapted for elementary differential equations, and systems of differential equations, even in cases where the solutions may exhibit discontinuities. Importantly, it offers a more straightforward and user-friendly implementation than existing libraries.
Authors: Adam D. Cobb, At{\i}l{\i}m G\"une\c{s} Baydin, Barak A. Pearlmutter, Susmit Jha
Abstract: This paper introduces a second-order hyperplane search, a novel optimization step that generalizes a second-order line search from a line to a $k$-dimensional hyperplane. This, combined with the forward-mode stochastic gradient method, yields a second-order optimization algorithm that consists of forward passes only, completely avoiding the storage overhead of backpropagation. Unlike recent work that relies on directional derivatives (or Jacobian--Vector Products, JVPs), we use hyper-dual numbers to jointly evaluate both directional derivatives and their second-order quadratic terms. As a result, we introduce forward-mode weight perturbation with Hessian information (FoMoH). We then use FoMoH to develop a novel generalization of line search by extending it to a hyperplane search. We illustrate the utility of this extension and how it might be used to overcome some of the recent challenges of optimizing machine learning models without backpropagation. Our code is open-sourced at https://github.com/SRI-CSL/fomoh.
Authors: Moshe Eliasof, Md Shahriar Rahim Siddiqui, Carola-Bibiane Sch\"onlieb, Eldad Haber
Abstract: In recent years, Graph Neural Networks (GNNs) have been utilized for various applications ranging from drug discovery to network design and social networks. In many applications, it is impossible to observe some properties of the graph directly; instead, noisy and indirect measurements of these properties are available. These scenarios are coined as Graph Inverse Problems (GRIP). In this work, we introduce a framework leveraging GNNs to solve GRIPs. The framework is based on a combination of likelihood and prior terms, which are used to find a solution that fits the data while adhering to learned prior information. Specifically, we propose to combine recent deep learning techniques that were developed for inverse problems, together with GNN architectures, to formulate and solve GRIP. We study our approach on a number of representative problems that demonstrate the effectiveness of the framework.
Authors: Max Vargas, Reilly Cannon, Andrew Engel, Anand D. Sarwate, Tony Chiang
Abstract: The construction of high-quality numerical features is critical to any quantitative data analysis. Feature engineering has been historically addressed by carefully hand-crafting data representations based on domain expertise. This work views the internal representations of modern deep neural networks (DNNs), called embeddings, as an automated form of traditional feature engineering. For trained DNNs, we show that these embeddings can reveal interpretable, high-level concepts in unstructured sample data. We use these embeddings in natural language and computer vision tasks to uncover both inherent heterogeneity in the underlying data and human-understandable explanations for it. In particular, we find empirical evidence that there is inherent separability between real data and that generated from AI models.
Authors: Yonghui Xiao, Yuxin Ding, Changwan Ryu, Petr Zadrazil, Francoise Beaufays
Abstract: Federated learning (FL) has shown promising results on training machine learning models with privacy preservation. However, for large models with over 100 million parameters, the training resource requirement becomes an obstacle for FL because common devices do not have enough memory and computation power to finish the FL tasks. Although efficient training methods have been proposed, it is still a challenge to train the large models like Conformer based ASR. This paper presents a systematic solution to train the full-size ASR models of 130M parameters with FL. To our knowledge, this is the first real-world FL application of the Conformer model, which is also the largest model ever trained with FL so far. And this is the first paper showing FL can improve the ASR model quality with a set of proposed methods to refine the quality of data and labels of clients. We demonstrate both the training efficiency and the model quality improvement in real-world experiments.
Authors: Jeremiah Birrell, Reza Ebrahimi, Rouzbeh Behnia, Jason Pacheco
Abstract: Differentially private stochastic gradient descent (DP-SGD) has been instrumental in privately training deep learning models by providing a framework to control and track the privacy loss incurred during training. At the core of this computation lies a subsampling method that uses a privacy amplification lemma to enhance the privacy guarantees provided by the additive noise. Fixed size subsampling is appealing for its constant memory usage, unlike the variable sized minibatches in Poisson subsampling. It is also of interest in addressing class imbalance and federated learning. However, the current computable guarantees for fixed-size subsampling are not tight and do not consider both add/remove and replace-one adjacency relationships. We present a new and holistic R{\'e}nyi differential privacy (RDP) accountant for DP-SGD with fixed-size subsampling without replacement (FSwoR) and with replacement (FSwR). For FSwoR we consider both add/remove and replace-one adjacency. Our FSwoR results improves on the best current computable bound by a factor of $4$. We also show for the first time that the widely-used Poisson subsampling and FSwoR with replace-one adjacency have the same privacy to leading order in the sampling probability. Accordingly, our work suggests that FSwoR is often preferable to Poisson subsampling due to constant memory usage. Our FSwR accountant includes explicit non-asymptotic upper and lower bounds and, to the authors' knowledge, is the first such analysis of fixed-size RDP with replacement for DP-SGD. We analytically and empirically compare fixed size and Poisson subsampling, and show that DP-SGD gradients in a fixed-size subsampling regime exhibit lower variance in practice in addition to memory usage benefits.
Authors: Md Fahim Anjum
Abstract: In this work, we introduce LightCNN, a minimalist Convolutional Neural Network (CNN) architecture designed for Parkinson's disease (PD) classification using EEG data. LightCNN's strength lies in its simplicity, utilizing just a single convolutional layer. Embracing Leonardo da Vinci's principle that "simplicity is the ultimate sophistication," LightCNN demonstrates that complexity is not required to achieve outstanding results. We benchmarked LightCNN against several state-of-the-art deep learning models known for their effectiveness in EEG-based PD classification. Remarkably, LightCNN outperformed all these complex architectures, with a 2.3% improvement in recall, a 4.6% increase in precision, a 0.1% edge in AUC, a 4% boost in F1-score, and a 3.3% higher accuracy compared to the closest competitor. Furthermore, LightCNN identifies known pathological brain rhythms associated with PD and effectively captures clinically relevant neurophysiological changes in EEG. Its simplicity and interpretability make it ideal for deployment in resource-constrained environments, such as mobile or embedded systems for EEG analysis. In conclusion, LightCNN represents a significant step forward in efficient EEG-based PD classification, demonstrating that a well-designed, lightweight model can achieve superior performance over more complex architectures. This work underscores the potential for minimalist models to meet the needs of modern healthcare applications, particularly where resources are limited.
Authors: Haoyang Jiang, Yongzhi Qu
Abstract: The challenge of applying learned knowledge from one domain to solve problems in another related but distinct domain, known as transfer learning, is fundamental in operator learning models that solve Partial Differential Equations (PDEs). These current models often struggle with generalization across different tasks and datasets, limiting their applicability in diverse scientific and engineering disciplines. This work presents a novel framework that enhances the transfer learning capabilities of operator learning models for solving Partial Differential Equations (PDEs) through the integration of fusion frame theory with the Proper Orthogonal Decomposition (POD)-enhanced Deep Operator Network (DeepONet). We introduce an innovative architecture that combines fusion frames with POD-DeepONet, demonstrating superior performance across various PDEs in our experimental analysis. Our framework addresses the critical challenge of transfer learning in operator learning models, paving the way for adaptable and efficient solutions across a wide range of scientific and engineering applications.
Authors: Shiyu Yuan, Carlo Lipizzi, Tian Han
Abstract: Multimodal generative models are crucial for various applications. We propose an approach that combines an expressive energy-based model (EBM) prior with Markov Chain Monte Carlo (MCMC) inference in the latent space for multimodal generation. The EBM prior acts as an informative guide, while MCMC inference, specifically through short-run Langevin dynamics, brings the posterior distribution closer to its true form. This method not only provides an expressive prior to better capture the complexity of multimodality but also improves the learning of shared latent variables for more coherent generation across modalities. Our proposed method is supported by empirical experiments, underscoring the effectiveness of our EBM prior with MCMC inference in enhancing cross-modal and joint generative tasks in multimodal contexts.
Authors: Jinxin Liu, Zao Yang
Abstract: The responses generated by Large Language Models (LLMs) can include sensitive information from individuals and organizations, leading to potential privacy leakage. This work implements Influence Functions (IFs) to trace privacy leakage back to the training data, thereby mitigating privacy concerns of Language Models (LMs). However, we notice that current IFs struggle to accurately estimate the influence of tokens with large gradient norms, potentially overestimating their influence. When tracing the most influential samples, this leads to frequently tracing back to samples with large gradient norm tokens, overshadowing the actual most influential samples even if their influences are well estimated. To address this issue, we propose Heuristically Adjusted IF (HAIF), which reduces the weight of tokens with large gradient norms, thereby significantly improving the accuracy of tracing the most influential samples. To establish easily obtained groundtruth for tracing privacy leakage, we construct two datasets, PII-E and PII-CR, representing two distinct scenarios: one with identical text in the model outputs and pre-training data, and the other where models leverage their reasoning abilities to generate text divergent from pre-training data. HAIF significantly improves tracing accuracy, enhancing it by 20.96\% to 73.71\% on the PII-E dataset and 3.21\% to 45.93\% on the PII-CR dataset, compared to the best SOTA IFs against various GPT-2 and QWen-1.5 models. HAIF also outperforms SOTA IFs on real-world pretraining data CLUECorpus2020, demonstrating strong robustness regardless prompt and response lengths.
Authors: Xinlang Yue, Yiran Liu, Fangzhou Shi, Sihong Luo, Chen Zhong, Min Lu, Zhe Xu
Abstract: Assigning orders to drivers under localized spatiotemporal context (micro-view order-dispatching) is a major task in Didi, as it influences ride-hailing service experience. Existing industrial solutions mainly follow a two-stage pattern that incorporate heuristic or learning-based algorithms with naive combinatorial methods, tackling the uncertainty of both sides' behaviors, including emerging timings, spatial relationships, and travel duration, etc. In this paper, we propose a one-stage end-to-end reinforcement learning based order-dispatching approach that solves behavior prediction and combinatorial optimization uniformly in a sequential decision-making manner. Specifically, we employ a two-layer Markov Decision Process framework to model this problem, and present \underline{D}eep \underline{D}ouble \underline{S}calable \underline{N}etwork (D2SN), an encoder-decoder structure network to generate order-driver assignments directly and stop assignments accordingly. Besides, by leveraging contextual dynamics, our approach can adapt to the behavioral patterns for better performance. Extensive experiments on Didi's real-world benchmarks justify that the proposed approach significantly outperforms competitive baselines in optimizing matching efficiency and user experience tasks. In addition, we evaluate the deployment outline and discuss the gains and experiences obtained during the deployment tests from the view of large-scale engineering implementation.
Authors: Yongbo Yu, Weizhong Yu, Feiping Nie, Xuelong Li
Abstract: The self-attention mechanism in Transformer architecture, invariant to sequence order, necessitates positional embeddings to encode temporal order in time series prediction. We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences, particularly when employing longer lookback windows. To address this, we introduce an innovative approach that combines Pyramid RNN embeddings(PRE) for univariate time series with the Transformer's capability to model multivariate dependencies. PRE, utilizing pyramidal one-dimensional convolutional layers, constructs multiscale convolutional features that preserve temporal order. Additionally, RNNs, layered atop these features, learn multiscale time series representations sensitive to sequence order. This integration into Transformer models with attention mechanisms results in significant performance enhancements. We present the PRformer, a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets. This performance highlights the effectiveness of our approach in leveraging longer lookback windows and underscores the critical role of robust temporal representations in maximizing Transformer's potential for prediction tasks. Code is available at this repository: \url{https://github.com/usualheart/PRformer}.
Authors: Samuel Chevalier (Dj), Duncan Starkenburg (Dj), Krishnamurthy (Dj), Dvijotham
Abstract: In the field of formal verification, Neural Networks (NNs) are typically reformulated into equivalent mathematical programs which are optimized over. To overcome the inherent non-convexity of these reformulations, convex relaxations of nonlinear activation functions are typically utilized. Common relaxations (i.e., static linear cuts) of ``S-shaped" activation functions, however, can be overly loose, slowing down the overall verification process. In this paper, we derive tuneable hyperplanes which upper and lower bound the sigmoid activation function. When tuned in the dual space, these affine bounds smoothly rotate around the nonlinear manifold of the sigmoid activation function. This approach, termed $\alpha$-sig, allows us to tractably incorporate the tightest possible, element-wise convex relaxation of the sigmoid activation function into a formal verification framework. We embed these relaxations inside of large verification tasks and compare their performance to LiRPA and $\alpha$-CROWN, a state-of-the-art verification duo.
Authors: Zhichang Xu, Zhiguo Long, Hua Meng
Abstract: Spectral clustering requires the time-consuming decomposition of the Laplacian matrix of the similarity graph, thus limiting its applicability to large datasets. To improve the efficiency of spectral clustering, a top-down approach was recently proposed, which first divides the data into several micro-clusters (granular-balls), then splits these micro-clusters when they are not "compact'', and finally uses these micro-clusters as nodes to construct a similarity graph for more efficient spectral clustering. However, this top-down approach is challenging to adapt to unevenly distributed or structurally complex data. This is because constructing micro-clusters as a rough ball struggles to capture the shape and structure of data in a local range, and the simplistic splitting rule that solely targets ``compactness'' is susceptible to noise and variations in data density and leads to micro-clusters with varying shapes, making it challenging to accurately measure the similarity between them. To resolve these issues, this paper first proposes to start from local structures to obtain micro-clusters, such that the complex structural information inside local neighborhoods is well captured by them. Moreover, by noting that Euclidean distance is more suitable for convex sets, this paper further proposes a data splitting rule that couples local density and data manifold structures, so that the similarities of the obtained micro-clusters can be easily characterized. A novel similarity measure between micro-clusters is then proposed for the final spectral clustering. A series of experiments based on synthetic and real-world datasets demonstrate that the proposed method has better adaptability to structurally complex data than granular-ball based methods.
Authors: Huifa Li, Jie Fu, Xinpeng Ling, Zhiyu Sun, Kuncan Wang, Zhili Chen
Abstract: The swift advancement of single-cell RNA sequencing (scRNA-seq) technologies enables the investigation of cellular-level tissue heterogeneity. Cell annotation significantly contributes to the extensive downstream analysis of scRNA-seq data. However, The analysis of scRNA-seq for biological inference presents challenges owing to its intricate and indeterminate data distribution, characterized by a substantial volume and a high frequency of dropout events. Furthermore, the quality of training samples varies greatly, and the performance of the popular scRNA-seq data clustering solution GNN could be harmed by two types of low-quality training nodes: 1) nodes on the boundary; 2) nodes that contribute little additional information to the graph. To address these problems, we propose a single-cell curriculum learning-based deep graph embedding clustering (scCLG). We first propose a Chebyshev graph convolutional autoencoder with multi-decoder (ChebAE) that combines three optimization objectives corresponding to three decoders, including topology reconstruction loss of cell graphs, zero-inflated negative binomial (ZINB) loss, and clustering loss, to learn cell-cell topology representation. Meanwhile, we employ a selective training strategy to train GNN based on the features and entropy of nodes and prune the difficult nodes based on the difficulty scores to keep the high-quality graph. Empirical results on a variety of gene expression datasets show that our model outperforms state-of-the-art methods.
Authors: Wall Kim
Abstract: Return-Conditioned Transformer Decision Models (RCTDM) have demonstrated the potential to enhance transformer performance in offline reinforcement learning by replacing rewards in the input sequence with returns-to-go. However, to achieve the goal of learning an optimal policy from offline datasets composed of limited suboptimal trajectories, RCTDM required alternative methods. One prominent approach, trajectory stitching, was designed to enable the network to combine multiple trajectories to find the optimal path. To implement this using only transformers without auxiliary networks, it was necessary to shorten the input sequence length to better capture the Markov property in reinforcement learnings. This, however, introduced a trade-off, as it reduced the accuracy of action inference. Our study introduces a model named Decision MetaMamba to resolve these challenges. DMM employs an input token mixer to extract patterns from short sequences and uses a State Space Model (SSM) to selectively combine information from relatively distant sequences. Inspired by Metaformer, this structure was developed by transforming Mamba's input layer into various multi-modal layers. Fortunately, with the advent of Mamba, implemented using parallel selective scanning, we achieved a high-performance sequence model capable of replacing transformers. Based on these innovations, DMM demonstrated excellent performance across various datasets in offline RL, confirming that models using SSM can improve performance by domain-specific alterations of the input layer. Additionally, it maintained its performance even in lightweight models with fewer parameters. These results suggest that decision models based on SSM can pave the way for improved outcomes in future developments.
Authors: Shengxiang Hu, Guobing Zou, Song Yang, Shiyi Lin, Bofeng Zhang, Yixin Chen
Abstract: In service-oriented architecture, accurately predicting the Quality of Service (QoS) is vital for maintaining reliability and enhancing user satisfaction. However, current methods often neglect high-order latent collaborative relationships and fail to dynamically adjust feature learning for specific user-service invocations, which are critical for precise feature extraction. Moreover, relying on RNNs to capture QoS evolution limits the ability to detect long-term trends due to challenges in managing long-range dependencies. To address these issues, we propose the Target-Prompt Online Graph Collaborative Learning (TOGCL) framework for temporal QoS prediction. It leverages a dynamic user-service invocation graph to comprehensively model historical interactions. Building on this graph, it develops a target-prompt graph attention network to extract online deep latent features of users and services at each time slice, considering implicit target-neighboring collaborative relationships and historical QoS values. Additionally, a multi-layer Transformer encoder is employed to uncover temporal feature evolution patterns, enhancing temporal QoS prediction. Extensive experiments on the WS-DREAM dataset demonstrate that TOGCL significantly outperforms state-of-the-art methods across multiple metrics, achieving improvements of up to 38.80\%. These results underscore the effectiveness of TOGCL for temporal QoS prediction.
Authors: Yuqing Zhao, Divya Saxena, Jiannong Cao, Xiaoyun Liu, Changlin Song
Abstract: In continual learning (CL), model growth enhances adaptability over new data, improving knowledge retention for more tasks. However, improper model growth can lead to severe degradation of previously learned knowledge, an issue we name as growth-induced forgetting (GIFt), especially in task-agnostic CL using entire grown model for inference. Existing works, despite adopting model growth and random initialization for better adaptability, often fail to recognize the presence of GIFt caused by improper model growth. This oversight limits comprehensive control of forgetting and hinders full utilization of model growth. We are the first in CL to identify this issue and conduct an in-depth study on root cause of GIFt, where layer expansion stands out among model growth strategies, widening layers without affecting model functionality. Yet, direct adoption of layer expansion presents challenges. It lacks data-driven control and initialization of expanded parameters to balance adaptability and knowledge retention. This paper presents a novel SparseGrow approach to overcome the issue of GIFt while enhancing adaptability over new data. SparseGrow employs data-driven sparse layer expansion to control efficient parameter usage during growth, reducing GIFt from excessive growth and functionality changes. It also combines sparse growth with on-data initialization at training late-stage to create partially 0-valued expansions that fit learned distribution, enhancing retention and adaptability. To further minimize forgetting, freezing is applied by calculating the sparse mask, allowing data-driven preservation of important parameters. Through experiments across datasets with various settings, cases and task numbers, we demonstrate the necessity of layer expansion and showcase the effectiveness of SparseGrow in overcoming GIFt, highlighting its adaptability and knowledge retention for incremental tasks.
Authors: Yan Wu, Esther Wershof, Sebastian M Schmon, Marcel Nassar, B{\l}a\.zej Osi\'nski, Ridvan Eksi, Kun Zhang, Thore Graepel
Abstract: We present a comprehensive framework for predicting the effects of perturbations in single cells, designed to standardize benchmarking in this rapidly evolving field. Our framework, PerturBench, includes a user-friendly platform, diverse datasets, metrics for fair model comparison, and detailed performance analysis. Extensive evaluations of published and baseline models reveal limitations like mode or posterior collapse, and underscore the importance of rank metrics that assess the ordering of perturbations alongside traditional measures like RMSE. Our findings show that simple models can outperform more complex approaches. This benchmarking exercise sets new standards for model evaluation, supports robust model development, and advances the potential of these models to use high-throughput and high-content genetic and chemical screens for disease target discovery.
Authors: Anand Ganesh, Babhrubahan Bose, Anand Rajagopalan
Abstract: We identify certain gaps in the literature on the approximability of stationary random variables using the Autoregressive Moving Average (ARMA) model. To quantify approximability, we propose that an ARMA model be viewed as an approximation of a stationary random variable. We map these stationary random variables to Hardy space functions, and formulate a new function approximation problem that corresponds to random variable approximation, and thus to ARMA. Based on this Hardy space formulation we identify a class of stationary processes where approximation guarantees are feasible. We also identify an idealized stationary random process for which we conjecture that a good ARMA approximation is not possible. Next, we provide a constructive proof that Pad\'e approximations do not always correspond to the best ARMA approximation. Finally, we note that the spectral methods adopted in this paper can be seen as a generalization of unit root methods for stationary processes even when an ARMA model is not defined.
Authors: Udo Schlegel, Daniel A. Keim, Tobias Sutter
Abstract: Understanding how models process and interpret time series data remains a significant challenge in deep learning to enable applicability in safety-critical areas such as healthcare. In this paper, we introduce Sequence Dreaming, a technique that adapts Activation Maximization to analyze sequential information, aiming to enhance the interpretability of neural networks operating on univariate time series. By leveraging this method, we visualize the temporal dynamics and patterns most influential in model decision-making processes. To counteract the generation of unrealistic or excessively noisy sequences, we enhance Sequence Dreaming with a range of regularization techniques, including exponential smoothing. This approach ensures the production of sequences that more accurately reflect the critical features identified by the neural network. Our approach is tested on a time series classification dataset encompassing applications in predictive maintenance. The results show that our proposed Sequence Dreaming approach demonstrates targeted activation maximization for different use cases so that either centered class or border activation maximization can be generated. The results underscore the versatility of Sequence Dreaming in uncovering salient temporal features learned by neural networks, thereby advancing model transparency and trustworthiness in decision-critical domains.
Authors: Yupeng Su, Ziyi Guan, Xiaoqun Liu, Tianlai Jin, Dongkuan Wu, Graziano Chesi, Ngai Wong, Hao Yu
Abstract: Large language models (LLMs) have grown significantly in scale, leading to a critical need for efficient model pruning techniques. Existing post-training pruning techniques primarily focus on measuring weight importance on converged dense models to determine salient weights to retain. However, they often overlook the changes in weight importance during the pruning process, which can lead to performance degradation in the pruned models. To address this issue, we present LLM-Barber (Block-Aware Rebuilder for Sparsity Mask in One-Shot), a novel one-shot pruning framework that rebuilds the sparsity mask of pruned models without any retraining or weight reconstruction. LLM-Barber incorporates block-aware error optimization across Self-Attention and MLP blocks, ensuring global performance optimization. Inspired by the recent discovery of prominent outliers in LLMs, LLM-Barber introduces an innovative pruning metric that identifies weight importance using weights multiplied by gradients. Our experiments show that LLM-Barber can efficiently prune models like LLaMA and OPT families with 7B to 13B parameters on a single A100 GPU in just 30 minutes, achieving state-of-the-art results in both perplexity and zero-shot performance across various language benchmarks. Code is available at https://github.com/YupengSu/LLM-Barber.
Authors: Udo Schlegel, Julius Rauscher, Daniel A. Keim
Abstract: We propose an interactive methodology for generating counterfactual explanations for univariate time series data in classification tasks by leveraging 2D projections and decision boundary maps to tackle interpretability challenges. Our approach aims to enhance the transparency and understanding of deep learning models' decision processes. The application simplifies the time series data analysis by enabling users to interactively manipulate projected data points, providing intuitive insights through inverse projection techniques. By abstracting user interactions with the projected data points rather than the raw time series data, our method facilitates an intuitive generation of counterfactual explanations. This approach allows for a more straightforward exploration of univariate time series data, enabling users to manipulate data points to comprehend potential outcomes of hypothetical scenarios. We validate this method using the ECG5000 benchmark dataset, demonstrating significant improvements in interpretability and user understanding of time series classification. The results indicate a promising direction for enhancing explainable AI, with potential applications in various domains requiring transparent and interpretable deep learning models. Future work will explore the scalability of this method to multivariate time series data and its integration with other interpretability techniques.
Authors: Qiao Li, Cong Wu, Jing Chen, Zijun Zhang, Kun He, Ruiying Du, Xinxin Wang, Qingchuang Zhao, Yang Liu
Abstract: Deep neural networks (DNNs) are increasingly used in critical applications such as identity authentication and autonomous driving, where robustness against adversarial attacks is crucial. These attacks can exploit minor perturbations to cause significant prediction errors, making it essential to enhance the resilience of DNNs. Traditional defense methods often rely on access to detailed model information, which raises privacy concerns, as model owners may be reluctant to share such data. In contrast, existing black-box defense methods fail to offer a universal defense against various types of adversarial attacks. To address these challenges, we introduce DUCD, a universal black-box defense method that does not require access to the target model's parameters or architecture. Our approach involves distilling the target model by querying it with data, creating a white-box surrogate while preserving data privacy. We further enhance this surrogate model using a certified defense based on randomized smoothing and optimized noise selection, enabling robust defense against a broad range of adversarial attacks. Comparative evaluations between the certified defenses of the surrogate and target models demonstrate the effectiveness of our approach. Experiments on multiple image classification datasets show that DUCD not only outperforms existing black-box defenses but also matches the accuracy of white-box defenses, all while enhancing data privacy and reducing the success rate of membership inference attacks.
Authors: Co\c{s}ku Can Horuz, Matthias Karlbauer, Timothy Praditia, Sergey Oladyshkin, Wolfgang Nowak, Sebastian Otte
Abstract: Spatiotemporal partial differential equations (PDEs) find extensive application across various scientific and engineering fields. While numerous models have emerged from both physics and machine learning (ML) communities, there is a growing trend towards integrating these approaches to develop hybrid architectures known as physics-aware machine learning models. Among these, the finite volume neural network (FINN) has emerged as a recent addition. FINN has proven to be particularly efficient in uncovering latent structures in data. In this study, we explore the capabilities of FINN in tackling the shallow-water equations, which simulates wave dynamics in coastal regions. Specifically, we investigate FINN's efficacy to reconstruct underwater topography based on these particular wave equations. Our findings reveal that FINN exhibits a remarkable capacity to infer topography solely from wave dynamics, distinguishing itself from both conventional ML and physics-aware ML models. Our results underscore the potential of FINN in advancing our understanding of spatiotemporal phenomena and enhancing parametrization capabilities in related domains.
Authors: Mirko Nardi, Lorenzo Valerio, Andrea Passarella
Abstract: Federated Learning (FL) is a pivotal approach in decentralized machine learning, especially when data privacy is crucial and direct data sharing is impractical. While FL is typically associated with supervised learning, its potential in unsupervised scenarios is underexplored. This paper introduces a novel unsupervised federated learning methodology designed to identify the complete set of categories (global K) across multiple clients within label-free, non-uniform data distributions, a process known as Federated Clustering. Our approach, Federated Cluster-Wise Refinement (FedCRef), involves clients that collaboratively train models on clusters with similar data distributions. Initially, clients with diverse local data distributions (local K) train models on their clusters to generate compressed data representations. These local models are then shared across the network, enabling clients to compare them through reconstruction error analysis, leading to the formation of federated groups.In these groups, clients collaboratively train a shared model representing each data distribution, while continuously refining their local clusters to enhance data association accuracy. This iterative process allows our system to identify all potential data distributions across the network and develop robust representation models for each. To validate our approach, we compare it with traditional centralized methods, establishing a performance baseline and showcasing the advantages of our distributed solution. We also conduct experiments on the EMNIST and KMNIST datasets, demonstrating FedCRef's ability to refine and align cluster models with actual data distributions, significantly improving data representation precision in unsupervised federated settings.
Authors: Kenji Harada, Tsuyoshi Okubo, Naoki Kawashima
Abstract: Based on the tensor tree network with the Born machine framework, we propose a general method for constructing a generative model by expressing the target distribution function as the quantum wave function amplitude represented by a tensor tree. The key idea is dynamically optimizing the tree structure that minimizes the bond mutual information. The proposed method offers enhanced performance and uncovers hidden relational structures in the target data. We illustrate potential practical applications with four examples: (i) random patterns, (ii) QMNIST hand-written digits, (iii) Bayesian networks, and (iv) the stock price fluctuation pattern in S&P500. In (i) and (ii), strongly correlated variables were concentrated near the center of the network; in (iii), the causality pattern was identified; and, in (iv), a structure corresponding to the eleven sectors emerged.
Authors: Zeyuan Ma, Jiacheng Chen, Hongshu Guo, Yue-Jiao Gong
Abstract: Recent research in Meta-Black-Box Optimization (MetaBBO) have shown that meta-trained neural networks can effectively guide the design of black-box optimizers, significantly reducing the need for expert tuning and delivering robust performance across complex problem distributions. Despite their success, a paradox remains: MetaBBO still rely on human-crafted Exploratory Landscape Analysis features to inform the meta-level agent about the low-level optimization progress. To address the gap, this paper proposes Neural Exploratory Landscape Analysis (NeurELA), a novel framework that dynamically profiles landscape features through a two-stage, attention-based neural network, executed in an entirely end-to-end fashion. NeurELA is pre-trained over a variety of MetaBBO algorithms using a multi-task neuroevolution strategy. Extensive experiments show that NeurELA achieves consistently superior performance when integrated into different and even unseen MetaBBO tasks and can be efficiently fine-tuned for further performance boost. This advancement marks a pivotal step in making MetaBBO algorithms more autonomous and broadly applicable.
Authors: Dong Geun Shin, Hye Won Chung
Abstract: Detecting out-of-distribution (OOD) samples is a critical task for reliable machine learning. However, it becomes particularly challenging when the models are trained on long-tailed datasets, as the models often struggle to distinguish tail-class in-distribution samples from OOD samples. We examine the main challenges in this problem by identifying the trade-offs between OOD detection and in-distribution (ID) classification, faced by existing methods. We then introduce our method, called \textit{Representation Norm Amplification} (RNA), which solves this challenge by decoupling the two problems. The main idea is to use the norm of the representation as a new dimension for OOD detection, and to develop a training method that generates a noticeable discrepancy in the representation norm between ID and OOD data, while not perturbing the feature learning for ID classification. Our experiments show that RNA achieves superior performance in both OOD detection and classification compared to the state-of-the-art methods, by 1.70\% and 9.46\% in FPR95 and 2.43\% and 6.87\% in classification accuracy on CIFAR10-LT and ImageNet-LT, respectively. The code for this work is available at https://github.com/dgshin21/RNA.
Authors: Lianghao Xia, Chao Huang
Abstract: The growing ubiquity of relational data structured as graphs has underscored the need for graph learning models with exceptional generalization capabilities. However, current approaches often struggle to effectively extract generalizable insights, frequently requiring extensive fine-tuning and limiting their versatility. Graph foundation models offer a transformative solution, with the potential to learn robust, generalizable representations from graph data. This enables more effective and adaptable applications across a wide spectrum of tasks and domains. In this work, we investigate a unified graph model, AnyGraph, designed to handle key challenges: i) Structure Heterogenity. Addressing distribution shift in graph structural information; ii) Feature Heterogenity. Handling diverse feature representation spaces across graph datasets; iii) Fast Adaptation. Efficiently adapting the model to new graph domains; iv) Scaling Law Emergence. Enabling the model to exhibit scaling law behavior, where its performance scales favorably with the amount of data and parameter sizes. To tackle these critical challenges, we build the AnyGraph upon a Graph Mixture-of-Experts (MoE) architecture. This approach empowers the model to effectively manage both the in-domain and cross-domain distribution shift concerning structure-level and feature-level heterogeneity. Furthermore, a lightweight graph expert routing mechanism is proposed to facilitate AnyGraph's fast adaptability to new data and domains. Our extensive experiments on diverse 38 graph datasets have demonstrated the strong zero-shot learning performance of AnyGraph across diverse graph domains with significant distribution shift. Furthermore, we have validated the model's fast adaptation ability and scaling law emergence, showcasing its versatility.
Authors: Yin Jun Phua, Katsumi Inoue
Abstract: Learning from interpretation transition (LFIT) is a framework for learning rules from observed state transitions. LFIT has been implemented in purely symbolic algorithms, but they are unable to deal with noise or generalize to unobserved transitions. Rule extraction based neural network methods suffer from overfitting, while more general implementation that categorize rules suffer from combinatorial explosion. In this paper, we introduce a technique to leverage variable permutation invariance inherent in symbolic domains. Our technique ensures that the permutation and the naming of the variables would not affect the results. We demonstrate the effectiveness and the scalability of this method with various experiments. Our code is publicly available at https://github.com/phuayj/delta-lfit-2
Authors: Padmanaba Srinivasan, William Knottenbelt
Abstract: Model-based reinforcement learning (MBRL) algorithms learn a dynamics model from collected data and apply it to generate synthetic trajectories to enable faster learning. This is an especially promising paradigm in offline reinforcement learning (RL) where data may be limited in quantity, in addition to being deficient in coverage and quality. Practical approaches to offline MBRL usually rely on ensembles of dynamics models to prevent exploitation of any individual model and to extract uncertainty estimates that penalize values in states far from the dataset support. Uncertainty estimates from ensembles can vary greatly in scale, making it challenging to generalize hyperparameters well across even similar tasks. In this paper, we present Morse Model-based offline RL (MoMo), which extends the anti-exploration paradigm found in offline model-free RL to the model-based space. We develop model-free and model-based variants of MoMo and show how the model-free version can be extended to detect and deal with out-of-distribution (OOD) states using explicit uncertainty estimation without the need for large ensembles. MoMo performs offline MBRL using an anti-exploration bonus to counteract value overestimation in combination with a policy constraint, as well as a truncation function to terminate synthetic rollouts that are excessively OOD. Experimentally, we find that both model-free and model-based MoMo perform well, and the latter outperforms prior model-based and model-free baselines on the majority of D4RL datasets tested.
Authors: Yifu Han, Francois P. Hamon, Louis J. Durlofsky
Abstract: Deep learning surrogate modeling shows great promise for subsurface flow applications, but the training demands can be substantial. Here we introduce a new surrogate modeling framework to predict CO2 saturation, pressure and surface displacement for use in the history matching of carbon storage operations. Rather than train using a large number of expensive coupled flow-geomechanics simulation runs, training here involves a large number of inexpensive flow-only simulations combined with a much smaller number of coupled runs. The flow-only runs use an effective rock compressibility, which is shown to provide accurate predictions for saturation and pressure for our system. A recurrent residual U-Net architecture is applied for the saturation and pressure surrogate models, while a new residual U-Net model is introduced to predict surface displacement. The surface displacement surrogate accepts, as inputs, geomodel quantities along with saturation and pressure surrogate predictions. Median relative error for a diverse test set is less than 4% for all variables. The surrogate models are incorporated into a hierarchical Markov chain Monte Carlo history matching workflow. Surrogate error is included using a new treatment involving the full model error covariance matrix. A high degree of prior uncertainty, with geomodels characterized by uncertain geological scenario parameters (metaparameters) and associated realizations, is considered. History matching results for a synthetic true model are generated using in-situ monitoring-well data only, surface displacement data only, and both data types. The enhanced uncertainty reduction achieved with both data types is quantified. Posterior saturation and surface displacement fields are shown to correspond well with the true solution.
Authors: Imran Nasim, Joa\~o Lucas de Sousa Almeida
Abstract: Scientific Machine Learning is transforming traditional engineering industries by enhancing the efficiency of existing technologies and accelerating innovation, particularly in modeling chemical reactions. Despite recent advancements, the issue of solving stiff chemically reacting problems within computational fluid dynamics remains a significant issue. In this study we propose a novel approach utilizing a multi-layer-perceptron mixer architecture (MLP-Mixer) to model the time-series of stiff chemical kinetics. We evaluate this method using the ROBER system, a benchmark model in chemical kinetics, to compare its performance with traditional numerical techniques. This study provides insight into the industrial utility of the recently developed MLP-Mixer architecture to model chemical kinetics and provides motivation for such neural architecture to be used as a base for time-series foundation models.
Authors: D Alqattan, R Sun, H Liang, G Nicosia, V Snasel, R Ranjan, V Ojha
Abstract: Hierarchical federated learning (HFL) is a promising distributed deep learning model training paradigm, but it has crucial security concerns arising from adversarial attacks. This research investigates and assesses the security of HFL using a novel methodology by focusing on its resilience against adversarial attacks inference-time and training-time. Through a series of extensive experiments across diverse datasets and attack scenarios, we uncover that HFL demonstrates robustness against untargeted training-time attacks due to its hierarchical structure. However, targeted attacks, particularly backdoor attacks, exploit this architecture, especially when malicious clients are positioned in the overlapping coverage areas of edge servers. Consequently, HFL shows a dual nature in its resilience, showcasing its capability to recover from attacks thanks to its hierarchical aggregation that strengthens its suitability for adversarial training, thereby reinforcing its resistance against inference-time attacks. These insights underscore the necessity for balanced security strategies in HFL systems, leveraging their inherent strengths while effectively mitigating vulnerabilities.
Authors: Md Fahim Sikder, Resmi Ramachandranpillai, Daniel de Leng, Fredrik Heintz
Abstract: Data Fairness is a crucial topic due to the recent wide usage of AI powered applications. Most of the real-world data is filled with human or machine biases and when those data are being used to train AI models, there is a chance that the model will reflect the bias in the training data. Existing bias-mitigating generative methods based on GANs, Diffusion models need in-processing fairness objectives and fail to consider computational overhead while choosing computationally-heavy architectures, which may lead to high computational demands, instability and poor optimization performance. To mitigate this issue, in this work, we present a fair data generation technique based on knowledge distillation, where we use a small architecture to distill the fair representation in the latent space. The idea of fair latent space distillation enables more flexible and stable training of Fair Generative Models (FGMs). We first learn a syntax-agnostic (for any data type) fair representation of the data, followed by distillation in the latent space into a smaller model. After distillation, we use the distilled fair latent space to generate high-fidelity fair synthetic data. While distilling, we employ quality loss (for fair distillation) and utility loss (for data utility) to ensure that the fairness and data utility characteristics remain in the distilled latent space. Our approaches show a 5%, 5% and 10% rise in performance in fairness, synthetic sample quality and data utility, respectively, than the state-of-the-art fair generative model.
Authors: Hossein Mirzaei, Mojtaba Nafez, Mohammad Jafari, Mohammad Bagher Soltani, Mohammad Azizmalayeri, Jafar Habibi, Mohammad Sabokrou, Mohammad Hossein Rohban
Abstract: Novelty detection is a critical task for deploying machine learning models in the open world. A crucial property of novelty detection methods is universality, which can be interpreted as generalization across various distributions of training or test data. More precisely, for novelty detection, distribution shifts may occur in the training set or the test set. Shifts in the training set refer to cases where we train a novelty detector on a new dataset and expect strong transferability. Conversely, distribution shifts in the test set indicate the methods' performance when the trained model encounters a shifted test sample. We experimentally show that existing methods falter in maintaining universality, which stems from their rigid inductive biases. Motivated by this, we aim for more generalized techniques that have more adaptable inductive biases. In this context, we leverage the fact that contrastive learning provides an efficient framework to easily switch and adapt to new inductive biases through the proper choice of augmentations in forming the negative pairs. We propose a novel probabilistic auto-negative pair generation method AutoAugOOD, along with contrastive learning, to yield a universal novelty detector method. Our experiments demonstrate the superiority of our method under different distribution shifts in various image benchmark datasets. Notably, our method emerges universality in the lens of adaptability to different setups of novelty detection, including one-class, unlabeled multi-class, and labeled multi-class settings. Code: https://github.com/mojtaba-nafez/UNODE
Authors: Jan Lewen, Max Pargmann, Mehdi Cherti, Jenia Jitsev, Robert Pitz-Paal, Daniel Maldonado Quinto
Abstract: Concentrating Solar Power (CSP) plants play a crucial role in the global transition towards sustainable energy. A key factor in ensuring the safe and efficient operation of CSP plants is the distribution of concentrated flux density on the receiver. However, the non-ideal flux density generated by individual heliostats can undermine the safety and efficiency of the power plant. The flux density from each heliostat is influenced by its precise surface profile, which includes factors such as canting and mirror errors. Accurately measuring these surface profiles for a large number of heliostats in operation is a formidable challenge. Consequently, control systems often rely on the assumption of ideal surface conditions, which compromises both safety and operational efficiency. In this study, we introduce inverse Deep Learning Ray Tracing (iDLR), an innovative method designed to predict heliostat surfaces based solely on target images obtained during heliostat calibration. Our simulation-based investigation demonstrates that sufficient information regarding the heliostat surface is retained in the flux density distribution of a single heliostat, enabling deep learning models to accurately predict the underlying surface with deflectometry-like precision for the majority of heliostats. Additionally, we assess the limitations of this method, particularly in relation to surface accuracy and resultant flux density predictions. Furthermore, we are presenting a new comprehensive heliostat model using Non-Uniform Rational B-Spline (NURBS) that has the potential to become the new State of the Art for heliostat surface parameterization. Our findings reveal that iDLR has significant potential to enhance CSP plant operations, potentially increasing the overall efficiency and energy output of the power plants.
Authors: Johannes von Oswald, Seijin Kobayashi, Yassir Akram, Angelika Steger
Abstract: Randomization is a powerful tool that endows algorithms with remarkable properties. For instance, randomized algorithms excel in adversarial settings, often surpassing the worst-case performance of deterministic algorithms with large margins. Furthermore, their success probability can be amplified by simple strategies such as repetition and majority voting. In this paper, we enhance deep neural networks, in particular transformer models, with randomization. We demonstrate for the first time that randomized algorithms can be instilled in transformers through learning, in a purely data- and objective-driven manner. First, we analyze known adversarial objectives for which randomized algorithms offer a distinct advantage over deterministic ones. We then show that common optimization techniques, such as gradient descent or evolutionary strategies, can effectively learn transformer parameters that make use of the randomness provided to the model. To illustrate the broad applicability of randomization in empowering neural networks, we study three conceptual tasks: associative recall, graph coloring, and agents that explore grid worlds. In addition to demonstrating increased robustness against oblivious adversaries through learned randomization, our experiments reveal remarkable performance improvements due to the inherently random nature of the neural networks' computation and predictions.
Authors: Jianxiang Zhou, Erdong Liu, Wei Chen, Siru Zhong, Yuxuan Liang
Abstract: Traffic forecasting has emerged as a crucial research area in the development of smart cities. Although various neural networks with intricate architectures have been developed to address this problem, they still face two key challenges: i) Recent advancements in network designs for modeling spatio-temporal correlations are starting to see diminishing returns in performance enhancements. ii) Additionally, most models do not account for the spatio-temporal heterogeneity inherent in traffic data, i.e., traffic distribution varies significantly across different regions and traffic flow patterns fluctuate across various time slots. To tackle these challenges, we introduce the Spatio-Temporal Graph Transformer (STGormer), which effectively integrates attribute and structure information inherent in traffic data for learning spatio-temporal correlations, and a mixture-of-experts module for capturing heterogeneity along spaital and temporal axes. Specifically, we design two straightforward yet effective spatial encoding methods based on the graph structure and integrate time position encoding into the vanilla transformer to capture spatio-temporal traffic patterns. Additionally, a mixture-of-experts enhanced feedforward neural network (FNN) module adaptively assigns suitable expert layers to distinct patterns via a spatio-temporal gating network, further improving overall prediction accuracy. Experiments on five real-world datasets demonstrate that STGormer achieves state-of-the-art performance.
Authors: Janina Enrica Sch\"utte, Martin Eigel
Abstract: A neural network architecture is presented that exploits the multilevel properties of high-dimensional parameter-dependent partial differential equations, enabling an efficient approximation of parameter-to-solution maps, rivaling best-in-class methods such as low-rank tensor regression in terms of accuracy and complexity. The neural network is trained with data on adaptively refined finite element meshes, thus reducing data complexity significantly. Error control is achieved by using a reliable finite element a posteriori error estimator, which is also provided as input to the neural network. The proposed U-Net architecture with CNN layers mimics a classical finite element multigrid algorithm. It can be shown that the CNN efficiently approximates all operations required by the solver, including the evaluation of the residual-based error estimator. In the CNN, a culling mask set-up according to the local corrections due to refinement on each mesh level reduces the overall complexity, allowing the network optimization with localized fine-scale finite element data. A complete convergence and complexity analysis is carried out for the adaptive multilevel scheme, which differs in several aspects from previous non-adaptive multilevel CNN. Moreover, numerical experiments with common benchmark problems from Uncertainty Quantification illustrate the practical performance of the architecture.
Authors: Haozhe Ma, Zhengding Luo, Thanh Vinh Vo, Kuankuan Sima, Tze-Yun Leong
Abstract: Reward shaping is effective in addressing the sparse-reward challenge in reinforcement learning by providing immediate feedback through auxiliary informative rewards. Based on the reward shaping strategy, we propose a novel multi-task reinforcement learning framework, that integrates a centralized reward agent (CRA) and multiple distributed policy agents. The CRA functions as a knowledge pool, which aims to distill knowledge from various tasks and distribute it to individual policy agents to improve learning efficiency. Specifically, the shaped rewards serve as a straightforward metric to encode knowledge. This framework not only enhances knowledge sharing across established tasks but also adapts to new tasks by transferring valuable reward signals. We validate the proposed method on both discrete and continuous domains, demonstrating its robustness in multi-task sparse-reward settings and its effective transferability to unseen tasks.
Authors: Ryan Swope, Amol Khanna, Philip Doldo, Saptarshi Roy, Edward Raff
Abstract: Data scientists often seek to identify the most important features in high-dimensional datasets. This can be done through $L_1$-regularized regression, but this can become inefficient for very high-dimensional datasets. Additionally, high-dimensional regression can leak information about individual datapoints in a dataset. In this paper, we empirically evaluate the established baseline method for feature selection with differential privacy, the two-stage selection technique, and show that it is not stable under sparsity. This makes it perform poorly on real-world datasets, so we consider a different approach to private feature selection. We employ a correlations-based order statistic to choose important features from a dataset and privatize them to ensure that the results do not leak information about individual datapoints. We find that our method significantly outperforms the established baseline for private feature selection on many datasets.
Authors: R\'obert Csord\'as, Christopher Potts, Christopher D. Manning, Atticus Geiger
Abstract: The Linear Representation Hypothesis (LRH) states that neural networks learn to encode concepts as directions in activation space, and a strong version of the LRH states that models learn only such encodings. In this paper, we present a counterexample to this strong LRH: when trained to repeat an input token sequence, gated recurrent neural networks (RNNs) learn to represent the token at each position with a particular order of magnitude, rather than a direction. These representations have layered features that are impossible to locate in distinct linear subspaces. To show this, we train interventions to predict and manipulate tokens by learning the scaling factor corresponding to each sequence position. These interventions indicate that the smallest RNNs find only this magnitude-based solution, while larger RNNs have linear representations. These findings strongly indicate that interpretability research should not be confined by the LRH.
Authors: Rui Luo, Zhixin Zhou
Abstract: Uncertainty quantification is essential in decision-making, especially when joint distributions of random variables are involved. While conformal prediction provides distribution-free prediction sets with valid coverage guarantees, it traditionally focuses on single predictions. This paper introduces novel conformal prediction methods for estimating the sum or average of unknown labels over specific index sets. We develop conformal prediction intervals for single target to the prediction interval for sum of multiple targets. Under permutation invariant assumptions, we prove the validity of our proposed method. We also apply our algorithms on class average estimation and path cost prediction tasks, and we show that our method outperforms existing conformalized approaches as well as non-conformal approaches.
Authors: Yuval Ben-Hur, Yuval Cassuto
Abstract: As machine-learning models grow in size, their implementation requirements cannot be met by a single computer system. This observation motivates distributed settings, in which intermediate computations are performed across a network of processing units, while the central node only aggregates their outputs. However, distributing inference tasks across low-precision or faulty edge devices, operating over a network of noisy communication channels, gives rise to serious reliability challenges. We study the problem of an ensemble of devices, implementing regression algorithms, that communicate through additive noisy channels in order to collaboratively perform a joint regression task. We define the problem formally, and develop methods for optimizing the aggregation coefficients for the parameters of the noise in the channels, which can potentially be correlated. Our results apply to the leading state-of-the-art ensemble regression methods: bagging and gradient boosting. We demonstrate the effectiveness of our algorithms on both synthetic and real-world datasets.
Authors: Xiaodong Yang, Xiaoting Li, Huiyuan Chen, Yiwei Cai
Abstract: Recent studies show that well-devised perturbations on graph structures or node features can mislead trained Graph Neural Network (GNN) models. However, these methods often overlook practical assumptions, over-rely on heuristics, or separate vital attack components. In response, we present GAIM, an integrated adversarial attack method conducted on a node feature basis while considering the strict black-box setting. Specifically, we define an adversarial influence function to theoretically assess the adversarial impact of node perturbations, thereby reframing the GNN attack problem into the adversarial influence maximization problem. In our approach, we unify the selection of the target node and the construction of feature perturbations into a single optimization problem, ensuring a unique and consistent feature perturbation for each target node. We leverage a surrogate model to transform this problem into a solvable linear programming task, streamlining the optimization process. Moreover, we extend our method to accommodate label-oriented attacks, broadening its applicability. Thorough evaluations on five benchmark datasets across three popular models underscore the effectiveness of our method in both untargeted and label-oriented targeted attacks. Through comprehensive analysis and ablation studies, we demonstrate the practical value and efficacy inherent to our design choices.
Authors: Dona Arabi, Jafar Bakhshaliyev, Ayse Coskuner, Kiran Madhusudhanan, Kami Serdar Uckardes
Abstract: Data augmentation is important for improving machine learning model performance when faced with limited real-world data. In time series forecasting (TSF), where accurate predictions are crucial in fields like finance, healthcare, and manufacturing, traditional augmentation methods for classification tasks are insufficient to maintain temporal coherence. This research introduces two augmentation approaches using the discrete wavelet transform (DWT) to adjust frequency elements while preserving temporal dependencies in time series data. Our methods, Wavelet Masking (WaveMask) and Wavelet Mixing (WaveMix), are evaluated against established baselines across various forecasting horizons. To the best of our knowledge, this is the first study to conduct extensive experiments on multivariate time series using Discrete Wavelet Transform as an augmentation technique. Experimental results demonstrate that our techniques achieve competitive results with previous methods. We also explore cold-start forecasting using downsampled training datasets, comparing outcomes to baseline methods.
Authors: Vitus Benson, Ana Bastos, Christian Reimers, Alexander J. Winkler, Fanny Yang, Markus Reichstein
Abstract: Accurately describing the distribution of CO$_2$ in the atmosphere with atmospheric tracer transport models is essential for greenhouse gas monitoring and verification support systems to aid implementation of international climate agreements. Large deep neural networks are poised to revolutionize weather prediction, which requires 3D modeling of the atmosphere. While similar in this regard, atmospheric transport modeling is subject to new challenges. Both, stable predictions for longer time horizons and mass conservation throughout need to be achieved, while IO plays a larger role compared to computational costs. In this study we explore four different deep neural networks (UNet, GraphCast, Spherical Fourier Neural Operator and SwinTransformer) which have proven as state-of-the-art in weather prediction to assess their usefulness for atmospheric tracer transport modeling. For this, we assemble the CarbonBench dataset, a systematic benchmark tailored for machine learning emulators of Eulerian atmospheric transport. Through architectural adjustments, we decouple the performance of our emulators from the distribution shift caused by a steady rise in atmospheric CO$_2$. More specifically, we center CO$_2$ input fields to zero mean and then use an explicit flux scheme and a mass fixer to assure mass balance. This design enables stable and mass conserving transport for over 6 months with all four neural network architectures. In our study, the SwinTransformer displays particularly strong emulation skill (90-day $R^2 > 0.99$), with physically plausible emulation even for forward runs of multiple years. This work paves the way forward towards high resolution forward and inverse modeling of inert trace gases with neural networks.
Authors: Micha{\l} Bortkiewicz, W{\l}adek Pa{\l}ucki, Vivek Myers, Tadeusz Dziarmaga, Tomasz Arczewski, {\L}ukasz Kuci\'nski, Benjamin Eysenbach
Abstract: Self-supervision has the potential to transform reinforcement learning (RL), paralleling the breakthroughs it has enabled in other areas of machine learning. While self-supervised learning in other domains aims to find patterns in a fixed dataset, self-supervised goal-conditioned reinforcement learning (GCRL) agents discover new behaviors by learning from the goals achieved during unstructured interaction with the environment. However, these methods have failed to see similar success, both due to a lack of data from slow environments as well as a lack of stable algorithms. We take a step toward addressing both of these issues by releasing a high-performance codebase and benchmark JaxGCRL for self-supervised GCRL, enabling researchers to train agents for millions of environment steps in minutes on a single GPU. The key to this performance is a combination of GPU-accelerated environments and a stable, batched version of the contrastive reinforcement learning algorithm, based on an infoNCE objective, that effectively makes use of this increased data throughput. With this approach, we provide a foundation for future research in self-supervised GCRL, enabling researchers to quickly iterate on new ideas and evaluate them in a diverse set of challenging environments. Website + Code: https://github.com/MichalBortkiewicz/JaxGCRL
Authors: Hendrik Alsmeier, Anton Savchenko, Rolf Findeisen
Abstract: The expansion in automation of increasingly fast applications and low-power edge devices poses a particular challenge for optimization based control algorithms, like model predictive control. Our proposed machine-learning supported approach addresses this by utilizing a feed-forward neural network to reduce the computation load of the online-optimization. We propose approximating part of the problem horizon, while maintaining safety guarantees -- constraint satisfaction -- via the remaining optimization part of the controller. The approach is validated in simulation, demonstrating an improvement in computational efficiency, while maintaining guarantees and near-optimal performance. The proposed MPC scheme can be applied to a wide range of applications, including those requiring a rapid control response, such as robotics and embedded applications with limited computational resources.
Authors: Kamal Acharya, Alvaro Velasquez, Houbing Herbert Song
Abstract: This survey paper delves into the emerging and critical area of symbolic knowledge distillation in Large Language Models (LLMs). As LLMs like Generative Pre-trained Transformer-3 (GPT-3) and Bidirectional Encoder Representations from Transformers (BERT) continue to expand in scale and complexity, the challenge of effectively harnessing their extensive knowledge becomes paramount. This survey concentrates on the process of distilling the intricate, often implicit knowledge contained within these models into a more symbolic, explicit form. This transformation is crucial for enhancing the interpretability, efficiency, and applicability of LLMs. We categorize the existing research based on methodologies and applications, focusing on how symbolic knowledge distillation can be used to improve the transparency and functionality of smaller, more efficient Artificial Intelligence (AI) models. The survey discusses the core challenges, including maintaining the depth of knowledge in a comprehensible format, and explores the various approaches and techniques that have been developed in this field. We identify gaps in current research and potential opportunities for future advancements. This survey aims to provide a comprehensive overview of symbolic knowledge distillation in LLMs, spotlighting its significance in the progression towards more accessible and efficient AI systems.
Authors: Ziyu Wang, Anil Kanduri, Seyed Amir Hossein Aqajari, Salar Jafarlou, Sanaz R. Mousavi, Pasi Liljeberg, Shaista Malik, Amir M. Rahmani
Abstract: While ECG data is crucial for diagnosing and monitoring heart conditions, it also contains unique biometric information that poses significant privacy risks. Existing ECG re-identification studies rely on exhaustive analysis of numerous deep learning features, confining to ad-hoc explainability towards clinicians decision making. In this work, we delve into explainability of ECG re-identification risks using transparent machine learning models. We use SHapley Additive exPlanations (SHAP) analysis to identify and explain the key features contributing to re-identification risks. We conduct an empirical analysis of identity re-identification risks using ECG data from five diverse real-world datasets, encompassing 223 participants. By employing transparent machine learning models, we reveal the diversity among different ECG features in contributing towards re-identification of individuals with an accuracy of 0.76 for gender, 0.67 for age group, and 0.82 for participant ID re-identification. Our approach provides valuable insights for clinical experts and guides the development of effective privacy-preserving mechanisms. Further, our findings emphasize the necessity for robust privacy measures in real-world health applications and offer detailed, actionable insights for enhancing data anonymization techniques.
Authors: Yun Xiao, Yimeng Zhang, Xiaopeng Peng, Shuzheng Han, Xia Zheng, Dingyi Fang, Xiaojiang Chen
Abstract: Electroencephalography (EEG) provides reliable indications of human cognition and mental states. Accurate emotion recognition from EEG remains challenging due to signal variations among individuals and across measurement sessions. To address these challenges, we introduce a multi-source dynamic contrastive domain adaptation method (MS-DCDA), which models coarse-grained inter-domain and fine-grained intra-class adaptations through a multi-branch contrastive neural network and contrastive sub-domain discrepancy learning. Our model leverages domain knowledge from each individual source and a complementary source ensemble and uses dynamically weighted learning to achieve an optimal tradeoff between domain transferability and discriminability. The proposed MS-DCDA model was evaluated using the SEED and SEED-IV datasets, achieving respectively the highest mean accuracies of $90.84\%$ and $78.49\%$ in cross-subject experiments as well as $95.82\%$ and $82.25\%$ in cross-session experiments. Our model outperforms several alternative domain adaptation methods in recognition accuracy, inter-class margin, and intra-class compactness. Our study also suggests greater emotional sensitivity in the frontal and parietal brain lobes, providing insights for mental health interventions, personalized medicine, and development of preventive strategies.
Authors: Neha R. Gupta, Jessica Hullman, Hari Subramonyam
Abstract: Research in Responsible AI has developed a range of principles and practices to ensure that machine learning systems are used in a manner that is ethical and aligned with human values. However, a critical yet often neglected aspect of ethical ML is the ethical implications that appear when designing evaluations of ML systems. For instance, teams may have to balance a trade-off between highly informative tests to ensure downstream product safety, with potential fairness harms inherent to the implemented testing procedures. We conceptualize ethics-related concerns in standard ML evaluation techniques. Specifically, we present a utility framework, characterizing the key trade-off in ethical evaluation as balancing information gain against potential ethical harms. The framework is then a tool for characterizing challenges teams face, and systematically disentangling competing considerations that teams seek to balance. Differentiating between different types of issues encountered in evaluation allows us to highlight best practices from analogous domains, such as clinical trials and automotive crash testing, which navigate these issues in ways that can offer inspiration to improve evaluation processes in ML. Our analysis underscores the critical need for development teams to deliberately assess and manage ethical complexities that arise during the evaluation of ML systems, and for the industry to move towards designing institutional policies to support ethical evaluations.
Authors: Rishit Dagli, Atsuhiro Hibi, Rahul G. Krishnan, Pascal N. Tyrrell
Abstract: Current methods for performing 3D reconstruction and novel view synthesis (NVS) in ultrasound imaging data often face severe artifacts when training NeRF-based approaches. The artifacts produced by current approaches differ from NeRF floaters in general scenes because of the unique nature of ultrasound capture. Furthermore, existing models fail to produce reasonable 3D reconstructions when ultrasound data is captured or obtained casually in uncontrolled environments, which is common in clinical settings. Consequently, existing reconstruction and NVS methods struggle to handle ultrasound motion, fail to capture intricate details, and cannot model transparent and reflective surfaces. In this work, we introduced NeRF-US, which incorporates 3D-geometry guidance for border probability and scattering density into NeRF training, while also utilizing ultrasound-specific rendering over traditional volume rendering. These 3D priors are learned through a diffusion model. Through experiments conducted on our new "Ultrasound in the Wild" dataset, we observed accurate, clinically plausible, artifact-free reconstructions.
Authors: Arjhun Swaminathan, Mete Akg\"un
Abstract: Quantum computing promises to revolutionize machine learning, offering significant efficiency gains in tasks such as clustering and distance estimation. Additionally, it provides enhanced security through fundamental principles like the measurement postulate and the no-cloning theorem, enabling secure protocols such as quantum teleportation and quantum key distribution. While advancements in secure quantum machine learning are notable, the development of secure and distributed quantum analogues of kernel-based machine learning techniques remains underexplored. In this work, we present a novel approach for securely computing common kernels, including polynomial, radial basis function (RBF), and Laplacian kernels, when data is distributed, using quantum feature maps. Our methodology introduces a robust framework that leverages quantum teleportation to ensure secure and distributed kernel learning. The proposed architecture is validated using IBM's Qiskit Aer Simulator on various public datasets.
Authors: P Modi
Abstract: With the rise in the number of IoT devices and its users, security in IoT has become a big concern to ensure the protection from harmful security attacks. In the recent years, different variants of DDoS attacks have been on the rise in IoT devices. Failure to detect DDoS attacks at the right time can result in financial and reputational loss for victim organizations. These attacks conducted with IoT devices can cause a significant downtime of applications running on the Internet. Although researchers have developed and utilized specialized models using artificial intelligence techniques, these models do not provide the best accuracy as there is always a scope of improvement until 100% accuracy is attained. We propose a hybrid feature selection algorithm that selects only the most useful features and passes those features into an XGBoost model, the results of which are explained using feature importances. Our model attains an accuracy of 99.993% on the CIC IDS 2017 dataset and a recall of 97.64 % on the CIC IoT 2023 dataset. Overall, this research would help researchers and implementers in the field of detecting IoT DDoS attacks by providing a more accurate and comparable model.
Authors: Florentina Voboril, Vaidyanathan Peruvemba Ramaswamy, Stefan Szeider
Abstract: This paper presents the novel method StreamLLM for generating streamliners in constraint programming using Large Language Models (LLMs). Streamliners are constraints that narrow the search space, enhancing the speed and feasibility of solving complex problems. Traditionally, streamliners were crafted manually or generated through systematically combined atomic constraints with high-effort offline testing. Our approach uses LLMs to propose effective streamliners. Our system StreamLLM generates streamlines for problems specified in the MiniZinc constraint programming language and integrates feedback to the LLM with quick empirical tests. Our rigorous empirical evaluation involving ten problems with several hundreds of test instances shows robust results that are highly encouraging, showcasing the transforming power of LLMs in the domain of constraint programming.
Authors: Diego Alvarez-Estevez
Abstract: Quantum-enhanced machine learning is a rapidly evolving field that aims to leverage the unique properties of quantum mechanics to enhance classical machine learning. However, the practical applicability of these methods remains an open question, particularly in the context of real-world datasets and the limitations of current quantum hardware. This work performs a benchmark study of Quantum Kernel Estimation (QKE) and Quantum Kernel Training (QKT) with a focus on classification tasks. Through a series of experiments, the versatility and generalization capabilities of two quantum feature mappings, namely ZZFeatureMap and CovariantFeatureMap, are analyzed in this context. Remarkably, these feature maps have been proposed in the literature under the conjecture of possible near-term quantum advantage and have shown promising performance in ad-hoc datasets. This study explores both artificial and established reference datasets and incorporates classical machine learning methods, specifically Support Vector Machines (SVMs) and logistic regression, as baseline comparisons. Experimental results indicate that quantum methods exhibit varying performance across different datasets. While they outperform classical methods in ad-hoc datasets, they frequently encounter difficulties in generalizing to unseen test data when dealing with reference classical datasets, even if achieving high classification accuracy on the training data. It is suggested that the choice of the feature mapping and the optimization of kernel parameters through QKT are critical for maximizing the effectiveness of quantum methods.
Authors: Athul Raimon, Shubha Masti, Shyam K Sateesh, Siyani Vengatagiri, Bhaskarjyoti Das
Abstract: This survey overviews various meta-learning approaches used in audio and speech processing scenarios. Meta-learning is used where model performance needs to be maximized with minimum annotated samples, making it suitable for low-sample audio processing. Although the field has made some significant contributions, audio meta-learning still lacks the presence of comprehensive survey papers. We present a systematic review of meta-learning methodologies in audio processing. This includes audio-specific discussions on data augmentation, feature extraction, preprocessing techniques, meta-learners, task selection strategies and also presents important datasets in audio, together with crucial real-world use cases. Through this extensive review, we aim to provide valuable insights and identify future research directions in the intersection of meta-learning and audio processing.
Authors: Eric Price, Zhiyang Xun
Abstract: In streaming PCA, we see a stream of vectors $x_1, \dotsc, x_n \in \mathbb{R}^d$ and want to estimate the top eigenvector of their covariance matrix. This is easier if the spectral ratio $R = \lambda_1 / \lambda_2$ is large. We ask: how large does $R$ need to be to solve streaming PCA in $\widetilde{O}(d)$ space? Existing algorithms require $R = \widetilde{\Omega}(d)$. We show: (1) For all mergeable summaries, $R = \widetilde{\Omega}(\sqrt{d})$ is necessary. (2) In the insertion-only model, a variant of Oja's algorithm gets $o(1)$ error for $R = O(\log n \log d)$. (3) No algorithm with $o(d^2)$ space gets $o(1)$ error for $R = O(1)$. Our analysis is the first application of Oja's algorithm to adversarial streams. It is also the first algorithm for adversarial streaming PCA that is designed for a spectral, rather than Frobenius, bound on the tail; and the bound it needs is exponentially better than is possible by adapting a Frobenius guarantee.
Authors: Nathalia Castellanos, Dhruv Desai, Sebastian Frank, Stefano Pasquali, Dhagash Mehta
Abstract: Peer analysis is a critical component of investment management, often relying on expert-provided categorization systems. These systems' consistency is questioned when they do not align with cohorts from unsupervised clustering algorithms optimized for various metrics. We investigate whether unsupervised clustering can reproduce ground truth classes in a labeled dataset, showing that success depends on feature selection and the chosen distance metric. Using toy datasets and fund categorization as real-world examples we demonstrate that accurately reproducing ground truth classes is challenging. We also highlight the limitations of standard clustering evaluation metrics in identifying the optimal number of clusters relative to the ground truth classes. We then show that if appropriate features are available in the dataset, and a proper distance metric is known (e.g., using a supervised Random Forest-based distance metric learning method), then an unsupervised clustering can indeed reproduce the ground truth classes as distinct clusters.
Authors: Xiaofeng Lin, Xuezhou Zhang
Abstract: In this paper, we study reinforcement learning in Markov Decision Processes with Probabilistic Reward Machines (PRMs), a form of non-Markovian reward commonly found in robotics tasks. We design an algorithm for PRMs that achieves a regret bound of $\widetilde{O}(\sqrt{HOAT} + H^2O^2A^{3/2} + H\sqrt{T})$, where $H$ is the time horizon, $O$ is the number of observations, $A$ is the number of actions, and $T$ is the number of time-steps. This result improves over the best-known bound, $\widetilde{O}(H\sqrt{OAT})$ of \citet{pmlr-v206-bourel23a} for MDPs with Deterministic Reward Machines (DRMs), a special case of PRMs. When $T \geq H^3O^3A^2$ and $OA \geq H$, our regret bound leads to a regret of $\widetilde{O}(\sqrt{HOAT})$, which matches the established lower bound of $\Omega(\sqrt{HOAT})$ for MDPs with DRMs up to a logarithmic factor. To the best of our knowledge, this is the first efficient algorithm for PRMs. Additionally, we present a new simulation lemma for non-Markovian rewards, which enables reward-free exploration for any non-Markovian reward given access to an approximate planner. Complementing our theoretical findings, we show through extensive experiment evaluations that our algorithm indeed outperforms prior methods in various PRM environments.
Authors: Inkit Padhi, Karthikeyan Natesan Ramamurthy, Prasanna Sattigeri, Manish Nagireddy, Pierre Dognin, Kush R. Varshney
Abstract: Aligning large language models (LLMs) to value systems has emerged as a significant area of research within the fields of AI and NLP. Currently, this alignment process relies on the availability of high-quality supervised and preference data, which can be both time-consuming and expensive to curate or annotate. In this paper, we introduce a systematic end-to-end methodology for aligning LLMs to the implicit and explicit values represented in unstructured text data. Our proposed approach leverages the use of scalable synthetic data generation techniques to effectively align the model to the values present in the unstructured data. Through two distinct use-cases, we demonstrate the efficiency of our methodology on the Mistral-7B-Instruct model. Our approach credibly aligns LLMs to the values embedded within documents, and shows improved performance against other approaches, as quantified through the use of automatic metrics and win rates.
Authors: Moumita Bhattacharya, Vito Ostuni, Sudarshan Lamkhede
Abstract: Search and recommendation systems are essential in many services, and they are often developed separately, leading to complex maintenance and technical debt. In this paper, we present a unified deep learning model that efficiently handles key aspects of both tasks.
Authors: Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob Bartel, Kyle Kastner, Gary Wang, Andrew Rosenberg, Quan Wang
Abstract: The keyword spotting (KWS) problem requires large amounts of real speech training data to achieve high accuracy across diverse populations. Utilizing large amounts of text-to-speech (TTS) synthesized data can reduce the cost and time associated with KWS development. However, TTS data may contain artifacts not present in real speech, which the KWS model can exploit (overfit), leading to degraded accuracy on real speech. To address this issue, we propose applying an adversarial training method to prevent the KWS model from learning TTS-specific features when trained on large amounts of TTS data. Experimental results demonstrate that KWS model accuracy on real speech data can be improved by up to 12% when adversarial loss is used in addition to the original KWS loss. Surprisingly, we also observed that the adversarial setup improves accuracy by up to 8%, even when trained solely on TTS and real negative speech data, without any real positive examples.
Authors: Guanchen Li, Xiandong Zhao, Lian Liu, Zeping Li, Dong Li, Lu Tian, Jie He, Ashish Sirasao, Emad Barsoum
Abstract: Pre-trained language models (PLMs) are engineered to be robust in contextual understanding and exhibit outstanding performance in various natural language processing tasks. However, their considerable size incurs significant computational and storage costs. Modern pruning strategies employ one-shot techniques to compress PLMs without the need for retraining on task-specific or otherwise general data; however, these approaches often lead to an indispensable reduction in performance. In this paper, we propose SDS, a Sparse-Dense-Sparse pruning framework to enhance the performance of the pruned PLMs from a weight distribution optimization perspective. We outline the pruning process in three steps. Initially, we prune less critical connections in the model using conventional one-shot pruning methods. Next, we reconstruct a dense model featuring a pruning-friendly weight distribution by reactivating pruned connections with sparse regularization. Finally, we perform a second pruning round, yielding a superior pruned model compared to the initial pruning. Experimental results demonstrate that SDS outperforms the state-of-the-art pruning techniques SparseGPT and Wanda under an identical sparsity configuration. For instance, SDS reduces perplexity by 9.13 on Raw-Wikitext2 and improves accuracy by an average of 2.05% across multiple zero-shot benchmarks for OPT-125M with 2:4 sparsity.
Authors: Xuan Xie, Jiayang Song, Yuheng Huang, Da Song, Fuyuan Zhang, Felix Juefei-Xu, Lei Ma
Abstract: Large Language Models (LLMs) are widely used in many different domains, but because of their limited interpretability, there are questions about how trustworthy they are in various perspectives, e.g., truthfulness and toxicity. Recent research has started developing testing methods for LLMs, aiming to uncover untrustworthy issues, i.e., defects, before deployment. However, systematic and formalized testing criteria are lacking, which hinders a comprehensive assessment of the extent and adequacy of testing exploration. To mitigate this threat, we propose a set of multi-level testing criteria, LeCov, for LLMs. The criteria consider three crucial LLM internal components, i.e., the attention mechanism, feed-forward neurons, and uncertainty, and contain nine types of testing criteria in total. We apply the criteria in two scenarios: test prioritization and coverage-guided testing. The experiment evaluation, on three models and four datasets, demonstrates the usefulness and effectiveness of LeCov.
Authors: Xinhui Rong, Victor Solo
Abstract: Despite the widespread occurrence of classification problems and the increasing collection of point process data across many disciplines, study of error probability for point process classification only emerged very recently. Here, we consider classification of renewal processes. We obtain asymptotic expressions for the Bhattacharyya bound on misclassification error probabilities for heavy-tailed renewal processes.
Authors: Thanh Thi Nguyen, Campbell Wilson, Janis Dalins
Abstract: Assessing the forensic value of hand images involves the use of unique features and patterns present in an individual's hand. The human hand has distinct characteristics, such as the pattern of veins, fingerprints, and the geometry of the hand itself. This paper investigates the use of vision transformers (ViTs) for classification of hand images. We use explainability tools to explore the internal representations of ViTs and assess their impact on the model outputs. Utilizing the internal understanding of ViTs, we introduce distillation methods that allow a student model to adaptively extract knowledge from a teacher model while learning on data of a different domain to prevent catastrophic forgetting. Two publicly available hand image datasets are used to conduct a series of experiments to evaluate performance of the ViTs and our proposed adaptive distillation methods. The experimental results demonstrate that ViT models significantly outperform traditional machine learning methods and the internal states of ViTs are useful for explaining the model outputs in the classification task. By averting catastrophic forgetting, our distillation methods achieve excellent performance on data from both source and target domains, particularly when these two domains exhibit significant dissimilarity. The proposed approaches therefore can be developed and implemented effectively for real-world applications such as access control, identity verification, and authentication systems.
Authors: Yun Qu, Boyuan Wang, Jianzhun Shao, Yuhang Jiang, Chen Chen, Zhenbin Ye, Lin Liu, Junfeng Yang, Lin Lai, Hongyang Qin, Minwen Deng, Juchao Zhuo, Deheng Ye, Qiang Fu, Wei Yang, Guang Yang, Lanxiao Huang, Xiangyang Ji
Abstract: The advancement of Offline Reinforcement Learning (RL) and Offline Multi-Agent Reinforcement Learning (MARL) critically depends on the availability of high-quality, pre-collected offline datasets that represent real-world complexities and practical applications. However, existing datasets often fall short in their simplicity and lack of realism. To address this gap, we propose Hokoff, a comprehensive set of pre-collected datasets that covers both offline RL and offline MARL, accompanied by a robust framework, to facilitate further research. This data is derived from Honor of Kings, a recognized Multiplayer Online Battle Arena (MOBA) game known for its intricate nature, closely resembling real-life situations. Utilizing this framework, we benchmark a variety of offline RL and offline MARL algorithms. We also introduce a novel baseline algorithm tailored for the inherent hierarchical action space of the game. We reveal the incompetency of current offline RL approaches in handling task complexity, generalization and multi-task learning.
Authors: Zijian Dong, Yilei Wu, Zijiao Chen, Yichi Zhang, Yueming Jin, Juan Helen Zhou
Abstract: We introduce Scaffold Prompt Tuning (ScaPT), a novel prompt-based framework for adapting large-scale functional magnetic resonance imaging (fMRI) pre-trained models to downstream tasks, with high parameter efficiency and improved performance compared to fine-tuning and baselines for prompt tuning. The full fine-tuning updates all pre-trained parameters, which may distort the learned feature space and lead to overfitting with limited training data which is common in fMRI fields. In contrast, we design a hierarchical prompt structure that transfers the knowledge learned from high-resource tasks to low-resource ones. This structure, equipped with a Deeply-conditioned Input-Prompt (DIP) mapping module, allows for efficient adaptation by updating only 2% of the trainable parameters. The framework enhances semantic interpretability through attention mechanisms between inputs and prompts, and it clusters prompts in the latent space in alignment with prior knowledge. Experiments on public resting state fMRI datasets reveal ScaPT outperforms fine-tuning and multitask-based prompt tuning in neurodegenerative diseases diagnosis/prognosis and personality trait prediction, even with fewer than 20 participants. It highlights ScaPT's efficiency in adapting pre-trained fMRI models to low-resource tasks.
Authors: Ritwik Mishra, Sreeram Vennam, Rajiv Ratn Shah, Ponnurangam Kumaraguru
Abstract: Most existing Question Answering Datasets (QuADs) primarily focus on factoid-based short-context Question Answering (QA) in high-resource languages. However, the scope of such datasets for low-resource languages remains limited, with only a few works centered on factoid-based QuADs and none on non-factoid QuADs. Therefore, this work presents MuNfQuAD, a multilingual QuAD with non-factoid questions. It utilizes interrogative sub-headings from BBC news articles as questions and the corresponding paragraphs as silver answers. The dataset comprises over 370K QA pairs across 38 languages, encompassing several low-resource languages, and stands as the largest multilingual QA dataset to date. Based on the manual annotations of 790 QA-pairs from MuNfQuAD (golden set), we observe that 98\% of questions can be answered using their corresponding silver answer. Our fine-tuned Answer Paragraph Selection (APS) model outperforms the baselines. The APS model attained an accuracy of 80\% and 72\%, as well as a macro F1 of 72\% and 66\%, on the MuNfQuAD testset and the golden set, respectively. Furthermore, the APS model effectively generalizes certain a language within the golden set, even after being fine-tuned on silver labels.
Authors: Yuting Liu, Jinghao Zhang, Yizhou Dang, Yuliang Liang, Qiang Liu, Guibing Guo, Jianzhe Zhao, Xingwei Wang
Abstract: Involving collaborative information in Large Language Models (LLMs) is a promising technique for adapting LLMs for recommendation. Existing methods achieve this by concatenating collaborative features with text tokens into a unified sequence input and then fine-tuning to align these features with LLM's input space. Although effective, in this work, we identify two limitations when adapting LLMs to recommendation tasks, which hinder the integration of general knowledge and collaborative information, resulting in sub-optimal recommendation performance. (1) Fine-tuning LLM with recommendation data can undermine its inherent world knowledge and fundamental competencies, which are crucial for interpreting and inferring recommendation text. (2) Incorporating collaborative features into textual prompts disrupts the semantics of the original prompts, preventing LLM from generating appropriate outputs. In this paper, we propose a new paradigm, CoRA (an acronym for Collaborative LoRA), with a collaborative weights generator. Rather than input space alignment, this method aligns collaborative information with LLM's parameter space, representing them as incremental weights to update LLM's output. This way, LLM perceives collaborative information without altering its general knowledge and text inference capabilities. Specifically, we employ a collaborative filtering model to extract user and item embeddings, converting them into collaborative weights with low-rank properties through the collaborative weights generator. We then merge the collaborative weights into LLM's weights, enabling LLM to perceive the collaborative signals and generate personalized recommendations without fine-tuning or extra collaborative tokens in prompts. Extensive experiments confirm that CoRA effectively integrates collaborative information into LLM, enhancing recommendation performance.
Authors: Dat Thanh Nguyen, Daniel Zieger, Marc Stamminger, Andre Kaup
Abstract: Recent advancements in point cloud compression have primarily emphasized geometry compression while comparatively fewer efforts have been dedicated to attribute compression. This study introduces an end-to-end learned dynamic lossy attribute coding approach, utilizing an efficient high-dimensional convolution to capture extensive inter-point dependencies. This enables the efficient projection of attribute features into latent variables. Subsequently, we employ a context model that leverage previous latent space in conjunction with an auto-regressive context model for encoding the latent tensor into a bitstream. Evaluation of our method on widely utilized point cloud datasets from the MPEG and Microsoft demonstrates its superior performance compared to the core attribute compression module Region-Adaptive Hierarchical Transform method from MPEG Geometry Point Cloud Compression with 38.1% Bjontegaard Delta-rate saving in average while ensuring a low-complexity encoding/decoding.
Authors: An Wang, Xingwu Sun, Ruobing Xie, Shuaipeng Li, Jiaqi Zhu, Zhen Yang, Pinxue Zhao, J. N. Han, Zhanhui Kang, Di Wang, Naoaki Okazaki, Cheng-zhong Xu
Abstract: Mixture of Experts (MoE) offers remarkable performance and computational efficiency by selectively activating subsets of model parameters. Traditionally, MoE models use homogeneous experts, each with identical capacity. However, varying complexity in input data necessitates experts with diverse capabilities, while homogeneous MoE hinders effective expert specialization and efficient parameter utilization. In this study, we propose a novel Heterogeneous Mixture of Experts (HMoE), where experts differ in size and thus possess diverse capacities. This heterogeneity allows for more specialized experts to handle varying token complexities more effectively. To address the imbalance in expert activation, we propose a novel training objective that encourages the frequent activation of smaller experts, enhancing computational efficiency and parameter utilization. Extensive experiments demonstrate that HMoE achieves lower loss with fewer activated parameters and outperforms conventional homogeneous MoE models on various pre-training evaluation benchmarks. Codes will be released upon acceptance.
Authors: Hongbang Yuan, Zhuoran Jin, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao
Abstract: LLM have achieved success in many fields but still troubled by problematic content in the training corpora. LLM unlearning aims at reducing their influence and avoid undesirable behaviours. However, existing unlearning methods remain vulnerable to adversarial queries and the unlearned knowledge resurfaces after the manually designed attack queries. As part of a red-team effort to proactively assess the vulnerabilities of unlearned models, we design Dynamic Unlearning Attack (DUA), a dynamic and automated framework to attack these models and evaluate their robustness. It optimizes adversarial suffixes to reintroduce the unlearned knowledge in various scenarios. We find that unlearned knowledge can be recovered in $55.2\%$ of the questions, even without revealing the unlearned model's parameters. In response to this vulnerability, we propose Latent Adversarial Unlearning (LAU), a universal framework that effectively enhances the robustness of the unlearned process. It formulates the unlearning process as a min-max optimization problem and resolves it through two stages: an attack stage, where perturbation vectors are trained and added to the latent space of LLMs to recover the unlearned knowledge, and a defense stage, where previously trained perturbation vectors are used to enhance unlearned model's robustness. With our LAU framework, we obtain two robust unlearning methods, AdvGA and AdvNPO. We conduct extensive experiments across multiple unlearning benchmarks and various models, and demonstrate that they improve the unlearning effectiveness by over $53.5\%$, cause only less than a $11.6\%$ reduction in neighboring knowledge, and have almost no impact on the model's general capabilities.
Authors: Bei Ouyang, Shengyuan Ye, Liekang Zeng, Tianyi Qian, Jingyi Li, Xu Chen
Abstract: Large language models (LLMs) have unlocked a plethora of powerful applications at the network edge, such as intelligent personal assistants. Data privacy and security concerns have prompted a shift towards edge-based fine-tuning of personal LLMs, away from cloud reliance. However, this raises issues of computational intensity and resource scarcity, hindering training efficiency and feasibility. While current studies investigate parameter-efficient fine-tuning (PEFT) techniques to mitigate resource constraints, our analysis indicates that these techniques are not sufficiently resource-efficient for edge devices. To tackle these challenges, we propose Pluto and Charon (PAC), a time and memory efficient collaborative edge AI framework for personal LLMs fine-tuning. PAC breaks the resource wall of personal LLMs fine-tuning with a sophisticated algorithm-system co-design. (1) Algorithmically, PAC implements a personal LLMs fine-tuning technique that is efficient in terms of parameters, time, and memory. It utilizes Parallel Adapters to circumvent the need for a full backward pass through the LLM backbone. Additionally, an activation cache mechanism further streamlining the process by negating the necessity for repeated forward passes across multiple epochs. (2) Systematically, PAC leverages edge devices in close proximity, pooling them as a collective resource for in-situ personal LLMs fine-tuning, utilizing a hybrid data and pipeline parallelism to orchestrate distributed training. The use of the activation cache eliminates the need for forward pass through the LLM backbone,enabling exclusive fine-tuning of the Parallel Adapters using data parallelism. Extensive evaluation based on prototype implementation demonstrates that PAC remarkably outperforms state-of-the-art approaches, achieving up to 8.64x end-to-end speedup and up to 88.16% reduction in memory footprint.
Authors: Karl El Hajal, Ajinkya Kulkarni, Enno Hermann, Mathew Magimai. -Doss
Abstract: While recent zero-shot multispeaker text-to-speech (TTS) models achieve impressive results, they typically rely on extensive transcribed speech datasets from numerous speakers and intricate training pipelines. Meanwhile, self-supervised learning (SSL) speech features have emerged as effective intermediate representations for TTS. It was also observed that SSL features from different speakers that are linearly close share phonetic information while maintaining individual speaker identity, which enables straight-forward and robust voice cloning. In this study, we introduce SSL-TTS, a lightweight and efficient zero-shot TTS framework trained on transcribed speech from a single speaker. SSL-TTS leverages SSL features and retrieval methods for simple and robust zero-shot multi-speaker synthesis. Objective and subjective evaluations show that our approach achieves performance comparable to state-of-the-art models that require significantly larger training datasets. The low training data requirements mean that SSL-TTS is well suited for the development of multi-speaker TTS systems for low-resource domains and languages. We also introduce an interpolation parameter which enables fine control over the output speech by blending voices. Demo samples are available at https://idiap.github.io/ssl-tts
Authors: Hans Aoyang Zhou, Dominik Wolfschl\"ager, Constantinos Florides, Jonas Werheid, Hannes Behnen, Jan-Henrick Woltersmann, Tiago C. Pinto, Marco Kemmerling, Anas Abdelrazeq, Robert H. Schmitt
Abstract: Machine vision enhances automation, quality control, and operational efficiency in industrial applications by enabling machines to interpret and act on visual data. While traditional computer vision algorithms and approaches remain widely utilized, machine learning has become pivotal in current research activities. In particular, generative \gls*{AI} demonstrates promising potential by improving pattern recognition capabilities, through data augmentation, increasing image resolution, and identifying anomalies for quality control. However, the application of generative \gls*{AI} in machine vision is still in its early stages due to challenges in data diversity, computational requirements, and the necessity for robust validation methods. A comprehensive literature review is essential to understand the current state of generative \gls*{AI} in industrial machine vision, focusing on recent advancements, applications, and research trends. Thus, a literature review based on the PRISMA guidelines was conducted, analyzing over 1,200 papers on generative \gls*{AI} in industrial machine vision. Our findings reveal various patterns in current research, with the primary use of generative \gls*{AI} being data augmentation, for machine vision tasks such as classification and object detection. Furthermore, we gather a collection of application challenges together with data requirements to enable a successful application of generative \gls*{AI} in industrial machine vision. This overview aims to provide researchers with insights into the different areas and applications within current research, highlighting significant advancements and identifying opportunities for future work.
Authors: Binta Sow, Bilal Faye, Hanane Azzag, Mustapha Lebbah
Abstract: Object detection in computer vision traditionally involves identifying objects in images. By integrating textual descriptions, we enhance this process, providing better context and accuracy. The MDETR model significantly advances this by combining image and text data for more versatile object detection and classification. However, MDETR's complexity and high computational demands hinder its practical use. In this paper, we introduce Lightweight MDETR (LightMDETR), an optimized MDETR variant designed for improved computational efficiency while maintaining robust multimodal capabilities. Our approach involves freezing the MDETR backbone and training a sole component, the Deep Fusion Encoder (DFE), to represent image and text modalities. A learnable context vector enables the DFE to switch between these modalities. Evaluation on datasets like RefCOCO, RefCOCO+, and RefCOCOg demonstrates that LightMDETR achieves superior precision and accuracy.
Authors: Yin-Jyun Luo, Kin Wai Cheuk, Woosung Choi, Toshimitsu Uesaka, Keisuke Toyama, Koichi Saito, Chieh-Hsin Lai, Yuhta Takida, Wei-Hsiang Liao, Simon Dixon, Yuki Mitsufuji
Abstract: Existing work on pitch and timbre disentanglement has been mostly focused on single-instrument music audio, excluding the cases where multiple instruments are presented. To fill the gap, we propose DisMix, a generative framework in which the pitch and timbre representations act as modular building blocks for constructing the melody and instrument of a source, and the collection of which forms a set of per-instrument latent representations underlying the observed mixture. By manipulating the representations, our model samples mixtures with novel combinations of pitch and timbre of the constituent instruments. We can jointly learn the disentangled pitch-timbre representations and a latent diffusion transformer that reconstructs the mixture conditioned on the set of source-level representations. We evaluate the model using both a simple dataset of isolated chords and a realistic four-part chorales in the style of J.S. Bach, identify the key components for the success of disentanglement, and demonstrate the application of mixture transformation based on source-level attribute manipulation.
Authors: Shivani Ranjan, Ayush Tripathi, Harshal Shende, Robin Badal, Amit Kumar, Pramod Yadav, Deepak Joshi, Lalan Kumar
Abstract: Dementia is a neurological syndrome marked by cognitive decline. Alzheimer's disease (AD) and Frontotemporal dementia (FTD) are the common forms of dementia, each with distinct progression patterns. EEG, a non-invasive tool for recording brain activity, has shown potential in distinguishing AD from FTD and mild cognitive impairment (MCI). Previous studies have utilized various EEG features, such as subband power and connectivity patterns to differentiate these conditions. However, artifacts in EEG signals can obscure crucial information, necessitating advanced signal processing techniques. This study aims to develop a deep learning-based classification system for dementia by analyzing scout time-series signals from deep brain regions, specifically the hippocampus, amygdala, and thalamus. The study utilizes scout time series extracted via the standardized low-resolution brain electromagnetic tomography (sLORETA) technique. The time series is converted to image representations using continuous wavelet transform (CWT) and fed as input to deep learning models. Two high-density EEG datasets are utilized to check for the efficacy of the proposed method: the online BrainLat dataset (comprising AD, FTD, and healthy controls (HC)) and the in-house IITD-AIIA dataset (including subjects with AD, MCI, and HC). Different classification strategies and classifier combinations have been utilized for the accurate mapping of classes on both datasets. The best results were achieved by using a product of probabilities from classifiers for left and right subcortical regions in conjunction with the DenseNet model architecture. It yields accuracies of 94.17$\%$ and 77.72$\%$ on the BrainLat and IITD-AIIA datasets, respectively. This highlights the potential of this approach for early and accurate differentiation of neurodegenerative disorders.
Authors: Kathrin Se{\ss}ler, Yao Rong, Emek G\"ozl\"ukl\"u, Enkelejda Kasneci
Abstract: The use of Large Language Models (LLMs) in mathematical reasoning has become a cornerstone of related research, demonstrating the intelligence of these models and enabling potential practical applications through their advanced performance, such as in educational settings. Despite the variety of datasets and in-context learning algorithms designed to improve the ability of LLMs to automate mathematical problem solving, the lack of comprehensive benchmarking across different datasets makes it complicated to select an appropriate model for specific tasks. In this project, we present a benchmark that fairly compares seven state-of-the-art in-context learning algorithms for mathematical problem solving across five widely used mathematical datasets on four powerful foundation models. Furthermore, we explore the trade-off between efficiency and performance, highlighting the practical applications of LLMs for mathematical reasoning. Our results indicate that larger foundation models like GPT-4o and LLaMA 3-70B can solve mathematical reasoning independently from the concrete prompting strategy, while for smaller models the in-context learning approach significantly influences the performance. Moreover, the optimal prompt depends on the chosen foundation model. We open-source our benchmark code to support the integration of additional models in future research.
Authors: Chiara Stuardi, Claudio Gheller, Franco Vazza, Andrea Botteon
Abstract: The forthcoming generation of radio telescope arrays promises significant advancements in sensitivity and resolution, enabling the identification and characterization of many new faint and diffuse radio sources. Conventional manual cataloging methodologies are anticipated to be insufficient to exploit the capabilities of new radio surveys. Radio interferometric images of diffuse sources present a challenge for image segmentation tasks due to noise, artifacts, and embedded radio sources. In response to these challenges, we introduce Radio U-Net, a fully convolutional neural network based on the U-Net architecture. Radio U-Net is designed to detect faint and extended sources in radio surveys, such as radio halos, relics, and cosmic web filaments. Radio U-Net was trained on synthetic radio observations built upon cosmological simulations and then tested on a sample of galaxy clusters, where the detection of cluster diffuse radio sources relied on customized data reduction and visual inspection of LOFAR Two Metre Sky Survey (LoTSS) data. The 83% of clusters exhibiting diffuse radio emission were accurately identified, and the segmentation successfully recovered the morphology of the sources even in low-quality images. In a test sample comprising 246 galaxy clusters, we achieved a 73% accuracy rate in distinguishing between clusters with and without diffuse radio emission. Our results establish the applicability of Radio U-Net to extensive radio survey datasets, probing its efficiency on cutting-edge high-performance computing systems. This approach represents an advancement in optimizing the exploitation of forthcoming large radio surveys for scientific exploration.
Authors: Ashley Klein, Edward Raff, Elisabeth Seamon, Lily Foley, Timothy Bussert
Abstract: An obstetric goal for a laboring mother is to achieve a vaginal delivery as it reduces the risks inherent in major abdominal surgery (i.e., a Cesarean section). Various medical interventions may be used by a physician to increase the likelihood of this occurring while minimizing maternal and fetal morbidity. However, patients with prelabor rupture of membranes (PROM) have only two commonly used options for cervical ripening, Pitocin and misoprostol. Little research exists on the benefits/risks for these two key drugs for PROM patients. A major limitation with most induction-of-labor related research is the inability to account for differences in \textit{Bishop scores} that are commonly used in obstetrical practice to determine the next induction agent offered to the patient. This creates a confounding factor, which biases the results, but has not been realized in the literature. In this work, we use a Bayesian model of the relationships between the relevant factors, informed by expert physicians, to separate the confounding variable from its actual impact. In doing so, we provide strong evidence that pitocin and buccal misoprostol are equally effective and safe; thus, physicians have more choice in clinical care than previously realized. This is particularly important for developing countries where neither medication may be readily available, and prior guidelines may create an artificial barrier to needed medication.
Authors: Hanjun Choi, Hyunsung Kim, Minho Lee, Chang-Jo Kim, Jinsung Yoon, Sang-Ki Ko
Abstract: Many spatiotemporal domains handle multi-agent trajectory data, but in real-world scenarios, collected trajectory data are often partially missing due to various reasons. While existing approaches demonstrate good performance in trajectory imputation, they face challenges in capturing the complex dynamics and interactions between agents due to a lack of physical constraints that govern realistic trajectories, leading to suboptimal results. To address this issue, the paper proposes a Derivative-Based Hybrid Prediction (DBHP) framework that can effectively impute multiple agents' missing trajectories. First, a neural network equipped with Set Transformers produces a naive prediction of missing trajectories while satisfying the permutation-equivariance in terms of the order of input agents. Then, the framework makes alternative predictions leveraging velocity and acceleration information and combines all the predictions with properly determined weights to provide final imputed trajectories. In this way, our proposed framework not only accurately predicts position, velocity, and acceleration values but also enforces the physical relationship between them, eventually improving both the accuracy and naturalness of the predicted trajectories. Accordingly, the experiment results about imputing player trajectories in team sports show that our framework significantly outperforms existing imputation baselines.
Authors: Zhongliang Guo, Lei Fang, Jingyu Lin, Yifei Qian, Shuai Zhao, Zeyu Wang, Junhao Dong, Cunjian Chen, Ognjen Arandjelovi\'c, Chun Pong Lau
Abstract: Recent advancements in generative AI, particularly Latent Diffusion Models (LDMs), have revolutionized image synthesis and manipulation. However, these generative techniques raises concerns about data misappropriation and intellectual property infringement. Adversarial attacks on machine learning models have been extensively studied, and a well-established body of research has extended these techniques as a benign metric to prevent the underlying misuse of generative AI. Current approaches to safeguarding images from manipulation by LDMs are limited by their reliance on model-specific knowledge and their inability to significantly degrade semantic quality of generated images. In response to these shortcomings, we propose the Posterior Collapse Attack (PCA) based on the observation that VAEs suffer from posterior collapse during training. Our method minimizes dependence on the white-box information of target models to get rid of the implicit reliance on model-specific knowledge. By accessing merely a small amount of LDM parameters, in specific merely the VAE encoder of LDMs, our method causes a substantial semantic collapse in generation quality, particularly in perceptual consistency, and demonstrates strong transferability across various model architectures. Experimental results show that PCA achieves superior perturbation effects on image generation of LDMs with lower runtime and VRAM. Our method outperforms existing techniques, offering a more robust and generalizable solution that is helpful in alleviating the socio-technical challenges posed by the rapidly evolving landscape of generative AI.
Authors: Zijian Zhao, Tingwei Chen, Zhijie Cai, Hang Li, Xiaoyang Li, Qimei Chen, Guangxu Zhu
Abstract: In recent years, Wi-Fi sensing has garnered significant attention due to its numerous benefits, such as privacy protection, low cost, and penetration ability. Extensive research has been conducted in this field, focusing on areas such as gesture recognition, people identification, and fall detection. However, many data-driven methods encounter challenges related to domain shift, where the model fails to perform well in environments different from the training data. One major factor contributing to this issue is the limited availability of Wi-Fi sensing datasets, which makes models learn excessive irrelevant information and over-fit to the training set. Unfortunately, collecting large-scale Wi-Fi sensing datasets across diverse scenarios is a challenging task. To address this problem, we propose CrossFi, a siamese network-based approach that excels in both in-domain scenario and cross-domain scenario, including few-shot, zero-shot scenarios, and even works in few-shot new-class scenario where testing set contains new categories. The core component of CrossFi is a sample-similarity calculation network called CSi-Net, which improves the structure of the siamese network by using an attention mechanism to capture similarity information, instead of simply calculating the distance or cosine similarity. Based on it, we develop an extra Weight-Net that can generate a template for each class, so that our CrossFi can work in different scenarios. Experimental results demonstrate that our CrossFi achieves state-of-the-art performance across various scenarios. In gesture recognition task, our CrossFi achieves an accuracy of 98.17% in in-domain scenario, 91.72% in one-shot cross-domain scenario, 64.81% in zero-shot cross-domain scenario, and 84.75% in one-shot new-class scenario. To facilitate future research, we will release the code for our model upon publication.
Authors: Nikolaos Pippas, Cagatay Turkay, Elliot A. Ludvig
Abstract: Reinforcement Learning (RL) has experienced significant advancement over the past decade, prompting a growing interest in applications within finance. This survey critically evaluates 167 publications, exploring diverse RL applications and frameworks in finance. Financial markets, marked by their complexity, multi-agent nature, information asymmetry, and inherent randomness, serve as an intriguing test-bed for RL. Traditional finance offers certain solutions, and RL advances these with a more dynamic approach, incorporating machine learning methods, including transfer learning, meta-learning, and multi-agent solutions. This survey dissects key RL components through the lens of Quantitative Finance. We uncover emerging themes, propose areas for future research, and critique the strengths and weaknesses of existing methods.
Authors: Vladislav Li, Georgios Tsoumplekas, Ilias Siniosoglou, Vasileios Argyriou, Anastasios Lytos, Eleftherios Fountoukidis, Panagiotis Sarigiannidis
Abstract: Current methods for low- and few-shot object detection have primarily focused on enhancing model performance for detecting objects. One common approach to achieve this is by combining model finetuning with data augmentation strategies. However, little attention has been given to the energy efficiency of these approaches in data-scarce regimes. This paper seeks to conduct a comprehensive empirical study that examines both model performance and energy efficiency of custom data augmentations and automated data augmentation selection strategies when combined with a lightweight object detector. The methods are evaluated in three different benchmark datasets in terms of their performance and energy consumption, and the Efficiency Factor is employed to gain insights into their effectiveness considering both performance and efficiency. Consequently, it is shown that in many cases, the performance gains of data augmentation strategies are overshadowed by their increased energy usage, necessitating the development of more energy efficient data augmentation strategies to address data scarcity.
Authors: Jaideep Pathak, Yair Cohen, Piyush Garg, Peter Harrington, Noah Brenowitz, Dale Durran, Morteza Mardani, Arash Vahdat, Shaoming Xu, Karthik Kashinath, Michael Pritchard
Abstract: Storm-scale convection-allowing models (CAMs) are an important tool for predicting the evolution of thunderstorms and mesoscale convective systems that result in damaging extreme weather. By explicitly resolving convective dynamics within the atmosphere they afford meteorologists the nuance needed to provide outlook on hazard. Deep learning models have thus far not proven skilful at km-scale atmospheric simulation, despite being competitive at coarser resolution with state-of-the-art global, medium-range weather forecasting. We present a generative diffusion model called StormCast, which emulates the high-resolution rapid refresh (HRRR) model-NOAA's state-of-the-art 3km operational CAM. StormCast autoregressively predicts 99 state variables at km scale using a 1-hour time step, with dense vertical resolution in the atmospheric boundary layer, conditioned on 26 synoptic variables. We present evidence of successfully learnt km-scale dynamics including competitive 1-6 hour forecast skill for composite radar reflectivity alongside physically realistic convective cluster evolution, moist updrafts, and cold pool morphology. StormCast predictions maintain realistic power spectra for multiple predicted variables across multi-hour forecasts. Together, these results establish the potential for autoregressive ML to emulate CAMs -- opening up new km-scale frontiers for regional ML weather prediction and future climate hazard dynamical downscaling.
Authors: Yurou Liang, Oleksandr Zadorozhnyi, Mathias Drton
Abstract: Causal discovery amounts to learning a directed acyclic graph (DAG) that encodes a causal model. This model selection problem can be challenging due to its large combinatorial search space, particularly when dealing with non-parametric causal models. Recent research has sought to bypass the combinatorial search by reformulating causal discovery as a continuous optimization problem, employing constraints that ensure the acyclicity of the graph. In non-parametric settings, existing approaches typically rely on finite-dimensional approximations of the relationships between nodes, resulting in a score-based continuous optimization problem with a smooth acyclicity constraint. In this work, we develop an alternative approximation method by utilizing reproducing kernel Hilbert spaces (RKHS) and applying general sparsity-inducing regularization terms based on partial derivatives. Within this framework, we introduce an extended RKHS representer theorem. To enforce acyclicity, we advocate the log-determinant formulation of the acyclicity constraint and show its stability. Finally, we assess the performance of our proposed RKHS-DAGMA procedure through simulations and illustrative data analyses.
Authors: Tong Mao, Jonathan W. Siegel, Jinchao Xu
Abstract: Let $\Omega\subset \mathbb{R}^d$ be a bounded domain. We consider the problem of how efficiently shallow neural networks with the ReLU$^k$ activation function can approximate functions from Sobolev spaces $W^s(L_p(\Omega))$ with error measured in the $L_q(\Omega)$-norm. Utilizing the Radon transform and recent results from discrepancy theory, we provide a simple proof of nearly optimal approximation rates in a variety of cases, including when $q\leq p$, $p\geq 2$, and $s \leq k + (d+1)/2$. The rates we derive are optimal up to logarithmic factors, and significantly generalize existing results. An interesting consequence is that the adaptivity of shallow ReLU$^k$ neural networks enables them to obtain optimal approximation rates for smoothness up to order $s = k + (d+1)/2$, even though they represent piecewise polynomials of fixed degree $k$.
Authors: Dennis Fedorishin, Lie Lu, Srirangaraj Setlur, Venu Govindaraju
Abstract: A "match cut" is a common video editing technique where a pair of shots that have a similar composition transition fluidly from one to another. Although match cuts are often visual, certain match cuts involve the fluid transition of audio, where sounds from different sources merge into one indistinguishable transition between two shots. In this paper, we explore the ability to automatically find and create "audio match cuts" within videos and movies. We create a self-supervised audio representation for audio match cutting and develop a coarse-to-fine audio match pipeline that recommends matching shots and creates the blended audio. We further annotate a dataset for the proposed audio match cut task and compare the ability of multiple audio representations to find audio match cut candidates. Finally, we evaluate multiple methods to blend two matching audio candidates with the goal of creating a smooth transition. Project page and examples are available at: https://denfed.github.io/audiomatchcut/
Authors: Mohsin Raza, Ahsan Adeel
Abstract: To date, most dendritic studies have predominantly focused on the apical zone of pyramidal two-point neurons (TPNs) receiving only feedback (FB) connections from higher perceptual layers and using them for learning. Recent cellular neurophysiology and computational neuroscience studies suggests that the apical input (context), coming from feedback and lateral connections, is multifaceted and far more diverse, with greater implications for ongoing learning and processing in the brain than previously realized. In addition to the FB, the apical tuft receives signals from neighboring cells of the same network as proximal (P) context, other parts of the brain as distal (D) context, and overall coherent information across the network as universal (U) context. The integrated context (C) amplifies and suppresses the transmission of coherent and conflicting feedforward (FF) signals, respectively. Specifically, we show that complex context-sensitive (CS)-TPNs flexibly integrate C moment-by-moment with the FF somatic current at the soma such that the somatic current is amplified when both feedforward (FF) and C are coherent; otherwise, it is attenuated. This generates the event only when the FF and C currents are coherent, which is then translated into a singlet or a burst based on the FB information. Spiking simulation results show that this flexible integration of somatic and contextual currents enables the propagation of more coherent signals (bursts), making learning faster with fewer neurons. Similar behavior is observed when this functioning is used in conventional artificial networks, where orders of magnitude fewer neurons are required to process vast amounts of heterogeneous real-world audio-visual (AV) data trained using backpropagation (BP). The computational findings presented here demonstrate the universality of CS-TPNs, suggesting a dendritic narrative that was previously overlooked.
Authors: Yi Zhao, Le Chen, Jan Schneider, Quankai Gao, Juho Kannala, Bernhard Sch\"olkopf, Joni Pajarinen, Dieter B\"uchler
Abstract: It has been a long-standing research goal to endow robot hands with human-level dexterity. Bi-manual robot piano playing constitutes a task that combines challenges from dynamic tasks, such as generating fast while precise motions, with slower but contact-rich manipulation problems. Although reinforcement learning based approaches have shown promising results in single-task performance, these methods struggle in a multi-song setting. Our work aims to close this gap and, thereby, enable imitation learning approaches for robot piano playing at scale. To this end, we introduce the Robot Piano 1 Million (RP1M) dataset, containing bi-manual robot piano playing motion data of more than one million trajectories. We formulate finger placements as an optimal transport problem, thus, enabling automatic annotation of vast amounts of unlabeled songs. Benchmarking existing imitation learning approaches shows that such approaches reach state-of-the-art robot piano playing performance by leveraging RP1M.
Authors: Vladimir Vovk
Abstract: This paper discusses a counterpart of conformal prediction for e-values, "conformal e-prediction". Conformal e-prediction is conceptually simpler and had been developed in the 1990s as precursor of conformal prediction. When conformal prediction emerged as result of replacing e-values by p-values, it seemed to have important advantages over conformal e-prediction without obvious disadvantages. This paper re-examines relations between conformal prediction and conformal e-prediction systematically from a modern perspective. Conformal e-prediction has advantages of its own, such as the ease of designing conditional conformal e-predictors and the guaranteed validity of cross-conformal e-predictors (whereas for cross-conformal predictors validity is only an empirical fact and can be broken with excessive randomization). Even where conformal prediction has clear advantages, conformal e-prediction can often emulate those advantages, more or less successfully. Conformal e-prediction can also serve as basis for testing; the resulting "conformal e-testing" looks very different from but inherits some strengths of conformal testing.
Authors: Mostafa M. Amin, Bj\"orn W. Schuller
Abstract: Algorithms and Machine Learning (ML) are increasingly affecting everyday life and several decision-making processes, where ML has an advantage due to scalability or superior performance. Fairness in such applications is crucial, where models should not discriminate their results based on race, gender, or other protected groups. This is especially crucial for models affecting very sensitive topics, like interview invitation or recidivism prediction. Fairness is not commonly studied for regression problems compared to binary classification problems; hence, we present a simple, yet effective method based on normalisation (FaiReg), which minimises the impact of unfairness in regression problems, especially due to labelling bias. We present a theoretical analysis of the method, in addition to an empirical comparison against two standard methods for fairness, namely data balancing and adversarial training. We also include a hybrid formulation (FaiRegH), merging the presented method with data balancing, in an attempt to face labelling and sampling biases simultaneously. The experiments are conducted on the multimodal dataset First Impressions (FI) with various labels, namely Big-Five personality prediction and interview screening score. The results show the superior performance of diminishing the effects of unfairness better than data balancing, also without deteriorating the performance of the original problem as much as adversarial training. Fairness is evaluated based on the Equal Accuracy (EA) and Statistical Parity (SP) constraints. The experiments present a setup that enhances the fairness for several protected variables simultaneously.
Authors: Md Saef Ullah Miah, Junaida Sulaiman, Md. Imamul Islam, Md. Masuduzzaman, Molla Shahadat Hossain Lipu, Ramdhan Nugraha
Abstract: Integrating renewable energy sources into the power grid is becoming increasingly important as the world moves towards a more sustainable energy future in line with SDG 7. However, the intermittent nature of renewable energy sources can make it challenging to manage the power grid and ensure a stable supply of electricity, which is crucial for achieving SDG 9. In this paper, we propose a deep learning model for predicting energy demand in a smart power grid, which can improve the integration of renewable energy sources by providing accurate predictions of energy demand. Our approach aligns with SDG 13 on climate action, enabling more efficient management of renewable energy resources. We use long short-term memory networks, well-suited for time series data, to capture complex patterns and dependencies in energy demand data. The proposed approach is evaluated using four historical short-term energy demand data datasets from different energy distribution companies, including American Electric Power, Commonwealth Edison, Dayton Power and Light, and Pennsylvania-New Jersey-Maryland Interconnection. The proposed model is compared with three other state-of-the-art forecasting algorithms: Facebook Prophet, Support Vector Regression, and Random Forest Regression. The experimental results show that the proposed REDf model can accurately predict energy demand with a mean absolute error of 1.4%, indicating its potential to enhance the stability and efficiency of the power grid and contribute to achieving SDGs 7, 9, and 13. The proposed model also has the potential to manage the integration of renewable energy sources effectively.
Authors: Xiaoyu Wang, Rui Pan, Renjie Pi, Tong Zhang
Abstract: Bilevel optimization has found successful applications in various machine learning problems, including hyper-parameter optimization, data cleaning, and meta-learning. However, its huge computational cost presents a significant challenge for its utilization in large-scale problems. This challenge arises due to the nested structure of the bilevel formulation, where each hyper-gradient computation necessitates a costly inner optimization procedure. To address this issue, we propose a reformulation of bilevel optimization as a minimax problem, effectively decoupling the outer-inner dependency. Under mild conditions, we show these two problems are equivalent. Furthermore, we introduce a multi-stage gradient descent and ascent (GDA) algorithm to solve the resulting minimax problem with convergence guarantees. Extensive experimental results demonstrate that our method outperforms state-of-the-art bilevel methods while significantly reducing the computational cost.
Authors: Ruiqi Sun, Siwei Ye, Jie Zhao, Xin He, Jianzhe Lin, Yiran Li, An Zou
Abstract: The inherent diversity of computation types within the deep neural network (DNN) models often requires a variety of specialized units in hardware processors, which limits computational efficiency, increasing both inference latency and power consumption, especially when the hardware processor needs to support and execute different neural networks. In this study, we introduce NeuralMatrix, which elastically transforms the computations of entire DNNs into linear matrix operations. This transformation allows seamless execution of various DNN models all with matrix operations and paves the way for running versatile DNN models with a single General Matrix Multiplication (GEMM) accelerator.Extensive experiments with both CNN and transformer-based models demonstrate the potential of NeuralMatrix to accurately and efficiently execute a wide range of DNN models, achieving 2.17-38.72 times computation efficiency (i.e., throughput per power) compared to CPUs, GPUs, and SoC platforms. This level of efficiency is usually only attainable with the accelerator designed for a specific neural network.
Authors: Xijie Huang, Zechun Liu, Shih-Yang Liu, Kwang-Ting Cheng
Abstract: Quantization-aware training (QAT) is a representative model compression method to reduce redundancy in weights and activations. However, most existing QAT methods require end-to-end training on the entire dataset, which suffers from long training time and high energy costs. In addition, the potential label noise in the training data undermines the robustness of QAT. We propose two metrics based on analysis of loss and gradient of quantized weights: error vector score and disagreement score, to quantify the importance of each sample during training. Guided by these two metrics, we proposed a quantization-aware Adaptive Coreset Selection (ACS) method to select the data for the current training epoch. We evaluate our method on various networks (ResNet-18, MobileNetV2, RetinaNet), datasets(CIFAR-10, CIFAR-100, ImageNet-1K, COCO), and under different quantization settings. Specifically, our method can achieve an accuracy of 68.39\% of 4-bit quantized ResNet-18 on the ImageNet-1K dataset with only a 10\% subset, which has an absolute gain of 4.24\% compared to the baseline. Our method can also improve the robustness of QAT by removing noisy samples in the training set.
Authors: Liangqi Yuan, Dong-Jun Han, Vishnu Pandi Chellapandi, Stanislaw H. \.Zak, Christopher G. Brinton
Abstract: Multimodal federated learning (FL) aims to enrich model training in FL settings where devices are collecting measurements across multiple modalities (e.g., sensors measuring pressure, motion, and other types of data). However, key challenges to multimodal FL remain unaddressed, particularly in heterogeneous network settings: (i) the set of modalities collected by each device will be diverse, and (ii) communication limitations prevent devices from uploading all their locally trained modality models to the server. In this paper, we propose Federated Multimodal Fusion learning with Selective modality communication (FedMFS), a new multimodal fusion FL methodology that can tackle the above mentioned challenges. The key idea is the introduction of a modality selection criterion for each device, which weighs (i) the impact of the modality, gauged by Shapley value analysis, against (ii) the modality model size as a gauge for communication overhead. This enables FedMFS to flexibly balance performance against communication costs, depending on resource constraints and application requirements. Experiments on the real-world ActionSense dataset demonstrate the ability of FedMFS to achieve comparable accuracy to several baselines while reducing the communication overhead by over 4x.
Authors: Muhang Tian, Bernie Chen, Allan Guo, Shiyi Jiang, Anru R. Zhang
Abstract: Electronic Health Records (EHRs) are rich sources of patient-level data, offering valuable resources for medical data analysis. However, privacy concerns often restrict access to EHRs, hindering downstream analysis. Current EHR de-identification methods are flawed and can lead to potential privacy leakage. Additionally, existing publicly available EHR databases are limited, preventing the advancement of medical research using EHR. This study aims to overcome these challenges by generating realistic and privacy-preserving synthetic electronic health records (EHRs) time series efficiently. We introduce a new method for generating diverse and realistic synthetic EHR time series data using Denoising Diffusion Probabilistic Models (DDPM). We conducted experiments on six databases: Medical Information Mart for Intensive Care III and IV (MIMIC-III/IV), the eICU Collaborative Research Database (eICU), and non-EHR datasets on Stocks and Energy. We compared our proposed method with eight existing methods. Our results demonstrate that our approach significantly outperforms all existing methods in terms of data fidelity while requiring less training effort. Additionally, data generated by our method yields a lower discriminative accuracy compared to other baseline methods, indicating the proposed method can generate data with less privacy risk. The proposed diffusion-model-based method can reliably and efficiently generate synthetic EHR time series, which facilitates the downstream medical data analysis. Our numerical results show the superiority of the proposed method over all other existing methods.
Authors: A. Gilad Kusne, Austin McDannald, Brian DeCost
Abstract: Autonomous materials research labs require the ability to combine and learn from diverse data streams. This is especially true for learning material synthesis-process-structure-property relationships, key to accelerating materials optimization and discovery as well as accelerating mechanistic understanding. We present the Synthesis-process-structure-property relAtionship coreGionalized lEarner (SAGE) algorithm. A fully Bayesian algorithm that uses multimodal coregionalization to merge knowledge across data sources to learn synthesis-process-structure-property relationships. SAGE outputs a probabilistic posterior for the relationships including the most likely relationships given the data.
Authors: Cian Eastwood, Julius von K\"ugelgen, Linus Ericsson, Diane Bouchacourt, Pascal Vincent, Bernhard Sch\"olkopf, Mark Ibrahim
Abstract: Self-supervised representation learning often uses data augmentations to induce some invariance to "style" attributes of the data. However, with downstream tasks generally unknown at training time, it is difficult to deduce a priori which attributes of the data are indeed "style" and can be safely discarded. To deal with this, current approaches try to retain some style information by tuning the degree of invariance to some particular task, such as ImageNet object classification. However, prior work has shown that such task-specific tuning can lead to significant performance degradation on other tasks that rely on the discarded style. To address this, we introduce a more principled approach that seeks to disentangle style features rather than discard them. The key idea is to add multiple style embedding spaces where: (i) each is invariant to all-but-one augmentation; and (ii) joint entropy is maximized. We formalize our structured data-augmentation procedure from a causal latent-variable-model perspective, and prove identifiability of both content and individual style variables. We empirically demonstrate the benefits of our approach on both synthetic and real-world data.
Authors: Jinghan Yang, Shuming Ma, Furu Wei
Abstract: With in-context learning ability, the performance of large language models can be significantly boosted when provided with appropriate context. However, existing in-context learning methods mainly rely on human-provided contexts, such as labeled examples and explicit instructions. Writing context by humans is labor-intensive on various tasks and limits the model to tasks manageable by humans. To overcome these limitations, we propose Automatic In-Context Learning framework that enables the model to autonomously generate examples and instructions for problem-solving. With experiments across various models and datasets, results show that model-generated contexts outperform human-annotated contexts, including Few-Shot and Few-Shot-CoT methods, and surpass existing self-generated context methods like Zero-CoT and Auto-CoT.
Authors: Zhihao Zhu, Chenwang Wu, Rui Fan, Yi Yang, Zhen Wang, Defu Lian, Enhong Chen
Abstract: Recent research demonstrates that GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions. However, they mainly focus on node classification tasks, neglecting the potential threats entailed within the domain of graph classification tasks. Furthermore, their practicality is questionable due to unreasonable assumptions, specifically concerning the large data requirements and extensive model knowledge. To this end, we advocate following strict settings with limited real data and hard-label awareness to generate synthetic data, thereby facilitating the stealing of the target model. Specifically, following important data generation principles, we introduce three model stealing attacks to adapt to different actual scenarios: MSA-AU is inspired by active learning and emphasizes the uncertainty to enhance query value of generated samples; MSA-AD introduces diversity based on Mixup augmentation strategy to alleviate the query inefficiency issue caused by over-similar samples generated by MSA-AU; MSA-AUD combines the above two strategies to seamlessly integrate the authenticity, uncertainty, and diversity of the generated samples. Finally, extensive experiments consistently demonstrate the superiority of the proposed methods in terms of concealment, query efficiency, and stealing performance.
Authors: Hao Wang, Bo Tang, Chi Harold Liu, Shangqin Mao, Jiahong Zhou, Zipeng Dai, Yaqi Sun, Qianlong Xie, Xingxing Wang, Dong Wang
Abstract: Online display advertising platforms service numerous advertisers by providing real-time bidding (RTB) for the scale of billions of ad requests every day. The bidding strategy handles ad requests cross multiple channels to maximize the number of clicks under the set financial constraints, i.e., total budget and cost-per-click (CPC), etc. Different from existing works mainly focusing on single channel bidding, we explicitly consider cross-channel constrained bidding with budget allocation. Specifically, we propose a hierarchical offline deep reinforcement learning (DRL) framework called ``HiBid'', consisted of a high-level planner equipped with auxiliary loss for non-competitive budget allocation, and a data augmentation enhanced low-level executor for adaptive bidding strategy in response to allocated budgets. Additionally, a CPC-guided action selection mechanism is introduced to satisfy the cross-channel CPC constraint. Through extensive experiments on both the large-scale log data and online A/B testing, we confirm that HiBid outperforms six baselines in terms of the number of clicks, CPC satisfactory ratio, and return-on-investment (ROI). We also deploy HiBid on Meituan advertising platform to already service tens of thousands of advertisers every day.
Authors: Ling Liang, Haizhao Yang
Abstract: We consider a regularized expected reward optimization problem in the non-oblivious setting that covers many existing problems in reinforcement learning (RL). In order to solve such an optimization problem, we apply and analyze the classical stochastic proximal gradient method. In particular, the method has shown to admit an $O(\epsilon^{-4})$ sample complexity to an $\epsilon$-stationary point, under standard conditions. Since the variance of the classical stochastic gradient estimator is typically large, which slows down the convergence, we also apply an efficient stochastic variance-reduce proximal gradient method with an importance sampling based ProbAbilistic Gradient Estimator (PAGE). Our analysis shows that the sample complexity can be improved from $O(\epsilon^{-4})$ to $O(\epsilon^{-3})$ under additional conditions. Our results on the stochastic (variance-reduced) proximal gradient method match the sample complexity of their most competitive counterparts for discounted Markov decision processes under similar settings. To the best of our knowledge, the proposed methods represent a novel approach in addressing the general regularized reward optimization problem.
Authors: Duy M. H. Nguyen, Nina Lukashina, Tai Nguyen, An T. Le, TrungTin Nguyen, Nhat Ho, Jan Peters, Daniel Sonntag, Viktor Zaverkin, Mathias Niepert
Abstract: A molecule's 2D representation consists of its atoms, their attributes, and the molecule's covalent bonds. A 3D (geometric) representation of a molecule is called a conformer and consists of its atom types and Cartesian coordinates. Every conformer has a potential energy, and the lower this energy, the more likely it occurs in nature. Most existing machine learning methods for molecular property prediction consider either 2D molecular graphs or 3D conformer structure representations in isolation. Inspired by recent work on using ensembles of conformers in conjunction with 2D graph representations, we propose $\mathrm{E}$(3)-invariant molecular conformer aggregation networks. The method integrates a molecule's 2D representation with that of multiple of its conformers. Contrary to prior work, we propose a novel 2D-3D aggregation mechanism based on a differentiable solver for the Fused Gromov-Wasserstein Barycenter problem and the use of an efficient conformer generation method based on distance geometry. We show that the proposed aggregation mechanism is $\mathrm{E}$(3) invariant and propose an efficient GPU implementation. Moreover, we demonstrate that the aggregation mechanism helps to significantly outperform state-of-the-art molecule property prediction methods on established datasets.
Authors: Saswat Das, Marco Romanelli, Ferdinando Fioretto
Abstract: Ensuring privacy-preserving inference on cryptographically secure data is a well-known computational challenge. To alleviate the bottleneck of costly cryptographic computations in non-linear activations, recent methods have suggested linearizing a targeted portion of these activations in neural networks. This technique results in significantly reduced runtimes with often negligible impacts on accuracy. In this paper, we demonstrate that such computational benefits may lead to increased fairness costs. Specifically, we find that reducing the number of ReLU activations disproportionately decreases the accuracy for minority groups compared to majority groups. To explain these observations, we provide a mathematical interpretation under restricted assumptions about the nature of the decision boundary, while also showing the prevalence of this problem across widely used datasets and architectures. Finally, we show how a simple procedure altering the fine-tuning step for linearized models can serve as an effective mitigation strategy.
Authors: Andrew Lowy, Jonathan Ullman, Stephen J. Wright
Abstract: We provide a simple and flexible framework for designing differentially private algorithms to find approximate stationary points of non-convex loss functions. Our framework is based on using a private approximate risk minimizer to "warm start" another private algorithm for finding stationary points. We use this framework to obtain improved, and sometimes optimal, rates for several classes of non-convex loss functions. First, we obtain improved rates for finding stationary points of smooth non-convex empirical loss functions. Second, we specialize to quasar-convex functions, which generalize star-convex functions and arise in learning dynamical systems and training some neural nets. We achieve the optimal rate for this class. Third, we give an optimal algorithm for finding stationary points of functions satisfying the Kurdyka-Lojasiewicz (KL) condition. For example, over-parameterized neural networks often satisfy this condition. Fourth, we provide new state-of-the-art rates for stationary points of non-convex population loss functions. Fifth, we obtain improved rates for non-convex generalized linear models. A modification of our algorithm achieves nearly the same rates for second-order stationary points of functions with Lipschitz Hessian, improving over the previous state-of-the-art for each of the above problems.
Authors: Jean V. Alves, Diogo Leit\~ao, S\'ergio Jesus, Marco O. P. Sampaio, Javier Li\'ebana, Pedro Saleiro, M\'ario A. T. Figueiredo, Pedro Bizarro
Abstract: Learning to defer (L2D) aims to improve human-AI collaboration systems by learning how to defer decisions to humans when they are more likely to be correct than an ML classifier. Existing research in L2D overlooks key real-world aspects that impede its practical adoption, namely: i) neglecting cost-sensitive scenarios, where type I and type II errors have different costs; ii) requiring concurrent human predictions for every instance of the training dataset; and iii) not dealing with human work-capacity constraints. To address these issues, we propose the \textit{deferral under cost and capacity constraints framework} (DeCCaF). DeCCaF is a novel L2D approach, employing supervised learning to model the probability of human error under less restrictive data requirements (only one expert prediction per instance) and using constraint programming to globally minimize the error cost, subject to workload limitations. We test DeCCaF in a series of cost-sensitive fraud detection scenarios with different teams of 9 synthetic fraud analysts, with individual work-capacity constraints. The results demonstrate that our approach performs significantly better than the baselines in a wide array of scenarios, achieving an average $8.4\%$ reduction in the misclassification cost. The code used for the experiments is available at https://github.com/feedzai/deccaf
Authors: Luxi He, Mengzhou Xia, Peter Henderson
Abstract: Current Large Language Models (LLMs), even those tuned for safety and alignment, are susceptible to jailbreaking. Some have found that just further fine-tuning an aligned model with benign data (i.e., data without harmful content) surprisingly leads to substantial degradation in safety. We delve into the data-centric aspects of why benign fine-tuning inadvertently contributes to jailbreaking. First, we represent fine-tuning data through two lenses: representation and gradient spaces. Additionally, we propose a bi-directional anchoring method that, during the selection process, prioritizes data points that are close to harmful examples and far from benign ones. Our approach effectively identifies subsets of benign data that are more likely to degrade the model's safety after fine-tuning. Training on just 100 of these seemingly benign datapoints surprisingly leads to the fine-tuned model affirmatively responding to >70% of tested harmful requests, compared to <20% after fine-tuning on randomly selected data. We also observe that the selected data frequently appear as lists, bullet points, or math questions, indicating a systematic pattern in fine-tuning data that contributes to jailbreaking.
Authors: Sebastian Bordt, Harsha Nori, Vanessa Rodrigues, Besmira Nushi, Rich Caruana
Abstract: While many have shown how Large Language Models (LLMs) can be applied to a diverse set of tasks, the critical issues of data contamination and memorization are often glossed over. In this work, we address this concern for tabular data. Specifically, we introduce a variety of different techniques to assess whether a language model has seen a tabular dataset during training. This investigation reveals that LLMs have memorized many popular tabular datasets verbatim. We then compare the few-shot learning performance of LLMs on datasets that were seen during training to the performance on datasets released after training. We find that LLMs perform better on datasets seen during training, indicating that memorization leads to overfitting. At the same time, LLMs show non-trivial performance on novel datasets and are surprisingly robust to data transformations. We then investigate the in-context statistical learning abilities of LLMs. While LLMs are significantly better than random at solving statistical classification problems, the sample efficiency of few-shot learning lags behind traditional statistical learning algorithms, especially as the dimension of the problem increases. This suggests that much of the observed few-shot performance on novel real-world datasets is due to the LLM's world knowledge. Overall, our results highlight the importance of testing whether an LLM has seen an evaluation dataset during pre-training. We release the https://github.com/interpretml/LLM-Tabular-Memorization-Checker Python package to test LLMs for memorization of tabular datasets.
URLs: https://github.com/interpretml/LLM-Tabular-Memorization-Checker
Authors: Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi
Abstract: Graphs are a natural representation for systems based on relations between connected entities. Combinatorial optimization problems, which arise when considering an objective function related to a process of interest on discrete structures, are often challenging due to the rapid growth of the solution space. The trial-and-error paradigm of Reinforcement Learning has recently emerged as a promising alternative to traditional methods, such as exact algorithms and (meta)heuristics, for discovering better decision-making strategies in a variety of disciplines including chemistry, computer science, and statistics. Despite the fact that they arose in markedly different fields, these techniques share significant commonalities. Therefore, we set out to synthesize this work in a unifying perspective that we term Graph Reinforcement Learning, interpreting it as a constructive decision-making method for graph problems. After covering the relevant technical background, we review works along the dividing line of whether the goal is to optimize graph structure given a process of interest, or to optimize the outcome of the process itself under fixed graph structure. Finally, we discuss the common challenges facing the field and open research questions. In contrast with other surveys, the present work focuses on non-canonical graph problems for which performant algorithms are typically not known and Reinforcement Learning is able to provide efficient and effective solutions.
Authors: Huai-an Su, Jiaxiang Geng, Liang Li, Xiaoqi Qin, Yanzhao Hou, Hao Wang, Xin Fu, Miao Pan
Abstract: As a popular distributed learning paradigm, federated learning (FL) over mobile devices fosters numerous applications, while their practical deployment is hindered by participating devices' computing and communication heterogeneity. Some pioneering research efforts proposed to extract subnetworks from the global model, and assign as large a subnetwork as possible to the device for local training based on its full computing and communications capacity. Although such fixed size subnetwork assignment enables FL training over heterogeneous mobile devices, it is unaware of (i) the dynamic changes of devices' communication and computing conditions and (ii) FL training progress and its dynamic requirements of local training contributions, both of which may cause very long FL training delay. Motivated by those dynamics, in this paper, we develop a wireless and heterogeneity aware latency efficient FL (WHALE-FL) approach to accelerate FL training through adaptive subnetwork scheduling. Instead of sticking to the fixed size subnetwork, WHALE-FL introduces a novel subnetwork selection utility function to capture device and FL training dynamics, and guides the mobile device to adaptively select the subnetwork size for local training based on (a) its computing and communication capacity, (b) its dynamic computing and/or communication conditions, and (c) FL training status and its corresponding requirements for local training contributions. Our evaluation shows that, compared with peer designs, WHALE-FL effectively accelerates FL training without sacrificing learning accuracy.
Authors: Christopher Amato
Abstract: Multi-agent reinforcement learning (MARL) has exploded in popularity in recent years. Many approaches have been developed but they can be divided into three main types: centralized training and execution (CTE), centralized training for decentralized execution (CTDE), and Decentralized training and execution (DTE). Decentralized training and execution methods make the fewest assumptions and are often simple to implement. In fact, as I'll discuss, any single-agent RL method can be used for DTE by just letting each agent learn separately. Of course, there are pros and cons to such approaches. It is worth noting that DTE is required if no offline coordination is available. That is, if all agents must learn during online interactions without prior coordination, learning and execution must both be decentralized. DTE methods can be applied in cooperative, competitive, or mixed cases but this text will focus on the cooperative MARL case. This text is an introduction to the field of decentralized, cooperative MARL. As such, I will first give a brief description of the cooperative MARL problem in the form of the Dec-POMDP. Then, I will discuss value-based DTE methods starting with independent Q-learning and its extensions and then discuss the extension to the deep case with DQN, the additional complications this causes, and methods that have been developed to (attempt to) address these issues. Next, I will discuss policy gradient DTE methods starting with independent REINFORCE (i.e., vanilla policy gradient), and then extending to the actor-critic case and deep variants (such as independent PPO). Finally, I will discuss some general topics related to DTE and future directions.
Authors: Sebastian Tschiatschek, Eugenia Stamboliev, Timoth\'ee Schmude, Mark Coeckelbergh, Laura Koesten
Abstract: We discuss the role of humans in algorithmic decision-making (ADM) for socially relevant problems from a technical and philosophical perspective. In particular, we illustrate tensions arising from diverse expectations, values, and constraints by and on the humans involved. To this end, we assume that a strategic decision-maker (SDM) introduces ADM to optimize strategic and societal goals while the algorithms' recommended actions are overseen by a practical decision-maker (PDM) - a specific human-in-the-loop - who makes the final decisions. While the PDM is typically assumed to be a corrective, it can counteract the realization of the SDM's desired goals and societal values not least because of a misalignment of these values and unmet information needs of the PDM. This has significant implications for the distribution of power between the stakeholders in ADM, their constraints, and information needs. In particular, we emphasize the overseeing PDM's role as a potential political and ethical decision maker, who acts expected to balance strategic, value-driven objectives and on-the-ground individual decisions and constraints. We demonstrate empirically, on a machine learning benchmark dataset, the significant impact an overseeing PDM's decisions can have even if the PDM is constrained to performing only a limited amount of actions differing from the algorithms' recommendations. To ensure that the SDM's intended values are realized, the PDM needs to be provided with appropriate information conveyed through tailored explanations and its role must be characterized clearly. Our findings emphasize the need for an in-depth discussion of the role and power of the PDM and challenge the often-taken view that just including a human-in-the-loop in ADM ensures the 'correct' and 'ethical' functioning of the system.
Authors: Haoze He, Jing Wang, Anna Choromanska
Abstract: This work focuses on the decentralized deep learning optimization framework. We propose Adjacent Leader Decentralized Gradient Descent (AL-DSGD), for improving final model performance, accelerating convergence, and reducing the communication overhead of decentralized deep learning optimizers. AL-DSGD relies on two main ideas. Firstly, to increase the influence of the strongest learners on the learning system it assigns weights to different neighbor workers according to both their performance and the degree when averaging among them, and it applies a corrective force on the workers dictated by both the currently best-performing neighbor and the neighbor with the maximal degree. Secondly, to alleviate the problem of the deterioration of the convergence speed and performance of the nodes with lower degrees, AL-DSGD relies on dynamic communication graphs, which effectively allows the workers to communicate with more nodes while keeping the degrees of the nodes low. Experiments demonstrate that AL-DSGD accelerates the convergence of the decentralized state-of-the-art techniques and improves their test performance especially in the communication constrained environments. We also theoretically prove the convergence of the proposed scheme. Finally, we release to the community a highly general and concise PyTorch-based library for distributed training of deep learning models that supports easy implementation of any distributed deep learning approach ((a)synchronous, (de)centralized).
Authors: Junqi Chen, Xu Tan, Sylwan Rahardja, Jiawei Yang, Susanto Rahardja
Abstract: Deep learning-based sequence models are extensively employed in Time Series Anomaly Detection (TSAD) tasks due to their effective sequential modeling capabilities. However, the ability of TSAD is limited by two key challenges: (i) the ability to model long-range dependency and (ii) the generalization issue in the presence of non-stationary data. To tackle these challenges, an anomaly detector that leverages the selective state space model known for its proficiency in capturing long-term dependencies across various domains is proposed. Additionally, a multi-stage detrending mechanism is introduced to mitigate the prominent trend component in non-stationary data to address the generalization issue. Extensive experiments conducted on realworld public datasets demonstrate that the proposed methods surpass all 12 compared baseline methods.
Authors: Sofie Goethals, Eoin Delaney, Brent Mittelstadt, Chris Russell
Abstract: Access to resources strongly constrains the decisions we make. While we might wish to offer every student a scholarship, or schedule every patient for follow-up meetings with a specialist, limited resources mean that this is not possible. When deploying machine learning systems, these resource constraints are simply enforced by varying the threshold of a classifier. However, these finite resource limitations are disregarded by most existing tools for fair machine learning, which do not allow the specification of resource limitations and do not remain fair when varying thresholds. This makes them ill-suited for real-world deployment. Our research introduces the concept of "resource-constrained fairness" and quantifies the cost of fairness within this framework. We demonstrate that the level of available resources significantly influences this cost, a factor overlooked in previous evaluations.
Authors: Ninghui Feng, Songning Lai, Fobao Zhou, Zhenxiao Yin, Hang Zhao
Abstract: Time series forecasting has become an increasingly popular research area due to its critical applications in various real-world domains such as traffic management, weather prediction, and financial analysis. Despite significant advancements, existing models face notable challenges, including the necessity of manual hyperparameter tuning for different datasets, and difficulty in effectively distinguishing signal from redundant features in data characterized by strong seasonality. These issues hinder the generalization and practical application of time series forecasting models. To solve this issues, we propose an innovative time series forecasting model TimeSieve designed to address these challenges. Our approach employs wavelet transforms to preprocess time series data, effectively capturing multi-scale features without the need for additional parameters or manual hyperparameter tuning. Additionally, we introduce the information bottleneck theory that filters out redundant features from both detail and approximation coefficients, retaining only the most predictive information. This combination reduces significantly improves the model's accuracy. Extensive experiments demonstrate that our model outperforms existing state-of-the-art methods on 70% of the datasets, achieving higher predictive accuracy and better generalization across diverse datasets. Our results validate the effectiveness of our approach in addressing the key challenges in time series forecasting, paving the way for more reliable and efficient predictive models in practical applications. The code for our model is available at https://github.com/xll0328/TimeSieve.
Authors: Md Fahim Sikder, Resmi Ramachandranpillai, Daniel de Leng, Fredrik Heintz
Abstract: We present FairX, an open-source Python-based benchmarking tool designed for the comprehensive analysis of models under the umbrella of fairness, utility, and eXplainability (XAI). FairX enables users to train benchmarking bias-removal models and evaluate their fairness using a wide array of fairness metrics, data utility metrics, and generate explanations for model predictions, all within a unified framework. Existing benchmarking tools do not have the way to evaluate synthetic data generated from fair generative models, also they do not have the support for training fair generative models either. In FairX, we add fair generative models in the collection of our fair-model library (pre-processing, in-processing, post-processing) and evaluation metrics for evaluating the quality of synthetic fair data. This version of FairX supports both tabular and image datasets. It also allows users to provide their own custom datasets. The open-source FairX benchmarking package is publicly available at https://github.com/fahim-sikder/FairX.
Authors: Amit Sinha, Mathieu Geist, Aditya Mahajan
Abstract: The standard approach for Partially Observable Markov Decision Processes (POMDPs) is to convert them to a fully observed belief-state MDP. However, the belief state depends on the system model and is therefore not viable in reinforcement learning (RL) settings. A widely used alternative is to use an agent state, which is a model-free, recursively updateable function of the observation history. Examples include frame stacking and recurrent neural networks. Since the agent state is model-free, it is used to adapt standard RL algorithms to POMDPs. However, standard RL algorithms like Q-learning learn a stationary policy. Our main thesis that we illustrate via examples is that because the agent state does not satisfy the Markov property, non-stationary agent-state based policies can outperform stationary ones. To leverage this feature, we propose PASQL (periodic agent-state based Q-learning), which is a variant of agent-state-based Q-learning that learns periodic policies. By combining ideas from periodic Markov chains and stochastic approximation, we rigorously establish that PASQL converges to a cyclic limit and characterize the approximation error of the converged periodic policy. Finally, we present a numerical experiment to highlight the salient features of PASQL and demonstrate the benefit of learning periodic policies over stationary policies.
Authors: Mohammed Ghaith Altarabichi
Abstract: We propose DropKAN (Dropout Kolmogorov-Arnold Networks) a regularization method that prevents co-adaptation of activation function weights in Kolmogorov-Arnold Networks (KANs). DropKAN functions by embedding the drop mask directly within the KAN layer, randomly masking the outputs of some activations within the KANs' computation graph. We show that this simple procedure that require minimal coding effort has a regularizing effect and consistently lead to better generalization of KANs. We analyze the adaptation of the standard Dropout with KANs and demonstrate that Dropout applied to KANs' neurons can lead to unpredictable behavior in the feedforward pass. We carry an empirical study with real world Machine Learning datasets to validate our findings. Our results suggest that DropKAN is consistently a better alternative to using standard Dropout with KANs, and improves the generalization performance of KANs. Our implementation of DropKAN is available at: \url{https://github.com/Ghaith81/dropkan}.
Authors: Huixiu Jiang, Ling Yang, Yu Bao, Rutong Si, Sikun Yang
Abstract: Stochastic optimization plays a crucial role in the advancement of deep learning technologies. Over the decades, significant effort has been dedicated to improving the training efficiency and robustness of deep neural networks, via various strategies including gradient normalization (GN) and gradient centralization (GC). Nevertheless, to the best of our knowledge, no one has considered to capture the optimal gradient descent trajectory, by adaptively controlling gradient descent direction. To address this concern, this paper is the first attempt to study a new optimization technique for deep neural networks, using the sum normalization of a gradient vector as coefficients, to dynamically regularize gradients and thus to effectively control optimization direction. The proposed technique is hence named as the adaptive gradient regularization (AGR). It can be viewed as an adaptive gradient clipping method. The theoretical analysis reveals that the AGR can effectively smooth the loss landscape, and hence can significantly improve the training efficiency and model generalization performance. We note that AGR can greatly improve the training efficiency of vanilla optimizers' including Adan and AdamW, by adding only three lines of code. The final experiments conducted on image generation, image classification, and language representation, demonstrate that the AGR method can not only improve the training efficiency but also enhance the model generalization performance.
Authors: Raja Vavekanand, Kira Sam
Abstract: Federated Learning (FL) is a recent model training paradigm in which client devices collaboratively train a model without ever aggregating their data. Crucially, this scheme offers users potential privacy and security benefits by only ever communicating updates to the model weights to a central server as opposed to traditional machine learning (ML) training which directly communicates and aggregates data. However, FL training suffers from statistical heterogeneity as clients may have differing local data distributions. Large language models (LLMs) offer a potential solution to this issue of heterogeneity given that they have consistently been shown to be able to learn on vast amounts of noisy data. While LLMs are a promising development for resolving the consistent issue of non-I.I.D. Clients in federated settings exacerbate two other bottlenecks in FL: limited local computing and expensive communication. This thesis aims to develop efficient training methods for LLMs in FL. To this end, we employ two critical techniques in enabling efficient training. First, we use low-rank adaptation (LoRA) to reduce the computational load of local model training. Second, we communicate sparse updates throughout training to significantly cut down on communication costs. Taken together, our method reduces communication costs by up to 10x over vanilla LoRA and up to 5x over more complex sparse LoRA baselines while achieving greater utility. We emphasize the importance of carefully applying sparsity and picking effective rank and sparsity configurations for federated LLM training.
Authors: Pablo Galv\'an, Filip Lemic, Gerard Calvo Bartra, Sergi Abadal, Xavier Costa P\'erez
Abstract: Flow-guided localization using in-body nanodevices in the bloodstream is expected to be beneficial for early disease detection, continuous monitoring of biological conditions, and targeted treatment. The nanodevices face size and power constraints that produce erroneous raw data for localization purposes. On-body anchors receive this data, and use it to derive the locations of diagnostic events of interest. Different Machine Learning (ML) approaches have been recently proposed for this task, yet they are currently restricted to a reference bloodstream of a resting patient. As such, they are unable to deal with the physical diversity of patients' bloodstreams and cannot provide continuous monitoring due to changes in individual patient's activities. Toward addressing these issues for the current State-of-the-Art (SotA) flow-guided localization approach based on Graph Neural Networks (GNNs), we propose a pipeline for GNN adaptation based on individual physiological indicators including height, weight, and heart rate. Our results indicate that the proposed adaptions are beneficial in reconciling the individual differences between bloodstreams and activities.
Authors: Mingshuo Liu, Shiyi Luo, Kevin Han, Bo Yuan, Ronald F. DeMara, Yu Bai
Abstract: The fast development of object detection techniques has attracted attention to developing efficient Deep Neural Networks (DNNs). However, the current state-of-the-art DNN models can not provide a balanced solution among accuracy, speed, and model size. This paper proposes an efficient real-time object detection framework on resource-constrained hardware devices through hardware and software co-design. The Tensor Train (TT) decomposition is proposed for compressing the YOLOv5 model. By unitizing the unique characteristics given by the TT decomposition, we develop an efficient hardware accelerator based on FPGA devices. Experimental results show that the proposed method can significantly reduce the model size and improve the execution time.
Authors: Tianyu Liu, Qitan Lv, Jie Wang, Shuling Yang, Hanzhu Chen
Abstract: Inductive relation prediction (IRP) -- where entities can be different during training and inference -- has shown great power for completing evolving knowledge graphs. Existing works mainly focus on using graph neural networks (GNNs) to learn the representation of the subgraph induced from the target link, which can be seen as an implicit rule-mining process to measure the plausibility of the target link. However, these methods cannot differentiate the target link and other links during message passing, hence the final subgraph representation will contain irrelevant rule information to the target link, which reduces the reasoning performance and severely hinders the applications for real-world scenarios. To tackle this problem, we propose a novel \textit{single-source edge-wise} GNN model to learn the \textbf{R}ule-induc\textbf{E}d \textbf{S}ubgraph represen\textbf{T}ations (\textbf{REST}), which encodes relevant rules and eliminates irrelevant rules within the subgraph. Specifically, we propose a \textit{single-source} initialization approach to initialize edge features only for the target link, which guarantees the relevance of mined rules and target link. Then we propose several RNN-based functions for \textit{edge-wise} message passing to model the sequential property of mined rules. REST is a simple and effective approach with theoretical support to learn the \textit{rule-induced subgraph representation}. Moreover, REST does not need node labeling, which significantly accelerates the subgraph preprocessing time by up to \textbf{11.66$\times$}. Experiments on inductive relation prediction benchmarks demonstrate the effectiveness of our REST. Our code is available at https://github.com/smart-lty/REST.
Authors: Nida Zamir, I-Hong Hou
Abstract: Scheduling in multi-channel wireless communication system presents formidable challenges in effectively allocating resources. To address these challenges, we investigate a multi-resource restless matching bandit (MR-RMB) model for heterogeneous resource systems with an objective of maximizing long-term discounted total rewards while respecting resource constraints. We have also generalized to applications beyond multi-channel wireless. We discuss the Max-Weight Index Matching algorithm, which optimizes resource allocation based on learned partial indexes. We have derived the policy gradient theorem for index learning. Our main contribution is the introduction of a new Deep Index Policy (DIP), an online learning algorithm tailored for MR-RMB. DIP learns the partial index by leveraging the policy gradient theorem for restless arms with convoluted and unknown transition kernels of heterogeneous resources. We demonstrate the utility of DIP by evaluating its performance for three different MR-RMB problems. Our simulation results show that DIP indeed learns the partial indexes efficiently.
Authors: Xiao Liu, Mingyuan Li, Xu Wang, Guangsheng Yu, Wei Ni, Lixiang Li, Haipeng Peng, Renping Liu
Abstract: Unlearning in various learning frameworks remains challenging, with the continuous growth and updates of models exhibiting complex inheritance relationships. This paper presents a novel unlearning framework, which enables fully parallel unlearning among models exhibiting inheritance. A key enabler is the new Unified Model Inheritance Graph (UMIG), which captures the inheritance using a Directed Acyclic Graph (DAG).Central to our framework is the new Fisher Inheritance Unlearning (FIUn) algorithm, which utilizes the Fisher Information Matrix (FIM) from initial unlearning models to pinpoint impacted parameters in inherited models. By employing FIM, the FIUn method breaks the sequential dependencies among the models, facilitating simultaneous unlearning and reducing computational overhead. We further design to merge disparate FIMs into a single matrix, synchronizing updates across inherited models. Experiments confirm the effectiveness of our unlearning framework. For single-class tasks, it achieves complete unlearning with 0\% accuracy for unlearned labels while maintaining 94.53\% accuracy for retained labels on average. For multi-class tasks, the accuracy is 1.07\% for unlearned labels and 84.77\% for retained labels on average. Our framework accelerates unlearning by 99\% compared to alternative methods.
Authors: Ravi Raju, Swayambhoo Jain, Bo Li, Jonathan Li, Urmish Thakker
Abstract: Large Language Models (LLMs) have revolutionized the landscape of machine learning, yet current benchmarks often fall short in capturing the diverse behavior of these models in real-world applications. A benchmark's usefulness is determined by its ability to clearly differentiate between models of varying capabilities (separability) and closely align with human preferences. Existing frameworks like Alpaca-Eval 2.0 LC \cite{dubois2024lengthcontrolledalpacaevalsimpleway} and Arena-Hard v0.1 \cite{li2024crowdsourced} are limited by their focus on general-purpose queries and lack of diversity across domains such as law, medicine, and multilingual contexts. In this paper, we address these limitations by introducing a novel data pipeline that curates diverse, domain-specific evaluation sets tailored for LLM-as-a-Judge frameworks. Our approach leverages a combination of manual curation, semi-supervised learning to generate clusters, and stratified sampling to ensure balanced representation across a wide range of domains and languages. The resulting evaluation set, which includes 1573 samples across 14 categories, demonstrates high separability (84\%) across ten top-ranked models, and agreement (84\%) with Chatbot Arena and (0.915) Spearman correlation. The agreement values are 9\% better than Arena Hard and 20\% better than AlpacaEval 2.0 LC, while the Spearman coefficient is 0.7 more than the next best benchmark, showcasing a significant improvement in the usefulness of the benchmark. We further provide an open-source evaluation tool that enables fine-grained analysis of model performance across user-defined categories, offering valuable insights for practitioners. This work contributes to the ongoing effort to enhance the transparency, diversity, and effectiveness of LLM evaluation methodologies.
Authors: Tatjana Legler, Vinit Hegiste, Martin Ruskowski
Abstract: Federated Learning (FL) represents a paradigm shift in the field of machine learning, offering an approach for a decentralized training of models across a multitude of devices while maintaining the privacy of local data. However, the dynamic nature of FL systems, characterized by the ongoing incorporation of new clients with potentially diverse data distributions and computational capabilities, poses a significant challenge to the stability and efficiency of these distributed learning networks. The seamless integration of new clients is imperative to sustain and enhance the performance and robustness of FL systems. This paper looks into the complexities of integrating new clients into existing FL systems and explores how data heterogeneity and varying data distribution (not independent and identically distributed) among them can affect model training, system efficiency, scalability and stability. Despite these challenges, the integration of new clients into FL systems presents opportunities to enhance data diversity, improve learning performance, and leverage distributed computational power. In contrast to other fields of application such as the distributed optimization of word predictions on Gboard (where federated learning once originated), there are usually only a few clients in the production environment, which is why information from each new client becomes all the more valuable. This paper outlines strategies for effective client selection strategies and solutions for ensuring system scalability and stability. Using the example of images from optical quality inspection, it offers insights into practical approaches. In conclusion, this paper proposes that addressing the challenges presented by new client integration is crucial to the advancement and efficiency of distributed learning networks, thus paving the way for the adoption of Federated Learning in production environments.
Authors: Chi-Heng Lin, Shangqian Gao, James Seale Smith, Abhishek Patel, Shikhar Tuli, Yilin Shen, Hongxia Jin, Yen-Chang Hsu
Abstract: Large Language Models (LLMs) have reshaped the landscape of artificial intelligence by demonstrating exceptional performance across various tasks. However, substantial computational requirements make their deployment challenging on devices with limited resources. Recently, compression methods using low-rank matrix techniques have shown promise, yet these often lead to degraded accuracy or introduce significant overhead in parameters and inference latency. This paper introduces \textbf{Mo}dular \textbf{De}composition (MoDeGPT), a novel structured compression framework that does not need recovery fine-tuning while resolving the above drawbacks. MoDeGPT partitions the Transformer block into modules comprised of matrix pairs and reduces the hidden dimensions via reconstructing the module-level outputs. MoDeGPT is developed based on a theoretical framework that utilizes three well-established matrix decomposition algorithms -- Nystr\"om approximation, CR decomposition, and SVD -- and applies them to our redefined transformer modules. Our comprehensive experiments show MoDeGPT, without backward propagation, matches or surpasses previous structured compression methods that rely on gradient information, and saves 98% of compute costs on compressing a 13B model. On \textsc{Llama}-2/3 and OPT models, MoDeGPT maintains 90-95% zero-shot performance with 25-30% compression rates. Moreover, the compression can be done on a single GPU within a few hours and increases the inference throughput by up to 46%.
Authors: Haoxiang Sun
Abstract: Support Vector Machines (SVMs) based on hinge loss have been extensively discussed and applied to various binary classification tasks. These SVMs achieve a balance between margin maximization and the minimization of slack due to outliers. Although many efforts have been dedicated to enhancing the performance of SVMs with hinge loss, studies on $p$SVMs, soft-margin SVMs with $p$-norm hinge loss, remain relatively scarce. In this paper, we explore the properties, performance, and training algorithms of $p$SVMs. We first derive the generalization bound of $p$SVMs, then formulate the dual optimization problem, comparing it with the traditional approach. Furthermore, we discuss a generalized version of the Sequential Minimal Optimization (SMO) algorithm, $p$SMO, to train our $p$SVM model. Comparative experiments on various datasets, including binary and multi-class classification tasks, demonstrate the effectiveness and advantages of our $p$SVM model and the $p$SMO method. Code is available at https://github.com/CoderBak/pSVM.
Authors: Xinjian Luo, Xianglong Zhang
Abstract: Federated learning (FL) is a decentralized model training framework that aims to merge isolated data islands while maintaining data privacy. However, recent studies have revealed that Generative Adversarial Network (GAN) based attacks can be employed in FL to learn the distribution of private datasets and reconstruct recognizable images. In this paper, we exploit defenses against GAN-based attacks in FL and propose a framework, Anti-GAN, to prevent attackers from learning the real distribution of the victim's data. The core idea of Anti-GAN is to manipulate the visual features of private training images to make them indistinguishable to human eyes even restored by attackers. Specifically, Anti-GAN projects the private dataset onto a GAN's generator and combines the generated fake images with the actual images to create the training dataset, which is then used for federated model training. The experimental results demonstrate that Anti-GAN is effective in preventing attackers from learning the distribution of private images while causing minimal harm to the accuracy of the federated model.
Authors: Hao Li, Cor-Paul Bezemer
Abstract: Open source machine learning (ML) libraries enable developers to integrate advanced ML functionality into their own applications. However, popular ML libraries, such as TensorFlow, are not available natively in all programming languages and software package ecosystems. Hence, developers who wish to use an ML library which is not available in their programming language or ecosystem of choice, may need to resort to using a so-called binding library (or binding). Bindings provide support across programming languages and package ecosystems for reusing a host library. For example, the Keras .NET binding provides support for the Keras library in the NuGet (.NET) ecosystem even though the Keras library was written in Python. In this paper, we collect 2,436 cross-ecosystem bindings for 546 ML libraries across 13 software package ecosystems by using an approach called BindFind, which can automatically identify bindings and link them to their host libraries. Furthermore, we conduct an in-depth study of 133 cross-ecosystem bindings and their development for 40 popular open source ML libraries. Our findings reveal that the majority of ML library bindings are maintained by the community, with npm being the most popular ecosystem for these bindings. Our study also indicates that most bindings cover only a limited range of the host library's releases, often experience considerable delays in supporting new releases, and have widespread technical lag. Our findings highlight key factors to consider for developers integrating bindings for ML libraries and open avenues for researchers to further investigate bindings in software package ecosystems.
Authors: Felix Weitk\"amper
Abstract: Dependencies on the relative frequency of a state in the domain are common when modelling probabilistic dependencies on relational data. For instance, the likelihood of a school closure during an epidemic might depend on the proportion of infected pupils exceeding a threshold. Often, rather than depending on discrete thresholds, dependencies are continuous: for instance, the likelihood of any one mosquito bite transmitting an illness depends on the proportion of carrier mosquitoes. Current approaches usually only consider probabilities over possible worlds rather than over domain elements themselves. An exception are the recently introduced lifted Bayesian networks for conditional probability logic, which express discrete dependencies on probabilistic data. We introduce functional lifted Bayesian networks, a formalism that explicitly incorporates continuous dependencies on relative frequencies into statistical relational artificial intelligence, and compare and contrast them with lifted Bayesian networks for conditional probability logic. Incorporating relative frequencies is not only beneficial to modelling; it also provides a more rigorous approach to learning problems where training and test or application domains have different sizes. To this end, we provide a representation of the asymptotic probability distributions induced by functional lifted Bayesian networks on domains of increasing sizes. Since that representation has well-understood scaling behaviour across domain sizes, it can be used to estimate parameters for a large domain consistently from randomly sampled subpopulations. Furthermore, we show that in parametric families of FLBN, convergence is uniform in the parameters, which ensures a meaningful dependence of the asymptotic probabilities on the parameters of the model.
Authors: Sanket Jantre, Shrijita Bhattacharya, Nathan M. Urban, Byung-Jun Yoon, Tapabrata Maiti, Prasanna Balaprakash, Sandeep Madireddy
Abstract: Deep ensembles have emerged as a powerful technique for improving predictive performance and enhancing model robustness across various applications by leveraging model diversity. However, traditional deep ensemble methods are often computationally expensive and rely on deterministic models, which may limit their flexibility. Additionally, while sparse subnetworks of dense models have shown promise in matching the performance of their dense counterparts and even enhancing robustness, existing methods for inducing sparsity typically incur training costs comparable to those of training a single dense model, as they either gradually prune the network during training or apply thresholding post-training. In light of these challenges, we propose an approach for sequential ensembling of dynamic Bayesian neural subnetworks that consistently maintains reduced model complexity throughout the training process while generating diverse ensembles in a single forward pass. Our approach involves an initial exploration phase to identify high-performing regions within the parameter space, followed by multiple exploitation phases that take advantage of the compactness of the sparse model. These exploitation phases quickly converge to different minima in the energy landscape, corresponding to high-performing subnetworks that together form a diverse and robust ensemble. We empirically demonstrate that our proposed approach outperforms traditional dense and sparse deterministic and Bayesian ensemble models in terms of prediction accuracy, uncertainty estimation, out-of-distribution detection, and adversarial robustness.
Authors: Paula Gradu, Tijana Zrnic, Yixin Wang, Michael I. Jordan
Abstract: Causal discovery and causal effect estimation are two fundamental tasks in causal inference. While many methods have been developed for each task individually, statistical challenges arise when applying these methods jointly: estimating causal effects after running causal discovery algorithms on the same data leads to "double dipping," invalidating the coverage guarantees of classical confidence intervals. To this end, we develop tools for valid post-causal-discovery inference. Across empirical studies, we show that a naive combination of causal discovery and subsequent inference algorithms leads to highly inflated miscoverage rates; on the other hand, applying our method provides reliable coverage while achieving more accurate causal discovery than data splitting.
Authors: Zalan Fabian, Berk Tinaz, Mahdi Soltanolkotabi
Abstract: Diffusion models have established new state of the art in a multitude of computer vision tasks, including image restoration. Diffusion-based inverse problem solvers generate reconstructions of exceptional visual quality from heavily corrupted measurements. However, in what is widely known as the perception-distortion trade-off, the price of perceptually appealing reconstructions is often paid in declined distortion metrics, such as PSNR. Distortion metrics measure faithfulness to the observation, a crucial requirement in inverse problems. In this work, we propose a novel framework for inverse problem solving, namely we assume that the observation comes from a stochastic degradation process that gradually degrades and noises the original clean image. We learn to reverse the degradation process in order to recover the clean image. Our technique maintains consistency with the original measurement throughout the reverse process, and allows for great flexibility in trading off perceptual quality for improved distortion metrics and sampling speedup via early-stopping. We demonstrate the efficiency of our method on different high-resolution datasets and inverse problems, achieving great improvements over other state-of-the-art diffusion-based methods with respect to both perceptual and distortion metrics.
Authors: Emre K{\i}c{\i}man, Robert Ness, Amit Sharma, Chenhao Tan
Abstract: The causal capabilities of large language models (LLMs) are a matter of significant debate, with critical implications for the use of LLMs in societally impactful domains such as medicine, science, law, and policy. We conduct a "behavorial" study of LLMs to benchmark their capability in generating causal arguments. Across a wide range of tasks, we find that LLMs can generate text corresponding to correct causal arguments with high probability, surpassing the best-performing existing methods. Algorithms based on GPT-3.5 and 4 outperform existing algorithms on a pairwise causal discovery task (97%, 13 points gain), counterfactual reasoning task (92%, 20 points gain) and event causality (86% accuracy in determining necessary and sufficient causes in vignettes). We perform robustness checks across tasks and show that the capabilities cannot be explained by dataset memorization alone, especially since LLMs generalize to novel datasets that were created after the training cutoff date. That said, LLMs exhibit unpredictable failure modes, and we discuss the kinds of errors that may be improved and what are the fundamental limits of LLM-based answers. Overall, by operating on the text metadata, LLMs bring capabilities so far understood to be restricted to humans, such as using collected knowledge to generate causal graphs or identifying background causal context from natural language. As a result, LLMs may be used by human domain experts to save effort in setting up a causal analysis, one of the biggest impediments to the widespread adoption of causal methods. Given that LLMs ignore the actual data, our results also point to a fruitful research direction of developing algorithms that combine LLMs with existing causal techniques. Code and datasets are available at https://github.com/py-why/pywhy-llm.
Authors: Jaehun Jung, Peter West, Liwei Jiang, Faeze Brahman, Ximing Lu, Jillian Fisher, Taylor Sorensen, Yejin Choi
Abstract: We present Impossible Distillation, a novel framework for paraphrasing and sentence summarization, that distills a high-quality dataset and model from a low-quality teacher that itself cannot perform these tasks. Unlike prior works that rely on an extreme-scale teacher model (e.g., GPT3) or task-specific architecture, we hypothesize and verify the paraphrastic proximity intrinsic to pre-trained LMs (e.g., GPT2), where paraphrases occupy a proximal subspace in the LM distribution. By identifying and distilling generations from these subspaces, Impossible Distillation produces a high-quality dataset and model even from GPT2-scale LMs. We evaluate our method on multiple benchmarks spanning unconstrained / syntax-controlled paraphrase generation and sentence summarization. Our model with 770M parameters consistently outperforms strong baselines, including models distilled from ChatGPT, and sometimes, even ChatGPT itself. Also, we find that our distilled dataset from 1.5B LMs exhibits higher diversity and fidelity than up to 13 times larger datasets.
Authors: Johannes Liebenow, Yara Sch\"utt, Tanya Braun, Marcel Gehrke, Florian Thaeter, Esfandiar Mohammadi
Abstract: Clustering is an important tool for data exploration where the goal is to subdivide a data set into disjoint clusters that fit well into the underlying data structure. When dealing with sensitive data, privacy-preserving algorithms aim to approximate the non-private baseline while minimising the leakage of sensitive information. State-of-the-art privacy-preserving clustering algorithms tend to output clusters that are good in terms of the standard metrics, inertia, silhouette score, and clustering accuracy, however, the clustering result strongly deviates from the non-private KMeans baseline. In this work, we present a privacy-preserving clustering algorithm called DPM that recursively separates a data set into clusters based on a geometrical clustering approach. In addition, DPM estimates most of the data-dependent hyper-parameters in a privacy-preserving way. We prove that DPM preserves Differential Privacy and analyse the utility guarantees of DPM. Finally, we conduct an extensive empirical evaluation for synthetic and real-life data sets. We show that DPM achieves state-of-the-art utility on the standard clustering metrics and yields a clustering result much closer to that of the popular non-private KMeans algorithm without requiring the number of classes.
Authors: Sophie Starck, Yadunandan Vivekanand Kini, Jessica Johanna Maria Ritter, Rickmer Braren, Daniel Rueckert, Tamara Mueller
Abstract: Age prediction is an important part of medical assessments and research. It can aid in detecting diseases as well as abnormal ageing by highlighting potential discrepancies between chronological and biological age. To improve understanding of age-related changes in various body parts, we investigate the ageing of the human body on a large scale by using whole-body 3D images. We utilise the Grad-CAM method to determine the body areas most predictive of a person's age. In order to expand our analysis beyond individual subjects, we employ registration techniques to generate population-wide importance maps that show the most predictive areas in the body for a whole cohort of subjects. We show that the investigation of the full 3D volume of the whole body and the population-wide analysis can give important insights into which body parts play the most important roles in predicting a person's age. Our findings reveal three primary areas of interest: the spine, the autochthonous back muscles, and the cardiac region, which exhibits the highest importance. Finally, we investigate differences between subjects that show accelerated and decelerated ageing.
Authors: Ying Lu, Pei Shi, Xiao-Han Wang, Jie Hu, Shi-Ju Ran
Abstract: Entanglement propagation provides a key routine to understand quantum many-body dynamics in and out of equilibrium. The entanglement entropy (EE) usually approaches to a sub-saturation known as the Page value $\tilde{S}_{P} =\tilde{S} - dS$ (with $\tilde{S}$ the maximum of EE and $dS$ the Page correction) in, e.g., the random unitary evolutions. The ballistic spreading of EE usually appears in the early time and will be deviated far before the Page value is reached. In this work, we uncover that the magnetic field that maximizes the EE robustly induces persistent ballistic spreading of entanglement in quantum spin chains. The linear growth of EE is demonstrated to persist till the maximal $\tilde{S}$ (along with a flat entanglement spectrum) is reached. The robustness of ballistic spreading and the enhancement of EE under such an optimal control are demonstrated, considering particularly perturbing the initial state by random pure states (RPS's). These are argued as the results from the endomorphism of the time evolution under such an entanglement-enhancing optimal control for the RPS's.
Authors: Iknoor Singh, Carolina Scarton, Xingyi Song, Kalina Bontcheva
Abstract: Finding previously debunked narratives involves identifying claims that have already undergone fact-checking. The issue intensifies when similar false claims persist in multiple languages, despite the availability of debunks for several months in another language. Hence, automatically finding debunks (or fact-checks) in multiple languages is crucial to make the best use of scarce fact-checkers' resources. Mainly due to the lack of readily available data, this is an understudied problem, particularly when considering the cross-lingual scenario, i.e. the retrieval of debunks in a language different from the language of the online post being checked. This study introduces cross-lingual debunked narrative retrieval and addresses this research gap by: (i) creating Multilingual Misinformation Tweets (MMTweets): a dataset that stands out, featuring cross-lingual pairs, images, human annotations, and fine-grained labels, making it a comprehensive resource compared to its counterparts; (ii) conducting an extensive experiment to benchmark state-of-the-art cross-lingual retrieval models and introducing multistage retrieval methods tailored for the task; and (iii) comprehensively evaluating retrieval models for their cross-lingual and cross-dataset transfer capabilities within MMTweets, and conducting a retrieval latency analysis. We find that MMTweets presents challenges for cross-lingual debunked narrative retrieval, highlighting areas for improvement in retrieval models. Nonetheless, the study provides valuable insights for creating MMTweets datasets and optimising debunked narrative retrieval models to empower fact-checking endeavours. The dataset and annotation codebook are publicly available at https://doi.org/10.5281/zenodo.10637161.
Authors: Zalan Fabian, Berk Tinaz, Mahdi Soltanolkotabi
Abstract: Inverse problems arise in a multitude of applications, where the goal is to recover a clean signal from noisy and possibly (non)linear observations. The difficulty of a reconstruction problem depends on multiple factors, such as the ground truth signal structure, the severity of the degradation and the complex interactions between the above. This results in natural sample-by-sample variation in the difficulty of a reconstruction problem. Our key observation is that most existing inverse problem solvers lack the ability to adapt their compute power to the difficulty of the reconstruction task, resulting in subpar performance and wasteful resource allocation. We propose a novel method, $\textit{severity encoding}$, to estimate the degradation severity of corrupted signals in the latent space of an autoencoder. We show that the estimated severity has strong correlation with the true corruption level and can provide useful hints on the difficulty of reconstruction problems on a sample-by-sample basis. Furthermore, we propose a reconstruction method based on latent diffusion models that leverages the predicted degradation severities to fine-tune the reverse diffusion sampling trajectory and thus achieve sample-adaptive inference times. Our framework, Flash-Diffusion, acts as a wrapper that can be combined with any latent diffusion-based baseline solver, imbuing it with sample-adaptivity and acceleration. We perform experiments on both linear and nonlinear inverse problems and demonstrate that our technique greatly improves the performance of the baseline solver and achieves up to $10\times$ acceleration in mean sampling speed.
Authors: Rohit Agarwal, Aman Sinha, Ayan Vishwakarma, Xavier Coubez, Marianne Clausel, Mathieu Constant, Alexander Horsch, Dilip K. Prasad
Abstract: Modeling irregularly-sampled time series (ISTS) is challenging because of missing values. Most existing methods focus on handling ISTS by converting irregularly sampled data into regularly sampled data via imputation. These models assume an underlying missing mechanism, which may lead to unwanted bias and sub-optimal performance. We present SLAN (Switch LSTM Aggregate Network), which utilizes a group of LSTMs to model ISTS without imputation, eliminating the assumption of any underlying process. It dynamically adapts its architecture on the fly based on the measured sensors using switches. SLAN exploits the irregularity information to explicitly capture each sensor's local summary and maintains a global summary state throughout the observational period. We demonstrate the efficacy of SLAN on two public datasets, namely, MIMIC-III, and Physionet 2012.
Authors: Yoav Noah, Nir Shlezinger
Abstract: Distributed optimization is a fundamental framework for collaborative inference and decision making in decentralized multi-agent systems. The operation is modeled as the joint minimization of a shared objective which typically depends on observations gathered locally by each agent. Distributed optimization algorithms, such as the common D-ADMM, tackle this task by iteratively combining local computations and message exchanges. One of the main challenges associated with distributed optimization, and particularly with D-ADMM, is that it requires a large number of communications, i.e., messages exchanged between the agents, to reach consensus. This can make D-ADMM costly in power, latency, and channel resources. In this work we propose unfolded D-ADMM, which follows the emerging deep unfolding methodology to enable D-ADMM to operate reliably with a predefined and small number of messages exchanged by each agent. Unfolded D-ADMM fully preserves the operation of D-ADMM, while leveraging data to tune the hyperparameters of each iteration of the algorithm. These hyperparameters can either be agent-specific, aiming at achieving the best performance within a fixed number of iterations over a given network, or shared among the agents, allowing to learn to distributedly optimize over different networks. For both settings, our unfolded D-ADMM operates with limited communications, while preserving the interpretability and flexibility of the original D-ADMM algorithm. We specialize unfolded D-ADMM for two representative settings: a distributed estimation task, considering a sparse recovery setup, and a distributed learning scenario, where multiple agents collaborate in learning a machine learning model. Our numerical results demonstrate that the proposed approach dramatically reduces the number of communications utilized by D-ADMM, without compromising on its performance.
Authors: Rasoul Amirzadeh, Dhananjay Thiruvady, Asef Nazari, Mong Shan Ee
Abstract: Despite advances in artificial intelligence-enhanced trading methods, developing a profitable automated trading system remains challenging in the rapidly evolving cryptocurrency market. This research focuses on developing a reinforcement learning (RL) framework to tackle the complexities of trading five prominent altcoins: Binance Coin, Ethereum, Litecoin, Ripple, and Tether. To this end, we present the CausalReinforceNet~(CRN) framework, which integrates both Bayesian and dynamic Bayesian network techniques to empower the RL agent in trade decision-making. We develop two agents using the framework based on distinct RL algorithms to analyse performance compared to the Buy-and-Hold benchmark strategy and a baseline RL model. The results indicate that our framework surpasses both models in profitability, highlighting CRN's consistent superiority, although the level of effectiveness varies across different cryptocurrencies.
Authors: Francisco Eiras, Kemal Oksuz, Adel Bibi, Philip H. S. Torr, Puneet K. Dokania
Abstract: Referring Image Segmentation (RIS) - the problem of identifying objects in images through natural language sentences - is a challenging task currently mostly solved through supervised learning. However, while collecting referred annotation masks is a time-consuming process, the few existing weakly-supervised and zero-shot approaches fall significantly short in performance compared to fully-supervised learning ones. To bridge the performance gap without mask annotations, we propose a novel weakly-supervised framework that tackles RIS by decomposing it into three steps: obtaining instance masks for the object mentioned in the referencing instruction (segment), using zero-shot learning to select a potentially correct mask for the given instruction (select), and bootstrapping a model which allows for fixing the mistakes of zero-shot selection (correct). In our experiments, using only the first two steps (zero-shot segment and select) outperforms other zero-shot baselines by as much as 16.5%, while our full method improves upon this much stronger baseline and sets the new state-of-the-art for weakly-supervised RIS, reducing the gap between the weakly-supervised and fully-supervised methods in some cases from around 33% to as little as 7%. Code is available at https://github.com/fgirbal/segment-select-correct.
Authors: Su Wang, Roberto Morabito, Seyyedali Hosseinalipour, Mung Chiang, Christopher G. Brinton
Abstract: The conventional federated learning (FedL) architecture distributes machine learning (ML) across worker devices by having them train local models that are periodically aggregated by a server. FedL ignores two important characteristics of contemporary wireless networks, however: (i) the network may contain heterogeneous communication/computation resources, and (ii) there may be significant overlaps in devices' local data distributions. In this work, we develop a novel optimization methodology that jointly accounts for these factors via intelligent device sampling complemented by device-to-device (D2D) offloading. Our optimization methodology aims to select the best combination of sampled nodes and data offloading configuration to maximize FedL training accuracy while minimizing data processing and D2D communication resource consumption subject to realistic constraints on the network topology and device capabilities. Theoretical analysis of the D2D offloading subproblem leads to new FedL convergence bounds and an efficient sequential convex optimizer. Using these results, we develop a sampling methodology based on graph convolutional networks (GCNs) which learns the relationship between network attributes, sampled nodes, and D2D data offloading to maximize FedL accuracy. Through evaluation on popular datasets and real-world network measurements from our edge testbed, we find that our methodology outperforms popular device sampling methodologies from literature in terms of ML model performance, data processing overhead, and energy consumption.
Authors: Shafagh Keyvanian, Michelle J. Johnson, Nadia Figueroa
Abstract: A realistic human kinematic model that satisfies anatomical constraints is essential for human-robot interaction, biomechanics and robot-assisted rehabilitation. Modeling realistic joint constraints, however, is challenging as human arm motion is constrained by joint limits, inter- and intra-joint dependencies, self-collisions, individual capabilities and muscular or neurological constraints which are difficult to represent. Hence, physicians and researchers have relied on simple box-constraints, ignoring important anatomical factors. In this paper, we propose a data-driven method to learn realistic anatomically constrained upper-limb range of motion (RoM) boundaries from motion capture data. This is achieved by fitting a one-class support vector machine to a dataset of upper-limb joint space exploration motions with an efficient hyper-parameter tuning scheme. Our approach outperforms similar works focused on valid RoM learning. Further, we propose an impairment index (II) metric that offers a quantitative assessment of capability/impairment when comparing healthy and impaired arms. We validate the metric on healthy subjects physically constrained to emulate hemiplegia and different disability levels as stroke patients.
Authors: Shijun Liang, Van Hoang Minh Nguyen, Jinghan Jia, Ismail Alkhouri, Sijia Liu, Saiprasad Ravishankar
Abstract: As the popularity of deep learning (DL) in the field of magnetic resonance imaging (MRI) continues to rise, recent research has indicated that DL-based MRI reconstruction models might be excessively sensitive to minor input disturbances, including worst-case additive perturbations. This sensitivity often leads to unstable, aliased images. This raises the question of how to devise DL techniques for MRI reconstruction that can be robust to train-test variations. To address this problem, we propose a novel image reconstruction framework, termed Smoothed Unrolling (SMUG), which advances a deep unrolling-based MRI reconstruction model using a randomized smoothing (RS)-based robust learning approach. RS, which improves the tolerance of a model against input noises, has been widely used in the design of adversarial defense approaches for image classification tasks. Yet, we find that the conventional design that applies RS to the entire DL-based MRI model is ineffective. In this paper, we show that SMUG and its variants address the above issue by customizing the RS process based on the unrolling architecture of a DL-based MRI reconstruction model. Compared to the vanilla RS approach, we show that SMUG improves the robustness of MRI reconstruction with respect to a diverse set of instability sources, including worst-case and random noise perturbations to input measurements, varying measurement sampling rates, and different numbers of unrolling steps. Furthermore, we theoretically analyze the robustness of our method in the presence of perturbations.
Authors: Kaijie Zhu, Qinlin Zhao, Hao Chen, Jindong Wang, Xing Xie
Abstract: The evaluation of large language models (LLMs) is crucial to assess their performance and mitigate potential security risks. In this paper, we introduce PromptBench, a unified library to evaluate LLMs. It consists of several key components that are easily used and extended by researchers: prompt construction, prompt engineering, dataset and model loading, adversarial prompt attack, dynamic evaluation protocols, and analysis tools. PromptBench is designed to be an open, general, and flexible codebase for research purposes that can facilitate original study in creating new benchmarks, deploying downstream applications, and designing new evaluation protocols. The code is available at: https://github.com/microsoft/promptbench and will be continuously supported.
Authors: Serina Chang, Frederic Koehler, Zhaonan Qu, Jure Leskovec, Johan Ugander
Abstract: A common network inference problem, arising from real-world data constraints, is how to infer a dynamic network from its time-aggregated adjacency matrix and time-varying marginals (i.e., row and column sums). Prior approaches to this problem have repurposed the classic iterative proportional fitting (IPF) procedure, also known as Sinkhorn's algorithm, with promising empirical results. However, the statistical foundation for using IPF has not been well understood: under what settings does IPF provide principled estimation of a dynamic network from its marginals, and how well does it estimate the network? In this work, we establish such a setting, by identifying a generative network model whose maximum likelihood estimates are recovered by IPF. Our model both reveals implicit assumptions on the use of IPF in such settings and enables new analyses, such as structure-dependent error bounds on IPF's parameter estimates. When IPF fails to converge on sparse network data, we introduce a principled algorithm that guarantees IPF converges under minimal changes to the network structure. Finally, we conduct experiments with synthetic and real-world data, which demonstrate the practical value of our theoretical and algorithmic contributions.
Authors: Guo Liang, Guangwu Liu, Kun Zhang
Abstract: Finite difference (FD) approximation is a classic approach to stochastic gradient estimation when only noisy function realizations are available. In this paper, we first provide a sample-driven method via the bootstrap technique to estimate the optimal perturbation, and then propose an efficient FD estimator based on correlated samples at the estimated optimal perturbation. Furthermore, theoretical analyses of both the perturbation estimator and the FD estimator reveal that, {\it surprisingly}, the correlation enables the proposed FD estimator to achieve a reduction in variance and, in some cases, a decrease in bias compared to the traditional optimal FD estimator. Numerical results confirm the efficiency of our estimators and align well with the theory presented, especially in scenarios with small sample sizes. Finally, we apply the estimator to solve derivative-free optimization (DFO) problems, and numerical studies show that DFO problems with 100 dimensions can be effectively solved.
Authors: Yifan Liu, Chenchen Kuai, Haoxuan Ma, Xishun Liao, Brian Yueshuai He, Jiaqi Ma
Abstract: Human travel trajectory mining is crucial for transportation systems, enhancing route optimization, traffic management, and the study of human travel patterns. Previous rule-based approaches without the integration of semantic information show a limitation in both efficiency and accuracy. Semantic information, such as activity types inferred from Points of Interest (POI) data, can significantly enhance the quality of trajectory mining. However, integrating these insights is challenging, as many POIs have incomplete feature information, and current learning-based POI algorithms require the integrity of datasets to do the classification. In this paper, we introduce a novel pipeline for human travel trajectory mining. Our approach first leverages the strong inferential and comprehension capabilities of large language models (LLMs) to annotate POI with activity types and then uses a Bayesian-based algorithm to infer activity for each stay point in a trajectory. In our evaluation using the OpenStreetMap (OSM) POI dataset, our approach achieves a 93.4% accuracy and a 96.1% F-1 score in POI classification, and a 91.7% accuracy with a 92.3% F-1 score in activity inference.
Authors: Edward Hall\'e-Hannan, Charles Audet, Youssef Diouane, S\'ebastien Le Digabel, Paul Saves
Abstract: Heterogeneous datasets emerge in various machine learning and optimization applications that feature different input sources, types or formats. Most models or methods do not natively tackle heterogeneity. Hence, such datasets are often partitioned into smaller and simpler ones, which may limit the generalizability or performance, especially if data is limited. The first main contribution of this work is a modeling framework that generalizes hierarchical, tree-structured, variable-size or conditional search frameworks. The framework models mixed-variable domains in which variables may be continuous, integer, or categorical, with some identified as meta when they influence the structure of the problem. The second main contribution is a novel distance that compares any pair of mixed-variable points that do not share the same variables, allowing to use whole heterogeneous datasets that reside in mixed-variable domains with meta variables. The contributions are illustrated on several regression experiments, in which the performance of a multilayer perceptron with respect to its hyperparameters is modeled.
Authors: Nathan Corecco, Giorgio Piatti, Luca A. Lanzend\"orfer, Flint Xiaofeng Fan, Roger Wattenhofer
Abstract: Reinforcement learning (RL) has gained popularity in the realm of recommender systems due to its ability to optimize long-term rewards and guide users in discovering relevant content. However, the successful implementation of RL in recommender systems is challenging because of several factors, including the limited availability of online data for training on-policy methods. This scarcity requires expensive human interaction for online model training. Furthermore, the development of effective evaluation frameworks that accurately reflect the quality of models remains a fundamental challenge in recommender systems. To address these challenges, we propose a comprehensive framework for synthetic environments that simulate human behavior by harnessing the capabilities of large language models (LLMs). We complement our framework with in-depth ablation studies and demonstrate its effectiveness with experiments on movie and book recommendations. Using LLMs as synthetic users, this work introduces a modular and novel framework to train RL-based recommender systems. The software, including the RL environment, is publicly available on GitHub.
Authors: Hao Li, Yuping Wu, Viktor Schlegel, Riza Batista-Navarro, Tharindu Madusanka, Iqra Zahid, Jiayan Zeng, Xiaochi Wang, Xinran He, Yizhi Li, Goran Nenadic
Abstract: With the recent advances of large language models (LLMs), it is no longer infeasible to build an automated debate system that helps people to synthesise persuasive arguments. Previous work attempted this task by integrating multiple components. In our work, we introduce an argument mining dataset that captures the end-to-end process of preparing an argumentative essay for a debate, which covers the tasks of claim and evidence identification (Task 1 ED), evidence convincingness ranking (Task 2 ECR), argumentative essay summarisation and human preference ranking (Task 3 ASR) and metric learning for automated evaluation of resulting essays, based on human feedback along argument quality dimensions (Task 4 SQE). Our dataset contains 14k examples of claims that are fully annotated with the various properties supporting the aforementioned tasks. We evaluate multiple generative baselines for each of these tasks, including representative LLMs. We find, that while they show promising results on individual tasks in our benchmark, their end-to-end performance on all four tasks in succession deteriorates significantly, both in automated measures as well as in human-centred evaluation. This challenge presented by our proposed dataset motivates future research on end-to-end argument mining and summarisation. The repository of this project is available at https://github.com/HaoBytes/ArgSum-Datatset
Authors: Victor Yon, Bastien Galaup, Claude Rohrbacher, Joffrey Rivard, Cl\'ement Godfrin, Ruoyu Li, Stefan Kubicek, Kristiaan De Greve, Louis Gaudreau, Eva Dupont-Ferrier, Yann Beilliard, Roger G. Melko, Dominique Drouin
Abstract: This study presents a machine-learning-based procedure to automate the charge tuning of semiconductor spin qubits with minimal human intervention, addressing one of the significant challenges in scaling up quantum dot technologies. This method exploits artificial neural networks to identify noisy transition lines in stability diagrams, guiding a robust exploration strategy leveraging neural networks' uncertainty estimations. Tested across three distinct offline experimental datasets representing different single quantum dot technologies, the approach achieves over 99% tuning success rate in optimal cases, where more than 10% of the success is directly attributable to uncertainty exploitation. The challenging constraints of small training sets containing high diagram-to-diagram variability allowed us to evaluate the capabilities and limits of the proposed procedure.
Authors: Lechen Feng, Yuan-Hua Ni, Xuebo Zhang
Abstract: This study investigates a decentralized linear-quadratic optimal control problem, and several approximate separable constrained optimization problems are formulated for the first time based on the selection of sparsity promoting functions. First, for the optimization problem with weighted $\ell_1$ sparsity promoting function, a two-timescale algorithm is adopted that is based on the BSUM (Block Successive Upper-bound Minimization) framework and a differential equation solver. Second, a piecewise quadratic sparsity promoting function is introduced, and the induced optimization problem demonstrates an accelerated convergence rate by performing the same two-timescale algorithm. Finally, the optimization problem with $\ell_0$ sparsity promoting function is considered that is nonconvex and discontinuous, and can be approximated by successive coordinatewise convex optimization problems.
Authors: Zhepei Wei, Wei-Lin Chen, Yu Meng
Abstract: Retrieval-augmented generation (RAG) has shown promising potential to enhance the accuracy and factuality of language models (LMs). However, imperfect retrievers or noisy corpora can introduce misleading or even erroneous information to the retrieved contents, posing a significant challenge to the generation quality. Existing RAG methods typically address this challenge by directly predicting final answers despite potentially noisy inputs, resulting in an implicit denoising process that is difficult to interpret and verify. On the other hand, the acquisition of explicit denoising supervision is often costly, involving significant human efforts. In this work, we propose InstructRAG, where LMs explicitly learn the denoising process through self-synthesized rationales -- First, we instruct the LM to explain how the ground-truth answer is derived from retrieved documents. Then, these rationales can be used either as demonstrations for in-context learning of explicit denoising or as supervised fine-tuning data to train the model. Compared to standard RAG approaches, InstructRAG requires no additional supervision, allows for easier verification of the predicted answers, and effectively improves generation accuracy. Experiments show InstructRAG consistently outperforms existing RAG methods in both training-free and trainable scenarios, achieving a relative improvement of 8.3% over the best baseline method on average across five knowledge-intensive benchmarks. Extensive analysis indicates that InstructRAG scales well with increased numbers of retrieved documents and consistently exhibits robust denoising ability even in out-of-domain datasets, demonstrating strong generalizability.
Authors: Hong Jun Jeon, Benjamin Van Roy
Abstract: The staggering progress of machine learning in the past decade has been a sight to behold. In retrospect, it is both remarkable and unsettling that these milestones were achievable with little to no rigorous theory to guide experimentation. Despite this fact, practitioners have been able to guide their future experimentation via observations from previous large-scale empirical investigations. However, alluding to Plato's Allegory of the cave, it is likely that the observations which form the field's notion of reality are but shadows representing fragments of that reality. In this work, we propose a theoretical framework which attempts to answer what exists outside of the cave. To the theorist, we provide a framework which is mathematically rigorous and leaves open many interesting ideas for future exploration. To the practitioner, we provide a framework whose results are very intuitive, general, and which will help form principles to guide future investigations. Concretely, we provide a theoretical framework rooted in Bayesian statistics and Shannon's information theory which is general enough to unify the analysis of many phenomena in machine learning. Our framework characterizes the performance of an optimal Bayesian learner, which considers the fundamental limits of information. Throughout this work, we derive very general theoretical results and apply them to derive insights specific to settings ranging from data which is independently and identically distributed under an unknown distribution, to data which is sequential, to data which exhibits hierarchical structure amenable to meta-learning. We conclude with a section dedicated to characterizing the performance of misspecified algorithms. These results are exciting and particularly relevant as we strive to overcome increasingly difficult machine learning challenges in this endlessly complex world.
Authors: Ye Jiang, Yimin Wang
Abstract: Large visual-language models (LVLMs) exhibit exceptional performance in visual-language reasoning across diverse cross-modal benchmarks. Despite these advances, recent research indicates that Large Language Models (LLMs), like GPT-3.5-turbo, underachieve compared to well-trained smaller models, such as BERT, in Fake News Detection (FND), prompting inquiries into LVLMs' efficacy in FND tasks. Although performance could improve through fine-tuning LVLMs, the substantial parameters and requisite pre-trained weights render it a resource-heavy endeavor for FND applications. This paper initially assesses the FND capabilities of two notable LVLMs, CogVLM and GPT4V, in comparison to a smaller yet adeptly trained CLIP model in a zero-shot context. The findings demonstrate that LVLMs can attain performance competitive with that of the smaller model. Next, we integrate standard in-context learning (ICL) with LVLMs, noting improvements in FND performance, though limited in scope and consistency. To address this, we introduce the \textbf{I}n-context \textbf{M}ultimodal \textbf{F}ake \textbf{N}ews \textbf{D}etection (IMFND) framework, enriching in-context examples and test inputs with predictions and corresponding probabilities from a well-trained smaller model. This strategic integration directs the LVLMs' focus towards news segments associated with higher probabilities, thereby improving their analytical accuracy. The experimental results suggest that the IMFND framework significantly boosts the FND efficiency of LVLMs, achieving enhanced accuracy over the standard ICL approach across three publicly available FND datasets.
Authors: Aobo Xu, Bingyu Chang, Qingpeng Liu, Ling Jian
Abstract: Identifying significant references within the complex interrelations of a citation knowledge graph is challenging, which encompasses connections through citations, authorship, keywords, and other relational attributes. The Paper Source Tracing (PST) task seeks to automate the identification of pivotal references for given scholarly articles utilizing advanced data mining techniques. In the KDD CUP OAG-Challenge PST track, we design a recommendation-based framework tailored for the PST task. This framework employs the Neural Collaborative Filtering (NCF) model to generate final predictions. To process the textual attributes of the papers and extract input features for the model, we utilize SciBERT, a pre-trained language model. According to the experimental results, our method achieved a score of 0.37814 on the Mean Average Precision (MAP) metric, outperforming baseline models and ranking 11th among all participating teams. The source code is publicly available at https://github.com/MyLove-XAB/KDDCupFinal.
Authors: Mahadev Prasad Panda, Matteo Tiezzi, Martina Vilas, Gemma Roig, Bjoern M. Eskofier, Dario Zanca
Abstract: We introduce Foveation-based Explanations (FovEx), a novel human-inspired visual explainability (XAI) method for Deep Neural Networks. Our method achieves state-of-the-art performance on both transformer (on 4 out of 5 metrics) and convolutional models (on 3 out of 5 metrics), demonstrating its versatility. Furthermore, we show the alignment between the explanation map produced by FovEx and human gaze patterns (+14\% in NSS compared to RISE, +203\% in NSS compared to gradCAM), enhancing our confidence in FovEx's ability to close the interpretation gap between humans and machines.
Authors: Julen Urain, Ajay Mandlekar, Yilun Du, Mahi Shafiullah, Danfei Xu, Katerina Fragkiadaki, Georgia Chalvatzaki, Jan Peters
Abstract: Learning from Demonstrations, the field that proposes to learn robot behavior models from data, is gaining popularity with the emergence of deep generative models. Although the problem has been studied for years under names such as Imitation Learning, Behavioral Cloning, or Inverse Reinforcement Learning, classical methods have relied on models that don't capture complex data distributions well or don't scale well to large numbers of demonstrations. In recent years, the robot learning community has shown increasing interest in using deep generative models to capture the complexity of large datasets. In this survey, we aim to provide a unified and comprehensive review of the last year's progress in the use of deep generative models in robotics. We present the different types of models that the community has explored, such as energy-based models, diffusion models, action value maps, or generative adversarial networks. We also present the different types of applications in which deep generative models have been used, from grasp generation to trajectory generation or cost learning. One of the most important elements of generative models is the generalization out of distributions. In our survey, we review the different decisions the community has made to improve the generalization of the learned models. Finally, we highlight the research challenges and propose a number of future directions for learning deep generative models in robotics.
Authors: Alejandro Gomez-Leos, Oscar L\'opez
Abstract: We revisit the sample and computational complexity of completing a rank-1 tensor in $\otimes_{i=1}^{N} \mathbb{R}^{d}$, given a uniformly sampled subset of its entries. We present a characterization of the problem (i.e. nonzero entries) which admits an algorithm amounting to Gauss-Jordan on a pair of random linear systems. For example, when $N = \Theta(1)$, we prove it uses no more than $m = O(d^2 \log d)$ samples and runs in $O(md^2)$ time. Moreover, we show any algorithm requires $\Omega(d\log d)$ samples. By contrast, existing upper bounds on the sample complexity are at least as large as $d^{1.5} \mu^{\Omega(1)} \log^{\Omega(1)} d$, where $\mu$ can be $\Theta(d)$ in the worst case. Prior work obtained these looser guarantees in higher rank versions of our problem, and tend to involve more complicated algorithms.
Authors: Hossein Jafarinia, Alireza Alipanah, Danial Hamdi, Saeed Razavi, Nahal Mirzaie, Mohammad Hossein Rohban
Abstract: Whole Slide Image (WSI) classification with multiple instance learning (MIL) in digital pathology faces significant computational challenges. Current methods mostly rely on extensive self-supervised learning (SSL) for satisfactory performance, requiring long training periods and considerable computational resources. At the same time, no pre-training affects performance due to domain shifts from natural images to WSIs. We introduce Snuffy architecture, a novel MIL-pooling method based on sparse transformers that mitigates performance loss with limited pre-training and enables continual few-shot pre-training as a competitive option. Our sparsity pattern is tailored for pathology and is theoretically proven to be a universal approximator with the tightest probabilistic sharp bound on the number of layers for sparse transformers, to date. We demonstrate Snuffy's effectiveness on CAMELYON16 and TCGA Lung cancer datasets, achieving superior WSI and patch-level accuracies. The code is available on https://github.com/jafarinia/snuffy.
Authors: Cyril Shih-Huan Hsu, Danny De Vleeschauwer, Chrysa Papagianni
Abstract: When a network slice spans multiple domains, each domain must uphold the End-to-End (E2E) Service Level Agreement (SLA) associated with the slice. This requires decomposing the E2E SLA into partial SLAs for each domain. In a two-level network slicing management system with an E2E orchestrator and local controllers, we propose an online learning-decomposition framework that dynamically updates risk models using recent feedback. This approach utilizes online gradient descent and FIFO memory buffers to enhance stability and robustness. Our empirical study shows the proposed framework outperforms state-of-the-art static methods, offering more accurate and resilient SLA decomposition under varying conditions and sparse data.
Authors: Cho Yang
Abstract: CycleGAN was trained on SynthRAD Grand Challenge Dataset using the single-epoch modification (SEM) method proposed in this paper which is referred to as (CycleGAN-single) compared to the usual method of training CycleGAN on around 200 epochs (CycleGAN-multi). Model performance were evaluated qualitatively and quantitatively with quantitative performance metrics like PSNR, SSIM, MAE and MSE. The consideration of both quantitative and qualitative performance when evaluating a model is unique to certain image-translation tasks like medical imaging as detailed in this paper. Also, this paper shows that good quantitative performance does not always imply good qualitative performance and the converse is also not always True (i.e. good qualitative performance does not always imply good quantitative performance). This paper also proposes FQGA (Fast Paired Image-to-Image Translation Quarter-Generator Adversary) Model which has 1/4 the number of parameters compared to CycleGAN (when comparing their Generator Models). FQGA outperforms CycleGAN qualitatively and quantitatively even only after training on 20 epochs. Finally, using SEM method on FQGA allowed it to again outperform CycleGAN both quantitatively and qualitatively. These performance gains with fewer model parameters and time savings from running fewer epochs may also be applicable to other image-to-image translation tasks in Machine Learning apart from the Medical image-translation task discussed in this paper between Cone Beam Computed Tomography (CBCT) and Computed Tomography (CT) images.
Authors: Zitian Gao, Yihao Xiao
Abstract: In the Venture Capital(VC) industry, predicting the success of startups is challenging due to limited financial data and the need for subjective revenue forecasts. Previous methods based on time series analysis or deep learning often fall short as they fail to incorporate crucial inter-company relationships such as competition and collaboration. Regarding the issues, we propose a novel approach using GrahphRAG augmented time series model. With GraphRAG, time series predictive methods are enhanced by integrating these vital relationships into the analysis framework, allowing for a more dynamic understanding of the startup ecosystem in venture capital. Our experimental results demonstrate that our model significantly outperforms previous models in startup success predictions. To the best of our knowledge, our work is the first application work of GraphRAG.
Authors: Zhi Qiao, Linbin Han, Xiantong Zhen, Jia-Hong Gao, Zhen Qian
Abstract: In light of the inherent entailment relations between images and text, hyperbolic point vector embeddings, leveraging the hierarchical modeling advantages of hyperbolic space, have been utilized for visual semantic representation learning. However, point vector embedding approaches fail to address the issue of semantic uncertainty, where an image may have multiple interpretations, and text may refer to different images, a phenomenon particularly prevalent in the medical domain. Therefor, we propose \textbf{HYDEN}, a novel hyperbolic density embedding based image-text representation learning approach tailored for specific medical domain data. This method integrates text-aware local features alongside global features from images, mapping image-text features to density features in hyperbolic space via using hyperbolic pseudo-Gaussian distributions. An encapsulation loss function is employed to model the partial order relations between image-text density distributions. Experimental results demonstrate the interpretability of our approach and its superior performance compared to the baseline methods across various zero-shot tasks and different datasets.
Authors: Yuanjian Xu, Anxian Liu, Jianing Hao, Zhenzhuo Li, Shichang Meng, Guang Zhang
Abstract: Financial time series modeling is crucial for understanding and predicting market behaviors but faces challenges such as non-linearity, non-stationarity, and high noise levels. Traditional models struggle to capture complex patterns due to these issues, compounded by limitations in computational resources and model capacity. Inspired by the success of large language models in NLP, we introduce $\textbf{PLUTUS}$, a $\textbf{P}$re-trained $\textbf{L}$arge $\textbf{U}$nified $\textbf{T}$ransformer-based model that $\textbf{U}$nveils regularities in financial time $\textbf{S}$eries. PLUTUS uses an invertible embedding module with contrastive learning and autoencoder techniques to create an approximate one-to-one mapping between raw data and patch embeddings. TimeFormer, an attention based architecture, forms the core of PLUTUS, effectively modeling high-noise time series. We incorporate a novel attention mechanisms to capture features across both variable and temporal dimensions. PLUTUS is pre-trained on an unprecedented dataset of 100 billion observations, designed to thrive in noisy financial environments. To our knowledge, PLUTUS is the first open-source, large-scale, pre-trained financial time series model with over one billion parameters. It achieves state-of-the-art performance in various tasks, demonstrating strong transferability and establishing a robust foundational model for finance. Our research provides technical guidance for pre-training financial time series data, setting a new standard in the field.