new Most Influential Subset Selection: Challenges, Promises, and Beyond

Authors: Yuzheng Hu, Pingbang Hu, Han Zhao, Jiaqi W. Ma

Abstract: How can we attribute the behaviors of machine learning models to their training data? While the classic influence function sheds light on the impact of individual samples, it often fails to capture the more complex and pronounced collective influence of a set of samples. To tackle this challenge, we study the Most Influential Subset Selection (MISS) problem, which aims to identify a subset of training samples with the greatest collective influence. We conduct a comprehensive analysis of the prevailing approaches in MISS, elucidating their strengths and weaknesses. Our findings reveal that influence-based greedy heuristics, a dominant class of algorithms in MISS, can provably fail even in linear regression. We delineate the failure modes, including the errors of influence function and the non-additive structure of the collective influence. Conversely, we demonstrate that an adaptive version of these heuristics which applies them iteratively, can effectively capture the interactions among samples and thus partially address the issues. Experiments on real-world datasets corroborate these theoretical findings, and further demonstrate that the merit of adaptivity can extend to more complex scenarios such as classification tasks and non-linear neural networks. We conclude our analysis by emphasizing the inherent trade-off between performance and computational efficiency, questioning the use of additive metrics such as the linear datamodeling score, and offering a range of discussions.

new A Survey on Neural Architecture Search Based on Reinforcement Learning

Authors: Wenzhu Shao

Abstract: The automation of feature extraction of machine learning has been successfully realized by the explosive development of deep learning. However, the structures and hyperparameters of deep neural network architectures also make huge difference on the performance in different tasks. The process of exploring optimal structures and hyperparameters often involves a lot of tedious human intervene. As a result, a legitimate question is to ask for the automation of searching for optimal network structures and hyperparameters. The work of automation of exploring optimal hyperparameters is done by Hyperparameter Optimization. Neural Architecture Search is aimed to automatically find the best network structure given specific tasks. In this paper, we firstly introduced the overall development of Neural Architecture Search and then focus mainly on providing an overall and understandable survey about Neural Architecture Search works that are relevant with reinforcement learning, including improvements and variants based on the hope of satisfying more complex structures and resource-insufficient environment.

new Jump Diffusion-Informed Neural Networks with Transfer Learning for Accurate American Option Pricing under Data Scarcity

Authors: Qiguo Sun, Hanyue Huang, XiBei Yang, Yuwei Zhang

Abstract: Option pricing models, essential in financial mathematics and risk management, have been extensively studied and recently advanced by AI methodologies. However, American option pricing remains challenging due to the complexity of determining optimal exercise times and modeling non-linear payoffs resulting from stochastic paths. Moreover, the prevalent use of the Black-Scholes formula in hybrid models fails to accurately capture the discontinuity in the price process, limiting model performance, especially under scarce data conditions. To address these issues, this study presents a comprehensive framework for American option pricing consisting of six interrelated modules, which combine nonlinear optimization algorithms, analytical and numerical models, and neural networks to improve pricing performance. Additionally, to handle the scarce data challenge, this framework integrates the transfer learning through numerical data augmentation and a physically constrained, jump diffusion process-informed neural network to capture the leptokurtosis of the log return distribution. To increase training efficiency, a warm-up period using Bayesian optimization is designed to provide optimal data loss and physical loss coefficients. Experimental results of six case studies demonstrate the accuracy, convergence, physical effectiveness, and generalization of the framework. Moreover, the proposed model shows superior performance in pricing deep out-of-the-money options.

new Bridging OOD Detection and Generalization: A Graph-Theoretic View

Authors: Han Wang, Yixuan Li

Abstract: In the context of modern machine learning, models deployed in real-world scenarios often encounter diverse data shifts like covariate and semantic shifts, leading to challenges in both out-of-distribution (OOD) generalization and detection. Despite considerable attention to these issues separately, a unified framework for theoretical understanding and practical usage is lacking. To bridge the gap, we introduce a graph-theoretic framework to jointly tackle both OOD generalization and detection problems. By leveraging the graph formulation, data representations are obtained through the factorization of the graph's adjacency matrix, enabling us to derive provable error quantifying OOD generalization and detection performance. Empirical results showcase competitive performance in comparison to existing methods, thereby validating our theoretical underpinnings. Code is publicly available at https://github.com/deeplearning-wisc/graph-spectral-ood.

URLs: https://github.com/deeplearning-wisc/graph-spectral-ood.

new Trustworthy Text-to-Image Diffusion Models: A Timely and Focused Survey

Authors: Yi Zhang, Zhen Chen, Chih-Hong Cheng, Wenjie Ruan, Xiaowei Huang, Dezong Zhao, David Flynn, Siddartha Khastgir, Xingyu Zhao

Abstract: Text-to-Image (T2I) Diffusion Models (DMs) have garnered widespread attention for their impressive advancements in image generation. However, their growing popularity has raised ethical and social concerns related to key non-functional properties of trustworthiness, such as robustness, fairness, security, privacy, factuality, and explainability, similar to those in traditional deep learning (DL) tasks. Conventional approaches for studying trustworthiness in DL tasks often fall short due to the unique characteristics of T2I DMs, e.g., the multi-modal nature. Given the challenge, recent efforts have been made to develop new methods for investigating trustworthiness in T2I DMs via various means, including falsification, enhancement, verification \& validation and assessment. However, there is a notable lack of in-depth analysis concerning those non-functional properties and means. In this survey, we provide a timely and focused review of the literature on trustworthy T2I DMs, covering a concise-structured taxonomy from the perspectives of property, means, benchmarks and applications. Our review begins with an introduction to essential preliminaries of T2I DMs, and then we summarise key definitions/metrics specific to T2I tasks and analyses the means proposed in recent literature based on these definitions/metrics. Additionally, we review benchmarks and domain applications of T2I DMs. Finally, we highlight the gaps in current research, discuss the limitations of existing methods, and propose future research directions to advance the development of trustworthy T2I DMs. Furthermore, we keep up-to-date updates in this field to track the latest developments and maintain our GitHub repository at: https://github.com/wellzline/Trustworthy_T2I_DMs

URLs: https://github.com/wellzline/Trustworthy_T2I_DMs

new Task-recency bias strikes back: Adapting covariances in Exemplar-Free Class Incremental Learning

Authors: Grzegorz Rype\'s\'c, Sebastian Cygert, Tomasz Trzci\'nski, Bart{\l}omiej Twardowski

Abstract: Exemplar-Free Class Incremental Learning (EFCIL) tackles the problem of training a model on a sequence of tasks without access to past data. Existing state-of-the-art methods represent classes as Gaussian distributions in the feature extractor's latent space, enabling Bayes classification or training the classifier by replaying pseudo features. However, we identify two critical issues that compromise their efficacy when the feature extractor is updated on incremental tasks. First, they do not consider that classes' covariance matrices change and must be adapted after each task. Second, they are susceptible to a task-recency bias caused by dimensionality collapse occurring during training. In this work, we propose AdaGauss -- a novel method that adapts covariance matrices from task to task and mitigates the task-recency bias owing to the additional anti-collapse loss function. AdaGauss yields state-of-the-art results on popular EFCIL benchmarks and datasets when training from scratch or starting from a pre-trained backbone. The code is available at: https://github.com/grypesc/AdaGauss.

URLs: https://github.com/grypesc/AdaGauss.

new Using dynamic loss weighting to boost improvements in forecast stability

Authors: Daan Caljon, Jeff Vercauteren, Simon De Vos, Wouter Verbeke, Jente Van Belle

Abstract: Rolling origin forecast instability refers to variability in forecasts for a specific period induced by updating the forecast when new data points become available. Recently, an extension to the N-BEATS model for univariate time series point forecasting was proposed to include forecast stability as an additional optimization objective, next to accuracy. It was shown that more stable forecasts can be obtained without harming accuracy by minimizing a composite loss function that contains both a forecast error and a forecast instability component, with a static hyperparameter to control the impact of stability. In this paper, we empirically investigate whether further improvements in stability can be obtained without compromising accuracy by applying dynamic loss weighting algorithms, which change the loss weights during training. We show that some existing dynamic loss weighting methods achieve this objective. However, our proposed extension to the Random Weighting approach -- Task-Aware Random Weighting -- shows the best performance.

new SLIDE: A machine-learning based method for forced dynamic response estimation of multibody systems

Authors: Peter Manzl, Alexander Humer, Qasim Khadim, Johannes Gerstmayr

Abstract: In computational engineering, enhancing the simulation speed and efficiency is a perpetual goal. To fully take advantage of neural network techniques and hardware, we present the SLiding-window Initially-truncated Dynamic-response Estimator (SLIDE), a deep learning-based method designed to estimate output sequences of mechanical or multibody systems with primarily, but not exclusively, forced excitation. A key advantage of SLIDE is its ability to estimate the dynamic response of damped systems without requiring the full system state, making it particularly effective for flexible multibody systems. The method truncates the output window based on the decay of initial effects, such as damping, which is approximated by the complex eigenvalues of the systems linearized equations. In addition, a second neural network is trained to provide an error estimation, further enhancing the methods applicability. The method is applied to a diverse selection of systems, including the Duffing oscillator, a flexible slider-crank system, and an industrial 6R manipulator, mounted on a flexible socket. Our results demonstrate significant speedups from the simulation up to several millions, exceeding real-time performance substantially.

new Criticality and Safety Margins for Reinforcement Learning

Authors: Alexander Grushin, Walt Woods, Alvaro Velasquez, Simon Khan

Abstract: State of the art reinforcement learning methods sometimes encounter unsafe situations. Identifying when these situations occur is of interest both for post-hoc analysis and during deployment, where it might be advantageous to call out to a human overseer for help. Efforts to gauge the criticality of different points in time have been developed, but their accuracy is not well established due to a lack of ground truth, and they are not designed to be easily interpretable by end users. Therefore, we seek to define a criticality framework with both a quantifiable ground truth and a clear significance to users. We introduce true criticality as the expected drop in reward when an agent deviates from its policy for n consecutive random actions. We also introduce the concept of proxy criticality, a low-overhead metric that has a statistically monotonic relationship to true criticality. Safety margins make these interpretable, when defined as the number of random actions for which performance loss will not exceed some tolerance with high confidence. We demonstrate this approach in several environment-agent combinations; for an A3C agent in an Atari Beamrider environment, the lowest 5% of safety margins contain 47% of agent losses; i.e., supervising only 5% of decisions could potentially prevent roughly half of an agent's errors. This criticality framework measures the potential impacts of bad decisions, even before those decisions are made, allowing for more effective debugging and oversight of autonomous agents.

new Enhancing Lossy Compression Through Cross-Field Information for Scientific Applications

Authors: Youyuan Liu, Wenqi Jia, Taolue Yang, Miao Yin, Sian Jin

Abstract: Lossy compression is one of the most effective methods for reducing the size of scientific data containing multiple data fields. It reduces information density through prediction or transformation techniques to compress the data. Previous approaches use local information from a single target field when predicting target data points, limiting their potential to achieve higher compression ratios. In this paper, we identified significant cross-field correlations within scientific datasets. We propose a novel hybrid prediction model that utilizes CNN to extract cross-field information and combine it with existing local field information. Our solution enhances the prediction accuracy of lossy compressors, leading to improved compression ratios without compromising data quality. We evaluate our solution on three scientific datasets, demonstrating its ability to improve compression ratios by up to 25% under specific error bounds. Additionally, our solution preserves more data details and reduces artifacts compared to baseline approaches.

new Causality-based Subject and Task Fingerprints using fMRI Time-series Data

Authors: Dachuan Song, Li Shen, Duy Duong-Tran, Xuan Wang

Abstract: Recently, there has been a revived interest in system neuroscience causation models due to their unique capability to unravel complex relationships in multi-scale brain networks. In this paper, our goal is to verify the feasibility and effectiveness of using a causality-based approach for fMRI fingerprinting. Specifically, we propose an innovative method that utilizes the causal dynamics activities of the brain to identify the unique cognitive patterns of individuals (e.g., subject fingerprint) and fMRI tasks (e.g., task fingerprint). The key novelty of our approach stems from the development of a two-timescale linear state-space model to extract 'spatio-temporal' (aka causal) signatures from an individual's fMRI time series data. To the best of our knowledge, we pioneer and subsequently quantify, in this paper, the concept of 'causal fingerprint.' Our method is well-separated from other fingerprint studies as we quantify fingerprints from a cause-and-effect perspective, which are then incorporated with a modal decomposition and projection method to perform subject identification and a GNN-based (Graph Neural Network) model to perform task identification. Finally, we show that the experimental results and comparisons with non-causality-based methods demonstrate the effectiveness of the proposed methods. We visualize the obtained causal signatures and discuss their biological relevance in light of the existing understanding of brain functionalities. Collectively, our work paves the way for further studies on causal fingerprints with potential applications in both healthy controls and neurodegenerative diseases.

new Realistic Evaluation of Model Merging for Compositional Generalization

Authors: Derek Tam, Yash Kant, Brian Lester, Igor Gilitschenski, Colin Raffel

Abstract: Merging has become a widespread way to cheaply combine individual models into a single model that inherits their capabilities and attains better performance. This popularity has spurred rapid development of many new merging methods, which are typically validated in disparate experimental settings and frequently differ in the assumptions made about model architecture, data availability, and computational budget. In this work, we characterize the relative merits of different merging methods by evaluating them in a shared experimental setting and precisely identifying the practical requirements of each method. Specifically, our setting focuses on using merging for compositional generalization of capabilities in image classification, image generation, and natural language processing. Additionally, we measure the computational costs of different merging methods as well as how they perform when scaling the number of models being merged. Taken together, our results clarify the state of the field of model merging and provide a comprehensive and rigorous experimental setup to test new methods.

new Towards the Mitigation of Confirmation Bias in Semi-supervised Learning: a Debiased Training Perspective

Authors: Yu Wang, Yuxuan Yin, Peng Li

Abstract: Semi-supervised learning (SSL) commonly exhibits confirmation bias, where models disproportionately favor certain classes, leading to errors in predicted pseudo labels that accumulate under a self-training paradigm. Unlike supervised settings, which benefit from a rich, static data distribution, SSL inherently lacks mechanisms to correct this self-reinforced bias, necessitating debiased interventions at each training step. Although the generation of debiased pseudo labels has been extensively studied, their effective utilization remains underexplored. Our analysis indicates that data from biased classes should have a reduced influence on parameter updates, while more attention should be given to underrepresented classes. To address these challenges, we introduce TaMatch, a unified framework for debiased training in SSL. TaMatch employs a scaling ratio derived from both a prior target distribution and the model's learning status to estimate and correct bias at each training step. This ratio adjusts the raw predictions on unlabeled data to produce debiased pseudo labels. In the utilization phase, these labels are differently weighted according to their predicted class, enhancing training equity and minimizing class bias. Additionally, TaMatch dynamically adjust the target distribution in response to the model's learning progress, facilitating robust handling of practical scenarios where the prior distribution is unknown. Empirical evaluations show that TaMatch significantly outperforms existing state-of-the-art methods across a range of challenging image classification tasks, highlighting the critical importance of both the debiased generation and utilization of pseudo labels in SSL.

new DMC-VB: A Benchmark for Representation Learning for Control with Visual Distractors

Authors: Joseph Ortiz, Antoine Dedieu, Wolfgang Lehrach, Swaroop Guntupalli, Carter Wendelken, Ahmad Humayun, Guangyao Zhou, Sivaramakrishnan Swaminathan, Miguel L\'azaro-Gredilla, Kevin Murphy

Abstract: Learning from previously collected data via behavioral cloning or offline reinforcement learning (RL) is a powerful recipe for scaling generalist agents by avoiding the need for expensive online learning. Despite strong generalization in some respects, agents are often remarkably brittle to minor visual variations in control-irrelevant factors such as the background or camera viewpoint. In this paper, we present theDeepMind Control Visual Benchmark (DMC-VB), a dataset collected in the DeepMind Control Suite to evaluate the robustness of offline RL agents for solving continuous control tasks from visual input in the presence of visual distractors. In contrast to prior works, our dataset (a) combines locomotion and navigation tasks of varying difficulties, (b) includes static and dynamic visual variations, (c) considers data generated by policies with different skill levels, (d) systematically returns pairs of state and pixel observation, (e) is an order of magnitude larger, and (f) includes tasks with hidden goals. Accompanying our dataset, we propose three benchmarks to evaluate representation learning methods for pretraining, and carry out experiments on several recently proposed methods. First, we find that pretrained representations do not help policy learning on DMC-VB, and we highlight a large representation gap between policies learned on pixel observations and on states. Second, we demonstrate when expert data is limited, policy learning can benefit from representations pretrained on (a) suboptimal data, and (b) tasks with stochastic hidden goals. Our dataset and benchmark code to train and evaluate agents are available at: https://github.com/google-deepmind/dmc_vision_benchmark.

URLs: https://github.com/google-deepmind/dmc_vision_benchmark.

new Benchmarking Graph Conformal Prediction: Empirical Analysis, Scalability, and Theoretical Insights

Authors: Pranav Maneriker, Aditya T. Vadlamani, Anutam Srinivasan, Yuntian He, Ali Payani, Srinivasan Parthasarathy

Abstract: Conformal prediction has become increasingly popular for quantifying the uncertainty associated with machine learning models. Recent work in graph uncertainty quantification has built upon this approach for conformal graph prediction. The nascent nature of these explorations has led to conflicting choices for implementations, baselines, and method evaluation. In this work, we analyze the design choices made in the literature and discuss the tradeoffs associated with existing methods. Building on the existing implementations for existing methods, we introduce techniques to scale existing methods to large-scale graph datasets without sacrificing performance. Our theoretical and empirical results justify our recommendations for future scholarship in graph conformal prediction.

new FedDCL: a federated data collaboration learning as a hybrid-type privacy-preserving framework based on federated learning and data collaboration

Authors: Akira Imakura, Tetsuya Sakurai

Abstract: Recently, federated learning has attracted much attention as a privacy-preserving integrated analysis that enables integrated analysis of data held by multiple institutions without sharing raw data. On the other hand, federated learning requires iterative communication across institutions and has a big challenge for implementation in situations where continuous communication with the outside world is extremely difficult. In this study, we propose a federated data collaboration learning (FedDCL), which solves such communication issues by combining federated learning with recently proposed non-model share-type federated learning named as data collaboration analysis. In the proposed FedDCL framework, each user institution independently constructs dimensionality-reduced intermediate representations and shares them with neighboring institutions on intra-group DC servers. On each intra-group DC server, intermediate representations are transformed to incorporable forms called collaboration representations. Federated learning is then conducted between intra-group DC servers. The proposed FedDCL framework does not require iterative communication by user institutions and can be implemented in situations where continuous communication with the outside world is extremely difficult. The experimental results show that the performance of the proposed FedDCL is comparable to that of existing federated learning.

new Generative AI for fast and accurate Statistical Computation of Fluids

Authors: Roberto Molinaro, Samuel Lanthaler, Bogdan Raoni\'c, Tobias Rohner, Victor Armegioiu, Zhong Yi Wan, Fei Sha, Siddhartha Mishra, Leonardo Zepeda-N\'u\~nez

Abstract: We present a generative AI algorithm for addressing the challenging task of fast, accurate and robust statistical computation of three-dimensional turbulent fluid flows. Our algorithm, termed as GenCFD, is based on a conditional score-based diffusion model. Through extensive numerical experimentation with both incompressible and compressible fluid flows, we demonstrate that GenCFD provides very accurate approximation of statistical quantities of interest such as mean, variance, point pdfs, higher-order moments, while also generating high quality realistic samples of turbulent fluid flows and ensuring excellent spectral resolution. In contrast, ensembles of operator learning baselines which are trained to minimize mean (absolute) square errors regress to the mean flow. We present rigorous theoretical results uncovering the surprising mechanisms through which diffusion models accurately generate fluid flows. These mechanisms are illustrated with solvable toy models that exhibit the relevant features of turbulent fluid flows while being amenable to explicit analytical formulas.

new Discovery and inversion of the viscoelastic wave equation in inhomogeneous media

Authors: Su Chen, Yi Ding, Hiroe Miyake, Xiaojun Li

Abstract: In scientific machine learning, the task of identifying partial differential equations accurately from sparse and noisy data poses a significant challenge. Current sparse regression methods may identify inaccurate equations on sparse and noisy datasets and are not suitable for varying coefficients. To address this issue, we propose a hybrid framework that combines two alternating direction optimization phases: discovery and embedding. The discovery phase employs current well-developed sparse regression techniques to preliminarily identify governing equations from observations. The embedding phase implements a recurrent convolutional neural network (RCNN), enabling efficient processes for time-space iterations involved in discretized forms of wave equation. The RCNN model further optimizes the imperfect sparse regression results to obtain more accurate functional terms and coefficients. Through alternating update of discovery-embedding phases, essential physical equations can be robustly identified from noisy and low-resolution measurements. To assess the performance of proposed framework, numerical experiments are conducted on various scenarios involving wave equation in elastic/viscoelastic and homogeneous/inhomogeneous media. The results demonstrate that the proposed method exhibits excellent robustness and accuracy, even when faced with high levels of noise and limited data availability in both spatial and temporal domains.

new Embed and Emulate: Contrastive representations for simulation-based inference

Authors: Ruoxi Jiang, Peter Y. Lu, Rebecca Willett

Abstract: Scientific modeling and engineering applications rely heavily on parameter estimation methods to fit physical models and calibrate numerical simulations using real-world measurements. In the absence of analytic statistical models with tractable likelihoods, modern simulation-based inference (SBI) methods first use a numerical simulator to generate a dataset of parameters and simulated outputs. This dataset is then used to approximate the likelihood and estimate the system parameters given observation data. Several SBI methods employ machine learning emulators to accelerate data generation and parameter estimation. However, applying these approaches to high-dimensional physical systems remains challenging due to the cost and complexity of training high-dimensional emulators. This paper introduces Embed and Emulate (E&E): a new SBI method based on contrastive learning that efficiently handles high-dimensional data and complex, multimodal parameter posteriors. E&E learns a low-dimensional latent embedding of the data (i.e., a summary statistic) and a corresponding fast emulator in the latent space, eliminating the need to run expensive simulations or a high dimensional emulator during inference. We illustrate the theoretical properties of the learned latent space through a synthetic experiment and demonstrate superior performance over existing methods in a realistic, non-identifiable parameter estimation task using the high-dimensional, chaotic Lorenz 96 system.

new VickreyFeedback: Cost-efficient Data Construction for Reinforcement Learning from Human Feedback

Authors: Guoxi Zhang, Jiuding Duan

Abstract: This paper addresses the cost-efficiency aspect of Reinforcement Learning from Human Feedback (RLHF). RLHF leverages datasets of human preferences over outputs of large language models (LLM) to instill human expectations into LLMs. While preference annotation comes with a monetized cost, the economic utility of a preference dataset has not been considered by far. What exacerbates this situation is that given complex intransitive or cyclic relationships in preference datasets, existing algorithms for fine-tuning LLMs are still far from capturing comprehensive preferences. This raises severe cost-efficiency concerns in production environments, where preference data accumulate over time. In this paper, we see the fine-tuning of LLMs as a monetized economy and introduce an auction mechanism to improve the efficiency of the preference data collection in dollar terms. We show that introducing an auction mechanism can play an essential role in enhancing the cost-efficiency of RLHF while maintaining satisfactory model performance. Experimental results demonstrate that our proposed auction-based protocol is cost-efficient for fine-tuning LLMs by concentrating on high-quality feedback.

new A3: Active Adversarial Alignment for Source-Free Domain Adaptation

Authors: Chrisantus Eze, Christopher Crick

Abstract: Unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. Recent works have focused on source-free UDA, where only target data is available. This is challenging as models rely on noisy pseudo-labels and struggle with distribution shifts. We propose Active Adversarial Alignment (A3), a novel framework combining self-supervised learning, adversarial training, and active learning for robust source-free UDA. A3 actively samples informative and diverse data using an acquisition function for training. It adapts models via adversarial losses and consistency regularization, aligning distributions without source data access. A3 advances source-free UDA through its synergistic integration of active and adversarial learning for effective domain alignment and noise reduction.

new A physics-driven sensor placement optimization methodology for temperature field reconstruction

Authors: Xu Liu, Wen Yao, Wei Peng, Zhuojia Fu, Zixue Xiang, Xiaoqian Chen

Abstract: Perceiving the global field from sparse sensors has been a grand challenge in the monitoring, analysis, and design of physical systems. In this context, sensor placement optimization is a crucial issue. Most existing works require large and sufficient data to construct data-based criteria, which are intractable in data-free scenarios without numerical and experimental data. To this end, we propose a novel physics-driven sensor placement optimization (PSPO) method for temperature field reconstruction using a physics-based criterion to optimize sensor locations. In our methodological framework, we firstly derive the theoretical upper and lower bounds of the reconstruction error under noise scenarios by analyzing the optimal solution, proving that error bounds correlate with the condition number determined by sensor locations. Furthermore, the condition number, as the physics-based criterion, is used to optimize sensor locations by the genetic algorithm. Finally, the best sensors are validated by reconstruction models, including non-invasive end-to-end models, non-invasive reduced-order models, and physics-informed models. Experimental results, both on a numerical and an application case, demonstrate that the PSPO method significantly outperforms random and uniform selection methods, improving the reconstruction accuracy by nearly an order of magnitude. Moreover, the PSPO method can achieve comparable reconstruction accuracy to the existing data-driven placement optimization methods.

new Dual Cone Gradient Descent for Training Physics-Informed Neural Networks

Authors: Youngsik Hwang, Dong-Young Lim

Abstract: Physics-informed neural networks (PINNs) have emerged as a prominent approach for solving partial differential equations (PDEs) by minimizing a combined loss function that incorporates both boundary loss and PDE residual loss. Despite their remarkable empirical performance in various scientific computing tasks, PINNs often fail to generate reasonable solutions, and such pathological behaviors remain difficult to explain and resolve. In this paper, we identify that PINNs can be adversely trained when gradients of each loss function exhibit a significant imbalance in their magnitudes and present a negative inner product value. To address these issues, we propose a novel optimization framework, Dual Cone Gradient Descent (DCGD), which adjusts the direction of the updated gradient to ensure it falls within a dual cone region. This region is defined as a set of vectors where the inner products with both the gradients of the PDE residual loss and the boundary loss are non-negative. Theoretically, we analyze the convergence properties of DCGD algorithms in a non-convex setting. On a variety of benchmark equations, we demonstrate that DCGD outperforms other optimization algorithms in terms of various evaluation metrics. In particular, DCGD achieves superior predictive accuracy and enhances the stability of training for failure modes of PINNs and complex PDEs, compared to existing optimally tuned models. Moreover, DCGD can be further improved by combining it with popular strategies for PINNs, including learning rate annealing and the Neural Tangent Kernel (NTK).

new Neural Collaborative Filtering to Detect Anomalies in Human Semantic Trajectories

Authors: Yueyang Liu, Lance Kennedy, Hossein Amiri, Andreas Z\"ufle

Abstract: Human trajectory anomaly detection has become increasingly important across a wide range of applications, including security surveillance and public health. However, existing trajectory anomaly detection methods are primarily focused on vehicle-level traffic, while human-level trajectory anomaly detection remains under-explored. Since human trajectory data is often very sparse, machine learning methods have become the preferred approach for identifying complex patterns. However, concerns regarding potential biases and the robustness of these models have intensified the demand for more transparent and explainable alternatives. In response to these challenges, our research focuses on developing a lightweight anomaly detection model specifically designed to detect anomalies in human trajectories. We propose a Neural Collaborative Filtering approach to model and predict normal mobility. Our method is designed to model users' daily patterns of life without requiring prior knowledge, thereby enhancing performance in scenarios where data is sparse or incomplete, such as in cold start situations. Our algorithm consists of two main modules. The first is the collaborative filtering module, which applies collaborative filtering to model normal mobility of individual humans to places of interest. The second is the neural module, responsible for interpreting the complex spatio-temporal relationships inherent in human trajectory data. To validate our approach, we conducted extensive experiments using simulated and real-world datasets comparing to numerous state-of-the-art trajectory anomaly detection approaches.

new Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization

Authors: Mucong Ding, Chenghao Deng, Jocelyn Choo, Zichu Wu, Aakriti Agrawal, Avi Schwarzschild, Tianyi Zhou, Tom Goldstein, John Langford, Anima Anandkumar, Furong Huang

Abstract: While generalization over tasks from easy to hard is crucial to profile language models (LLMs), the datasets with fine-grained difficulty annotations for each problem across a broad range of complexity are still blank. Aiming to address this limitation, we present Easy2Hard-Bench, a consistently formatted collection of 6 benchmark datasets spanning various domains, such as mathematics and programming problems, chess puzzles, and reasoning questions. Each problem within these datasets is annotated with numerical difficulty scores. To systematically estimate problem difficulties, we collect abundant performance data on attempts to each problem by humans in the real world or LLMs on the prominent leaderboard. Leveraging the rich performance data, we apply well-established difficulty ranking systems, such as Item Response Theory (IRT) and Glicko-2 models, to uniformly assign numerical difficulty scores to problems. Moreover, datasets in Easy2Hard-Bench distinguish themselves from previous collections by a higher proportion of challenging problems. Through extensive experiments with six state-of-the-art LLMs, we provide a comprehensive analysis of their performance and generalization capabilities across varying levels of difficulty, with the aim of inspiring future research in LLM generalization. The datasets are available at https://huggingface.co/datasets/furonghuang-lab/Easy2Hard-Bench.

URLs: https://huggingface.co/datasets/furonghuang-lab/Easy2Hard-Bench.

new Multi-agent Reinforcement Learning for Dynamic Dispatching in Material Handling Systems

Authors: Xian Yeow Lee, Haiyan Wang, Daisuke Katsumata, Takaharu Matsui, Chetan Gupta

Abstract: This paper proposes a multi-agent reinforcement learning (MARL) approach to learn dynamic dispatching strategies, which is crucial for optimizing throughput in material handling systems across diverse industries. To benchmark our method, we developed a material handling environment that reflects the complexities of an actual system, such as various activities at different locations, physical constraints, and inherent uncertainties. To enhance exploration during learning, we propose a method to integrate domain knowledge in the form of existing dynamic dispatching heuristics. Our experimental results show that our method can outperform heuristics by up to 7.4 percent in terms of median throughput. Additionally, we analyze the effect of different architectures on MARL performance when training multiple agents with different functions. We also demonstrate that the MARL agents performance can be further improved by using the first iteration of MARL agents as heuristics to train a second iteration of MARL agents. This work demonstrates the potential of applying MARL to learn effective dynamic dispatching strategies that may be deployed in real-world systems to improve business outcomes.

new State-free Reinforcement Learning

Authors: Mingyu Chen, Aldo Pacchiano, Xuezhou Zhang

Abstract: In this work, we study the \textit{state-free RL} problem, where the algorithm does not have the states information before interacting with the environment. Specifically, denote the reachable state set by ${S}^\Pi := \{ s|\max_{\pi\in \Pi}q^{P, \pi}(s)>0 \}$, we design an algorithm which requires no information on the state space $S$ while having a regret that is completely independent of ${S}$ and only depend on ${S}^\Pi$. We view this as a concrete first step towards \textit{parameter-free RL}, with the goal of designing RL algorithms that require no hyper-parameter tuning.

new Gradient-free Decoder Inversion in Latent Diffusion Models

Authors: Seongmin Hong, Suh Yoon Jeon, Kyeonghyun Lee, Ernest K. Ryu, Se Young Chun

Abstract: In latent diffusion models (LDMs), denoising diffusion process efficiently takes place on latent space whose dimension is lower than that of pixel space. Decoder is typically used to transform the representation in latent space to that in pixel space. While a decoder is assumed to have an encoder as an accurate inverse, exact encoder-decoder pair rarely exists in practice even though applications often require precise inversion of decoder. Prior works for decoder inversion in LDMs employed gradient descent inspired by inversions of generative adversarial networks. However, gradient-based methods require larger GPU memory and longer computation time for larger latent space. For example, recent video LDMs can generate more than 16 frames, but GPUs with 24 GB memory can only perform gradient-based decoder inversion for 4 frames. Here, we propose an efficient gradient-free decoder inversion for LDMs, which can be applied to diverse latent models. Theoretical convergence property of our proposed inversion has been investigated not only for the forward step method, but also for the inertial Krasnoselskii-Mann (KM) iterations under mild assumption on cocoercivity that is satisfied by recent LDMs. Our proposed gradient-free method with Adam optimizer and learning rate scheduling significantly reduced computation time and memory usage over prior gradient-based methods and enabled efficient computation in applications such as noise-space watermarking while achieving comparable error levels.

new Hierarchical Federated Learning with Multi-Timescale Gradient Correction

Authors: Wenzhi Fang, Dong-Jun Han, Evan Chen, Shiqiang Wang, Christopher G. Brinton

Abstract: While traditional federated learning (FL) typically focuses on a star topology where clients are directly connected to a central server, real-world distributed systems often exhibit hierarchical architectures. Hierarchical FL (HFL) has emerged as a promising solution to bridge this gap, leveraging aggregation points at multiple levels of the system. However, existing algorithms for HFL encounter challenges in dealing with multi-timescale model drift, i.e., model drift occurring across hierarchical levels of data heterogeneity. In this paper, we propose a multi-timescale gradient correction (MTGC) methodology to resolve this issue. Our key idea is to introduce distinct control variables to (i) correct the client gradient towards the group gradient, i.e., to reduce client model drift caused by local updates based on individual datasets, and (ii) correct the group gradient towards the global gradient, i.e., to reduce group model drift caused by FL over clients within the group. We analytically characterize the convergence behavior of MTGC under general non-convex settings, overcoming challenges associated with couplings between correction terms. We show that our convergence bound is immune to the extent of data heterogeneity, confirming the stability of the proposed algorithm against multi-level non-i.i.d. data. Through extensive experiments on various datasets and models, we validate the effectiveness of MTGC in diverse HFL settings. The code for this project is available at \href{https://github.com/wenzhifang/MTGC}{https://github.com/wenzhifang/MTGC}.

URLs: https://github.com/wenzhifang/MTGC, https://github.com/wenzhifang/MTGC

new Review of Digital Asset Development with Graph Neural Network Unlearning

Authors: Zara Lisbon

Abstract: In the rapidly evolving landscape of digital assets, the imperative for robust data privacy and compliance with regulatory frameworks has intensified. This paper investigates the critical role of Graph Neural Networks (GNNs) in the management of digital assets and introduces innovative unlearning techniques specifically tailored to GNN architectures. We categorize unlearning strategies into two primary classes: data-driven approximation, which manipulates the graph structure to isolate and remove the influence of specific nodes, and model-driven approximation, which modifies the internal parameters and architecture of the GNN itself. By examining recent advancements in these unlearning methodologies, we highlight their applicability in various use cases, including fraud detection, risk assessment, token relationship prediction, and decentralized governance. We discuss the challenges inherent in balancing model performance with the requirements for data unlearning, particularly in the context of real-time financial applications. Furthermore, we propose a hybrid approach that combines the strengths of both unlearning strategies to enhance the efficiency and effectiveness of GNNs in digital asset ecosystems. Ultimately, this paper aims to provide a comprehensive framework for understanding and implementing GNN unlearning techniques, paving the way for secure and compliant deployment of machine learning in the digital asset domain.

new Towards Diverse Device Heterogeneous Federated Learning via Task Arithmetic Knowledge Integration

Authors: Mahdi Morafah, Vyacheslav Kungurtsev, Hojin Chang, Chen Chen, Bill Lin

Abstract: Federated Learning has emerged as a promising paradigm for collaborative machine learning, while preserving user data privacy. Despite its potential, standard FL lacks support for diverse heterogeneous device prototypes, which vary significantly in model and dataset sizes -- from small IoT devices to large workstations. This limitation is only partially addressed by existing knowledge distillation techniques, which often fail to transfer knowledge effectively across a broad spectrum of device prototypes with varied capabilities. This failure primarily stems from two issues: the dilution of informative logits from more capable devices by those from less capable ones, and the use of a single integrated logits as the distillation target across all devices, which neglects their individual learning capacities and and the unique contributions of each. To address these challenges, we introduce TAKFL, a novel KD-based framework that treats the knowledge transfer from each device prototype's ensemble as a separate task, independently distilling each to preserve its unique contributions and avoid dilution. TAKFL also incorporates a KD-based self-regularization technique to mitigate the issues related to the noisy and unsupervised ensemble distillation process. To integrate the separately distilled knowledge, we introduce an adaptive task arithmetic knowledge integration process, allowing each student model to customize the knowledge integration for optimal performance. Additionally, we present theoretical results demonstrating the effectiveness of task arithmetic in transferring knowledge across heterogeneous devices with varying capacities. Comprehensive evaluations of our method across both CV and NLP tasks demonstrate that TAKFL achieves SOTA results in a variety of datasets and settings, significantly outperforming existing KD-based methods. Code is released at https://github.com/MMorafah/TAKFL

URLs: https://github.com/MMorafah/TAKFL

new Latent Representation Learning for Multimodal Brain Activity Translation

Authors: Arman Afrasiyabi, Dhananjay Bhaskar, Erica L. Busch, Laurent Caplette, Rahul Singh, Guillaume Lajoie, Nicholas B. Turk-Browne, Smita Krishnaswamy

Abstract: Neuroscience employs diverse neuroimaging techniques, each offering distinct insights into brain activity, from electrophysiological recordings such as EEG, which have high temporal resolution, to hemodynamic modalities such as fMRI, which have increased spatial precision. However, integrating these heterogeneous data sources remains a challenge, which limits a comprehensive understanding of brain function. We present the Spatiotemporal Alignment of Multimodal Brain Activity (SAMBA) framework, which bridges the spatial and temporal resolution gaps across modalities by learning a unified latent space free of modality-specific biases. SAMBA introduces a novel attention-based wavelet decomposition for spectral filtering of electrophysiological recordings, graph attention networks to model functional connectivity between functional brain units, and recurrent layers to capture temporal autocorrelations in brain signal. We show that the training of SAMBA, aside from achieving translation, also learns a rich representation of brain information processing. We showcase this classify external stimuli driving brain activity from the representation learned in hidden layers of SAMBA, paving the way for broad downstream applications in neuroscience research and clinical contexts.

new A TextGCN-Based Decoding Approach for Improving Remote Sensing Image Captioning

Authors: Swadhin Das, Raksha Sharma

Abstract: Remote sensing images are highly valued for their ability to address complex real-world issues such as risk management, security, and meteorology. However, manually captioning these images is challenging and requires specialized knowledge across various domains. This letter presents an approach for automatically describing (captioning) remote sensing images. We propose a novel encoder-decoder setup that deploys a Text Graph Convolutional Network (TextGCN) and multi-layer LSTMs. The embeddings generated by TextGCN enhance the decoder's understanding by capturing the semantic relationships among words at both the sentence and corpus levels. Furthermore, we advance our approach with a comparison-based beam search method to ensure fairness in the search strategy for generating the final caption. We present an extensive evaluation of our approach against various other state-of-the-art encoder-decoder frameworks. We evaluated our method across three datasets using seven metrics: BLEU-1 to BLEU-4, METEOR, ROUGE-L, and CIDEr. The results demonstrate that our approach significantly outperforms other state-of-the-art encoder-decoder methods.

new Fairness without Sensitive Attributes via Knowledge Sharing

Authors: Hongliang Ni, Lei Han, Tong Chen, Shazia Sadiq, Gianluca Demartini

Abstract: While model fairness improvement has been explored previously, existing methods invariably rely on adjusting explicit sensitive attribute values in order to improve model fairness in downstream tasks. However, we observe a trend in which sensitive demographic information becomes inaccessible as public concerns around data privacy grow. In this paper, we propose a confidence-based hierarchical classifier structure called "Reckoner" for reliable fair model learning under the assumption of missing sensitive attributes. We first present results showing that if the dataset contains biased labels or other hidden biases, classifiers significantly increase the bias gap across different demographic groups in the subset with higher prediction confidence. Inspired by these findings, we devised a dual-model system in which a version of the model initialised with a high-confidence data subset learns from a version of the model initialised with a low-confidence data subset, enabling it to avoid biased predictions. Our experimental results show that Reckoner consistently outperforms state-of-the-art baselines in COMPAS dataset and New Adult dataset, considering both accuracy and fairness metrics.

new CycleNet: Enhancing Time Series Forecasting through Modeling Periodic Patterns

Authors: Shengsheng Lin, Weiwei Lin, Xinyi Hu, Wentai Wu, Ruichao Mo, Haocheng Zhong

Abstract: The stable periodic patterns present in time series data serve as the foundation for conducting long-horizon forecasts. In this paper, we pioneer the exploration of explicitly modeling this periodicity to enhance the performance of models in long-term time series forecasting (LTSF) tasks. Specifically, we introduce the Residual Cycle Forecasting (RCF) technique, which utilizes learnable recurrent cycles to model the inherent periodic patterns within sequences, and then performs predictions on the residual components of the modeled cycles. Combining RCF with a Linear layer or a shallow MLP forms the simple yet powerful method proposed in this paper, called CycleNet. CycleNet achieves state-of-the-art prediction accuracy in multiple domains including electricity, weather, and energy, while offering significant efficiency advantages by reducing over 90% of the required parameter quantity. Furthermore, as a novel plug-and-play technique, the RCF can also significantly improve the prediction accuracy of existing models, including PatchTST and iTransformer. The source code is available at: https://github.com/ACAT-SCUT/CycleNet.

URLs: https://github.com/ACAT-SCUT/CycleNet.

new Deep Heterogeneous Contrastive Hyper-Graph Learning for In-the-Wild Context-Aware Human Activity Recognition

Authors: Wen Ge, Guanyi Mou, Emmanuel O. Agu, Kyumin Lee

Abstract: Human Activity Recognition (HAR) is a challenging, multi-label classification problem as activities may co-occur and sensor signals corresponding to the same activity may vary in different contexts (e.g., different device placements). This paper proposes a Deep Heterogeneous Contrastive Hyper-Graph Learning (DHC-HGL) framework that captures heterogenous Context-Aware HAR (CA-HAR) hypergraph properties in a message-passing and neighborhood-aggregation fashion. Prior work only explored homogeneous or shallow-node-heterogeneous graphs. DHC-HGL handles heterogeneous CA-HAR data by innovatively 1) Constructing three different types of sub-hypergraphs that are each passed through different custom HyperGraph Convolution (HGC) layers designed to handle edge-heterogeneity and 2) Adopting a contrastive loss function to ensure node-heterogeneity. In rigorous evaluation on two CA-HAR datasets, DHC-HGL significantly outperformed state-of-the-art baselines by 5.8% to 16.7% on Matthews Correlation Coefficient (MCC) and 3.0% to 8.4% on Macro F1 scores. UMAP visualizations of learned CA-HAR node embeddings are also presented to enhance model explainability.

new HSTFL: A Heterogeneous Federated Learning Framework for Misaligned Spatiotemporal Forecasting

Authors: Shuowei Cai, Hao Liu

Abstract: Spatiotemporal forecasting has emerged as an indispensable building block of diverse smart city applications, such as intelligent transportation and smart energy management. Recent advancements have uncovered that the performance of spatiotemporal forecasting can be significantly improved by integrating knowledge in geo-distributed time series data from different domains, \eg enhancing real-estate appraisal with human mobility data; joint taxi and bike demand predictions. While effective, existing approaches assume a centralized data collection and exploitation environment, overlooking the privacy and commercial interest concerns associated with data owned by different parties. In this paper, we investigate multi-party collaborative spatiotemporal forecasting without direct access to multi-source private data. However, this task is challenging due to 1) cross-domain feature heterogeneity and 2) cross-client geographical heterogeneity, where standard horizontal or vertical federated learning is inapplicable. To this end, we propose a Heterogeneous SpatioTemporal Federated Learning (HSTFL) framework to enable multiple clients to collaboratively harness geo-distributed time series data from different domains while preserving privacy. Specifically, we first devise vertical federated spatiotemporal representation learning to locally preserve spatiotemporal dependencies among individual participants and generate effective representations for heterogeneous data. Then we propose a cross-client virtual node alignment block to incorporate cross-client spatiotemporal dependencies via a multi-level knowledge fusion scheme. Extensive privacy analysis and experimental evaluations demonstrate that HSTFL not only effectively resists inference attacks but also provides a significant improvement against various baselines.

new Treating Brain-inspired Memories as Priors for Diffusion Model to Forecast Multivariate Time Series

Authors: Muyao Wang, Wenchao Chen, Zhibin Duan, Bo Chen

Abstract: Forecasting Multivariate Time Series (MTS) involves significant challenges in various application domains. One immediate challenge is modeling temporal patterns with the finite length of the input. These temporal patterns usually involve periodic and sudden events that recur across different channels. To better capture temporal patterns, we get inspiration from humans' memory mechanisms and propose a channel-shared, brain-inspired memory module for MTS. Specifically, brain-inspired memory comprises semantic and episodic memory, where the former is used to capture general patterns, such as periodic events, and the latter is employed to capture special patterns, such as sudden events, respectively. Meanwhile, we design corresponding recall and update mechanisms to better utilize these patterns. Furthermore, acknowledging the capacity of diffusion models to leverage memory as a prior, we present a brain-inspired memory-augmented diffusion model. This innovative model retrieves relevant memories for different channels, utilizing them as distinct priors for MTS predictions. This incorporation significantly enhances the accuracy and robustness of predictions. Experimental results on eight datasets consistently validate the superiority of our approach in capturing and leveraging diverse recurrent temporal patterns across different channels.

new Fairness-aware Multiobjective Evolutionary Learning

Authors: Qingquan Zhang, Jialin Liu, Xin Yao

Abstract: Multiobjective evolutionary learning (MOEL) has demonstrated its advantages of training fairer machine learning models considering a predefined set of conflicting objectives, including accuracy and different fairness measures. Recent works propose to construct a representative subset of fairness measures as optimisation objectives of MOEL throughout model training. However, the determination of a representative measure set relies on dataset, prior knowledge and requires substantial computational costs. What's more, those representative measures may differ across different model training processes. Instead of using a static predefined set determined before model training, this paper proposes to dynamically and adaptively determine a representative measure set online during model training. The dynamically determined representative set is then used as optimising objectives of the MOEL framework and can vary with time. Extensive experimental results on 12 well-known benchmark datasets demonstrate that our proposed framework achieves outstanding performance compared to state-of-the-art approaches for mitigating unfairness in terms of accuracy as well as 25 fairness measures although only a few of them were dynamically selected and used as optimisation objectives. The results indicate the importance of setting optimisation objectives dynamically during training.

new Token Caching for Diffusion Transformer Acceleration

Authors: Jinming Lou, Wenyang Luo, Yufan Liu, Bing Li, Xinmiao Ding, Weiming Hu, Jiajiong Cao, Yuming Li, Chenguang Ma

Abstract: Diffusion transformers have gained substantial interest in diffusion generative modeling due to their outstanding performance. However, their high computational cost, arising from the quadratic computational complexity of attention mechanisms and multi-step inference, presents a significant bottleneck. To address this challenge, we propose TokenCache, a novel post-training acceleration method that leverages the token-based multi-block architecture of transformers to reduce redundant computations among tokens across inference steps. TokenCache specifically addresses three critical questions in the context of diffusion transformers: (1) which tokens should be pruned to eliminate redundancy, (2) which blocks should be targeted for efficient pruning, and (3) at which time steps caching should be applied to balance speed and quality. In response to these challenges, TokenCache introduces a Cache Predictor that assigns importance scores to tokens, enabling selective pruning without compromising model performance. Furthermore, we propose an adaptive block selection strategy to focus on blocks with minimal impact on the network's output, along with a Two-Phase Round-Robin (TPRR) scheduling policy to optimize caching intervals throughout the denoising process. Experimental results across various models demonstrate that TokenCache achieves an effective trade-off between generation quality and inference speed for diffusion transformers. Our code will be publicly available.

new Wasserstein Distance-Weighted Adversarial Network for Cross-Domain Credit Risk Assessment

Authors: Mohan Jiang, Jiating Lin, Hongju Ouyang, Jingming Pan, Siyuan Han, Bingyao Liu

Abstract: This paper delves into the application of adversarial domain adaptation (ADA) for enhancing credit risk assessment in financial institutions. It addresses two critical challenges: the cold start problem, where historical lending data is scarce, and the data imbalance issue, where high-risk transactions are underrepresented. The paper introduces an improved ADA framework, the Wasserstein Distance Weighted Adversarial Domain Adaptation Network (WD-WADA), which leverages the Wasserstein distance to align source and target domains effectively. The proposed method includes an innovative weighted strategy to tackle data imbalance, adjusting for both the class distribution and the difficulty level of predictions. The paper demonstrates that WD-WADA not only mitigates the cold start problem but also provides a more accurate measure of domain differences, leading to improved cross-domain credit risk assessment. Extensive experiments on real-world credit datasets validate the model's effectiveness, showcasing superior performance in cross-domain learning, classification accuracy, and model stability compared to traditional methods.

new Efficient Noise Mitigation for Enhancing Inference Accuracy in DNNs on Mixed-Signal Accelerators

Authors: Seyedarmin Azizi, Mohammad Erfan Sadeghi, Mehdi Kamal, Massoud Pedram

Abstract: In this paper, we propose a framework to enhance the robustness of the neural models by mitigating the effects of process-induced and aging-related variations of analog computing components on the accuracy of the analog neural networks. We model these variations as the noise affecting the precision of the activations and introduce a denoising block inserted between selected layers of a pre-trained model. We demonstrate that training the denoising block significantly increases the model's robustness against various noise levels. To minimize the overhead associated with adding these blocks, we present an exploration algorithm to identify optimal insertion points for the denoising blocks. Additionally, we propose a specialized architecture to efficiently execute the denoising blocks, which can be integrated into mixed-signal accelerators. We evaluate the effectiveness of our approach using Deep Neural Network (DNN) models trained on the ImageNet and CIFAR-10 datasets. The results show that on average, by accepting 2.03% parameter count overhead, the accuracy drop due to the variations reduces from 31.7% to 1.15%.

new CodeSCAN: ScreenCast ANalysis for Video Programming Tutorials

Authors: Alexander Naumann, Felix Hertlein, Jacqueline H\"ollig, Lucas Cazzonelli, Steffen Thoma

Abstract: Programming tutorials in the form of coding screencasts play a crucial role in programming education, serving both novices and experienced developers. However, the video format of these tutorials presents a challenge due to the difficulty of searching for and within videos. Addressing the absence of large-scale and diverse datasets for screencast analysis, we introduce the CodeSCAN dataset. It comprises 12,000 screenshots captured from the Visual Studio Code environment during development, featuring 24 programming languages, 25 fonts, and over 90 distinct themes, in addition to diverse layout changes and realistic user interactions. Moreover, we conduct detailed quantitative and qualitative evaluations to benchmark the performance of Integrated Development Environment (IDE) element detection, color-to-black-and-white conversion, and Optical Character Recognition (OCR). We hope that our contributions facilitate more research in coding screencast analysis, and we make the source code for creating the dataset and the benchmark publicly available on this website.

new Optimizing DNN Inference on Multi-Accelerator SoCs at Training-time

Authors: Matteo Risso, Alessio Burrello, Daniele Jahier Pagliari

Abstract: The demand for executing Deep Neural Networks (DNNs) with low latency and minimal power consumption at the edge has led to the development of advanced heterogeneous Systems-on-Chips (SoCs) that incorporate multiple specialized computing units (CUs), such as accelerators. Offloading DNN computations to a specific CU from the available set often exposes accuracy vs efficiency trade-offs, due to differences in their supported operations (e.g., standard vs. depthwise convolution) or data representations (e.g., more/less aggressively quantized). A challenging yet unresolved issue is how to map a DNN onto these multi-CU systems to maximally exploit the parallelization possibilities while taking accuracy into account. To address this problem, we present ODiMO, a hardware-aware tool that efficiently explores fine-grain mapping of DNNs among various on-chip CUs, during the training phase. ODiMO strategically splits individual layers of the neural network and executes them in parallel on the multiple available CUs, aiming to balance the total inference energy consumption or latency with the resulting accuracy, impacted by the unique features of the different hardware units. We test our approach on CIFAR-10, CIFAR-100, and ImageNet, targeting two open-source heterogeneous SoCs, i.e., DIANA and Darkside. We obtain a rich collection of Pareto-optimal networks in the accuracy vs. energy or latency space. We show that ODiMO reduces the latency of a DNN executed on the Darkside SoC by up to 8x at iso-accuracy, compared to manual heuristic mappings. When targeting energy, on the same SoC, ODiMO produced up to 50.8x more efficient mappings, with minimal accuracy drop (< 0.3%).

new Towards an active-learning approach to resource allocation for population-based damage prognosis

Authors: George Tsialiamanis, Keith Worden, Nikolaos Dervilis, Aidan J Hughes

Abstract: Damage prognosis is, arguably, one of the most difficult tasks of structural health monitoring (SHM). To address common problems of damage prognosis, a population-based SHM (PBSHM) approach is adopted in the current work. In this approach the prognosis problem is considered as an information-sharing problem where data from past structures are exploited to make more accurate inferences regarding currently-degrading structures. For a given population, there may exist restrictions on the resources available to conduct monitoring; thus, the current work studies the problem of allocating such resources within a population of degrading structures with a view to maximising the damage-prognosis accuracy. The challenges of the current framework are mainly associated with the inference of outliers on the level of damage evolution, given partial data from the damage-evolution phenomenon. The current approach considers an initial population of structures for which damage evolution is extensively observed. Subsequently, a second population of structures with evolving damage is considered for which two monitoring systems are available, a low-availability and high-fidelity (low-uncertainty) one, and a widely-available and low-fidelity (high-uncertainty) one. The task of the current work is to follow an active-learning approach to identify the structures to which the high-fidelity system should be assigned in order to enhance the predictive capabilities of the machine-learning model throughout the population.

new Climate Adaptation with Reinforcement Learning: Experiments with Flooding and Transportation in Copenhagen

Authors: Miguel Costa, Morten W. Petersen, Arthur Vandervoort, Martin Drews, Karyn Morrissey, Francisco C. Pereira

Abstract: Due to climate change the frequency and intensity of extreme rainfall events, which contribute to urban flooding, are expected to increase in many places. These floods can damage transport infrastructure and disrupt mobility, highlighting the need for cities to adapt to escalating risks. Reinforcement learning (RL) serves as a powerful tool for uncovering optimal adaptation strategies, determining how and where to deploy adaptation measures effectively, even under significant uncertainty. In this study, we leverage RL to identify the most effective timing and locations for implementing measures, aiming to reduce both direct and indirect impacts of flooding. Our framework integrates climate change projections of future rainfall events and floods, models city-wide motorized trips, and quantifies direct and indirect impacts on infrastructure and mobility. Preliminary results suggest that our RL-based approach can significantly enhance decision-making by prioritizing interventions in specific urban areas and identifying the optimal periods for their implementation.

new An Enhanced Federated Prototype Learning Method under Domain Shift

Authors: Liang Kuang, Kuangpu Guo, Jian Liang, Jianguo Zhang

Abstract: Federated Learning (FL) allows collaborative machine learning training without sharing private data. Numerous studies have shown that one significant factor affecting the performance of federated learning models is the heterogeneity of data across different clients, especially when the data is sampled from various domains. A recent paper introduces variance-aware dual-level prototype clustering and uses a novel $\alpha$-sparsity prototype loss, which increases intra-class similarity and reduces inter-class similarity. To ensure that the features converge within specific clusters, we introduce an improved algorithm, Federated Prototype Learning with Convergent Clusters, abbreviated as FedPLCC. To increase inter-class distances, we weight each prototype with the size of the cluster it represents. To reduce intra-class distances, considering that prototypes with larger distances might come from different domains, we select only a certain proportion of prototypes for the loss function calculation. Evaluations on the Digit-5, Office-10, and DomainNet datasets show that our method performs better than existing approaches.

new Using Deep Autoregressive Models as Causal Inference Engines

Authors: Daniel Jiwoong Im, Kevin Zhang, Nakul Verma, Kyunghyun Cho

Abstract: Existing causal inference (CI) models are limited to primarily handling low-dimensional confounders and singleton actions. We propose an autoregressive (AR) CI framework capable of handling complex confounders and sequential actions common in modern applications. We accomplish this by {\em sequencification}, transforming data from an underlying causal diagram into a sequence of tokens. This approach not only enables training with data generated from any DAG but also extends existing CI capabilities to accommodate estimating several statistical quantities using a {\em single} model. We can directly predict interventional probabilities, simplifying inference and enhancing outcome prediction accuracy. We demonstrate that an AR model adapted for CI is efficient and effective in various complex applications such as navigating mazes, playing chess endgames, and evaluating the impact of certain keywords on paper acceptance rates.

new Optimistic Games for Combinatorial Bayesian Optimization with Application to Protein Design

Authors: Melis Ilayda Bal, Pier Giuseppe Sessa, Mojmir Mutny, Andreas Krause

Abstract: Bayesian optimization (BO) is a powerful framework to optimize black-box expensive-to-evaluate functions via sequential interactions. In several important problems (e.g. drug discovery, circuit design, neural architecture search, etc.), though, such functions are defined over large $\textit{combinatorial and unstructured}$ spaces. This makes existing BO algorithms not feasible due to the intractable maximization of the acquisition function over these domains. To address this issue, we propose $\textbf{GameOpt}$, a novel game-theoretical approach to combinatorial BO. $\textbf{GameOpt}$ establishes a cooperative game between the different optimization variables, and selects points that are game $\textit{equilibria}$ of an upper confidence bound acquisition function. These are stable configurations from which no variable has an incentive to deviate$-$ analog to local optima in continuous domains. Crucially, this allows us to efficiently break down the complexity of the combinatorial domain into individual decision sets, making $\textbf{GameOpt}$ scalable to large combinatorial spaces. We demonstrate the application of $\textbf{GameOpt}$ to the challenging $\textit{protein design}$ problem and validate its performance on four real-world protein datasets. Each protein can take up to $20^{X}$ possible configurations, where $X$ is the length of a protein, making standard BO methods infeasible. Instead, our approach iteratively selects informative protein configurations and very quickly discovers highly active protein variants compared to other baselines.

new TemporalPaD: a reinforcement-learning framework for temporal feature representation and dimension reduction

Authors: Xuechen Mu, Zhenyu Huang, Kewei Li, Haotian Zhang, Xiuli Wang, Yusi Fan, Kai Zhang, Fengfeng Zhou

Abstract: Recent advancements in feature representation and dimension reduction have highlighted their crucial role in enhancing the efficacy of predictive modeling. This work introduces TemporalPaD, a novel end-to-end deep learning framework designed for temporal pattern datasets. TemporalPaD integrates reinforcement learning (RL) with neural networks to achieve concurrent feature representation and feature reduction. The framework consists of three cooperative modules: a Policy Module, a Representation Module, and a Classification Module, structured based on the Actor-Critic (AC) framework. The Policy Module, responsible for dimensionality reduction through RL, functions as the actor, while the Representation Module for feature extraction and the Classification Module collectively serve as the critic. We comprehensively evaluate TemporalPaD using 29 UCI datasets, a well-known benchmark for validating feature reduction algorithms, through 10 independent tests and 10-fold cross-validation. Additionally, given that TemporalPaD is specifically designed for time series data, we apply it to a real-world DNA classification problem involving enhancer category and enhancer strength. The results demonstrate that TemporalPaD is an efficient and effective framework for achieving feature reduction, applicable to both structured data and sequence datasets. The source code of the proposed TemporalPaD is freely available as supplementary material to this article and at http://www.healthinformaticslab.org/supp/.

URLs: http://www.healthinformaticslab.org/supp/.

new Differentially Private Non Parametric Copulas: Generating synthetic data with non parametric copulas under privacy guarantees

Authors: Pablo A. Osorio-Marulanda, John Esteban Castro Ramirez, Mikel Hern\'andez Jim\'enez, Nicolas Moreno Reyes, Gorka Epelde Unanue

Abstract: Creation of synthetic data models has represented a significant advancement across diverse scientific fields, but this technology also brings important privacy considerations for users. This work focuses on enhancing a non-parametric copula-based synthetic data generation model, DPNPC, by incorporating Differential Privacy through an Enhanced Fourier Perturbation method. The model generates synthetic data for mixed tabular databases while preserving privacy. We compare DPNPC with three other models (PrivBayes, DP-Copula, and DP-Histogram) across three public datasets, evaluating privacy, utility, and execution time. DPNPC outperforms others in modeling multivariate dependencies, maintaining privacy for small $\epsilon$ values, and reducing training times. However, limitations include the need to assess the model's performance with different encoding methods and consider additional privacy attacks. Future research should address these areas to enhance privacy-preserving synthetic data generation.

new Entropy, concentration, and learning: a statistical mechanics primer

Authors: Akshay Balsubramani

Abstract: Artificial intelligence models trained through loss minimization have demonstrated significant success, grounded in principles from fields like information theory and statistical physics. This work explores these established connections through the lens of statistical mechanics, starting from first-principles sample concentration behaviors that underpin AI and machine learning. Our development of statistical mechanics for modeling highlights the key role of exponential families, and quantities of statistics, physics, and information theory.

new How green is continual learning, really? Analyzing the energy consumption in continual training of vision foundation models

Authors: Tomaso Trinci, Simone Magistri, Roberto Verdecchia, Andrew D. Bagdanov

Abstract: With the ever-growing adoption of AI, its impact on the environment is no longer negligible. Despite the potential that continual learning could have towards Green AI, its environmental sustainability remains relatively uncharted. In this work we aim to gain a systematic understanding of the energy efficiency of continual learning algorithms. To that end, we conducted an extensive set of empirical experiments comparing the energy consumption of recent representation-, prompt-, and exemplar-based continual learning algorithms and two standard baseline (fine tuning and joint training) when used to continually adapt a pre-trained ViT-B/16 foundation model. We performed our experiments on three standard datasets: CIFAR-100, ImageNet-R, and DomainNet. Additionally, we propose a novel metric, the Energy NetScore, which we use measure the algorithm efficiency in terms of energy-accuracy trade-off. Through numerous evaluations varying the number and size of the incremental learning steps, our experiments demonstrate that different types of continual learning algorithms have very different impacts on energy consumption during both training and inference. Although often overlooked in the continual learning literature, we found that the energy consumed during the inference phase is crucial for evaluating the environmental sustainability of continual learning models.

new Understanding the Benefits of SimCLR Pre-Training in Two-Layer Convolutional Neural Networks

Authors: Han Zhang, Yuan Cao

Abstract: SimCLR is one of the most popular contrastive learning methods for vision tasks. It pre-trains deep neural networks based on a large amount of unlabeled data by teaching the model to distinguish between positive and negative pairs of augmented images. It is believed that SimCLR can pre-train a deep neural network to learn efficient representations that can lead to a better performance of future supervised fine-tuning. Despite its effectiveness, our theoretical understanding of the underlying mechanisms of SimCLR is still limited. In this paper, we theoretically introduce a case study of the SimCLR method. Specifically, we consider training a two-layer convolutional neural network (CNN) to learn a toy image data model. We show that, under certain conditions on the number of labeled data, SimCLR pre-training combined with supervised fine-tuning achieves almost optimal test loss. Notably, the label complexity for SimCLR pre-training is far less demanding compared to direct training on supervised data. Our analysis sheds light on the benefits of SimCLR in learning with fewer labels.

new Rethinking the Power of Timestamps for Robust Time Series Forecasting: A Global-Local Fusion Perspective

Authors: Chengsen Wang, Qi Qi, Jingyu Wang, Haifeng Sun, Zirui Zhuang, Jinming Wu, Jianxin Liao

Abstract: Time series forecasting has played a pivotal role across various industries, including finance, transportation, energy, healthcare, and climate. Due to the abundant seasonal information they contain, timestamps possess the potential to offer robust global guidance for forecasting techniques. However, existing works primarily focus on local observations, with timestamps being treated merely as an optional supplement that remains underutilized. When data gathered from the real world is polluted, the absence of global information will damage the robust prediction capability of these algorithms. To address these problems, we propose a novel framework named GLAFF. Within this framework, the timestamps are modeled individually to capture the global dependencies. Working as a plugin, GLAFF adaptively adjusts the combined weights for global and local information, enabling seamless collaboration with any time series forecasting backbone. Extensive experiments conducted on nine real-world datasets demonstrate that GLAFF significantly enhances the average performance of widely used mainstream forecasting models by 12.5%, surpassing the previous state-of-the-art method by 5.5%.

new Cottention: Linear Transformers With Cosine Attention

Authors: Gabriel Mongaras, Trevor Dohm, Eric C. Larson

Abstract: Attention mechanisms, particularly softmax attention, have been instrumental in the success of transformer-based models such as GPT. However, the quadratic memory complexity of softmax attention with respect to sequence length poses significant challenges for processing longer sequences. We introduce Cottention, a novel attention mechanism that replaces the softmax operation with cosine similarity. By leveraging the properties of cosine similarity and rearranging the attention equation, Cottention achieves native linear memory complexity with respect to sequence length, making it inherently more memory-efficient than softmax attention. We demonstrate that Cottention can be reformulated as a recurrent neural network (RNN) with a finite hidden state, allowing for constant memory usage during inference. We evaluate Cottention on both the bidirectional BERT and causal GPT tasks, demonstrating comparable performance to softmax attention while significantly reducing memory requirements. To ensure efficient computation, we develop a custom CUDA kernel for Cottention. Our results show that Cottention is a promising alternative to softmax attention, enabling the processing of longer sequences without sacrificing performance, due to its native linear memory complexity and ability to maintain a constant memory footprint during inference.

new TensorSocket: Shared Data Loading for Deep Learning Training

Authors: Ties Robroek (IT University of Copenhagen), Neil Kim Nielsen (IT University of Copenhagen), P{\i}nar T\"oz\"un (IT University of Copenhagen)

Abstract: Training deep learning models is a repetitive and resource-intensive process. Data scientists often train several models before landing on set of parameters (e.g., hyper-parameter tuning), model architecture (e.g., neural architecture search), among other things that yields the highest accuracy. The computational efficiency of these training tasks depends highly on how well we can supply the training process with training data. The repetitive nature of these tasks results in the same data processing pipelines running over and over exacerbating the need for and costs of computational resources. In this paper, we present Tensorsocket to reduce the computational needs of deep learning training by enabling simultaneous training processes to share the same data loader. Tensorsocket mitigates CPU-side bottlenecks in cases where the collocated training workloads have high throughput on GPU, but are held back by lower data-loading throughput on CPU. Tensorsocket achieves this by reducing redundant computations across collocated training processes and leveraging modern GPU-GPU interconnects. We demonstrate the hardware- and pipeline-agnostic nature of Tensorsocket and evaluate it using a variety of training scenarios. Our evaluation shows that Tensorsocket enables scenarios that are infeasible without data sharing, increases training throughput by up to $100\%$, and when utilizing cloud instances, Tensorsocket achieves cost savings of $50\%$ by reducing the hardware resource needs on the CPU side. Furthermore, Tensorsocket outperforms the state-of-the-art solutions for shared data loading such as CoorDL and Joader. It is easier to use, maintain, and deploy, and either achieves higher or matches the throughput of other solutions while requiring less CPU resources.

new HardCore Generation: Generating Hard UNSAT Problems for Data Augmentation

Authors: Joseph Cotnareanu, Zhanguang Zhang, Hui-Ling Zhen, Yingxue Zhang, Mark Coates

Abstract: Efficiently determining the satisfiability of a boolean equation -- known as the SAT problem for brevity -- is crucial in various industrial problems. Recently, the advent of deep learning methods has introduced significant potential for enhancing SAT solving. However, a major barrier to the advancement of this field has been the scarcity of large, realistic datasets. The majority of current public datasets are either randomly generated or extremely limited, containing only a few examples from unrelated problem families. These datasets are inadequate for meaningful training of deep learning methods. In light of this, researchers have started exploring generative techniques to create data that more accurately reflect SAT problems encountered in practical situations. These methods have so far suffered from either the inability to produce challenging SAT problems or time-scalability obstacles. In this paper we address both by identifying and manipulating the key contributors to a problem's ``hardness'', known as cores. Although some previous work has addressed cores, the time costs are unacceptably high due to the expense of traditional heuristic core detection techniques. We introduce a fast core detection procedure that uses a graph neural network. Our empirical results demonstrate that we can efficiently generate problems that remain hard to solve and retain key attributes of the original example problems. We show via experiment that the generated synthetic SAT problems can be used in a data augmentation setting to provide improved prediction of solver runtimes.

new Hierarchical Federated ADMM

Authors: Seyed Mohammad Azimi-Abarghouyi, Nicola Bastianello, Karl H. Johansson, Viktoria Fodor

Abstract: In this paper, we depart from the widely-used gradient descent-based hierarchical federated learning (FL) algorithms to develop a novel hierarchical FL framework based on the alternating direction method of multipliers (ADMM). Within this framework, we propose two novel FL algorithms, which both use ADMM in the top layer: one that employs ADMM in the lower layer and another that uses the conventional gradient descent-based approach. The proposed framework enhances privacy, and experiments demonstrate the superiority of the proposed algorithms compared to the conventional algorithms in terms of learning convergence and accuracy. Additionally, gradient descent on the lower layer performs well even if the number of local steps is very limited, while ADMM on both layers lead to better performance otherwise.

new ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning

Authors: Jannis Becktepe, Julian Dierkes, Carolin Benjamins, Aditya Mohan, David Salinas, Raghu Rajan, Frank Hutter, Holger Hoos, Marius Lindauer, Theresa Eimer

Abstract: Hyperparameters are a critical factor in reliably training well-performing reinforcement learning (RL) agents. Unfortunately, developing and evaluating automated approaches for tuning such hyperparameters is both costly and time-consuming. As a result, such approaches are often only evaluated on a single domain or algorithm, making comparisons difficult and limiting insights into their generalizability. We propose ARLBench, a benchmark for hyperparameter optimization (HPO) in RL that allows comparisons of diverse HPO approaches while being highly efficient in evaluation. To enable research into HPO in RL, even in settings with low compute resources, we select a representative subset of HPO tasks spanning a variety of algorithm and environment combinations. This selection allows for generating a performance profile of an automated RL (AutoRL) method using only a fraction of the compute previously necessary, enabling a broader range of researchers to work on HPO in RL. With the extensive and large-scale dataset on hyperparameter landscapes that our selection is based on, ARLBench is an efficient, flexible, and future-oriented foundation for research on AutoRL. Both the benchmark and the dataset are available at https://github.com/automl/arlbench.

URLs: https://github.com/automl/arlbench.

new Two Sparse Matrices are Better than One: Sparsifying Neural Networks with Double Sparse Factorization

Authors: Vladim\'ir Bo\v{z}a, Vladim\'ir Macko

Abstract: Neural networks are often challenging to work with due to their large size and complexity. To address this, various methods aim to reduce model size by sparsifying or decomposing weight matrices, such as magnitude pruning and low-rank or block-diagonal factorization. In this work, we present Double Sparse Factorization (DSF), where we factorize each weight matrix into two sparse matrices. Although solving this problem exactly is computationally infeasible, we propose an efficient heuristic based on alternating minimization via ADMM that achieves state-of-the-art results, enabling unprecedented sparsification of neural networks. For instance, in a one-shot pruning setting, our method can reduce the size of the LLaMA2-13B model by 50% while maintaining better performance than the dense LLaMA2-7B model. We also compare favorably with Optimal Brain Compression, the state-of-the-art layer-wise pruning approach for convolutional neural networks. Furthermore, accuracy improvements of our method persist even after further model fine-tuning. Code available at: https://github.com/usamec/double_sparse.

URLs: https://github.com/usamec/double_sparse.

new Challenges of Generating Structurally Diverse Graphs

Authors: Fedor Velikonivtsev, Mikhail Mironov, Liudmila Prokhorenkova

Abstract: For many graph-related problems, it can be essential to have a set of structurally diverse graphs. For instance, such graphs can be used for testing graph algorithms or their neural approximations. However, to the best of our knowledge, the problem of generating structurally diverse graphs has not been explored in the literature. In this paper, we fill this gap. First, we discuss how to define diversity for a set of graphs, why this task is non-trivial, and how one can choose a proper diversity measure. Then, for a given diversity measure, we propose and compare several algorithms optimizing it: we consider approaches based on standard random graph models, local graph optimization, genetic algorithms, and neural generative models. We show that it is possible to significantly improve diversity over basic random graph generators. Additionally, our analysis of generated graphs allows us to better understand the properties of graph distances: depending on which diversity measure is used for optimization, the obtained graphs may possess very different structural properties which gives insights about the sensitivity of the graph distance underlying the diversity measure.

new CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting

Authors: Josef Koumar, Karel Hynek, Tom\'a\v{s} \v{C}ejka, Pavel \v{S}i\v{s}ka

Abstract: Anomaly detection in network traffic is crucial for maintaining the security of computer networks and identifying malicious activities. One of the primary approaches to anomaly detection are methods based on forecasting. Nevertheless, extensive real-world network datasets for forecasting and anomaly detection techniques are missing, potentially causing performance overestimation of anomaly detection algorithms. This manuscript addresses this gap by introducing a dataset comprising time series data of network entities' behavior, collected from the CESNET3 network. The dataset was created from 40 weeks of network traffic of 275 thousand active IP addresses. The ISP origin of the presented data ensures a high level of variability among network entities, which forms a unique and authentic challenge for forecasting and anomaly detection models. It provides valuable insights into the practical deployment of forecast-based anomaly detection approaches.

new HR-Extreme: A High-Resolution Dataset for Extreme Weather Forecasting

Authors: Nian Ran, Peng Xiao, Yue Wang, Wesley Shi, Jianxin Lin, Qi Meng, Richard Allmendinger

Abstract: The application of large deep learning models in weather forecasting has led to significant advancements in the field, including higher-resolution forecasting and extended prediction periods exemplified by models such as Pangu and Fuxi. Despite these successes, previous research has largely been characterized by the neglect of extreme weather events, and the availability of datasets specifically curated for such events remains limited. Given the critical importance of accurately forecasting extreme weather, this study introduces a comprehensive dataset that incorporates high-resolution extreme weather cases derived from the High-Resolution Rapid Refresh (HRRR) data, a 3-km real-time dataset provided by NOAA. We also evaluate the current state-of-the-art deep learning models and Numerical Weather Prediction (NWP) systems on HR-Extreme, and provide a improved baseline deep learning model called HR-Heim which has superior performance on both general loss and HR-Extreme compared to others. Our results reveal that the errors of extreme weather cases are significantly larger than overall forecast error, highlighting them as an crucial source of loss in weather prediction. These findings underscore the necessity for future research to focus on improving the accuracy of extreme weather forecasts to enhance their practical utility.

new HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models

Authors: Yu Zhou, Xingyu Wu, Jibin Wu, Liang Feng, Kay Chen Tan

Abstract: Model merging is a technique that combines multiple large pretrained models into a single model with enhanced performance and broader task adaptability. It has gained popularity in large pretrained model development due to its ability to bypass the need for original training data and further training processes. However, most existing model merging approaches focus solely on exploring the parameter space, merging models with identical architectures. Merging within the architecture space, despite its potential, remains in its early stages due to the vast search space and the challenges of layer compatibility. This paper marks a significant advance toward more flexible and comprehensive model merging techniques by modeling the architecture-space merging process as a reinforcement learning task. We train policy and value networks using offline sampling of weight vectors, which are then employed for the online optimization of merging strategies. Moreover, a multi-objective optimization paradigm is introduced to accommodate users' diverse task preferences, learning the Pareto front of optimal models to offer customized merging suggestions. Experimental results across multiple tasks, including text translation, mathematical reasoning, and code generation, validate the effectiveness and superiority of the proposed framework in model merging. The code will be made publicly available after the review process.

new Multi-Source Hard and Soft Information Fusion Approach for Accurate Cryptocurrency Price Movement Prediction

Authors: Saeed Mohammadi Dashtaki, Mehdi Hosseini Chagahi, Behzad Moshiri, Md. Jalil Piran

Abstract: One of the most important challenges in the financial and cryptocurrency field is accurately predicting cryptocurrency price trends. Leveraging artificial intelligence (AI) is beneficial in addressing this challenge. Cryptocurrency markets, marked by substantial growth and volatility, attract investors and scholars keen on deciphering and forecasting cryptocurrency price movements. The vast and diverse array of data available for such predictions increases the complexity of the task. In our study, we introduce a novel approach termed hard and soft information fusion (HSIF) to enhance the accuracy of cryptocurrency price movement forecasts. The hard information component of our approach encompasses historical price records alongside technical indicators. Complementing this, the soft data component extracts from X (formerly Twitter), encompassing news headlines and tweets about the cryptocurrency. To use this data, we use the Bidirectional Encoder Representations from Transformers (BERT)-based sentiment analysis method, financial BERT (FinBERT), which performs best. Finally, our model feeds on the information set including processed hard and soft data. We employ the bidirectional long short-term memory (BiLSTM) model because processing information in both forward and backward directions can capture long-term dependencies in sequential information. Our empirical findings emphasize the superiority of the HSIF approach over models dependent on single-source data by testing on Bitcoin-related data. By fusing hard and soft information on Bitcoin dataset, our model has about 96.8\% accuracy in predicting price movement. Incorporating information enables our model to grasp the influence of social sentiment on price fluctuations, thereby supplementing the technical analysis-based predictions derived from hard information.

new In-depth Analysis of Privacy Threats in Federated Learning for Medical Data

Authors: Badhan Chandra Das, M. Hadi Amini, Yanzhao Wu

Abstract: Federated learning is emerging as a promising machine learning technique in the medical field for analyzing medical images, as it is considered an effective method to safeguard sensitive patient data and comply with privacy regulations. However, recent studies have revealed that the default settings of federated learning may inadvertently expose private training data to privacy attacks. Thus, the intensity of such privacy risks and potential mitigation strategies in the medical domain remain unclear. In this paper, we make three original contributions to privacy risk analysis and mitigation in federated learning for medical data. First, we propose a holistic framework, MedPFL, for analyzing privacy risks in processing medical data in the federated learning environment and developing effective mitigation strategies for protecting privacy. Second, through our empirical analysis, we demonstrate the severe privacy risks in federated learning to process medical images, where adversaries can accurately reconstruct private medical images by performing privacy attacks. Third, we illustrate that the prevalent defense mechanism of adding random noises may not always be effective in protecting medical images against privacy attacks in federated learning, which poses unique and pressing challenges related to protecting the privacy of medical data. Furthermore, the paper discusses several unique research questions related to the privacy protection of medical data in the federated learning environment. We conduct extensive experiments on several benchmark medical image datasets to analyze and mitigate the privacy risks associated with federated learning for medical data.

new Best Arm Identification with Minimal Regret

Authors: Junwen Yang, Vincent Y. F. Tan, Tianyuan Jin

Abstract: Motivated by real-world applications that necessitate responsible experimentation, we introduce the problem of best arm identification (BAI) with minimal regret. This innovative variant of the multi-armed bandit problem elegantly amalgamates two of its most ubiquitous objectives: regret minimization and BAI. More precisely, the agent's goal is to identify the best arm with a prescribed confidence level $\delta$, while minimizing the cumulative regret up to the stopping time. Focusing on single-parameter exponential families of distributions, we leverage information-theoretic techniques to establish an instance-dependent lower bound on the expected cumulative regret. Moreover, we present an intriguing impossibility result that underscores the tension between cumulative regret and sample complexity in fixed-confidence BAI. Complementarily, we design and analyze the Double KL-UCB algorithm, which achieves asymptotic optimality as the confidence level tends to zero. Notably, this algorithm employs two distinct confidence bounds to guide arm selection in a randomized manner. Our findings elucidate a fresh perspective on the inherent connections between regret minimization and BAI.

new A-FedPD: Aligning Dual-Drift is All Federated Primal-Dual Learning Needs

Authors: Yan Sun, Li Shen, Dacheng Tao

Abstract: As a popular paradigm for juggling data privacy and collaborative training, federated learning (FL) is flourishing to distributively process the large scale of heterogeneous datasets on edged clients. Due to bandwidth limitations and security considerations, it ingeniously splits the original problem into multiple subproblems to be solved in parallel, which empowers primal dual solutions to great application values in FL. In this paper, we review the recent development of classical federated primal dual methods and point out a serious common defect of such methods in non-convex scenarios, which we say is a "dual drift" caused by dual hysteresis of those longstanding inactive clients under partial participation training. To further address this problem, we propose a novel Aligned Federated Primal Dual (A-FedPD) method, which constructs virtual dual updates to align global consensus and local dual variables for those protracted unparticipated local clients. Meanwhile, we provide a comprehensive analysis of the optimization and generalization efficiency for the A-FedPD method on smooth non-convex objectives, which confirms its high efficiency and practicality. Extensive experiments are conducted on several classical FL setups to validate the effectiveness of our proposed method.

new $O(d/T)$ Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions

Authors: Gen Li, Yuling Yan

Abstract: Score-based diffusion models, which generate new data by learning to reverse a diffusion process that perturbs data from the target distribution into noise, have achieved remarkable success across various generative tasks. Despite their superior empirical performance, existing theoretical guarantees are often constrained by stringent assumptions or suboptimal convergence rates. In this paper, we establish a fast convergence theory for a popular SDE-based sampler under minimal assumptions. Our analysis shows that, provided $\ell_{2}$-accurate estimates of the score functions, the total variation distance between the target and generated distributions is upper bounded by $O(d/T)$ (ignoring logarithmic factors), where $d$ is the data dimensionality and $T$ is the number of steps. This result holds for any target distribution with finite first-order moment. To our knowledge, this improves upon existing convergence theory for both the SDE-based sampler and another ODE-based sampler, while imposing minimal assumptions on the target data distribution and score estimates. This is achieved through a novel set of analytical tools that provides a fine-grained characterization of how the error propagates at each step of the reverse process.

cross A shortest-path based clustering algorithm for joint human-machine analysis of complex datasets

Authors: Diego Ulisse Pizzagalli, Santiago Fernandez Gonzalez, Rolf Krause

Abstract: Clustering is a technique for the analysis of datasets obtained by empirical studies in several disciplines with a major application for biomedical research. Essentially, clustering algorithms are executed by machines aiming at finding groups of related points in a dataset. However, the result of grouping depends on both metrics for point-to-point similarity and rules for point-to-group association. Indeed, non-appropriate metrics and rules can lead to undesirable clustering artifacts. This is especially relevant for datasets, where groups with heterogeneous structures co-exist. In this work, we propose an algorithm that achieves clustering by exploring the paths between points. This allows both, to evaluate the properties of the path (such as gaps, density variations, etc.), and expressing the preference for certain paths. Moreover, our algorithm supports the integration of existing knowledge about admissible and non-admissible clusters by training a path classifier. We demonstrate the accuracy of the proposed method on challenging datasets including points from synthetic shapes in publicly available benchmarks and microscopy data.

cross Reinforcement Learning for Finite Space Mean-Field Type Games

Authors: Kai Shao, Jiacheng Shen, Chijie An, Mathieu Lauri\`ere

Abstract: Mean field type games (MFTGs) describe Nash equilibria between large coalitions: each coalition consists of a continuum of cooperative agents who maximize the average reward of their coalition while interacting non-cooperatively with a finite number of other coalitions. Although the theory has been extensively developed, we are still lacking efficient and scalable computational methods. Here, we develop reinforcement learning methods for such games in a finite space setting with general dynamics and reward functions. We start by proving that MFTG solution yields approximate Nash equilibria in finite-size coalition games. We then propose two algorithms. The first is based on quantization of the mean-field spaces and Nash Q-learning. We provide convergence and stability analysis. We then propose an deep reinforcement learning algorithm, which can scale to larger spaces. Numerical examples on 5 environments show the scalability and the efficiency of the proposed method.

cross A novel application of Shapley values for large multidimensional time-series data: Applying explainable AI to a DNA profile classification neural network

Authors: Lauren Elborough, Duncan Taylor, Melissa Humphries

Abstract: The application of Shapley values to high-dimensional, time-series-like data is computationally challenging - and sometimes impossible. For $N$ inputs the problem is $2^N$ hard. In image processing, clusters of pixels, referred to as superpixels, are used to streamline computations. This research presents an efficient solution for time-seres-like data that adapts the idea of superpixels for Shapley value computation. Motivated by a forensic DNA classification example, the method is applied to multivariate time-series-like data whose features have been classified by a convolutional neural network (CNN). In DNA processing, it is important to identify alleles from the background noise created by DNA extraction and processing. A single DNA profile has $31,200$ scan points to classify, and the classification decisions must be defensible in a court of law. This means that classification is routinely performed by human readers - a monumental and time consuming process. The application of a CNN with fast computation of meaningful Shapley values provides a potential alternative to the classification. This research demonstrates the realistic, accurate and fast computation of Shapley values for this massive task

cross Decomposable Transformer Point Processes

Authors: Aristeidis Panos

Abstract: The standard paradigm of modeling marked point processes is by parameterizing the intensity function using an attention-based (Transformer-style) architecture. Despite the flexibility of these methods, their inference is based on the computationally intensive thinning algorithm. In this work, we propose a framework where the advantages of the attention-based architecture are maintained and the limitation of the thinning algorithm is circumvented. The framework depends on modeling the conditional distribution of inter-event times with a mixture of log-normals satisfying a Markov property and the conditional probability mass function for the marks with a Transformer-based architecture. The proposed method attains state-of-the-art performance in predicting the next event of a sequence given its history. The experiments also reveal the efficacy of the methods that do not rely on the thinning algorithm during inference over the ones they do. Finally, we test our method on the challenging long-horizon prediction task and find that it outperforms a baseline developed specifically for tackling this task; importantly, inference requires just a fraction of time compared to the thinning-based baseline.

cross Data-Prep-Kit: getting your data ready for LLM application development

Authors: David Wood, Boris Lublinsky, Alexy Roytman, Shivdeep Singh, Abdulhamid Adebayo, Revital Eres, Mohammad Nassar, Hima Patel, Yousaf Shah, Constantin Adam, Petros Zerfos, Nirmit Desai, Daiki Tsuzuku, Takuya Goto, Michele Dolfi, Saptha Surendran, Paramesvaran Selvam, Sungeun An, Yuan Chi Chang, Dhiraj Joshi, Hajar Emami-Gohari, Xuan-Hong Dang, Yan Koyfman, Shahrokh Daijavad

Abstract: Data preparation is the first and a very important step towards any Large Language Model (LLM) development. This paper introduces an easy-to-use, extensible, and scale-flexible open-source data preparation toolkit called Data Prep Kit (DPK). DPK is architected and designed to enable users to scale their data preparation to their needs. With DPK they can prepare data on a local machine or effortlessly scale to run on a cluster with thousands of CPU Cores. DPK comes with a highly scalable, yet extensible set of modules that transform natural language and code data. If the user needs additional transforms, they can be easily developed using extensive DPK support for transform creation. These modules can be used independently or pipelined to perform a series of operations. In this paper, we describe DPK architecture and show its performance from a small scale to a very large number of CPUs. The modules from DPK have been used for the preparation of Granite Models [1] [2]. We believe DPK is a valuable contribution to the AI community to easily prepare data to enhance the performance of their LLM models or to fine-tune models with Retrieval-Augmented Generation (RAG).

cross Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey

Authors: Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Ling Liu

Abstract: Recent research demonstrates that the nascent fine-tuning-as-a-service business model exposes serious safety concerns -- fine-tuning over a few harmful data uploaded by the users can compromise the safety alignment of the model. The attack, known as harmful fine-tuning, has raised a broad research interest among the community. However, as the attack is still new, \textbf{we observe from our miserable submission experience that there are general misunderstandings within the research community.} We in this paper aim to clear some common concerns for the attack setting, and formally establish the research problem. Specifically, we first present the threat model of the problem, and introduce the harmful fine-tuning attack and its variants. Then we systematically survey the existing literature on attacks/defenses/mechanical analysis of the problem. Finally, we outline future research directions that might contribute to the development of the field. Additionally, we present a list of questions of interest, which might be useful to refer to when reviewers in the peer review process question the realism of the experiment/attack/defense setting. A curated list of relevant papers is maintained and made accessible at: \url{https://github.com/git-disl/awesome_LLM-harmful-fine-tuning-papers.}

URLs: https://github.com/git-disl/awesome_LLM-harmful-fine-tuning-papers.

cross Autonomous Network Defence using Reinforcement Learning

Authors: Myles Foley, Chris Hicks, Kate Highnam, Vasilios Mavroudis

Abstract: In the network security arms race, the defender is significantly disadvantaged as they need to successfully detect and counter every malicious attack. In contrast, the attacker needs to succeed only once. To level the playing field, we investigate the effectiveness of autonomous agents in a realistic network defence scenario. We first outline the problem, provide the background on reinforcement learning and detail our proposed agent design. Using a network environment simulation, with 13 hosts spanning 3 subnets, we train a novel reinforcement learning agent and show that it can reliably defend continual attacks by two advanced persistent threat (APT) red agents: one with complete knowledge of the network layout and another which must discover resources through exploration but is more general.

cross Loop-Diffusion: an equivariant diffusion model for designing and scoring protein loops

Authors: Kevin Borisiak, Gian Marco Visani, Armita Nourmohammad

Abstract: Predicting protein functional characteristics from structure remains a central problem in protein science, with broad implications from understanding the mechanisms of disease to designing novel therapeutics. Unfortunately, current machine learning methods are limited by scarce and biased experimental data, and physics-based methods are either too slow to be useful, or too simplified to be accurate. In this work, we present Loop-Diffusion, an energy based diffusion model which leverages a dataset of general protein loops from the entire protein universe to learn an energy function that generalizes to functional prediction tasks. We evaluate Loop-Diffusion's performance on scoring TCR-pMHC interfaces and demonstrate state-of-the-art results in recognizing binding-enhancing mutations.

cross AI Policy Projector: Grounding LLM Policy Design in Iterative Mapmaking

Authors: Michelle S. Lam, Fred Hohman, Dominik Moritz, Jeffrey P. Bigham, Kenneth Holstein, Mary Beth Kery

Abstract: Whether a large language model policy is an explicit constitution or an implicit reward model, it is challenging to assess coverage over the unbounded set of real-world situations that a policy must contend with. We introduce an AI policy design process inspired by mapmaking, which has developed tactics for visualizing and iterating on maps even when full coverage is not possible. With Policy Projector, policy designers can survey the landscape of model input-output pairs, define custom regions (e.g., "violence"), and navigate these regions with rules that can be applied to LLM outputs (e.g., if output contains "violence" and "graphic details," then rewrite without "graphic details"). Policy Projector supports interactive policy authoring using LLM classification and steering and a map visualization reflecting the policy designer's work. In an evaluation with 12 AI safety experts, our system helps policy designers to address problematic model behaviors extending beyond an existing, comprehensive harm taxonomy.

cross A Unified View on Learning Unnormalized Distributions via Noise-Contrastive Estimation

Authors: J. Jon Ryu, Abhin Shah, Gregory W. Wornell

Abstract: This paper studies a family of estimators based on noise-contrastive estimation (NCE) for learning unnormalized distributions. The main contribution of this work is to provide a unified perspective on various methods for learning unnormalized distributions, which have been independently proposed and studied in separate research communities, through the lens of NCE. This unified view offers new insights into existing estimators. Specifically, for exponential families, we establish the finite-sample convergence rates of the proposed estimators under a set of regularity assumptions, most of which are new.

cross MMMT-IF: A Challenging Multimodal Multi-Turn Instruction Following Benchmark

Authors: Elliot L. Epstein, Kaisheng Yao, Jing Li, Xinyi Bai, Hamid Palangi

Abstract: Evaluating instruction following capabilities for multimodal, multi-turn dialogue is challenging. With potentially multiple instructions in the input model context, the task is time-consuming for human raters and we show LLM based judges are biased towards answers from the same model. We propose MMMT-IF, an image based multi-turn Q$\&$A evaluation set with added global instructions between questions, constraining the answer format. This challenges models to retrieve instructions dispersed across long dialogues and reason under instruction constraints. All instructions are objectively verifiable through code execution. We introduce the Programmatic Instruction Following ($\operatorname{PIF}$) metric to measure the fraction of the instructions that are correctly followed while performing a reasoning task. The $\operatorname{PIF-N-K}$ set of metrics further evaluates robustness by measuring the fraction of samples in a corpus where, for each sample, at least K out of N generated model responses achieve a $\operatorname{PIF}$ score of one. The $\operatorname{PIF}$ metric aligns with human instruction following ratings, showing 60 percent correlation. Experiments show Gemini 1.5 Pro, GPT-4o, and Claude 3.5 Sonnet, have a $\operatorname{PIF}$ metric that drops from 0.81 on average at turn 1 across the models, to 0.64 at turn 20. Across all turns, when each response is repeated 4 times ($\operatorname{PIF-4-4}$), GPT-4o and Gemini successfully follow all instructions only $11\%$ of the time. When all the instructions are also appended to the end of the model input context, the $\operatorname{PIF}$ metric improves by 22.3 points on average, showing that the challenge with the task lies not only in following the instructions, but also in retrieving the instructions spread out in the model context. We plan to open source the MMMT-IF dataset and metric computation code.

cross Learning to Drive via Asymmetric Self-Play

Authors: Chris Zhang, Sourav Biswas, Kelvin Wong, Kion Fallah, Lunjun Zhang, Dian Chen, Sergio Casas, Raquel Urtasun

Abstract: Large-scale data is crucial for learning realistic and capable driving policies. However, it can be impractical to rely on scaling datasets with real data alone. The majority of driving data is uninteresting, and deliberately collecting new long-tail scenarios is expensive and unsafe. We propose asymmetric self-play to scale beyond real data with additional challenging, solvable, and realistic synthetic scenarios. Our approach pairs a teacher that learns to generate scenarios it can solve but the student cannot, with a student that learns to solve them. When applied to traffic simulation, we learn realistic policies with significantly fewer collisions in both nominal and long-tail scenarios. Our policies further zero-shot transfer to generate training data for end-to-end autonomy, significantly outperforming state-of-the-art adversarial approaches, or using real data alone. For more information, visit https://waabi.ai/selfplay .

URLs: https://waabi.ai/selfplay

cross Revolutionizing Payload Inspection: A Self-Supervised Journey to Precision with Few Shots

Authors: Kyle Stein, Arash Mahyari, Guillermo Francia III, Eman El-Sheikh

Abstract: As networks continue to expand and become more interconnected, the need for novel malware detection methods becomes more pronounced. Traditional security measures are increasingly inadequate against the sophistication of modern cyber attacks. Deep Packet Inspection (DPI) has been pivotal in enhancing network security, offering an in-depth analysis of network traffic that surpasses conventional monitoring techniques. DPI not only examines the metadata of network packets, but also dives into the actual content being carried within the packet payloads, providing a comprehensive view of the data flowing through networks. The integration of advanced deep learning techniques with DPI has introduced modern methodologies into malware detection. However, the challenge with the state-of-the-art supervised learning approaches is that they prevent the generalization to unseen attacks embedded in the payloads, prohibiting them from accurately detecting new attacks and transferring knowledge learned from previous attacks to the new attacks with small labeled sample sizes. This paper leverages the recent advancements in self-supervised learning and few-shot learning. Our proposed self-supervised approach trains a transformer to learn the embedding of the payloads from a vast amount of unlabeled datasets by masking portions of payloads, leading to a learnt representation that well generalizes to various downstream tasks. Once the representation is extracted from payloads, they are used to train a malware detection algorithm. The representation obtained from the transformer is then used to adapt the malware detector to novel types of attacks using few-shot learning approaches. Our experimental results across several datasets show the great success and generalization of the proposed approach to novel scenarios.

cross Visual Concept Networks: A Graph-Based Approach to Detecting Anomalous Data in Deep Neural Networks

Authors: Debargha Ganguly, Debayan Gupta, Vipin Chaudhary

Abstract: Deep neural networks (DNNs), while increasingly deployed in many applications, struggle with robustness against anomalous and out-of-distribution (OOD) data. Current OOD benchmarks often oversimplify, focusing on single-object tasks and not fully representing complex real-world anomalies. This paper introduces a new, straightforward method employing graph structures and topological features to effectively detect both far-OOD and near-OOD data. We convert images into networks of interconnected human understandable features or visual concepts. Through extensive testing on two novel tasks, including ablation studies with large vocabularies and diverse tasks, we demonstrate the method's effectiveness. This approach enhances DNN resilience to OOD data and promises improved performance in various applications.

cross Spatial Visibility and Temporal Dynamics: Revolutionizing Field of View Prediction in Adaptive Point Cloud Video Streaming

Authors: Chen Li, Tongyu Zong, Yueyu Hu, Yao Wang, Yong Liu

Abstract: Field-of-View (FoV) adaptive streaming significantly reduces bandwidth requirement of immersive point cloud video (PCV) by only transmitting visible points in a viewer's FoV. The traditional approaches often focus on trajectory-based 6 degree-of-freedom (6DoF) FoV predictions. The predicted FoV is then used to calculate point visibility. Such approaches do not explicitly consider video content's impact on viewer attention, and the conversion from FoV to point visibility is often error-prone and time-consuming. We reformulate the PCV FoV prediction problem from the cell visibility perspective, allowing for precise decision-making regarding the transmission of 3D data at the cell level based on the predicted visibility distribution. We develop a novel spatial visibility and object-aware graph model that leverages the historical 3D visibility data and incorporates spatial perception, neighboring cell correlation, and occlusion information to predict the cell visibility in the future. Our model significantly improves the long-term cell visibility prediction, reducing the prediction MSE loss by up to 50% compared to the state-of-the-art models while maintaining real-time performance (more than 30fps) for point cloud videos with over 1 million points.

cross Towards sub-millisecond latency real-time speech enhancement models on hearables

Authors: Artem Dementyev, Chandan K. A. Reddy, Scott Wisdom, Navin Chatlani, John R. Hershey, Richard F. Lyon

Abstract: Low latency models are critical for real-time speech enhancement applications, such as hearing aids and hearables. However, the sub-millisecond latency space for resource-constrained hearables remains underexplored. We demonstrate speech enhancement using a computationally efficient minimum-phase FIR filter, enabling sample-by-sample processing to achieve mean algorithmic latency of 0.32 ms to 1.25 ms. With a single microphone, we observe a mean SI-SDRi of 4.1 dB. The approach shows generalization with a DNSMOS increase of 0.2 on unseen audio recordings. We use a lightweight LSTM-based model of 644k parameters to generate FIR taps. We benchmark that our system can run on low-power DSP with 388 MIPS and mean end-to-end latency of 3.35 ms. We provide a comparison with baseline low-latency spectral masking techniques. We hope this work will enable a better understanding of latency and can be used to improve the comfort and usability of hearables.

cross Development of an Edge Resilient ML Ensemble to Tolerate ICS Adversarial Attacks

Authors: Likai Yao, Qinxuan Shi, Zhanglong Yang, Sicong Shao, Salim Hariri

Abstract: Deploying machine learning (ML) in dynamic data-driven applications systems (DDDAS) can improve the security of industrial control systems (ICS). However, ML-based DDDAS are vulnerable to adversarial attacks because adversaries can alter the input data slightly so that the ML models predict a different result. In this paper, our goal is to build a resilient edge machine learning (reML) architecture that is designed to withstand adversarial attacks by performing Data Air Gap Transformation (DAGT) to anonymize data feature spaces using deep neural networks and randomize the ML models used for predictions. The reML is based on the Resilient DDDAS paradigm, Moving Target Defense (MTD) theory, and TinyML and is applied to combat adversarial attacks on ICS. Furthermore, the proposed approach is power-efficient and privacy-preserving and, therefore, can be deployed on power-constrained devices to enhance ICS security. This approach enables resilient ML inference at the edge by shifting the computation from the computing-intensive platforms to the resource-constrained edge devices. The incorporation of TinyML with TensorFlow Lite ensures efficient resource utilization and, consequently, makes reML suitable for deployment in various industrial control environments. Furthermore, the dynamic nature of reML, facilitated by the resilient DDDAS development environment, allows for continuous adaptation and improvement in response to emerging threats. Lastly, we evaluate our approach on an ICS dataset and demonstrate that reML provides a viable and effective solution for resilient ML inference at the edge devices.

cross DisGeM: Distractor Generation for Multiple Choice Questions with Span Masking

Authors: Devrim Cavusoglu, Secil Sen, Ulas Sert

Abstract: Recent advancements in Natural Language Processing (NLP) have impacted numerous sub-fields such as natural language generation, natural language inference, question answering, and more. However, in the field of question generation, the creation of distractors for multiple-choice questions (MCQ) remains a challenging task. In this work, we present a simple, generic framework for distractor generation using readily available Pre-trained Language Models (PLMs). Unlike previous methods, our framework relies solely on pre-trained language models and does not require additional training on specific datasets. Building upon previous research, we introduce a two-stage framework consisting of candidate generation and candidate selection. Our proposed distractor generation framework outperforms previous methods without the need for training or fine-tuning. Human evaluations confirm that our approach produces more effective and engaging distractors. The related codebase is publicly available at https://github.com/obss/disgem.

URLs: https://github.com/obss/disgem.

cross Predicting Muscle Thickness Deformation from Muscle Activation Patterns: A Dual-Attention Framework

Authors: Bangyu Lan, Kenan Niu

Abstract: Understanding the relationship between muscle activation and thickness deformation is critical for diagnosing muscle-related diseases and monitoring muscle health. Although ultrasound technique can measure muscle thickness change during muscle movement, its application in portable devices is limited by wiring and data collection challenges. Surface electromyography (sEMG), on the other hand, records muscle bioelectrical signals as the muscle activation. This paper introduced a deep-learning approach to leverage sEMG signals for muscle thickness deformation prediction, eliminating the need for ultrasound measurement. Using a dual-attention framework combining self-attention and cross-attention mechanisms, this method predicted muscle deformation directly from sEMG data. Experimental results with six healthy subjects showed that the approach could accurately predict muscle excursion with an average precision of 0.923$\pm$0.900mm, which shows that this method can facilitate real-time portable muscle health monitoring, showing potential for applications in clinical diagnostics, sports science, and rehabilitation.

cross SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining

Authors: Ruiqi Xian, Xiyang Wu, Tianrui Guan, Xijun Wang, Boqing Gong, Dinesh Manocha

Abstract: We introduce SOAR, a novel Self-supervised pretraining algorithm for aerial footage captured by Unmanned Aerial Vehicles (UAVs). We incorporate human object knowledge throughout the pretraining process to enhance UAV video pretraining efficiency and downstream action recognition performance. This is in contrast to prior works that primarily incorporate object information during the fine-tuning stage. Specifically, we first propose a novel object-aware masking strategy designed to retain the visibility of certain patches related to objects throughout the pretraining phase. Second, we introduce an object-aware loss function that utilizes object information to adjust the reconstruction loss, preventing bias towards less informative background patches. In practice, SOAR with a vanilla ViT backbone, outperforms best UAV action recognition models, recording a 9.7% and 21.4% boost in top-1 accuracy on the NEC-Drone and UAV-Human datasets, while delivering an inference speed of 18.7ms per video, making it 2x to 5x faster. Additionally, SOAR obtains comparable accuracy to prior self-supervised learning (SSL) methods while requiring 87.5% less pretraining time and 25% less memory usage

cross Harnessing Wavelet Transformations for Generalizable Deepfake Forgery Detection

Authors: Lalith Bharadwaj Baru, Shilhora Akshay Patel, Rohit Boddeda

Abstract: The evolution of digital image manipulation, particularly with the advancement of deep generative models, significantly challenges existing deepfake detection methods, especially when the origin of the deepfake is obscure. To tackle the increasing complexity of these forgeries, we propose \textbf{Wavelet-CLIP}, a deepfake detection framework that integrates wavelet transforms with features derived from the ViT-L/14 architecture, pre-trained in the CLIP fashion. Wavelet-CLIP utilizes Wavelet Transforms to deeply analyze both spatial and frequency features from images, thus enhancing the model's capability to detect sophisticated deepfakes. To verify the effectiveness of our approach, we conducted extensive evaluations against existing state-of-the-art methods for cross-dataset generalization and detection of unseen images generated by standard diffusion models. Our method showcases outstanding performance, achieving an average AUC of 0.749 for cross-data generalization and 0.893 for robustness against unseen deepfakes, outperforming all compared methods. The code can be reproduced from the repo: \url{https://github.com/lalithbharadwajbaru/Wavelet-CLIP}

URLs: https://github.com/lalithbharadwajbaru/Wavelet-CLIP

cross Deep-ER: Deep Learning ECCENTRIC Reconstruction for fast high-resolution neurometabolic imaging

Authors: Paul Weiser, Georg Langs, Wolfgang Bogner, Stanislav Motyka, Bernhard Strasser, Polina Golland, Nalini Singh, Jorg Dietrich, Erik Uhlmann, Tracy Batchelor, Daniel Cahill, Malte Hoffmann, Antoine Klauser, Ovidiu C. Andronesi

Abstract: Introduction: Altered neurometabolism is an important pathological mechanism in many neurological diseases and brain cancer, which can be mapped non-invasively by Magnetic Resonance Spectroscopic Imaging (MRSI). Advanced MRSI using non-cartesian compressed-sense acquisition enables fast high-resolution metabolic imaging but has lengthy reconstruction times that limits throughput and needs expert user interaction. Here, we present a robust and efficient Deep Learning reconstruction to obtain high-quality metabolic maps. Methods: Fast high-resolution whole-brain metabolic imaging was performed at 3.4 mm$^3$ isotropic resolution with acquisition times between 4:11-9:21 min:s using ECCENTRIC pulse sequence on a 7T MRI scanner. Data were acquired in a high-resolution phantom and 27 human participants, including 22 healthy volunteers and 5 glioma patients. A deep neural network using recurring interlaced convolutional layers with joint dual-space feature representation was developed for deep learning ECCENTRIC reconstruction (Deep-ER). 21 subjects were used for training and 6 subjects for testing. Deep-ER performance was compared to conventional iterative Total Generalized Variation reconstruction using image and spectral quality metrics. Results: Deep-ER demonstrated 600-fold faster reconstruction than conventional methods, providing improved spatial-spectral quality and metabolite quantification with 12%-45% (P<0.05) higher signal-to-noise and 8%-50% (P<0.05) smaller Cramer-Rao lower bounds. Metabolic images clearly visualize glioma tumor heterogeneity and boundary. Conclusion: Deep-ER provides efficient and robust reconstruction for sparse-sampled MRSI. The accelerated acquisition-reconstruction MRSI is compatible with high-throughput imaging workflow. It is expected that such improved performance will facilitate basic and clinical MRSI applications.

cross Embodied-RAG: General non-parametric Embodied Memory for Retrieval and Generation

Authors: Quanting Xie, So Yeon Min, Tianyi Zhang, Aarav Bajaj, Ruslan Salakhutdinov, Matthew Johnson-Roberson, Yonatan Bisk

Abstract: There is no limit to how much a robot might explore and learn, but all of that knowledge needs to be searchable and actionable. Within language research, retrieval augmented generation (RAG) has become the workhouse of large-scale non-parametric knowledge, however existing techniques do not directly transfer to the embodied domain, which is multimodal, data is highly correlated, and perception requires abstraction. To address these challenges, we introduce Embodied-RAG, a framework that enhances the foundational model of an embodied agent with a non-parametric memory system capable of autonomously constructing hierarchical knowledge for both navigation and language generation. Embodied-RAG handles a full range of spatial and semantic resolutions across diverse environments and query types, whether for a specific object or a holistic description of ambiance. At its core, Embodied-RAG's memory is structured as a semantic forest, storing language descriptions at varying levels of detail. This hierarchical organization allows the system to efficiently generate context-sensitive outputs across different robotic platforms. We demonstrate that Embodied-RAG effectively bridges RAG to the robotics domain, successfully handling over 200 explanation and navigation queries across 19 environments, highlighting its promise for general-purpose non-parametric system for embodied agents.

cross Local Prediction-Powered Inference

Authors: Yanwu Gu, Dong Xia

Abstract: To infer a function value on a specific point $x$, it is essential to assign higher weights to the points closer to $x$, which is called local polynomial / multivariable regression. In many practical cases, a limited sample size may ruin this method, but such conditions can be improved by the Prediction-Powered Inference (PPI) technique. This paper introduced a specific algorithm for local multivariable regression using PPI, which can significantly reduce the variance of estimations without enlarge the error. The confidence intervals, bias correction, and coverage probabilities are analyzed and proved the correctness and superiority of our algorithm. Numerical simulation and real-data experiments are applied and show these conclusions. Another contribution compared to PPI is the theoretical computation efficiency and explainability by taking into account the dependency of the dependent variable.

cross A Framework for Standardizing Similarity Measures in a Rapidly Evolving Field

Authors: Nathan Cloos, Guangyu Robert Yang, Christopher J. Cueva

Abstract: Similarity measures are fundamental tools for quantifying the alignment between artificial and biological systems. However, the diversity of similarity measures and their varied naming and implementation conventions makes it challenging to compare across studies. To facilitate comparisons and make explicit the implementation choices underlying a given code package, we have created and are continuing to develop a Python repository that benchmarks and standardizes similarity measures. The goal of creating a consistent naming convention that uniquely and efficiently specifies a similarity measure is not trivial as, for example, even commonly used methods like Centered Kernel Alignment (CKA) have at least 12 different variations, and this number will likely continue to grow as the field evolves. For this reason, we do not advocate for a fixed, definitive naming convention. The landscape of similarity measures and best practices will continue to change and so we see our current repository, which incorporates approximately 100 different similarity measures from 14 packages, as providing a useful tool at this snapshot in time. To accommodate the evolution of the field we present a framework for developing, validating, and refining naming conventions with the goal of uniquely and efficiently specifying similarity measures, ultimately making it easier for the community to make comparisons across studies.

cross AQMLator -- An Auto Quantum Machine Learning E-Platform

Authors: Tomasz Rybotycki, Piotr Gawron

Abstract: A successful Machine Learning (ML) model implementation requires three main components: training dataset, suitable model architecture and training procedure. Given dataset and task, finding an appropriate model might be challenging. AutoML, a branch of ML, focuses on automatic architecture search -- a meta method that aims at moving human from ML system design process. The success of ML and the development of quantum computing (QC) in recent years led to a birth of new fascinating field called Quantum Machine Learning (QML) that, amongst others, incorporates quantum computers into ML models. In this paper we present AQMLator, an Auto Quantum Machine Learning platform that aims to automatically propose and train the quantum layers of an ML model with minimal input from the user. This way, data scientists can bypass the entry barrier for QC and use QML. AQMLator uses standard ML libraries, making it easy to introduce into existing ML pipelines.

cross Multi-hypotheses Conditioned Point Cloud Diffusion for 3D Human Reconstruction from Occluded Images

Authors: Donghwan Kim, Tae-Kyun Kim

Abstract: 3D human shape reconstruction under severe occlusion due to human-object or human-human interaction is a challenging problem. Parametric models i.e., SMPL(-X), which are based on the statistics across human shapes, can represent whole human body shapes but are limited to minimally-clothed human shapes. Implicit-function-based methods extract features from the parametric models to employ prior knowledge of human bodies and can capture geometric details such as clothing and hair. However, they often struggle to handle misaligned parametric models and inpaint occluded regions given a single RGB image. In this work, we propose a novel pipeline, MHCDIFF, Multi-hypotheses Conditioned Point Cloud Diffusion, composed of point cloud diffusion conditioned on probabilistic distributions for pixel-aligned detailed 3D human reconstruction under occlusion. Compared to previous implicit-function-based methods, the point cloud diffusion model can capture the global consistent features to generate the occluded regions, and the denoising process corrects the misaligned SMPL meshes. The core of MHCDIFF is extracting local features from multiple hypothesized SMPL(-X) meshes and aggregating the set of features to condition the diffusion model. In the experiments on CAPE and MultiHuman datasets, the proposed method outperforms various SOTA methods based on SMPL, implicit functions, point cloud diffusion, and their combined, under synthetic and real occlusions.

cross Defect Prediction with Content-based Features

Authors: Hung Viet Pham, Tung Thanh Nguyen

Abstract: Traditional defect prediction approaches often use metrics that measure the complexity of the design or implementing code of a software system, such as the number of lines of code in a source file. In this paper, we explore a different approach based on content of source code. Our key assumption is that source code of a software system contains information about its technical aspects and those aspects might have different levels of defect-proneness. Thus, content-based features such as words, topics, data types, and package names extracted from a source code file could be used to predict its defects. We have performed an extensive empirical evaluation and found that: i) such content-based features have higher predictive power than code complexity metrics and ii) the use of feature selection, reduction, and combination further improves the prediction performance.

cross A model-constrained Discontinuous Galerkin Network (DGNet) for Compressible Euler Equations with Out-of-Distribution Generalization

Authors: Hai Van Nguyen (Department of Aerospace Engineering and Engineering Mechanics, the University of Texas at Austin, Texas), Jau-Uei Chen (Department of Aerospace Engineering and Engineering Mechanics, the University of Texas at Austin, Texas), William Cole Nockolds (The Oden Institute for Computational Engineering and Sciences, the University of Texas at Austin, Texas), Wesley Lao (The Oden Institute for Computational Engineering and Sciences, the University of Texas at Austin, Texas), Tan Bui-Thanh (Department of Aerospace Engineering and Engineering Mechanics, the University of Texas at Austin, Texas, The Oden Institute for Computational Engineering and Sciences, the University of Texas at Austin, Texas)

Abstract: Real-time accurate solutions of large-scale complex dynamical systems are critically needed for control, optimization, uncertainty quantification, and decision-making in practical engineering and science applications, particularly in digital twin contexts. In this work, we develop a model-constrained discontinuous Galerkin Network (DGNet) approach, an extension to our previous work [Model-constrained Tagent Slope Learning Approach for Dynamical Systems], for compressible Euler equations with out-of-distribution generalization. The core of DGNet is the synergy of several key strategies: (i) leveraging time integration schemes to capture temporal correlation and taking advantage of neural network speed for computation time reduction; (ii) employing a model-constrained approach to ensure the learned tangent slope satisfies governing equations; (iii) utilizing a GNN-inspired architecture where edges represent Riemann solver surrogate models and nodes represent volume integration correction surrogate models, enabling capturing discontinuity capacity, aliasing error reduction, and mesh discretization generalizability; (iv) implementing the input normalization technique that allows surrogate models to generalize across different initial conditions, boundary conditions, and solution orders; and (v) incorporating a data randomization technique that not only implicitly promotes agreement between surrogate models and true numerical models up to second-order derivatives, ensuring long-term stability and prediction capacity, but also serves as a data generation engine during training, leading to enhanced generalization on unseen data. To validate the effectiveness, stability, and generalizability of our novel DGNet approach, we present comprehensive numerical results for 1D and 2D compressible Euler equation problems.

cross Adaptive Learning of the Latent Space of Wasserstein Generative Adversarial Networks

Authors: Yixuan Qiu, Qingyi Gao, Xiao Wang

Abstract: Generative models based on latent variables, such as generative adversarial networks (GANs) and variational auto-encoders (VAEs), have gained lots of interests due to their impressive performance in many fields. However, many data such as natural images usually do not populate the ambient Euclidean space but instead reside in a lower-dimensional manifold. Thus an inappropriate choice of the latent dimension fails to uncover the structure of the data, possibly resulting in mismatch of latent representations and poor generative qualities. Towards addressing these problems, we propose a novel framework called the latent Wasserstein GAN (LWGAN) that fuses the Wasserstein auto-encoder and the Wasserstein GAN so that the intrinsic dimension of the data manifold can be adaptively learned by a modified informative latent distribution. We prove that there exist an encoder network and a generator network in such a way that the intrinsic dimension of the learned encoding distribution is equal to the dimension of the data manifold. We theoretically establish that our estimated intrinsic dimension is a consistent estimate of the true dimension of the data manifold. Meanwhile, we provide an upper bound on the generalization error of LWGAN, implying that we force the synthetic data distribution to be similar to the real data distribution from a population perspective. Comprehensive empirical experiments verify our framework and show that LWGAN is able to identify the correct intrinsic dimension under several scenarios, and simultaneously generate high-quality synthetic data by sampling from the learned latent distribution.

cross CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models

Authors: Kanghyun Ryu, Qiayuan Liao, Zhongyu Li, Koushil Sreenath, Negar Mehr

Abstract: Curriculum learning is a training mechanism in reinforcement learning (RL) that facilitates the achievement of complex policies by progressively increasing the task difficulty during training. However, designing effective curricula for a specific task often requires extensive domain knowledge and human intervention, which limits its applicability across various domains. Our core idea is that large language models (LLMs), with their extensive training on diverse language data and ability to encapsulate world knowledge, present significant potential for efficiently breaking down tasks and decomposing skills across various robotics environments. Additionally, the demonstrated success of LLMs in translating natural language into executable code for RL agents strengthens their role in generating task curricula. In this work, we propose CurricuLLM, which leverages the high-level planning and programming capabilities of LLMs for curriculum design, thereby enhancing the efficient learning of complex target tasks. CurricuLLM consists of: (Step 1) Generating sequence of subtasks that aid target task learning in natural language form, (Step 2) Translating natural language description of subtasks in executable task code, including the reward code and goal distribution code, and (Step 3) Evaluating trained policies based on trajectory rollout and subtask description. We evaluate CurricuLLM in various robotics simulation environments, ranging from manipulation, navigation, and locomotion, to show that CurricuLLM can aid learning complex robot control tasks. In addition, we validate humanoid locomotion policy learned through CurricuLLM in real-world. The code is provided in https://github.com/labicon/CurricuLLM

URLs: https://github.com/labicon/CurricuLLM

cross Scientific Machine Learning Seismology

Authors: Tomohisa Okazaki

Abstract: Scientific machine learning (SciML) is an interdisciplinary research field that integrates machine learning, particularly deep learning, with physics theory to understand and predict complex natural phenomena. By incorporating physical knowledge, SciML reduces the dependency on observational data, which is often limited in the natural sciences. In this article, the fundamental concepts of SciML, its applications in seismology, and prospects are described. Specifically, two popular methods are mainly discussed: physics-informed neural networks (PINNs) and neural operators (NOs). PINNs can address both forward and inverse problems by incorporating governing laws into the loss functions. The use of PINNs is expanding into areas such as simultaneous solutions of differential equations, inference in underdetermined systems, and regularization based on physics. These research directions would broaden the scope of deep learning in natural sciences. NOs are models designed for operator learning, which deals with relationships between infinite-dimensional spaces. NOs show promise in modeling the time evolution of complex systems based on observational or simulation data. Since large amounts of data are often required, combining NOs with physics-informed learning holds significant potential. Finally, SciML is considered from a broader perspective beyond deep learning: statistical (or mathematical) frameworks that integrate observational data with physical principles to model natural phenomena. In seismology, mathematically rigorous Bayesian statistics has been developed over the past decades, whereas more flexible and scalable deep learning has only emerged recently. Both approaches can be considered as part of SciML in a broad sense. Theoretical and practical insights in both directions would advance SciML methodologies and thereby deepen our understanding of earthquake phenomena.

cross Robust Network Learning via Inverse Scale Variational Sparsification

Authors: Zhiling Zhou, Zirui Liu, Chengming Xu, Yanwei Fu, Xinwei Sun

Abstract: While neural networks have made significant strides in many AI tasks, they remain vulnerable to a range of noise types, including natural corruptions, adversarial noise, and low-resolution artifacts. Many existing approaches focus on enhancing robustness against specific noise types, limiting their adaptability to others. Previous studies have addressed general robustness by adopting a spectral perspective, which tends to blur crucial features like texture and object contours. Our proposed solution, however, introduces an inverse scale variational sparsification framework within a time-continuous inverse scale space formulation. This framework progressively learns finer-scale features by discerning variational differences between pixels, ultimately preserving only large-scale features in the smoothed image. Unlike frequency-based methods, our approach not only removes noise by smoothing small-scale features where corruptions often occur but also retains high-contrast details such as textures and object contours. Moreover, our framework offers simplicity and efficiency in implementation. By integrating this algorithm into neural network training, we guide the model to prioritize learning large-scale features. We show the efficacy of our approach through enhanced robustness against various noise types.

cross URIEL+: Enhancing Linguistic Inclusion and Usability in a Typological and Multilingual Knowledge Base

Authors: Aditya Khan, Mason Shipton, David Anugraha, Kaiyao Duan, Phuong H. Hoang, Eric Khiu, A. Seza Do\u{g}ru\"oz, En-Shiun Annie Lee

Abstract: URIEL is a knowledge base offering geographical, phylogenetic, and typological vector representations for 7970 languages. It includes distance measures between these vectors for 4005 languages, which are accessible via the lang2vec tool. Despite being frequently cited, URIEL is limited in terms of linguistic inclusion and overall usability. To tackle these challenges, we introduce URIEL+, an enhanced version of URIEL and lang2vec addressing these limitations. In addition to expanding typological feature coverage for 2898 languages, URIEL+ improves user experience with robust, customizable distance calculations to better suit the needs of the users. These upgrades also offer competitive performance on downstream tasks and provide distances that better align with linguistic distance studies.

cross WHOMP: Optimizing Randomized Controlled Trials via Wasserstein Homogeneity

Authors: Shizhou Xu, Thomas Strohmer

Abstract: We investigate methods for partitioning datasets into subgroups that maximize diversity within each subgroup while minimizing dissimilarity across subgroups. We introduce a novel partitioning method called the $\textit{Wasserstein Homogeneity Partition}$ (WHOMP), which optimally minimizes type I and type II errors that often result from imbalanced group splitting or partitioning, commonly referred to as accidental bias, in comparative and controlled trials. We conduct an analytical comparison of WHOMP against existing partitioning methods, such as random subsampling, covariate-adaptive randomization, rerandomization, and anti-clustering, demonstrating its advantages. Moreover, we characterize the optimal solutions to the WHOMP problem and reveal an inherent trade-off between the stability of subgroup means and variances among these solutions. Based on our theoretical insights, we design algorithms that not only obtain these optimal solutions but also equip practitioners with tools to select the desired trade-off. Finally, we validate the effectiveness of WHOMP through numerical experiments, highlighting its superiority over traditional methods.

cross Med-IC: Fusing a Single Layer Involution with Convolutions for Enhanced Medical Image Classification and Segmentation

Authors: Md. Farhadul Islam, Sarah Zabeen, Meem Arafat Manab, Mohammad Rakibul Hasan Mahin, Joyanta Jyoti Mondal, Md. Tanzim Reza, Md Zahidul Hasan, Munima Haque, Farig Sadeque, Jannatun Noor

Abstract: The majority of medical images, especially those that resemble cells, have similar characteristics. These images, which occur in a variety of shapes, often show abnormalities in the organ or cell region. The convolution operation possesses a restricted capability to extract visual patterns across several spatial regions of an image. The involution process, which is the inverse operation of convolution, complements this inherent lack of spatial information extraction present in convolutions. In this study, we investigate how applying a single layer of involution prior to a convolutional neural network (CNN) architecture can significantly improve classification and segmentation performance, with a comparatively negligible amount of weight parameters. The study additionally shows how excessive use of involution layers might result in inaccurate predictions in a particular type of medical image. According to our findings from experiments, the strategy of adding only a single involution layer before a CNN-based model outperforms most of the previous works.

cross Robustness of AI-based weather forecasts in a changing climate

Authors: Thomas Rackow, Nikolay Koldunov, Christian Lessig, Irina Sandu, Mihai Alexe, Matthew Chantry, Mariana Clare, Jesper Dramsch, Florian Pappenberger, Xabier Pedruzo-Bagazgoitia, Steffen Tietsche, Thomas Jung

Abstract: Data-driven machine learning models for weather forecasting have made transformational progress in the last 1-2 years, with state-of-the-art ones now outperforming the best physics-based models for a wide range of skill scores. Given the strong links between weather and climate modelling, this raises the question whether machine learning models could also revolutionize climate science, for example by informing mitigation and adaptation to climate change or to generate larger ensembles for more robust uncertainty estimates. Here, we show that current state-of-the-art machine learning models trained for weather forecasting in present-day climate produce skillful forecasts across different climate states corresponding to pre-industrial, present-day, and future 2.9K warmer climates. This indicates that the dynamics shaping the weather on short timescales may not differ fundamentally in a changing climate. It also demonstrates out-of-distribution generalization capabilities of the machine learning models that are a critical prerequisite for climate applications. Nonetheless, two of the models show a global-mean cold bias in the forecasts for the future warmer climate state, i.e. they drift towards the colder present-day climate they have been trained for. A similar result is obtained for the pre-industrial case where two out of three models show a warming. We discuss possible remedies for these biases and analyze their spatial distribution, revealing complex warming and cooling patterns that are partly related to missing ocean-sea ice and land surface information in the training data. Despite these current limitations, our results suggest that data-driven machine learning models will provide powerful tools for climate science and transform established approaches by complementing conventional physics-based models.

cross Experimental Evaluation of Machine Learning Models for Goal-oriented Customer Service Chatbot with Pipeline Architecture

Authors: Nurul Ain Nabilah Mohd Isa, Siti Nuraishah Agos Jawaddi, Azlan Ismail

Abstract: Integrating machine learning (ML) into customer service chatbots enhances their ability to understand and respond to user queries, ultimately improving service performance. However, they may appear artificial to some users and affecting customer experience. Hence, meticulous evaluation of ML models for each pipeline component is crucial for optimizing performance, though differences in functionalities can lead to unfair comparisons. In this paper, we present a tailored experimental evaluation approach for goal-oriented customer service chatbots with pipeline architecture, focusing on three key components: Natural Language Understanding (NLU), dialogue management (DM), and Natural Language Generation (NLG). Our methodology emphasizes individual assessment to determine optimal ML models. Specifically, we focus on optimizing hyperparameters and evaluating candidate models for NLU (utilizing BERT and LSTM), DM (employing DQN and DDQN), and NLG (leveraging GPT-2 and DialoGPT). The results show that for the NLU component, BERT excelled in intent detection whereas LSTM was superior for slot filling. For the DM component, the DDQN model outperformed DQN by achieving fewer turns, higher rewards, as well as greater success rates. For NLG, the large language model GPT-2 surpassed DialoGPT in BLEU, METEOR, and ROUGE metrics. These findings aim to provide a benchmark for future research in developing and optimizing customer service chatbots, offering valuable insights into model performance and optimal hyperparameters.

cross "Oh LLM, I'm Asking Thee, Please Give Me a Decision Tree": Zero-Shot Decision Tree Induction and Embedding with Large Language Models

Authors: Ricardo Knauer, Mario Koddenbrock, Raphael Wallsberger, Nicholas M. Brisson, Georg N. Duda, Deborah Falla, David W. Evans, Erik Rodner

Abstract: Large language models (LLMs) provide powerful means to leverage prior knowledge for predictive modeling when data is limited. In this work, we demonstrate how LLMs can use their compressed world knowledge to generate intrinsically interpretable machine learning models, i.e., decision trees, without any training data. We find that these zero-shot decision trees can surpass data-driven trees on some small-sized tabular datasets and that embeddings derived from these trees perform on par with data-driven tree-based embeddings on average. Our knowledge-driven decision tree induction and embedding approaches therefore serve as strong new baselines for data-driven machine learning methods in the low-data regime.

cross ASAG2024: A Combined Benchmark for Short Answer Grading

Authors: G\'er\^ome Meyer, Philip Breuer, Jonathan F\"urst

Abstract: Open-ended questions test a more thorough understanding than closed-ended questions and are often a preferred assessment method. However, open-ended questions are tedious to grade and subject to personal bias. Therefore, there have been efforts to speed up the grading process through automation. Short Answer Grading (SAG) systems aim to automatically score students' answers. Despite growth in SAG methods and capabilities, there exists no comprehensive short-answer grading benchmark across different subjects, grading scales, and distributions. Thus, it is hard to assess the capabilities of current automated grading methods in terms of their generalizability. In this preliminary work, we introduce the combined ASAG2024 benchmark to facilitate the comparison of automated grading systems. Combining seven commonly used short-answer grading datasets in a common structure and grading scale. For our benchmark, we evaluate a set of recent SAG methods, revealing that while LLM-based approaches reach new high scores, they still are far from reaching human performance. This opens up avenues for future research on human-machine SAG systems.

cross Unsupervised Cognition

Authors: Alfredo Ibias, Hector Antona, Guillem Ramirez-Miranda, Enric Guinovart, Eduard Alarcon

Abstract: Unsupervised learning methods have a soft inspiration in cognition models. To this day, the most successful unsupervised learning methods revolve around clustering samples in a mathematical space. In this paper we propose a state-of-the-art primitive-based unsupervised learning approach for decision-making inspired by novel cognition models. This representation-centric approach models the input space constructively as a distributed hierarchical structure in an input-agnostic way. We compared our approach with current state-of-the-art in unsupervised learning classification, and with current state-of-the-art in cancer type classification. We show how our proposal outperforms previous state-of-the-art. We also evaluate some cognition-like properties of our proposal where it not only outperforms the compared algorithms (even supervised learning ones), but it also shows a different, more cognition-like, behaviour.

cross Towards Integrating Epistemic Uncertainty Estimation into the Radiotherapy Workflow

Authors: Marvin Tom Teichmann, Manasi Datar, Lisa Kratzke, Fernando Vega, Florin C. Ghesu

Abstract: The precision of contouring target structures and organs-at-risk (OAR) in radiotherapy planning is crucial for ensuring treatment efficacy and patient safety. Recent advancements in deep learning (DL) have significantly improved OAR contouring performance, yet the reliability of these models, especially in the presence of out-of-distribution (OOD) scenarios, remains a concern in clinical settings. This application study explores the integration of epistemic uncertainty estimation within the OAR contouring workflow to enable OOD detection in clinically relevant scenarios, using specifically compiled data. Furthermore, we introduce an advanced statistical method for OOD detection to enhance the methodological framework of uncertainty estimation. Our empirical evaluation demonstrates that epistemic uncertainty estimation is effective in identifying instances where model predictions are unreliable and may require an expert review. Notably, our approach achieves an AUC-ROC of 0.95 for OOD detection, with a specificity of 0.95 and a sensitivity of 0.92 for implant cases, underscoring its efficacy. This study addresses significant gaps in the current research landscape, such as the lack of ground truth for uncertainty estimation and limited empirical evaluations. Additionally, it provides a clinically relevant application of epistemic uncertainty estimation in an FDA-approved and widely used clinical solution for OAR segmentation from Varian, a Siemens Healthineers company, highlighting its practical benefits.

cross MG-Net: Learn to Customize QAOA with Circuit Depth Awareness

Authors: Yang Qian, Xinbiao Wang, Yuxuan Du, Yong Luo, Dacheng Tao

Abstract: Quantum Approximate Optimization Algorithm (QAOA) and its variants exhibit immense potential in tackling combinatorial optimization challenges. However, their practical realization confronts a dilemma: the requisite circuit depth for satisfactory performance is problem-specific and often exceeds the maximum capability of current quantum devices. To address this dilemma, here we first analyze the convergence behavior of QAOA, uncovering the origins of this dilemma and elucidating the intricate relationship between the employed mixer Hamiltonian, the specific problem at hand, and the permissible maximum circuit depth. Harnessing this understanding, we introduce the Mixer Generator Network (MG-Net), a unified deep learning framework adept at dynamically formulating optimal mixer Hamiltonians tailored to distinct tasks and circuit depths. Systematic simulations, encompassing Ising models and weighted Max-Cut instances with up to 64 qubits, substantiate our theoretical findings, highlighting MG-Net's superior performance in terms of both approximation ratio and efficiency.

cross Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning

Authors: Sheikh Salman Hassan, Yu Min Park, Yan Kyaw Tun, Walid Saad, Zhu Han, Choong Seon Hong

Abstract: In this paper, a novel generative adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association in NTNs. Traditional reinforcement learning (RL) methods for wireless network optimization often rely on manually designed reward functions, which can require extensive parameter tuning. To overcome these limitations, we employ inverse RL (IRL), specifically leveraging the GAIL framework, to automatically learn reward functions without manual design. We augment this framework with an asynchronous federated learning approach, enabling decentralized multi-satellite systems to collaboratively derive optimal policies. The proposed method aims to maximize spectrum efficiency (SE) while meeting minimum information rate requirements for RUEs. To address the non-convex, NP-hard nature of this problem, we combine the many-to-one matching theory with a multi-agent asynchronous federated IRL (MA-AFIRL) framework. This allows agents to learn through asynchronous environmental interactions, improving training efficiency and scalability. The expert policy is generated using the Whale optimization algorithm (WOA), providing data to train the automatic reward function within GAIL. Simulation results show that the proposed MA-AFIRL method outperforms traditional RL approaches, achieving a $14.6\%$ improvement in convergence and reward value. The novel GAIL-driven policy learning establishes a novel benchmark for 6G NTN optimization.

cross Scalable Cross-Entropy Loss for Sequential Recommendations with Large Item Catalogs

Authors: Gleb Mezentsev, Danil Gusak, Ivan Oseledets, Evgeny Frolov

Abstract: Scalability issue plays a crucial role in productionizing modern recommender systems. Even lightweight architectures may suffer from high computational overload due to intermediate calculations, limiting their practicality in real-world applications. Specifically, applying full Cross-Entropy (CE) loss often yields state-of-the-art performance in terms of recommendations quality. Still, it suffers from excessive GPU memory utilization when dealing with large item catalogs. This paper introduces a novel Scalable Cross-Entropy (SCE) loss function in the sequential learning setup. It approximates the CE loss for datasets with large-size catalogs, enhancing both time efficiency and memory usage without compromising recommendations quality. Unlike traditional negative sampling methods, our approach utilizes a selective GPU-efficient computation strategy, focusing on the most informative elements of the catalog, particularly those most likely to be false positives. This is achieved by approximating the softmax distribution over a subset of the model outputs through the maximum inner product search. Experimental results on multiple datasets demonstrate the effectiveness of SCE in reducing peak memory usage by a factor of up to 100 compared to the alternatives, retaining or even exceeding their metrics values. The proposed approach also opens new perspectives for large-scale developments in different domains, such as large language models.

cross Autoregressive Policy Optimization for Constrained Allocation Tasks

Authors: David Winkel, Niklas Strau{\ss}, Maximilian Bernhard, Zongyue Li, Thomas Seidl, Matthias Schubert

Abstract: Allocation tasks represent a class of problems where a limited amount of resources must be allocated to a set of entities at each time step. Prominent examples of this task include portfolio optimization or distributing computational workloads across servers. Allocation tasks are typically bound by linear constraints describing practical requirements that have to be strictly fulfilled at all times. In portfolio optimization, for example, investors may be obligated to allocate less than 30\% of the funds into a certain industrial sector in any investment period. Such constraints restrict the action space of allowed allocations in intricate ways, which makes learning a policy that avoids constraint violations difficult. In this paper, we propose a new method for constrained allocation tasks based on an autoregressive process to sequentially sample allocations for each entity. In addition, we introduce a novel de-biasing mechanism to counter the initial bias caused by sequential sampling. We demonstrate the superior performance of our approach compared to a variety of Constrained Reinforcement Learning (CRL) methods on three distinct constrained allocation tasks: portfolio optimization, computational workload distribution, and a synthetic allocation benchmark. Our code is available at: https://github.com/niklasdbs/paspo

URLs: https://github.com/niklasdbs/paspo

cross MemFusionMap: Working Memory Fusion for Online Vectorized HD Map Construction

Authors: Jingyu Song, Xudong Chen, Liupei Lu, Jie Li, Katherine A. Skinner

Abstract: High-definition (HD) maps provide environmental information for autonomous driving systems and are essential for safe planning. While existing methods with single-frame input achieve impressive performance for online vectorized HD map construction, they still struggle with complex scenarios and occlusions. We propose MemFusionMap, a novel temporal fusion model with enhanced temporal reasoning capabilities for online HD map construction. Specifically, we contribute a working memory fusion module that improves the model's memory capacity to reason across history frames. We also design a novel temporal overlap heatmap to explicitly inform the model about the temporal overlap information and vehicle trajectory in the Bird's Eye View space. By integrating these two designs, MemFusionMap significantly outperforms existing methods while also maintaining a versatile design for scalability. We conduct extensive evaluation on open-source benchmarks and demonstrate a maximum improvement of 5.4% in mAP over state-of-the-art methods. The code for MemFusionMap will be made open-source upon publication of this paper.

cross Geometric deep learning for galaxy-halo connection: a case study for galaxy intrinsic alignments

Authors: Yesukhei Jagvaral, Francois Lanusse, Rachel Mandelbaum

Abstract: Forthcoming cosmological imaging surveys, such as the Rubin Observatory LSST, require large-scale simulations encompassing realistic galaxy populations for a variety of scientific applications. Of particular concern is the phenomenon of intrinsic alignments (IA), whereby galaxies orient themselves towards overdensities, potentially introducing significant systematic biases in weak gravitational lensing analyses if they are not properly modeled. Due to computational constraints, simulating the intricate details of galaxy formation and evolution relevant to IA across vast volumes is impractical. As an alternative, we propose a Deep Generative Model trained on the IllustrisTNG-100 simulation to sample 3D galaxy shapes and orientations to accurately reproduce intrinsic alignments along with correlated scalar features. We model the cosmic web as a set of graphs, each graph representing a halo with nodes representing the subhalos/galaxies. The architecture consists of a SO(3) $\times$ $\mathbb{R}^n$ diffusion generative model, for galaxy orientations and $n$ scalars, implemented with E(3) equivariant Graph Neural Networks that explicitly respect the Euclidean symmetries of our Universe. The model is able to learn and predict features such as galaxy orientations that are statistically consistent with the reference simulation. Notably, our model demonstrates the ability to jointly model Euclidean-valued scalars (galaxy sizes, shapes, and colors) along with non-Euclidean valued SO(3) quantities (galaxy orientations) that are governed by highly complex galactic physics at non-linear scales.

cross Learning from Demonstration with Implicit Nonlinear Dynamics Models

Authors: Peter David Fagan, Subramanian Ramamoorthy

Abstract: Learning from Demonstration (LfD) is a useful paradigm for training policies that solve tasks involving complex motions. In practice, the successful application of LfD requires overcoming error accumulation during policy execution, i.e. the problem of drift due to errors compounding over time and the consequent out-of-distribution behaviours. Existing works seek to address this problem through scaling data collection, correcting policy errors with a human-in-the-loop, temporally ensembling policy predictions or through learning the parameters of a dynamical system model. In this work, we propose and validate an alternative approach to overcoming this issue. Inspired by reservoir computing, we develop a novel neural network layer that includes a fixed nonlinear dynamical system with tunable dynamical properties. We validate the efficacy of our neural network layer on the task of reproducing human handwriting motions using the LASA Human Handwriting Dataset. Through empirical experiments we demonstrate that incorporating our layer into existing neural network architectures addresses the issue of compounding errors in LfD. Furthermore, we perform a comparative evaluation against existing approaches including a temporal ensemble of policy predictions and an Echo State Networks (ESNs) implementation. We find that our approach yields greater policy precision and robustness on the handwriting task while also generalising to multiple dynamics regimes and maintaining competitive latency scores.

cross A method of using RSVD in residual calculation of LowBit GEMM

Authors: Hongyaoxing Gu

Abstract: The advancements of hardware technology in recent years has brought many possibilities for low-precision applications. However, the use of low precision can introduce significant computational errors, posing a considerable challenge to maintaining the computational accuracy. We propose low-rank residuals quantized matrix multiplication(LRQMM) method which introduces low-rank approximation in residual compensation for dense low precision quantization matrix multiplication. It can bring several times accuracy improvement with only BLAS-2 level extra time overhead. Moreover, LRQMM is a completely data-free quantization method that does not require additional data for pre-training. And it only works with low precision GEMM operator, which is easy to couple with other methods. Through experimentation, LRQMM can reduce the error of direct quantized matrix multiplication by 1~2 orders of magnitude, when dealing with larger matrix sizes, the computational speed is only reduced by approximately 20\%. In deep learning networks, LRQMM-4bit achieves 61.8% ImageNet Top-1 accuracy in Resnet-50, while the Direct Quant accuracy is only 8.3%.

cross Supervised Learning Model for Key Frame Identification from Cow Teat Videos

Authors: Minghao Wang, Pinxue Lin

Abstract: This paper proposes a method for improving the accuracy of mastitis risk assessment in cows using neural networks and video analysis. Mastitis, an infection of the udder tissue, is a critical health problem for cows and can be detected by examining the cow's teat. Traditionally, veterinarians assess the health of a cow's teat during the milking process, but this process is limited in time and can weaken the accuracy of the assessment. In commercial farms, cows are recorded by cameras when they are milked in the milking parlor. This paper uses a neural network to identify key frames in the recorded video where the cow's udder appears intact. These key frames allow veterinarians to have more flexible time to perform health assessments on the teat, increasing their efficiency and accuracy. However, there are challenges in using cow teat video for mastitis risk assessment, such as complex environments, changing cow positions and postures, and difficulty in identifying the udder from the video. To address these challenges, a fusion distance and an ensemble model are proposed to improve the performance (F-score) of identifying key frames from cow teat videos. The results show that these two approaches improve performance compared to using a single distance measure or model.

cross Esports Debut as a Medal Event at 2023 Asian Games: Exploring Public Perceptions with BERTopic and GPT-4 Topic Fine-Tuning

Authors: Tyreal Yizhou Qian, Bo Yu, Weizhe Li, Chenglong Xu

Abstract: This study examined the public opinions of esports at the 2023 Asian Games and value co-creation during the event using an LLM-enhanced BERTopic modeling analysis. We identified five major themes representing public perceptions, as well as how major stakeholders co-created value within and beyond the esports ecosystem. Key findings highlighted the strategic use of social media marketing to influence public opinion and promote esports events and brands, emphasizing the importance of event logistics and infrastructure. Additionally, the study revealed the co-creation value contributed by stakeholders outside the traditional esports ecosystem, particularly in promoting national representation and performance. Our findings supported the ongoing efforts to legitimize esports as a sport, noting that mainstream recognition remains a challenge. The inclusion of esports as a medal event showcased broader acceptance and helped mitigate negative public perceptions. Moreover, contributions from non-traditional stakeholders underscored the value of cross-subcultural collaborations in esports.

cross Convergence of Diffusion Models Under the Manifold Hypothesis in High-Dimensions

Authors: Iskander Azangulov, George Deligiannidis, Judith Rousseau

Abstract: Denoising Diffusion Probabilistic Models (DDPM) are powerful state-of-the-art methods used to generate synthetic data from high-dimensional data distributions and are widely used for image, audio and video generation as well as many more applications in science and beyond. The manifold hypothesis states that high-dimensional data often lie on lower-dimensional manifolds within the ambient space, and is widely believed to hold in provided examples. While recent results has provided invaluable insight into how diffusion models adapt to the manifold hypothesis, they do not capture the great empirical success of these models, making this a very fruitful research direction. In this work, we study DDPMs under the manifold hypothesis and prove that they achieve rates independent of the ambient dimension in terms of learning the score. In terms of sampling, we obtain rates independent of the ambient dimension w.r.t. the Kullback-Leibler divergence, and $O(\sqrt{D})$ w.r.t. the Wasserstein distance. We do this by developing a new framework connecting diffusion models to the well-studied theory of extrema of Gaussian Processes.

cross Early diagnosis of Alzheimer's disease from MRI images with deep learning model

Authors: Sajjad Aghasi Javid, Mahmood Mohassel Feghhi

Abstract: It is acknowledged that the most common cause of dementia worldwide is Alzheimer's disease (AD). This condition progresses in severity from mild to severe and interferes with people's everyday routines. Early diagnosis plays a critical role in patient care and clinical trials. Convolutional neural networks (CNN) are used to create a framework for identifying specific disease features from MRI scans Classification of dementia involves approaches such as medical history review, neuropsychological tests, and magnetic resonance imaging (MRI). However, the image dataset obtained from Kaggle faces a significant issue of class imbalance, which requires equal distribution of samples from each class to address. In this article, to address this imbalance, the Synthetic Minority Oversampling Technique (SMOTE) is utilized. Furthermore, a pre-trained convolutional neural network has been applied to the DEMNET dementia network to extract key features from AD images. The proposed model achieved an impressive accuracy of 98.67%.

cross Classification and regression of trajectories rendered as images via 2D Convolutional Neural Networks

Authors: Mariaclaudia Nicolai, Raffaella Fiamma Cabini, Diego Ulisse Pizzagalli

Abstract: Trajectories can be regarded as time-series of coordinates, typically arising from motile objects. Methods for trajectory classification are particularly important to detect different movement patterns, while methods for regression to compute motility metrics and forecasting. Recent advances in computer vision have facilitated the processing of trajectories rendered as images via artificial neural networks with 2d convolutional layers (CNNs). This approach leverages the capability of CNNs to learn spatial hierarchies of features from images, necessary to recognize complex shapes. Moreover, it overcomes the limitation of other machine learning methods that require input trajectories with a fixed number of points. However, rendering trajectories as images can introduce poorly investigated artifacts such as information loss due to the plotting of coordinates on a discrete grid, and spectral changes due to line thickness and aliasing. In this study, we investigate the effectiveness of CNNs for solving classification and regression problems from synthetic trajectories that have been rendered as images using different modalities. The parameters considered in this study include line thickness, image resolution, usage of motion history (color-coding of the temporal component) and anti-aliasing. Results highlight the importance of choosing an appropriate image resolution according to model depth and motion history in applications where movement direction is critical.

cross Constructing Confidence Intervals for 'the' Generalization Error -- a Comprehensive Benchmark Study

Authors: Hannah Schulz-K\"umpel, Sebastian Fischer, Thomas Nagler, Anne-Laure Boulesteix, Bernd Bischl, Roman Hornung

Abstract: When assessing the quality of prediction models in machine learning, confidence intervals (CIs) for the generalization error, which measures predictive performance, are a crucial tool. Luckily, there exist many methods for computing such CIs and new promising approaches are continuously being proposed. Typically, these methods combine various resampling procedures, most popular among them cross-validation and bootstrapping, with different variance estimation techniques. Unfortunately, however, there is currently no consensus on when any of these combinations may be most reliably employed and how they generally compare. In this work, we conduct the first large-scale study comparing CIs for the generalization error - empirically evaluating 13 different methods on a total of 18 tabular regression and classification problems, using four different inducers and a total of eight loss functions. We give an overview of the methodological foundations and inherent challenges of constructing CIs for the generalization error and provide a concise review of all 13 methods in a unified framework. Finally, the CI methods are evaluated in terms of their relative coverage frequency, width, and runtime. Based on these findings, we are able to identify a subset of methods that we would recommend. We also publish the datasets as a benchmarking suite on OpenML and our code on GitHub to serve as a basis for further studies.

cross Classical Statistical (In-Sample) Intuitions Don't Generalize Well: A Note on Bias-Variance Tradeoffs, Overfitting and Moving from Fixed to Random Designs

Authors: Alicia Curth

Abstract: The sudden appearance of modern machine learning (ML) phenomena like double descent and benign overfitting may leave many classically trained statisticians feeling uneasy -- these phenomena appear to go against the very core of statistical intuitions conveyed in any introductory class on learning from data. The historical lack of earlier observation of such phenomena is usually attributed to today's reliance on more complex ML methods, overparameterization, interpolation and/or higher data dimensionality. In this note, we show that there is another reason why we observe behaviors today that appear at odds with intuitions taught in classical statistics textbooks, which is much simpler to understand yet rarely discussed explicitly. In particular, many intuitions originate in fixed design settings, in which in-sample prediction error (under resampling of noisy outcomes) is of interest, while modern ML evaluates its predictions in terms of generalization error, i.e. out-of-sample prediction error in random designs. Here, we highlight that this simple move from fixed to random designs has (perhaps surprisingly) far-reaching consequences on textbook intuitions relating to the bias-variance tradeoff, and comment on the resulting (im)possibility of observing double descent and benign overfitting in fixed versus random designs.

cross Positional Encoder Graph Quantile Neural Networks for Geographic Data

Authors: William E. R. de Amorim, Scott A. Sisson, T. Rodrigues, David J. Nott, Guilherme S. Rodrigues

Abstract: Positional Encoder Graph Neural Networks (PE-GNNs) are a leading approach for modeling continuous spatial data. However, they often fail to produce calibrated predictive distributions, limiting their effectiveness for uncertainty quantification. We introduce the Positional Encoder Graph Quantile Neural Network (PE-GQNN), a novel method that integrates PE-GNNs, Quantile Neural Networks, and recalibration techniques in a fully nonparametric framework, requiring minimal assumptions about the predictive distributions. We propose a new network architecture that, when combined with a quantile-based loss function, yields accurate and reliable probabilistic models without increasing computational complexity. Our approach provides a flexible, robust framework for conditional density estimation, applicable beyond spatial data contexts. We further introduce a structured method for incorporating a KNN predictor into the model while avoiding data leakage through the GNN layer operation. Experiments on benchmark datasets demonstrate that PE-GQNN significantly outperforms existing state-of-the-art methods in both predictive accuracy and uncertainty quantification.

cross Individuation in Neural Models with and without Visual Grounding

Authors: Alexey Tikhonov, Lisa Bylinina, Ivan P. Yamshchikov

Abstract: We show differences between a language-and-vision model CLIP and two text-only models - FastText and SBERT - when it comes to the encoding of individuation information. We study latent representations that CLIP provides for substrates, granular aggregates, and various numbers of objects. We demonstrate that CLIP embeddings capture quantitative differences in individuation better than models trained on text-only data. Moreover, the individuation hierarchy we deduce from the CLIP embeddings agrees with the hierarchies proposed in linguistics and cognitive science.

cross Simulating Dynamic Tumor Contrast Enhancement in Breast MRI using Conditional Generative Adversarial Networks

Authors: Richard Osuala, Smriti Joshi, Apostolia Tsirikoglou, Lidia Garrucho, Walter H. L. Pinaya, Daniel M. Lang, Julia A. Schnabel, Oliver Diaz, Karim Lekadir

Abstract: This paper presents a method for virtual contrast enhancement in breast MRI, offering a promising non-invasive alternative to traditional contrast agent-based DCE-MRI acquisition. Using a conditional generative adversarial network, we predict DCE-MRI images, including jointly-generated sequences of multiple corresponding DCE-MRI timepoints, from non-contrast-enhanced MRIs, enabling tumor localization and characterization without the associated health risks. Furthermore, we qualitatively and quantitatively evaluate the synthetic DCE-MRI images, proposing a multi-metric Scaled Aggregate Measure (SAMe), assessing their utility in a tumor segmentation downstream task, and conclude with an analysis of the temporal patterns in multi-sequence DCE-MRI generation. Our approach demonstrates promising results in generating realistic and useful DCE-MRI sequences, highlighting the potential of virtual contrast enhancement for improving breast cancer diagnosis and treatment, particularly for patients where contrast agent administration is contraindicated.

cross Probabilistic Analysis of Least Squares, Orthogonal Projection, and QR Factorization Algorithms Subject to Gaussian Noise

Authors: Ali Lotfi, Julien Langou, Mohammad Meysami

Abstract: In this paper, we extend the work of Liesen et al. (2002), which analyzes how the condition number of an orthonormal matrix Q changes when a column is added ([Q, c]), particularly focusing on the perpendicularity of c to the span of Q. Their result, presented in Theorem 2.3 of Liesen et al. (2002), assumes exact arithmetic and orthonormality of Q, which is a strong assumption when applying these results to numerical methods such as QR factorization algorithms. In our work, we address this gap by deriving bounds on the condition number increase for a matrix B without assuming perfect orthonormality, even when a column is not perfectly orthogonal to the span of B. This framework allows us to analyze QR factorization methods where orthogonalization is imperfect and subject to Gaussian noise. We also provide results on the performance of orthogonal projection and least squares under Gaussian noise, further supporting the development of this theory.

cross Unconditional stability of a recurrent neural circuit implementing divisive normalization

Authors: Shivang Rawat, David J. Heeger, Stefano Martiniani

Abstract: Stability in recurrent neural models poses a significant challenge, particularly in developing biologically plausible neurodynamical models that can be seamlessly trained. Traditional cortical circuit models are notoriously difficult to train due to expansive nonlinearities in the dynamical system, leading to an optimization problem with nonlinear stability constraints that are difficult to impose. Conversely, recurrent neural networks (RNNs) excel in tasks involving sequential data but lack biological plausibility and interpretability. In this work, we address these challenges by linking dynamic divisive normalization (DN) to the stability of ORGaNICs, a biologically plausible recurrent cortical circuit model that dynamically achieves DN and has been shown to simulate a wide range of neurophysiological phenomena. By using the indirect method of Lyapunov, we prove the remarkable property of unconditional local stability for an arbitrary-dimensional ORGaNICs circuit when the recurrent weight matrix is the identity. We thus connect ORGaNICs to a system of coupled damped harmonic oscillators, which enables us to derive the circuit's energy function, providing a normative principle of what the circuit, and individual neurons, aim to accomplish. Further, for a generic recurrent weight matrix, we prove the stability of the 2D model and demonstrate empirically that stability holds in higher dimensions. Finally, we show that ORGaNICs can be trained by backpropagation through time without gradient clipping/scaling, thanks to its intrinsic stability property and adaptive time constants, which address the problems of exploding, vanishing, and oscillating gradients. By evaluating the model's performance on RNN benchmarks, we find that ORGaNICs outperform alternative neurodynamical models on static image classification tasks and perform comparably to LSTMs on sequential tasks.

cross Spectral Wavelet Dropout: Regularization in the Wavelet Domain

Authors: Rinor Cakaj, Jens Mehnert, Bin Yang

Abstract: Regularization techniques help prevent overfitting and therefore improve the ability of convolutional neural networks (CNNs) to generalize. One reason for overfitting is the complex co-adaptations among different parts of the network, which make the CNN dependent on their joint response rather than encouraging each part to learn a useful feature representation independently. Frequency domain manipulation is a powerful strategy for modifying data that has temporal and spatial coherence by utilizing frequency decomposition. This work introduces Spectral Wavelet Dropout (SWD), a novel regularization method that includes two variants: 1D-SWD and 2D-SWD. These variants improve CNN generalization by randomly dropping detailed frequency bands in the discrete wavelet decomposition of feature maps. Our approach distinguishes itself from the pre-existing Spectral "Fourier" Dropout (2D-SFD), which eliminates coefficients in the Fourier domain. Notably, SWD requires only a single hyperparameter, unlike the two required by SFD. We also extend the literature by implementing a one-dimensional version of Spectral "Fourier" Dropout (1D-SFD), setting the stage for a comprehensive comparison. Our evaluation shows that both 1D and 2D SWD variants have competitive performance on CIFAR-10/100 benchmarks relative to both 1D-SFD and 2D-SFD. Specifically, 1D-SWD has a significantly lower computational complexity compared to 1D/2D-SFD. In the Pascal VOC Object Detection benchmark, SWD variants surpass 1D-SFD and 2D-SFD in performance and demonstrate lower computational complexity during training.

cross RepairBench: Leaderboard of Frontier Models for Program Repair

Authors: Andr\'e Silva, Martin Monperrus

Abstract: AI-driven program repair uses AI models to repair buggy software by producing patches. Rapid advancements in AI surely impact state-of-the-art performance of program repair. Yet, grasping this progress requires frequent and standardized evaluations. We propose RepairBench, a novel leaderboard for AI-driven program repair. The key characteristics of RepairBench are: 1) it is execution-based: all patches are compiled and executed against a test suite, 2) it assesses frontier models in a frequent and standardized way. RepairBench leverages two high-quality benchmarks, Defects4J and GitBug-Java, to evaluate frontier models against real-world program repair tasks. We publicly release the evaluation framework of RepairBench. We will update the leaderboard as new frontier models are released.

cross LML: Language Model Learning a Dataset for Data-Augmented Prediction

Authors: Praneeth Vadlapati

Abstract: This paper introduces a new approach to using Large Language Models (LLMs) for classification tasks, which are typically handled using Machine Learning (ML) models. Unlike ML models that rely heavily on data cleaning and feature engineering, this method streamlines the process using LLMs. This paper proposes a new concept called "Language Model Learning (LML)" powered by a new method called "Data-Augmented Prediction (DAP)". The classification is performed by LLMs using a method similar to humans manually exploring and understanding the data and deciding classifications using data as a reference. Training data is summarized and evaluated to determine the features that lead to the classification of each label the most. In the process of DAP, the system uses the data summary to automatically create a query, which is used to retrieve relevant rows from the dataset. A classification is generated by the LLM using data summary and relevant rows, ensuring satisfactory accuracy even with complex data. Usage of data summary and similar data in DAP ensures context-aware decision-making. The proposed method uses the words "Act as an Explainable Machine Learning Model" in the prompt to enhance the interpretability of the predictions by allowing users to review the logic behind each prediction. In some test cases, the system scored an accuracy above 90%, proving the effectiveness of the system and its potential to outperform conventional ML models in various scenarios. The code is available at https://github.com/Pro-GenAI/LML-DAP

URLs: https://github.com/Pro-GenAI/LML-DAP

cross Exploring Token Pruning in Vision State Space Models

Authors: Zheng Zhan, Zhenglun Kong, Yifan Gong, Yushu Wu, Zichong Meng, Hangyu Zheng, Xuan Shen, Stratis Ioannidis, Wei Niu, Pu Zhao, Yanzhi Wang

Abstract: State Space Models (SSMs) have the advantage of keeping linear computational complexity compared to attention modules in transformers, and have been applied to vision tasks as a new type of powerful vision foundation model. Inspired by the observations that the final prediction in vision transformers (ViTs) is only based on a subset of most informative tokens, we take the novel step of enhancing the efficiency of SSM-based vision models through token-based pruning. However, direct applications of existing token pruning techniques designed for ViTs fail to deliver good performance, even with extensive fine-tuning. To address this issue, we revisit the unique computational characteristics of SSMs and discover that naive application disrupts the sequential token positions. This insight motivates us to design a novel and general token pruning method specifically for SSM-based vision models. We first introduce a pruning-aware hidden state alignment method to stabilize the neighborhood of remaining tokens for performance enhancement. Besides, based on our detailed analysis, we propose a token importance evaluation method adapted for SSM models, to guide the token pruning. With efficient implementation and practical acceleration methods, our method brings actual speedup. Extensive experiments demonstrate that our approach can achieve significant computation reduction with minimal impact on performance across different tasks. Notably, we achieve 81.7\% accuracy on ImageNet with a 41.6\% reduction in the FLOPs for pruned PlainMamba-L3. Furthermore, our work provides deeper insights into understanding the behavior of SSM-based vision models for future research.

cross PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation

Authors: Shaowei Liu, Zhongzheng Ren, Saurabh Gupta, Shenlong Wang

Abstract: We present PhysGen, a novel image-to-video generation method that converts a single image and an input condition (e.g., force and torque applied to an object in the image) to produce a realistic, physically plausible, and temporally consistent video. Our key insight is to integrate model-based physical simulation with a data-driven video generation process, enabling plausible image-space dynamics. At the heart of our system are three core components: (i) an image understanding module that effectively captures the geometry, materials, and physical parameters of the image; (ii) an image-space dynamics simulation model that utilizes rigid-body physics and inferred parameters to simulate realistic behaviors; and (iii) an image-based rendering and refinement module that leverages generative video diffusion to produce realistic video footage featuring the simulated motion. The resulting videos are realistic in both physics and appearance and are even precisely controllable, showcasing superior results over existing data-driven image-to-video generation works through quantitative comparison and comprehensive user study. PhysGen's resulting videos can be used for various downstream applications, such as turning an image into a realistic animation or allowing users to interact with the image and create various dynamics. Project page: https://stevenlsw.github.io/physgen/

URLs: https://stevenlsw.github.io/physgen/

replace The Sandbox Environment for Generalizable Agent Research (SEGAR)

Authors: R Devon Hjelm, Bogdan Mazoure, Florian Golemo, Samira Ebrahimi Kahou, Pedro Braga, Felipe Frujeri, Mihai Jalobeanu, Andrey Kolobov

Abstract: A broad challenge of research on generalization for sequential decision-making tasks in interactive environments is designing benchmarks that clearly landmark progress. While there has been notable headway, current benchmarks either do not provide suitable exposure nor intuitive control of the underlying factors, are not easy-to-implement, customizable, or extensible, or are computationally expensive to run. We built the Sandbox Environment for Generalizable Agent Research (SEGAR) with all of these things in mind. SEGAR improves the ease and accountability of generalization research in RL, as generalization objectives can be easy designed by specifying task distributions, which in turns allows the researcher to measure the nature of the generalization objective. We present an overview of SEGAR and how it contributes to these goals, as well as experiments that demonstrate a few types of research questions SEGAR can help answer.

replace Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality

Authors: Ying Jin, Zhimei Ren, Zhuoran Yang, Zhaoran Wang

Abstract: This paper studies offline policy learning, which aims at utilizing observations collected a priori (from either fixed or adaptively evolving behavior policies) to learn an optimal individualized decision rule that achieves the best overall outcomes for a given population. Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics must be lower bounded. As one has no control over the data collection process, this assumption can be unrealistic in many situations, especially when the behavior policies are allowed to evolve over time with diminishing propensities for certain actions. In this paper, we propose Pessimistic Policy Learning (PPL), a new algorithm that optimizes lower confidence bounds (LCBs) -- instead of point estimates -- of the policy values. The LCBs are constructed using knowledge of the behavior policies for collecting the offline data. Without assuming any uniform overlap condition, we establish a data-dependent upper bound for the suboptimality of our algorithm, which only depends on (i) the overlap for the optimal policy, and (ii) the complexity of the policy class we optimize over. As an implication, for adaptively collected data, we ensure efficient policy learning as long as the propensities for optimal actions are lower bounded over time, while those for suboptimal ones are allowed to diminish arbitrarily fast. In our theoretical analysis, we develop a new self-normalized type concentration inequality for inverse-propensity-weighting estimators, generalizing the well-known empirical Bernstein's inequality to unbounded and non-i.i.d. data. We complement our theory with an efficient optimization algorithm via Majorization-Minimization and policy tree search, as well as extensive simulation studies and real-world applications that demonstrate the efficacy of PPL.

replace The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers

Authors: Seungwoo Son, Jegwang Ryu, Namhoon Lee, Jaeho Lee

Abstract: Knowledge distillation is an effective method for training lightweight vision models. However, acquiring teacher supervision for training samples is often costly, especially from large-scale models like vision transformers (ViTs). In this paper, we develop a simple framework to reduce the supervision cost of ViT distillation: masking out a fraction of input tokens given to the teacher. By masking input tokens, one can skip the computations associated with the masked tokens without requiring any change to teacher parameters or architecture. We find that masking patches with the lowest student attention scores is highly effective, saving up to 50% of teacher FLOPs without any drop in student accuracy, while other masking criterion leads to suboptimal efficiency gains. Through in-depth analyses, we reveal that the student-guided masking provides a good curriculum to the student, making teacher supervision easier to follow during the early stage and challenging in the later stage.

replace Contaminated Multivariate Time-Series Anomaly Detection with Spatio-Temporal Graph Conditional Diffusion Models

Authors: Thi Kieu Khanh Ho, Narges Armanfard

Abstract: Mainstream unsupervised anomaly detection algorithms often excel in academic datasets, yet their real-world performance is restricted due to the controlled experimental conditions involving clean training data. Addressing the challenge of training with noise, a prevalent issue in practical anomaly detection, is frequently overlooked. In a pioneering endeavor, this study delves into the realm of label-level noise within sensory time-series anomaly detection (TSAD). This paper presents a novel and practical end-to-end unsupervised TSAD when the training data is contaminated with anomalies. The introduced approach, called TSAD-C, is devoid of access to abnormality labels during the training phase. TSAD-C encompasses three core modules: a Decontaminator to rectify anomalies (aka noise) present during training, a Long-range Variable Dependency Modeling module to capture long-term intra- and inter-variable dependencies within the decontaminated data that is considered as a surrogate of the pure normal data, and an Anomaly Scoring module to detect anomalies from all types. Our extensive experiments conducted on four reliable and diverse datasets conclusively demonstrate that TSAD-C surpasses existing methodologies, thus establishing a new state-of-the-art in the TSAD field.

replace Cluster Exploration using Informative Manifold Projections

Authors: Stavros Gerolymatos, Xenophon Evangelopoulos, Vladimir Gusev, John Y. Goulermas

Abstract: Dimensionality reduction (DR) is one of the key tools for the visual exploration of high-dimensional data and uncovering its cluster structure in two- or three-dimensional spaces. The vast majority of DR methods in the literature do not take into account any prior knowledge a practitioner may have regarding the dataset under consideration. We propose a novel method to generate informative embeddings which not only factor out the structure associated with different kinds of prior knowledge but also aim to reveal any remaining underlying structure. To achieve this, we employ a linear combination of two objectives: firstly, contrastive PCA that discounts the structure associated with the prior information, and secondly, kurtosis projection pursuit which ensures meaningful data separation in the obtained embeddings. We formulate this task as a manifold optimization problem and validate it empirically across a variety of datasets considering three distinct types of prior knowledge. Lastly, we provide an automated framework to perform iterative visual exploration of high-dimensional data.

replace Can-SAVE: Mass Cancer Risk Prediction via Survival Analysis Variables and EHR

Authors: Petr Philonenko, Vladimir Kokh, Pavel Blinov

Abstract: Specific medical cancer screening methods are often costly, time-consuming, and weakly applicable on a large scale. Advanced Artificial Intelligence (AI) methods greatly help cancer detection but require specific or deep medical data. These aspects prevent the mass implementation of cancer screening methods. For this reason, it is a disruptive change for healthcare to apply AI methods for mass personalized assessment of the cancer risk among patients based on the existing Electronic Health Records (EHR) volume. This paper presents a novel Can-SAVE cancer risk assessment method combining a survival analysis approach with a gradient-boosting algorithm. It is highly accessible and resource-efficient, utilizing only a sequence of high-level medical events. We tested the proposed method in a long-term retrospective experiment covering more than 1.1 million people and four regions of Russia. The Can-SAVE method significantly exceeds the baselines by the Average Precision metric of 22.8%$\pm$2.7% vs 15.1%$\pm$2.6%. The extensive ablation study also confirmed the proposed method's dominant performance. The experiment supervised by oncologists shows a reliable cancer patient detection rate of up to 84 out of 1000 selected. Such results surpass the medical screening strategies estimates; the typical age-specific Number Needed to Screen is only 9 out of 1000 (for colorectal cancer). Overall, our experiments show a 4.7-6.4 times improvement in cancer detection rate (TOP@1k) compared to the traditional healthcare risk estimation approach.

replace Learning Hierarchical Relational Representations through Relational Convolutions

Authors: Awni Altabaa, John Lafferty

Abstract: An evolving area of research in deep learning is the study of architectures and inductive biases that support the learning of relational feature representations. In this paper, we address the challenge of learning representations of hierarchical relations--that is, higher-order relational patterns among groups of objects. We introduce "relational convolutional networks", a neural architecture equipped with computational mechanisms that capture progressively more complex relational features through the composition of simple modules. A key component of this framework is a novel operation that captures relational patterns in groups of objects by convolving graphlet filters--learnable templates of relational patterns--against subsets of the input. Composing relational convolutions gives rise to a deep architecture that learns representations of higher-order, hierarchical relations. We present the motivation and details of the architecture, together with a set of experiments to demonstrate how relational convolutional networks can provide an effective framework for modeling relational tasks that have hierarchical structure.

replace Deep Reinforcement Learning for Autonomous Cyber Defence: A Survey

Authors: Gregory Palmer, Chris Parry, Daniel J. B. Harrold, Chris Willis

Abstract: The rapid increase in the number of cyber-attacks in recent years raises the need for principled methods for defending networks against malicious actors. Deep reinforcement learning (DRL) has emerged as a promising approach for mitigating these attacks. However, while DRL has shown much potential for cyber defence, numerous challenges must be overcome before DRL can be applied to the autonomous cyber defence (ACD) problem at scale. Principled methods are required for environments that confront learners with very high-dimensional state spaces, large multi-discrete action spaces, and adversarial learning. Recent works have reported success in solving these problems individually. There have also been impressive engineering efforts towards solving all three for real-time strategy games. However, applying DRL to the full ACD problem remains an open challenge. Here, we survey the relevant DRL literature and conceptualize an idealised ACD-DRL agent. We provide: i.) A summary of the domain properties that define the ACD problem; ii.) A comprehensive comparison of current ACD environments used for benchmarking DRL approaches; iii.) An overview of state-of-the-art approaches for scaling DRL to domains that confront learners with the curse of dimensionality, and; iv.) A survey and critique of current methods for limiting the exploitability of agents within adversarial settings from the perspective of ACD. We conclude with open research questions that we hope will motivate future directions for researchers and practitioners working on ACD.

replace Neural Dynamical Operator: Continuous Spatial-Temporal Model with Gradient-Based and Derivative-Free Optimization Methods

Authors: Chuanqi Chen, Jin-Long Wu

Abstract: Data-driven modeling techniques have been explored in the spatial-temporal modeling of complex dynamical systems for many engineering applications. However, a systematic approach is still lacking to leverage the information from different types of data, e.g., with different spatial and temporal resolutions, and the combined use of short-term trajectories and long-term statistics. In this work, we build on the recent progress of neural operator and present a data-driven modeling framework called neural dynamical operator that is continuous in both space and time. A key feature of the neural dynamical operator is the resolution-invariance with respect to both spatial and temporal discretizations, without demanding abundant training data in different temporal resolutions. To improve the long-term performance of the calibrated model, we further propose a hybrid optimization scheme that leverages both gradient-based and derivative-free optimization methods and efficiently trains on both short-term time series and long-term statistics. We investigate the performance of the neural dynamical operator with three numerical examples, including the viscous Burgers' equation, the Navier-Stokes equations, and the Kuramoto-Sivashinsky equation. The results confirm the resolution-invariance of the proposed modeling framework and also demonstrate stable long-term simulations with only short-term time series data. In addition, we show that the proposed model can better predict long-term statistics via the hybrid optimization scheme with a combined use of short-term and long-term data.

replace A Distributed Privacy Preserving Model for the Detection of Alzheimer's Disease

Authors: Paul K. Mandal

Abstract: In the era of rapidly advancing medical technologies, the segmentation of medical data has become inevitable, necessitating the development of privacy preserving machine learning algorithms that can train on distributed data. Consolidating sensitive medical data is not always an option particularly due to the stringent privacy regulations imposed by the Health Insurance Portability and Accountability Act (HIPAA). In this paper, I introduce a HIPAA compliant framework that can train from distributed data. I then propose a multimodal vertical federated model for Alzheimer's Disease (AD) detection, a serious neurodegenerative condition that can cause dementia, severely impairing brain function and hindering simple tasks, especially without preventative care. This vertical federated learning (VFL) model offers a distributed architecture that enables collaborative learning across diverse sources of medical data while respecting privacy constraints imposed by HIPAA. The VFL architecture proposed herein offers a novel distributed architecture, enabling collaborative learning across diverse sources of medical data while respecting statutory privacy constraints. By leveraging multiple modalities of data, the robustness and accuracy of AD detection can be enhanced. This model not only contributes to the advancement of federated learning techniques but also holds promise for overcoming the hurdles posed by data segmentation in medical research.

replace A Gaussian Process Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial Differential Equations

Authors: Carlos Mora, Amin Yousefpour, Shirin Hosseinmardi, Ramin Bostanabad

Abstract: Physics-informed machine learning (PIML) has emerged as a promising alternative to conventional numerical methods for solving partial differential equations (PDEs). PIML models are increasingly built via deep neural networks (NNs) whose architecture and training process are designed such that the network satisfies the PDE system. While such PIML models have substantially advanced over the past few years, their performance is still very sensitive to the NN's architecture and loss function. Motivated by this limitation, we introduce kernel-weighted Corrective Residuals (CoRes) to integrate the strengths of kernel methods and deep NNs for solving nonlinear PDE systems. To achieve this integration, we design a modular and robust framework which consistently outperforms competing methods in solving a broad range of benchmark problems. This performance improvement has a theoretical justification and is particularly attractive since we simplify the training process while negligibly increasing the inference costs. Additionally, our studies on solving multiple PDEs indicate that kernel-weighted CoRes considerably decrease the sensitivity of NNs to factors such as random initialization, architecture type, and choice of optimizer. We believe our findings have the potential to spark a renewed interest in leveraging kernel methods for solving PDEs.

replace Lens: A Foundation Model for Network Traffic

Authors: Qineng Wang, Chen Qian, Xiaochang Li, Ziyu Yao, Gang Zhou, Huajie Shao

Abstract: Network traffic refers to the amount of data being sent and received over the internet or any system that connects computers. Analyzing and understanding network traffic is vital for improving network security and management. However, the analysis of network traffic is challenging due to the diverse nature of data packets, which often feature heterogeneous headers and encrypted payloads lacking semantics. To capture the latent semantics of traffic, a few studies have adopted pre-training techniques based on the Transformer encoder or decoder to learn the representations from massive traffic data. However, these methods typically excel in traffic understanding (classification) or traffic generation tasks. To address this issue, we develop Lens, a foundation model for network traffic that leverages the T5 architecture to learn the pre-trained representations from large-scale unlabeled data. Harnessing the strength of the encoder-decoder framework, which captures the global information while preserving the generative ability, our model can better learn the representations from raw data. To further enhance pre-training effectiveness, we design a novel loss that combines three distinct tasks: Masked Span Prediction (MSP), Packet Order Prediction (POP), and Homologous Traffic Prediction (HTP). Evaluation results across various benchmark datasets demonstrate that the proposed Lens outperforms the baselines in most downstream tasks related to both traffic understanding and generation. Notably, it also requires much less labeled data for fine-tuning compared to current methods.

replace Two Trades is not Baffled: Condensing Graph via Crafting Rational Gradient Matching

Authors: Tianle Zhang, Yuchen Zhang, Kun Wang, Kai Wang, Beining Yang, Kaipeng Zhang, Wenqi Shao, Ping Liu, Joey Tianyi Zhou, Yang You

Abstract: Training on large-scale graphs has achieved remarkable results in graph representation learning, but its cost and storage have raised growing concerns. As one of the most promising directions, graph condensation methods address these issues by employing gradient matching, aiming to condense the full graph into a more concise yet information-rich synthetic set. Though encouraging, these strategies primarily emphasize matching directions of the gradients, which leads to deviations in the training trajectories. Such deviations are further magnified by the differences between the condensation and evaluation phases, culminating in accumulated errors, which detrimentally affect the performance of the condensed graphs. In light of this, we propose a novel graph condensation method named \textbf{C}raf\textbf{T}ing \textbf{R}ationa\textbf{L} trajectory (\textbf{CTRL}), which offers an optimized starting point closer to the original dataset's feature distribution and a more refined strategy for gradient matching. Theoretically, CTRL can effectively neutralize the impact of accumulated errors on the performance of condensed graphs. We provide extensive experiments on various graph datasets and downstream tasks to support the effectiveness of CTRL. Code is released at https://github.com/NUS-HPC-AI-Lab/CTRL.

URLs: https://github.com/NUS-HPC-AI-Lab/CTRL.

replace HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs

Authors: Adri\'an Bazaga, Pietro Li\`o, Gos Micklem

Abstract: Hypergraphs are characterized by complex topological structure, representing higher-order interactions among multiple entities through hyperedges. Lately, hypergraph-based deep learning methods to learn informative data representations for the problem of node classification on text-attributed hypergraphs have garnered increasing research attention. However, existing methods struggle to simultaneously capture the full extent of hypergraph structural information and the rich linguistic attributes inherent in the nodes attributes, which largely hampers their effectiveness and generalizability. To overcome these challenges, we explore ways to further augment a pretrained BERT model with specialized hypergraph-aware layers for the task of node classification. Such layers introduce higher-order structural inductive bias into the language model, thus improving the model's capacity to harness both higher-order context information from the hypergraph structure and semantic information present in text. In this paper, we propose a new architecture, HyperBERT, a mixed text-hypergraph model which simultaneously models hypergraph relational structure while maintaining the high-quality text encoding capabilities of a pre-trained BERT. Notably, HyperBERT presents results that achieve a new state-of-the-art on five challenging text-attributed hypergraph node classification benchmarks.

replace Learning Transferable Time Series Classifier with Cross-Domain Pre-training from Language Model

Authors: Mingyue Cheng, Xiaoyu Tao, Qi Liu, Hao Zhang, Yiheng Chen, Chenyi Lei

Abstract: Advancements in self-supervised pre-training (SSL) have significantly advanced the field of learning transferable time series representations, which can be very useful in enhancing the downstream task. Despite being effective, most existing works struggle to achieve cross-domain SSL pre-training, missing valuable opportunities to integrate patterns and features from different domains. The main challenge lies in the significant differences in the characteristics of time-series data across different domains, such as variations in the number of channels and temporal resolution scales. To address this challenge, we propose CrossTimeNet, a novel cross-domain SSL learning framework to learn transferable knowledge from various domains to largely benefit the target downstream task. One of the key characteristics of CrossTimeNet is the newly designed time series tokenization module, which could effectively convert the raw time series into a sequence of discrete tokens based on a reconstruction optimization process. Besides, we highlight that predicting a high proportion of corrupted tokens can be very helpful for extracting informative patterns across different domains during SSL pre-training, which has been largely overlooked in past years. Furthermore, unlike previous works, our work treats the pre-training language model (PLM) as the initialization of the encoder network, investigating the feasibility of transferring the knowledge learned by the PLM to the time series area. Through these efforts, the path to cross-domain pre-training of a generic time series model can be effectively paved. We conduct extensive experiments in a real-world scenario across various time series classification domains. The experimental results clearly confirm CrossTimeNet's superior performance.

replace DeNetDM: Debiasing by Network Depth Modulation

Authors: Silpa Vadakkeeveetil Sreelatha, Adarsh Kappiyath, Abhra Chaudhuri, Anjan Dutta

Abstract: When neural networks are trained on biased datasets, they tend to inadvertently learn spurious correlations, leading to challenges in achieving strong generalization and robustness. Current approaches to address such biases typically involve utilizing bias annotations, reweighting based on pseudo-bias labels, or enhancing diversity within bias-conflicting data points through augmentation techniques. We introduce DeNetDM, a novel debiasing method based on the observation that shallow neural networks prioritize learning core attributes, while deeper ones emphasize biases when tasked with acquiring distinct information. Using a training paradigm derived from Product of Experts, we create both biased and debiased branches with deep and shallow architectures and then distill knowledge to produce the target debiased model. Extensive experiments and analyses demonstrate that our approach outperforms current debiasing techniques, achieving a notable improvement of around 5% in three datasets, encompassing both synthetic and real-world data. Remarkably, DeNetDM accomplishes this without requiring annotations pertaining to bias labels or bias types, while still delivering performance on par with supervised counterparts. Furthermore, our approach effectively harnesses the diversity of bias-conflicting points within the data, surpassing previous methods and obviating the need for explicit augmentation-based methods to enhance the diversity of such bias-conflicting points. The source code will be available upon acceptance.

replace CausalBench: A Comprehensive Benchmark for Causal Learning Capability of LLMs

Authors: Yu Zhou, Xingyu Wu, Beicheng Huang, Jibin Wu, Liang Feng, Kay Chen Tan

Abstract: The ability to understand causality significantly impacts the competence of large language models (LLMs) in output explanation and counterfactual reasoning, as causality reveals the underlying data distribution. However, the lack of a comprehensive benchmark currently limits the evaluation of LLMs' causal learning capabilities. To fill this gap, this paper develops CausalBench based on data from the causal research community, enabling comparative evaluations of LLMs against traditional causal learning algorithms. To provide a comprehensive investigation, we offer three tasks of varying difficulties, including correlation, causal skeleton, and causality identification. Evaluations of 19 leading LLMs reveal that, while closed-source LLMs show potential for simple causal relationships, they significantly lag behind traditional algorithms on larger-scale networks ($>50$ nodes). Specifically, LLMs struggle with collider structures but excel at chain structures, especially at long-chain causality analogous to Chains-of-Thought techniques. This supports the current prompt approaches while suggesting directions to enhance LLMs' causal reasoning capability. Furthermore, CausalBench incorporates background knowledge and training data into prompts to thoroughly unlock LLMs' text-comprehension ability during evaluation, whose findings indicate that, LLM understand causality through semantic associations with distinct entities, rather than directly from contextual information or numerical distributions.

replace Sensor Response-Time Reduction using Long-Short Term Memory Network Forecasting

Authors: Simon J. Ward, Muhamed Baljevic, Sharon M. Weiss

Abstract: The response time of a biosensor is a crucial metric in safety-critical applications such as medical diagnostics where an earlier diagnosis can markedly improve patient outcomes. However, the speed at which a biosensor reaches a final equilibrium state can be limited by poor mass transport and long molecular diffusion times that increase the time it takes target molecules to reach the active sensing region of a biosensor. While optimization of system and sensor design can promote molecules reaching the sensing element faster, a simpler and complementary approach for response time reduction that is widely applicable across all sensor platforms is to use time-series forecasting to predict the ultimate steady-state sensor response. In this work, we show that ensembles of long short-term memory (LSTM) networks can accurately predict equilibrium biosensor response from a small quantity of initial time-dependent biosensor measurements, allowing for significant reduction in response time by a mean and median factor of improvement of 18.6 and 5.1 respectively. The ensemble of models simultaneously estimates uncertainty, which is vital for ensuring confidence in the predictions and subsequent safety-related decisions that are made. This approach is demonstrated on real-time experimental data collected by exposing porous silicon biosensors to buffered protein solutions using a multi-channel fluidic cell that enables the automated measurement of 100 porous silicon biosensors in parallel. The dramatic improvement in sensor response time achieved using LSTM network ensembles and associated uncertainty quantification opens the door to trustworthy and faster responding biosensors, enabling more rapid medical diagnostics for faster clinical decision making that can lead to improved patient outcomes and healthcare access, as well as quicker identification of toxins in food and the environment.

replace Gradient Flow Based Phase-Field Modeling Using Separable Neural Networks

Authors: Revanth Mattey, Susanta Ghosh

Abstract: The $L^2$ gradient flow of the Ginzburg-Landau free energy functional leads to the Allen Cahn equation that is widely used for modeling phase separation. Machine learning methods for solving the Allen-Cahn equation in its strong form suffer from inaccuracies in collocation techniques, errors in computing higher-order spatial derivatives through automatic differentiation, and the large system size required by the space-time approach. To overcome these limitations, we propose a separable neural network-based approximation of the phase field in a minimizing movement scheme to solve the aforementioned gradient flow problem. At each time step, the separable neural network is used to approximate the phase field in space through a low-rank tensor decomposition thereby accelerating the derivative calculations. The minimizing movement scheme naturally allows for the use of Gauss quadrature technique to compute the functional. A `$tanh$' transformation is applied on the neural network-predicted phase field to strictly bounds the solutions within the values of the two phases. For this transformation, a theoretical guarantee for energy stability of the minimizing movement scheme is established. Our results suggest that bounding the solution through this transformation is the key to effectively model sharp interfaces through separable neural network. The proposed method outperforms the state-of-the-art machine learning methods for phase separation problems and is an order of magnitude faster than the finite element method.

replace Automating Data Annotation under Strategic Human Agents: Risks and Potential Solutions

Authors: Tian Xie, Xueru Zhang

Abstract: As machine learning (ML) models are increasingly used in social domains to make consequential decisions about humans, they often have the power to reshape data distributions. Humans, as strategic agents, continuously adapt their behaviors in response to the learning system. As populations change dynamically, ML systems may need frequent updates to ensure high performance. However, acquiring high-quality human-annotated samples can be highly challenging and even infeasible in social domains. A common practice to address this issue is using the model itself to annotate unlabeled data samples. This paper investigates the long-term impacts when ML models are retrained with model-annotated samples when they incorporate human strategic responses. We first formalize the interactions between strategic agents and the model and then analyze how they evolve under such dynamic interactions. We find that agents are increasingly likely to receive positive decisions as the model gets retrained, whereas the proportion of agents with positive labels may decrease over time. We thus propose a refined retraining process to stabilize the dynamics. Last, we examine how algorithmic fairness can be affected by these retraining processes and find that enforcing common fairness constraints at every round may not benefit the disadvantaged group in the long run. Experiments on (semi-)synthetic and real data validate the theoretical findings.

replace Scaling Law for Time Series Forecasting

Authors: Jingzhe Shi, Qinwei Ma, Huan Ma, Lei Li

Abstract: Scaling law that rewards large datasets, complex models and enhanced data granularity has been observed in various fields of deep learning. Yet, studies on time series forecasting have cast doubt on scaling behaviors of deep learning methods for time series forecasting: while more training data improves performance, more capable models do not always outperform less capable models, and longer input horizons may hurt performance for some models. We propose a theory for scaling law for time series forecasting that can explain these seemingly abnormal behaviors. We take into account the impact of dataset size and model complexity, as well as time series data granularity, particularly focusing on the look-back horizon, an aspect that has been unexplored in previous theories. Furthermore, we empirically evaluate various models using a diverse set of time series forecasting datasets, which (1) verifies the validity of scaling law on dataset size and model complexity within the realm of time series forecasting, and (2) validates our theoretical framework, particularly regarding the influence of look back horizon. We hope our findings may inspire new models targeting time series forecasting datasets of limited size, as well as large foundational datasets and models for time series forecasting in future works. Codes for our experiments will be made public at: https://github.com/JingzheShi/ScalingLawForTimeSeriesForecasting.

URLs: https://github.com/JingzheShi/ScalingLawForTimeSeriesForecasting.

replace Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization

Authors: Zhe Li, Bicheng Ying, Zidong Liu, Chaosheng Dong, Haibo Yang

Abstract: Federated Learning (FL) offers a promising framework for collaborative and privacy-preserving machine learning across distributed data sources. However, the substantial communication costs associated with FL significantly challenge its efficiency. Specifically, in each communication round, the communication costs scale linearly with the model's dimension, which presents a formidable obstacle, especially in large model scenarios. Despite various communication-efficient strategies, the intrinsic dimension-dependent communication cost remains a major bottleneck for current FL implementations. This paper proposes a novel dimension-free communication algorithm -- DeComFL, which leverages the zeroth-order optimization techniques and reduces the communication cost from $\mathscr{O}(d)$ to $\mathscr{O}(1)$ by transmitting only a constant number of scalar values between clients and the server in each round, regardless of the dimension $d$ of the model parameters. Theoretically, in non-convex functions, we prove that our algorithm achieves state-of-the-art rates, which show a linear speedup of the number of clients and local steps under standard assumptions. With additional low effective rank assumption, we can further show the convergence rate is independent of the model dimension $d$ as well. Empirical evaluations, encompassing both classic deep learning training and large language model fine-tuning, demonstrate significant reductions in communication overhead. Notably, DeComFL achieves this by transmitting only around 1MB of data in total between the server and a client to fine-tune a model with billions of parameters.

replace LUCIE: A Lightweight Uncoupled ClImate Emulator with long-term stability and physical consistency for O(1000)-member ensembles

Authors: Haiwen Guan, Troy Arcomano, Ashesh Chattopadhyay, Romit Maulik

Abstract: We present a lightweight, easy-to-train, low-resolution, fully data-driven climate emulator, LUCIE, that can be trained on as low as $2$ years of $6$-hourly ERA5 data. Unlike most state-of-the-art AI weather models, LUCIE remains stable and physically consistent for $100$ years of autoregressive simulation with $100$ ensemble members. Long-term mean climatology from LUCIE's simulation of temperature, wind, precipitation, and humidity matches that of ERA5 data, along with the variability. We further demonstrate how well extreme weather events and their return periods can be estimated from a large ensemble of long-term simulations. We further discuss an improved training strategy with a hard-constrained first-order integrator to suppress autoregressive error growth, a novel spectral regularization strategy to better capture fine-scale dynamics, and finally an optimization algorithm that enables data-limited (as low as $2$ years of $6$-hourly data) training of the emulator without losing stability and physical consistency. Finally, we provide a scaling experiment to compare the long-term bias of LUCIE with respect to the number of training samples. Importantly, LUCIE is an easy to use model that can be trained in just $2.4$h on a single A-100 GPU, allowing for multiple experiments that can explore important scientific questions that could be answered with large ensembles of long-term simulations, e.g., the impact of different variables on the simulation, dynamic response to external forcing, and estimation of extreme weather events, amongst others.

replace EMR-Merging: Tuning-Free High-Performance Model Merging

Authors: Chenyu Huang, Peng Ye, Tao Chen, Tong He, Xiangyu Yue, Wanli Ouyang

Abstract: The success of pretrain-finetune paradigm brings about the release of numerous model weights. In this case, merging models finetuned on different tasks to enable a single model with multi-task capabilities is gaining increasing attention for its practicability. Existing model merging methods usually suffer from (1) significant performance degradation or (2) requiring tuning by additional data or training. In this paper, we rethink and analyze the existing model merging paradigm. We discover that using a single model's weights can hardly simulate all the models' performance. To tackle this issue, we propose Elect, Mask & Rescale-Merging (EMR-Merging). We first (a) elect a unified model from all the model weights and then (b) generate extremely lightweight task-specific modulators, including masks and rescalers, to align the direction and magnitude between the unified model and each specific model, respectively. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance. We find that EMR-Merging shows outstanding performance compared to existing merging methods under different classical and newly-established settings, including merging different numbers of vision models (up to 30), NLP models, PEFT models, and multi-modal models.

replace Towards Physically Consistent Deep Learning For Climate Model Parameterizations

Authors: Birgit K\"uhbacher, Fernando Iglesias-Suarez, Niki Kilbertus, Veronika Eyring

Abstract: Climate models play a critical role in understanding and projecting climate change. Due to their complexity, their horizontal resolution of about 40-100 km remains too coarse to resolve processes such as clouds and convection, which need to be approximated via parameterizations. These parameterizations are a major source of systematic errors and large uncertainties in climate projections. Deep learning (DL)-based parameterizations, trained on data from computationally expensive short, high-resolution simulations, have shown great promise for improving climate models in that regard. However, their lack of interpretability and tendency to learn spurious non-physical correlations result in reduced trust in the climate simulation. We propose an efficient supervised learning framework for DL-based parameterizations that leads to physically consistent models with improved interpretability and negligible computational overhead compared to standard supervised training. First, key features determining the target physical processes are uncovered. Subsequently, the neural network is fine-tuned using only those relevant features. We show empirically that our method robustly identifies a small subset of the inputs as actual physical drivers, therefore removing spurious non-physical relationships. This results in by design physically consistent and interpretable neural networks while maintaining the predictive performance of unconstrained black-box DL-based parameterizations.

replace Robust Conformal Prediction Using Privileged Information

Authors: Shai Feldman, Yaniv Romano

Abstract: We develop a method to generate prediction sets with a guaranteed coverage rate that is robust to corruptions in the training data, such as missing or noisy variables. Our approach builds on conformal prediction, a powerful framework to construct prediction sets that are valid under the i.i.d assumption. Importantly, naively applying conformal prediction does not provide reliable predictions in this setting, due to the distribution shift induced by the corruptions. To account for the distribution shift, we assume access to privileged information (PI). The PI is formulated as additional features that explain the distribution shift, however, they are only available during training and absent at test time. We approach this problem by introducing a novel generalization of weighted conformal prediction and support our method with theoretical coverage guarantees. Empirical experiments on both real and synthetic datasets indicate that our approach achieves a valid coverage rate and constructs more informative predictions compared to existing methods, which are not supported by theoretical guarantees.

replace Physics-informed neural networks for parameter learning of wildfire spreading

Authors: Konstantinos Vogiatzoglou, Costas Papadimitriou, Vasilis Bontozoglou, Konstantinos Ampountolas

Abstract: Wildland fires pose a terrifying natural hazard, underscoring the urgent need to develop data-driven and physics-informed digital twins for wildfire prevention, monitoring, intervention, and response. In this direction of research, this work introduces a physics-informed neural network (PiNN) designed to learn the unknown parameters of an interpretable wildfire spreading model. The considered modeling approach integrates fundamental physical laws articulated by key model parameters essential for capturing the complex behavior of wildfires. The proposed machine learning framework leverages the theory of artificial neural networks with the physical constraints governing wildfire dynamics, including the first principles of mass and energy conservation. Training of the PiNN for physics-informed parameter identification is realized using synthetic data on the spatiotemporal evolution of one- and two-dimensional firefronts, derived from a high-fidelity simulator, as well as empirical data (ground surface thermal images) from the Troy Fire that occurred on June 19, 2002, in California. The parameter learning results demonstrate the predictive ability of the proposed PiNN in uncovering the unknown coefficients of the wildfire model in one- and two-dimensional fire spreading scenarios as well as the Troy Fire. Additionally, this methodology exhibits robustness by identifying the same parameters even in the presence of noisy data. By integrating this PiNN approach into a comprehensive framework, the envisioned physics-informed digital twin will enhance intelligent wildfire management and risk assessment, providing a powerful tool for proactive and reactive strategies.

replace An Autotuning-based Optimization Framework for Mixed-kernel SVM Classifications in Smart Pixel Datasets and Heterojunction Transistors

Authors: Xingfu Wu, Tupendra Oli, Justin H. Qian, Valerie Taylor, Mark C. Hersam, Vinod K. Sangwan

Abstract: Support Vector Machine (SVM) is a state-of-the-art classification method widely used in science and engineering due to its high accuracy, its ability to deal with high dimensional data, and its flexibility in modeling diverse sources of data. In this paper, we propose an autotuning-based optimization framework to quantify the ranges of hyperparameters in SVMs to identify their optimal choices, and apply the framework to two SVMs with the mixed-kernel between Sigmoid and Gaussian kernels for smart pixel datasets in high energy physics (HEP) and mixed-kernel heterojunction transistors (MKH). Our experimental results show that the optimal selection of hyperparameters in the SVMs and the kernels greatly varies for different applications and datasets, and choosing their optimal choices is critical for a high classification accuracy of the mixed kernel SVMs. Uninformed choices of hyperparameters C and coef0 in the mixed-kernel SVMs result in severely low accuracy, and the proposed framework effectively quantifies the proper ranges for the hyperparameters in the SVMs to identify their optimal choices to achieve the highest accuracy 94.6\% for the HEP application and the highest average accuracy 97.2\% with far less tuning time for the MKH application.

replace QBI: Quantile-Based Bias Initialization for Efficient Private Data Reconstruction in Federated Learning

Authors: Micha V. Nowak, Tim P. Bott, David Khachaturov, Frank Puppe, Adrian Krenzer, Amar Hekalo

Abstract: Federated learning enables the training of machine learning models on distributed data without compromising user privacy, as data remains on personal devices and only model updates, such as gradients, are shared with a central coordinator. However, recent research has shown that the central entity can perfectly reconstruct private data from shared model updates by maliciously initializing the model's parameters. In this paper, we propose QBI, a novel bias initialization method that significantly enhances reconstruction capabilities. This is accomplished by directly solving for bias values yielding sparse activation patterns. Further, we propose PAIRS, an algorithm that builds on QBI. PAIRS can be deployed when a separate dataset from the target domain is available to further increase the percentage of data that can be fully recovered. Measured by the percentage of samples that can be perfectly reconstructed from batches of various sizes, our approach achieves significant improvements over previous methods with gains of up to 50% on ImageNet and up to 60% on the IMDB sentiment analysis text dataset. Furthermore, we establish theoretical limits for attacks leveraging stochastic gradient sparsity, providing a foundation for understanding the fundamental constraints of these attacks. We empirically assess these limits using synthetic datasets. Finally, we propose and evaluate AGGP, a defensive framework designed to prevent gradient sparsity attacks, contributing to the development of more secure and private federated learning systems.

replace Integer-only Quantized Transformers for Embedded FPGA-based Time-series Forecasting in AIoT

Authors: Tianheng Ling, Chao Qian, Gregor Schiele

Abstract: This paper presents the design of a hardware accelerator for Transformers, optimized for on-device time-series forecasting in AIoT systems. It integrates integer-only quantization and Quantization-Aware Training with optimized hardware designs to realize 6-bit and 4-bit quantized Transformer models, which achieved precision comparable to 8-bit quantized models from related research. Utilizing a complete implementation on an embedded FPGA (Xilinx Spartan-7 XC7S15), we examine the feasibility of deploying Transformer models on embedded IoT devices. This includes a thorough analysis of achievable precision, resource utilization, timing, power, and energy consumption for on-device inference. Our results indicate that while sufficient performance can be attained, the optimization process is not trivial. For instance, reducing the quantization bitwidth does not consistently result in decreased latency or energy consumption, underscoring the necessity of systematically exploring various optimization combinations. Compared to an 8-bit quantized Transformer model in related studies, our 4-bit quantized Transformer model increases test loss by only 0.63%, operates up to 132.33x faster, and consumes 48.19x less energy.

replace A Tiny Supervised ODL Core with Auto Data Pruning for Human Activity Recognition

Authors: Hiroki Matsutani, Radu Marculescu

Abstract: In this paper, we introduce a low-cost and low-power tiny supervised on-device learning (ODL) core that can address the distributional shift of input data for human activity recognition. Although ODL for resource-limited edge devices has been studied recently, how exactly to provide the training labels to these devices at runtime remains an open-issue. To address this problem, we propose to combine an automatic data pruning with supervised ODL to reduce the number queries needed to acquire predicted labels from a nearby teacher device and thus save power consumption during model retraining. The data pruning threshold is automatically tuned, eliminating a manual threshold tuning. As a tinyML solution at a few mW for the human activity recognition, we design a supervised ODL core that supports our automatic data pruning using a 45nm CMOS process technology. We show that the required memory size for the core is smaller than the same-shaped multilayer perceptron (MLP) and the power consumption is only 3.39mW. Experiments using a human activity recognition dataset show that the proposed automatic data pruning reduces the communication volume by 55.7% and power consumption accordingly with only 0.9% accuracy loss.

replace Meta Clustering of Neural Bandits

Authors: Yikun Ban, Yunzhe Qi, Tianxin Wei, Lihui Liu, Jingrui He

Abstract: The contextual bandit has been identified as a powerful framework to formulate the recommendation process as a sequential decision-making process, where each item is regarded as an arm and the objective is to minimize the regret of $T$ rounds. In this paper, we study a new problem, Clustering of Neural Bandits, by extending previous work to the arbitrary reward function, to strike a balance between user heterogeneity and user correlations in the recommender system. To solve this problem, we propose a novel algorithm called M-CNB, which utilizes a meta-learner to represent and rapidly adapt to dynamic clusters, along with an informative Upper Confidence Bound (UCB)-based exploration strategy. We provide an instance-dependent performance guarantee for the proposed algorithm that withstands the adversarial context, and we further prove the guarantee is at least as good as state-of-the-art (SOTA) approaches under the same assumptions. In extensive experiments conducted in both recommendation and online classification scenarios, M-CNB outperforms SOTA baselines. This shows the effectiveness of the proposed approach in improving online recommendation and online classification performance.

replace SustainDC -- Benchmarking for Sustainable Data Center Control

Authors: Avisek Naug, Antonio Guillen, Ricardo Luna, Vineet Gundecha, Desik Rengarajan, Sahand Ghorbanpour, Sajad Mousavi, Ashwin Ramesh Babu, Dejan Markovikj, Lekhapriya D Kashyap, Soumyendu Sarkar

Abstract: Machine learning has driven an exponential increase in computational demand, leading to massive data centers that consume significant amounts of energy and contribute to climate change. This makes sustainable data center control a priority. In this paper, we introduce SustainDC, a set of Python environments for benchmarking multi-agent reinforcement learning (MARL) algorithms for data centers (DC). SustainDC supports custom DC configurations and tasks such as workload scheduling, cooling optimization, and auxiliary battery management, with multiple agents managing these operations while accounting for the effects of each other. We evaluate various MARL algorithms on SustainDC, showing their performance across diverse DC designs, locations, weather conditions, grid carbon intensity, and workload requirements. Our results highlight significant opportunities for improvement of data center operations using MARL algorithms. Given the increasing use of DC due to AI, SustainDC provides a crucial platform for the development and benchmarking of advanced algorithms essential for achieving sustainable computing and addressing other heterogeneous real-world challenges.

replace Overcoming Growth-Induced Forgetting in Task-Agnostic Continual Learning

Authors: Yuqing Zhao, Divya Saxena, Jiannong Cao, Xiaoyun Liu, Changlin Song

Abstract: In continual learning (CL), model growth enhances adaptability over new data, improving knowledge retention for more tasks. However, improper model growth can lead to severe degradation of previously learned knowledge, an issue we name as growth-induced forgetting (GIFt), especially in task-agnostic CL using entire grown model for inference. Existing works, despite adopting model growth and random initialization for better adaptability, often fail to recognize the presence of GIFt caused by improper model growth. This oversight limits comprehensive control of forgetting and hinders full utilization of model growth. We are the first in CL to identify this issue and conduct an in-depth study on root cause of GIFt, where layer expansion stands out among model growth strategies, widening layers without affecting model functionality. Yet, direct adoption of layer expansion presents challenges. It lacks data-driven control and initialization of expanded parameters to balance adaptability and knowledge retention. This paper presents a novel SparseGrow approach to overcome the issue of GIFt while enhancing adaptability over new data. SparseGrow employs data-driven sparse layer expansion to control efficient parameter usage during growth, reducing GIFt from excessive growth and functionality changes. It also combines sparse growth with on-data initialization at training late-stage to create partially 0-valued expansions that fit learned distribution, enhancing retention and adaptability. To further minimize forgetting, freezing is applied by calculating the sparse mask, allowing data-driven preservation of important parameters. Through experiments across datasets with various settings, cases, and task numbers, we demonstrate the necessity of layer expansion and showcase the effectiveness of SparseGrow in overcoming GIFt, highlighting its adaptability and knowledge retention for incremental tasks.

replace Mask-Encoded Sparsification: Mitigating Biased Gradients in Communication-Efficient Split Learning

Authors: Wenxuan Zhou, Zhihao Qu, Shen-Huan Lyu, Miao Cai, Baoliu Ye

Abstract: This paper introduces a novel framework designed to achieve a high compression ratio in Split Learning (SL) scenarios where resource-constrained devices are involved in large-scale model training. Our investigations demonstrate that compressing feature maps within SL leads to biased gradients that can negatively impact the convergence rates and diminish the generalization capabilities of the resulting models. Our theoretical analysis provides insights into how compression errors critically hinder SL performance, which previous methodologies underestimate. To address these challenges, we employ a narrow bit-width encoded mask to compensate for the sparsification error without increasing the order of time complexity. Supported by rigorous theoretical analysis, our framework significantly reduces compression errors and accelerates the convergence. Extensive experiments also verify that our method outperforms existing solutions regarding training efficiency and communication complexity.

replace Lemon and Orange Disease Classification using CNN-Extracted Features and Machine Learning Classifier

Authors: Khandoker Nosiba Arifin, Sayma Akter Rupa, Md Musfique Anwar, Israt Jahan

Abstract: Lemons and oranges, both are the most economically significant citrus fruits globally. The production of lemons and oranges is severely affected due to diseases in its growth stages. Fruit quality has degraded due to the presence of flaws. Thus, it is necessary to diagnose the disease accurately so that we can avoid major loss of lemons and oranges. To improve citrus farming, we proposed a disease classification approach for lemons and oranges. This approach would enable early disease detection and intervention, reduce yield losses, and optimize resource allocation. For the initial modeling of disease classification, the research uses innovative deep learning architectures such as VGG16, VGG19 and ResNet50. In addition, for achieving better accuracy, the basic machine learning algorithms used for classification problems include Random Forest, Naive Bayes, K-Nearest Neighbors (KNN) and Logistic Regression. The lemon and orange fruits diseases are classified more accurately (95.0% for lemon and 99.69% for orange) by the model. The model's base features were extracted from the ResNet50 pre-trained model and the diseases are classified by the Logistic Regression which beats the performance given by VGG16 and VGG19 for other classifiers. Experimental outcomes show that the proposed model also outperforms existing models in which most of them classified the diseases using the Softmax classifier without using any individual classifiers.

replace Large-Scale Multi-omic Biosequence Transformers for Modeling Peptide-Nucleotide Interactions

Authors: Sully F. Chen, Robert J. Steele, Beakal Lemeneh, Shivanand P. Lad, Eric Oermann

Abstract: The transformer architecture has revolutionized bioinformatics and driven progress in the understanding and prediction of the properties of biomolecules. Almost all research on large-scale biosequence transformers has focused on one domain at a time (single-omic), usually nucleotides or peptides. These models have seen incredible success in downstream tasks in each domain and have achieved particularly noteworthy breakthroughs in sequences of peptides and structural modeling. However, these single-omic models are naturally incapable of modeling multi-omic tasks, one of the most biologically critical being nucleotide-peptide interactions. We present our work training the first multi-omic nucleotide-peptide foundation models. We show that these multi-omic models (MOMs) can learn joint representations between various single-omic distributions that are emergently consistent with the Central Dogma of molecular biology, despite only being trained on unlabeled biosequences. We further demonstrate that MOMs can be fine-tuned to achieve state-of-the-art results on peptide-nucleotide interaction tasks, namely predicting the change in Gibbs free energy ({\Delta}G) of the binding interaction between a given oligonucleotide and peptide, as well as the effect on this binding interaction due to mutations in the oligonucleotide sequence ({\Delta}{\Delta}G). Remarkably, we show that multi-omic biosequence transformers emergently learn useful structural information without any prior structural training, allowing us to predict which peptide residues are most involved in the peptide-nucleotide binding interaction. Lastly, we provide evidence that multi-omic biosequence models are non-inferior to foundation models trained on single-omics distributions, suggesting a more generalized or foundational approach to building these models.

replace The Star Geometry of Critic-Based Regularizer Learning

Authors: Oscar Leong, Eliza O'Reilly, Yong Sheng Soh

Abstract: Variational regularization is a classical technique to solve statistical inference tasks and inverse problems, with modern data-driven approaches parameterizing regularizers via deep neural networks showcasing impressive empirical performance. Recent works along these lines learn task-dependent regularizers. This is done by integrating information about the measurements and ground-truth data in an unsupervised, critic-based loss function, where the regularizer attributes low values to likely data and high values to unlikely data. However, there is little theory about the structure of regularizers learned via this process and how it relates to the two data distributions. To make progress on this challenge, we initiate a study of optimizing critic-based loss functions to learn regularizers over a particular family of regularizers: gauges (or Minkowski functionals) of star-shaped bodies. This family contains regularizers that are commonly employed in practice and shares properties with regularizers parameterized by deep neural networks. We specifically investigate critic-based losses derived from variational representations of statistical distances between probability measures. By leveraging tools from star geometry and dual Brunn-Minkowski theory, we illustrate how these losses can be interpreted as dual mixed volumes that depend on the data distribution. This allows us to derive exact expressions for the optimal regularizer in certain cases. Finally, we identify which neural network architectures give rise to such star body gauges and when do such regularizers have favorable properties for optimization. More broadly, this work highlights how the tools of star geometry can aid in understanding the geometry of unsupervised regularizer learning.

replace CoDiCast: Conditional Diffusion Model for Weather Prediction with Uncertainty Quantification

Authors: Jimeng Shi, Bowen Jin, Jiawei Han, Giri Narasimhan

Abstract: Accurate weather forecasting is critical for science and society. Yet, existing methods have not managed to simultaneously have the properties of high accuracy, low uncertainty, and high computational efficiency. On one hand, to quantify the uncertainty in weather predictions, the strategy of ensemble forecast (i.e., generating a set of diverse predictions) is often employed. However, traditional ensemble numerical weather prediction (NWP) is computationally intensive. On the other hand, most existing machine learning-based weather prediction (MLWP) approaches are efficient and accurate. Nevertheless, they are deterministic and cannot capture the uncertainty of weather forecasting. In this work, we propose CoDiCast, a conditional diffusion model to generate accurate global weather prediction, while achieving uncertainty quantification with ensemble forecasts and modest computational cost. The key idea is to simulate a conditional version of the reverse denoising process in diffusion models, which starts from pure Gaussian noise to generate realistic weather scenarios for a future time point. Each denoising step is conditioned on observations from the recent past. Ensemble forecasts are achieved by repeatedly sampling from stochastic Gaussian noise to represent uncertainty quantification. CoDiCast is trained on a decade of ERA5 reanalysis data from the European Centre for Medium-Range Weather Forecasts (ECMWF). Experimental results demonstrate that our approach outperforms several existing data-driven methods in accuracy. Our conditional diffusion model, CoDiCast, can generate 3-day global weather forecasts, at 6-hour steps and $5.625^\circ$ latitude-longitude resolution, for over 5 variables, in about 12 minutes on a commodity A100 GPU machine with 80GB memory. The open-souced code is provided at \url{https://github.com/JimengShi/CoDiCast}.

URLs: https://github.com/JimengShi/CoDiCast

replace Symmetry constrained neural networks for detection and localization of damage in metal plates

Authors: James Amarel, Christopher Rudolf, Athanasios Iliopoulos, John Michopoulos, Leslie N. Smith

Abstract: The present paper is concerned with deep learning techniques applied to detection and localization of damage in a thin aluminum plate. We used data collected on a tabletop apparatus by mounting to the plate four piezoelectric transducers, each of which took turn to generate a Lamb wave that then traversed the region of interest before being received by the remaining three sensors. On training a neural network to analyze time-series data of the material response, which displayed damage-reflective features whenever the plate guided waves interacted with a contact load, we achieved a model that detected with greater than $99\%$ accuracy in addition to a model that localized with $2.58 \pm 0.12$ mm mean distance error. For each task, the best-performing model was designed according to the inductive bias that our transducers were both similar and arranged in a square pattern on a nearly uniform plate.

replace A Survey of Out-of-distribution Generalization for Graph Machine Learning from a Causal View

Authors: Jing Ma

Abstract: Graph machine learning (GML) has been successfully applied across a wide range of tasks. Nonetheless, GML faces significant challenges in generalizing over out-of-distribution (OOD) data, which raises concerns about its wider applicability. Recent advancements have underscored the crucial role of causality-driven approaches in overcoming these generalization challenges. Distinct from traditional GML methods that primarily rely on statistical dependencies, causality-focused strategies delve into the underlying causal mechanisms of data generation and model prediction, thus significantly improving the generalization of GML across different environments. This paper offers a thorough review of recent progress in causality-involved GML generalization. We elucidate the fundamental concepts of employing causality to enhance graph model generalization and categorize the various approaches, providing detailed descriptions of their methodologies and the connections among them. Furthermore, we explore the incorporation of causality in other related important areas of trustworthy GML, such as explanation, fairness, and robustness. Concluding with a discussion on potential future research directions, this review seeks to articulate the continuing development and future potential of causality in enhancing the trustworthiness of graph machine learning.

replace Sparse Low-Ranked Self-Attention Transformer for Remaining Useful Lifetime Prediction of Optical Fiber Amplifiers

Authors: Dominic Schneider, Lutz Rapp

Abstract: Optical fiber amplifiers are key elements in present optical networks. Failures of these components result in high financial loss of income of the network operator as the communication traffic over an affected link is interrupted. Applying Remaining useful lifetime (RUL) prediction in the context of Predictive Maintenance (PdM) to optical fiber amplifiers to predict upcoming system failures at an early stage, so that network outages can be minimized through planning of targeted maintenance actions, ensures reliability and safety. Optical fiber amplifier are complex systems, that work under various operating conditions, which makes correct forecasting a difficult task. Increased monitoring capabilities of systems results in datasets that facilitate the application of data-driven RUL prediction methods. Deep learning models in particular have shown good performance, but generalization based on comparatively small datasets for RUL prediction is difficult. In this paper, we propose Sparse Low-ranked self-Attention Transformer (SLAT) as a novel RUL prediction method. SLAT is based on an encoder-decoder architecture, wherein two parallel working encoders extract features for sensors and time steps. By utilizing the self-attention mechanism, long-term dependencies can be learned from long sequences. The implementation of sparsity in the attention matrix and a low-rank parametrization reduce overfitting and increase generalization. Experimental application to optical fiber amplifiers exemplified on EDFA, as well as a reference dataset from turbofan engines, shows that SLAT outperforms the state-of-the-art methods.

replace Reservoir Static Property Estimation Using Nearest-Neighbor Neural Network

Authors: Yuhe Wang

Abstract: This note presents an approach for estimating the spatial distribution of static properties in reservoir modeling using a nearest-neighbor neural network. The method leverages the strengths of neural networks in approximating complex, non-linear functions, particularly for tasks involving spatial interpolation. It incorporates a nearest-neighbor algorithm to capture local spatial relationships between data points and introduces randomization to quantify the uncertainty inherent in the interpolation process. This approach addresses the limitations of traditional geostatistical methods, such as Inverse Distance Weighting (IDW) and Kriging, which often fail to model the complex non-linear dependencies in reservoir data. By integrating spatial proximity and uncertainty quantification, the proposed method can improve the accuracy of static property predictions like porosity and permeability.

replace Reward-Robust RLHF in LLMs

Authors: Yuzi Yan, Xingzhou Lou, Jialian Li, Yiping Zhang, Jian Xie, Chao Yu, Yu Wang, Dong Yan, Yuan Shen

Abstract: As Large Language Models (LLMs) continue to progress toward more advanced forms of intelligence, Reinforcement Learning from Human Feedback (RLHF) is increasingly seen as a key pathway toward achieving Artificial General Intelligence (AGI). However, the reliance on reward-model-based (RM-based) alignment methods introduces significant challenges due to the inherent instability and imperfections of Reward Models (RMs), which can lead to critical issues such as reward hacking and misalignment with human intentions. In this paper, we introduce a reward-robust RLHF framework aimed at addressing these fundamental challenges, paving the way for more reliable and resilient learning in LLMs. Our approach introduces a novel optimization objective that carefully balances performance and robustness by incorporating Bayesian Reward Model Ensembles (BRME) to model the uncertainty set of reward functions. This allows the framework to integrate both nominal performance and minimum reward signals, ensuring more stable learning even with imperfect RMs. Empirical results demonstrate that our framework consistently outperforms baselines across diverse benchmarks, showing improved accuracy and long-term stability. We also provide a theoretical analysis, demonstrating that reward-robust RLHF approaches the stability of constant reward settings, which proves to be acceptable even in a stochastic-case analysis. Together, these contributions highlight the framework potential to enhance both the performance and stability of LLM alignment.

replace CauSkelNet: Causal Representation Learning for Human Behaviour Analysis

Authors: Xingrui Gu, Chuyi Jiang, Erte Wang, Zekun Wu, Qiang Cui, Leimin Tian, Lianlong Wu, Siyang Song, Chuang Yu

Abstract: Constrained by the lack of model interpretability and a deep understanding of human movement in traditional movement recognition machine learning methods, this study introduces a novel representation learning method based on causal inference to better understand human joint dynamics and complex behaviors. We propose a two-stage framework that combines the Peter-Clark (PC) algorithm and Kullback-Leibler (KL) divergence to identify and quantify causal relationships between joints. Our method effectively captures interactions and produces interpretable, robust representations. Experiments on the EmoPain dataset show that our causal GCN outperforms traditional GCNs in accuracy, F1 score, and recall, especially in detecting protective behaviors. The model is also highly invariant to data scale changes, enhancing its reliability in practical applications. Our approach advances human motion analysis and paves the way for more adaptive intelligent healthcare solutions.

replace FedRepOpt: Gradient Re-parametrized Optimizers in Federated Learning

Authors: Kin Wai Lau, Yasar Abbas Ur Rehman, Pedro Porto Buarque de Gusm\~ao, Lai-Man Po, Lan Ma, Yuyang Xie

Abstract: Federated Learning (FL) has emerged as a privacy-preserving method for training machine learning models in a distributed manner on edge devices. However, on-device models face inherent computational power and memory limitations, potentially resulting in constrained gradient updates. As the model's size increases, the frequency of gradient updates on edge devices decreases, ultimately leading to suboptimal training outcomes during any particular FL round. This limits the feasibility of deploying advanced and large-scale models on edge devices, hindering the potential for performance enhancements. To address this issue, we propose FedRepOpt, a gradient re-parameterized optimizer for FL. The gradient re-parameterized method allows training a simple local model with a similar performance as a complex model by modifying the optimizer's gradients according to a set of model-specific hyperparameters obtained from the complex models. In this work, we focus on VGG-style and Ghost-style models in the FL environment. Extensive experiments demonstrate that models using FedRepOpt obtain a significant boost in performance of 16.7% and 11.4% compared to the RepGhost-style and RepVGG-style networks, while also demonstrating a faster convergence time of 11.7% and 57.4% compared to their complex structure.

replace-cross On Rademacher Complexity-based Generalization Bounds for Deep Learning

Authors: Lan V. Truong

Abstract: We show that the Rademacher complexity-based approach can generate non-vacuous generalisation bounds on Convolutional Neural Networks (CNNs) for classifying a small number of classes of images. The development of new Talagrand's contraction lemmas for high-dimensional mappings between function spaces and CNNs for general Lipschitz activation functions is a key technical contribution. Our results show that the Rademacher complexity does not depend on the network length for CNNs with some special types of activation functions such as ReLU, Leaky ReLU, Parametric Rectifier Linear Unit, Sigmoid, and Tanh.

replace-cross DICTDIS: Dictionary Constrained Disambiguation for Improved NMT

Authors: Ayush Maheshwari, Preethi Jyothi, Ganesh Ramakrishnan

Abstract: Domain-specific neural machine translation (NMT) systems (e.g., in educational applications) are socially significant with the potential to help make information accessible to a diverse set of users in multilingual societies. It is desirable that such NMT systems be lexically constrained and draw from domain-specific dictionaries. Dictionaries could present multiple candidate translations for a source word/phrase due to the polysemous nature of words. The onus is then on the NMT model to choose the contextually most appropriate candidate. Prior work has largely ignored this problem and focused on the single candidate constraint setting wherein the target word or phrase is replaced by a single constraint. In this work we present DictDis, a lexically constrained NMT system that disambiguates between multiple candidate translations derived from dictionaries. We achieve this by augmenting training data with multiple dictionary candidates to actively encourage disambiguation during training by implicitly aligning multiple candidate constraints. We demonstrate the utility of DictDis via extensive experiments on English-Hindi and English-German sentences in a variety of domains including regulatory, finance, engineering. We also present comparisons on standard benchmark test datasets. In comparison with existing approaches for lexically constrained and unconstrained NMT, we demonstrate superior performance with respect to constraint copy and disambiguation related measures on all domains while also obtaining improved fluency of up to 2-3 BLEU points on some domains.

replace-cross Probabilistic unifying relations for modelling epistemic and aleatoric uncertainty: semantics and automated reasoning with theorem proving

Authors: Kangfeng Ye, Jim Woodcock, Simon Foster

Abstract: Probabilistic programming combines general computer programming, statistical inference, and formal semantics to help systems make decisions when facing uncertainty. Probabilistic programs are ubiquitous, including having a significant impact on machine intelligence. While many probabilistic algorithms have been used in practice in different domains, their automated verification based on formal semantics is still a relatively new research area. In the last two decades, it has attracted much interest. Many challenges, however, remain. The work presented in this paper, probabilistic unifying relations (ProbURel), takes a step towards our vision to tackle these challenges. Our work is based on Hehner's predicative probabilistic programming, but there are several obstacles to the broader adoption of his work. Our contributions here include (1) the formalisation of its syntax and semantics by introducing an Iverson bracket notation to separate relations from arithmetic; (2) the formalisation of relations using Unifying Theories of Programming (UTP) and probabilities outside the brackets using summation over the topological space of the real numbers; (3) the constructive semantics for probabilistic loops using Kleene's fixed-point theorem; (4) the enrichment of its semantics from distributions to subdistributions and superdistributions to deal with the constructive semantics; (5) the unique fixed-point theorem to simplify the reasoning about probabilistic loops; and (6) the mechanisation of our theory in Isabelle/UTP, an implementation of UTP in Isabelle/HOL, for automated reasoning using theorem proving. We demonstrate our work with six examples, including problems in robot localisation, classification in machine learning, and the termination of probabilistic loops.

replace-cross A Differentially Private Weighted Empirical Risk Minimization Procedure and its Application to Outcome Weighted Learning

Authors: Spencer Giddens, Yiwang Zhou, Kevin R. Krull, Tara M. Brinkman, Peter X. K. Song, Fang Liu

Abstract: It is common practice to use data containing personal information to build predictive models in the framework of empirical risk minimization (ERM). While these models can be highly accurate in prediction, sharing the results from these models trained on sensitive data may be susceptible to privacy attacks. Differential privacy (DP) is an appealing framework for addressing such data privacy issues by providing mathematically provable bounds on the privacy loss incurred when releasing information from sensitive data. Previous work has primarily concentrated on applying DP to unweighted ERM. We consider weighted ERM (wERM), an important generalization, where each individual's contribution to the objective function can be assigned varying weights. We propose the first differentially private algorithm for general wERM, with theoretical DP guarantees. Extending the existing DP-ERM procedures to wERM creates a pathway for deriving privacy-preserving learning methods for individualized treatment rules, including the popular outcome weighted learning (OWL). We evaluate the performance of the DP-wERM framework applied to OWL in both simulation studies and in a real clinical trial. All empirical results demonstrate the feasibility of training OWL models via wERM with DP guarantees while maintaining sufficiently robust model performance, providing strong evidence for the practicality of implementing the proposed privacy-preserving OWL procedure in real-world scenarios involving sensitive data.

replace-cross Graph-enhanced Optimizers for Structure-aware Recommendation Embedding Evolution

Authors: Cong Xu, Jun Wang, Jianyong Wang, Wei Zhang

Abstract: Embedding plays a key role in modern recommender systems because they are virtual representations of real-world entities and the foundation for subsequent decision-making models. In this paper, we propose a novel embedding update mechanism, Structure-aware Embedding Evolution (SEvo for short), to encourage related nodes to evolve similarly at each step. Unlike GNN (Graph Neural Network) that typically serves as an intermediate module, SEvo is able to directly inject graph structural information into embedding with minimal computational overhead during training. The convergence properties of SEvo along with its potential variants are theoretically analyzed to justify the validity of the designs. Moreover, SEvo can be seamlessly integrated into existing optimizers for state-of-the-art performance. Particularly SEvo-enhanced AdamW with moment estimate correction demonstrates consistent improvements across a spectrum of models and datasets, suggesting a novel technical route to effectively utilize graph structural information beyond explicit GNN modules.

replace-cross Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models

Authors: Saman Motamed, Danda Pani Paudel, Luc Van Gool

Abstract: Text-to-Image (T2I) models excel at synthesizing concepts such as nouns, appearances, and styles. To enable customized content creation based on a few example images of a concept, methods such as Textual Inversion and DreamBooth invert the desired concept and enable synthesizing it in new scenes. However, inverting personalized concepts that go beyond object appearance and style (adjectives and verbs) through natural language remains a challenge. Two key characteristics of these concepts contribute to the limitations of current inversion methods. 1) Adjectives and verbs are entangled with nouns (subject) and can hinder appearance-based inversion methods, where the subject appearance leaks into the concept embedding, and 2) describing such concepts often extends beyond single word embeddings. In this study, we introduce Lego, a textual inversion method designed to invert subject-entangled concepts from a few example images. Lego disentangles concepts from their associated subjects using a simple yet effective Subject Separation step and employs a Context Loss that guides the inversion of single/multi-embedding concepts. In a thorough user study, Lego-generated concepts were preferred over 70% of the time when compared to the baseline in terms of authentically generating concepts according to a reference. Additionally, visual question answering using an LLM suggested Lego-generated concepts are better aligned with the text description of the concept.

replace-cross On fundamental aspects of quantum extreme learning machines

Authors: Weijie Xiong, Giorgio Facelli, Mehrad Sahebi, Owen Agnel, Thiparat Chotibut, Supanut Thanasilp, Zo\"e Holmes

Abstract: Quantum Extreme Learning Machines (QELMs) have emerged as a promising framework for quantum machine learning. Their appeal lies in the rich feature map induced by the dynamics of a quantum substrate - the quantum reservoir - and the efficient post-measurement training via linear regression. Here we study the expressivity of QELMs by decomposing the prediction of QELMs into a Fourier series. We show that the achievable Fourier frequencies are determined by the data encoding scheme, while Fourier coefficients depend on both the reservoir and the measurement. Notably, the expressivity of QELMs is fundamentally limited by the number of Fourier frequencies and the number of observables, while the complexity of the prediction hinges on the reservoir. As a cautionary note on scalability, we identify four sources that can lead to the exponential concentration of the observables as the system size grows (randomness, hardware noise, entanglement, and global measurements) and show how this can turn QELMs into useless input-agnostic oracles. In particular, our result on the reservoir-induced concentration strongly indicates that quantum reservoirs drawn from a highly random ensemble make QELM models unscalable. Our analysis elucidates the potential and fundamental limitations of QELMs, and lays the groundwork for systematically exploring quantum reservoir systems for other machine learning tasks.

replace-cross Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector

Authors: Yuqian Fu, Yu Wang, Yixuan Pan, Lian Huai, Xingyu Qiu, Zeyu Shangguan, Tong Liu, Yanwei Fu, Luc Van Gool, Xingqun Jiang

Abstract: This paper studies the challenging cross-domain few-shot object detection (CD-FSOD), aiming to develop an accurate object detector for novel domains with minimal labeled examples. While transformer-based open-set detectors, such as DE-ViT, show promise in traditional few-shot object detection, their generalization to CD-FSOD remains unclear: 1) can such open-set detection methods easily generalize to CD-FSOD? 2) If not, how can models be enhanced when facing huge domain gaps? To answer the first question, we employ measures including style, inter-class variance (ICV), and indefinable boundaries (IB) to understand the domain gap. Based on these measures, we establish a new benchmark named CD-FSOD to evaluate object detection methods, revealing that most of the current approaches fail to generalize across domains. Technically, we observe that the performance decline is associated with our proposed measures: style, ICV, and IB. Consequently, we propose several novel modules to address these issues. First, the learnable instance features align initial fixed instances with target categories, enhancing feature distinctiveness. Second, the instance reweighting module assigns higher importance to high-quality instances with slight IB. Third, the domain prompter encourages features resilient to different styles by synthesizing imaginary domains without altering semantic contents. These techniques collectively contribute to the development of the Cross-Domain Vision Transformer for CD-FSOD (CD-ViTO), significantly improving upon the base DE-ViT. Experimental results validate the efficacy of our model.

replace-cross Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code

Authors: Vahid Majdinasab, Amin Nikanjam, Foutse Khomh

Abstract: Code auditing ensures that the developed code adheres to standards, regulations, and copyright protection by verifying that it does not contain code from protected sources. The recent advent of Large Language Models (LLMs) as coding assistants in the software development process poses new challenges for code auditing. The dataset for training these models is mainly collected from publicly available sources. This raises the issue of intellectual property infringement as developers' codes are already included in the dataset. Therefore, auditing code developed using LLMs is challenging, as it is difficult to reliably assert if an LLM used during development has been trained on specific copyrighted codes, given that we do not have access to the training datasets of these models. Given the non-disclosure of the training datasets, traditional approaches such as code clone detection are insufficient for asserting copyright infringement. To address this challenge, we propose a new approach, TraWiC; a model-agnostic and interpretable method based on membership inference for detecting code inclusion in an LLM's training dataset. We extract syntactic and semantic identifiers unique to each program to train a classifier for detecting code inclusion. In our experiments, we observe that TraWiC is capable of detecting 83.87% of codes that were used to train an LLM. In comparison, the prevalent clone detection tool NiCad is only capable of detecting 47.64%. In addition to its remarkable performance, TraWiC has low resource overhead in contrast to pair-wise clone detection that is conducted during the auditing process of tools like CodeWhisperer reference tracker, across thousands of code snippets.

replace-cross PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning

Authors: Gyeongman Kim, Doohyuk Jang, Eunho Yang

Abstract: Recent advancements in large language models (LLMs) have raised concerns about inference costs, increasing the need for research into model compression. While knowledge distillation (KD) is a prominent method for this, research on KD for generative language models like LLMs is relatively sparse, and the approach of distilling student-friendly knowledge, which has shown promising performance in KD for classification models, remains unexplored in generative language models. To explore this approach, we propose PromptKD, a simple yet effective method that utilizes prompt tuning - for the first time in KD - to enable generative language models to transfer student-friendly knowledge. Unlike previous works in classification that require fine-tuning the entire teacher model for extracting student-friendly knowledge, PromptKD achieves similar effects by adding a small number of prompt tokens and tuning only the prompt with student guidance. Extensive experiments on instruction-following datasets show that PromptKD achieves state-of-the-art performance while adding only 0.0007% of the teacher's parameters as prompts. Further analysis suggests that distilling student-friendly knowledge alleviates exposure bias effectively throughout the entire training process, leading to performance enhancements.

replace-cross Implicit Image-to-Image Schrodinger Bridge for Image Restoration

Authors: Yuang Wang, Siyeop Yoon, Pengfei Jin, Matthew Tivnan, Sifan Song, Zhennong Chen, Rui Hu, Li Zhang, Quanzheng Li, Zhiqiang Chen, Dufan Wu

Abstract: Diffusion-based models are widely recognized for their effectiveness in image restoration tasks; however, their iterative denoising process, which begins from Gaussian noise, often results in slow inference speeds. The Image-to-Image Schr\"odinger Bridge (I$^2$SB) presents a promising alternative by starting the generative process from corrupted images and leveraging training techniques from score-based diffusion models. In this paper, we introduce the Implicit Image-to-Image Schr\"odinger Bridge (I$^3$SB) to further accelerate the generative process of I$^2$SB. I$^3$SB reconfigures the generative process into a non-Markovian framework by incorporating the initial corrupted image into each step, while ensuring that the marginal distribution aligns with that of I$^2$SB. This allows for the direct use of the pretrained network from I$^2$SB. Extensive experiments on natural images, human face images, and medical images validate the acceleration benefits of I$^3$SB. Compared to I$^2$SB, I$^3$SB achieves the same perceptual quality with fewer generative steps, while maintaining equal or improved fidelity to the ground truth.

replace-cross Deep Bayesian Future Fusion for Self-Supervised, High-Resolution, Off-Road Mapping

Authors: Shubhra Aich, Wenshan Wang, Parv Maheshwari, Matthew Sivaprakasam, Samuel Triest, Cherie Ho, Jason M. Gregory, John G. Rogers III, Sebastian Scherer

Abstract: High-speed off-road navigation requires long-range, high-resolution maps to enable robots to safely navigate over different surfaces while avoiding dangerous obstacles. However, due to limited computational power and sensing noise, most approaches to off-road mapping focus on producing coarse (20-40cm) maps of the environment. In this paper, we propose Future Fusion, a framework capable of generating dense, high-resolution maps from sparse sensing data (30m forward at 2cm). This is accomplished by - (1) the efficient realization of the well-known Bayes filtering within the standard deep learning models that explicitly accounts for the sparsity pattern in stereo and LiDAR depth data, and (2) leveraging perceptual losses common in generative image completion. The proposed methodology outperforms the conventional baselines. Moreover, the learned features and the completed dense maps lead to improvements in the downstream navigation task.

replace-cross VITaL Pretraining: Visuo-Tactile Pretraining for Tactile and Non-Tactile Manipulation Policies

Authors: Abraham George, Selam Gano, Pranav Katragadda, Amir Barati Farimani

Abstract: Tactile information is a critical tool for dexterous manipulation. As humans, we rely heavily on tactile information to understand objects in our environments and how to interact with them. We use touch not only to perform manipulation tasks but also to learn how to perform these tasks. Therefore, to create robotic agents that can learn to complete manipulation tasks at a human or super-human level of performance, we need to properly incorporate tactile information into both skill execution and skill learning. In this paper, we investigate how we can incorporate tactile information into imitation learning platforms to improve performance on manipulation tasks. We show that incorporating visuo-tactile pretraining improves imitation learning performance, not only for tactile agents (policies that use tactile information at inference), but also for non-tactile agents (policies that do not use tactile information at inference). For these non-tactile agents, pretraining with tactile information significantly improved performance (for example, improving the accuracy on USB plugging from 20% to 85%), reaching a level on par with visuo-tactile agents, and even surpassing them in some cases. For demonstration videos and access to our codebase, see the project website: https://sites.google.com/andrew.cmu.edu/visuo-tactile-pretraining

URLs: https://sites.google.com/andrew.cmu.edu/visuo-tactile-pretraining

replace-cross Proprioception Is All You Need: Terrain Classification for Boreal Forests

Authors: Damien LaRocque, William Guimont-Martin, David-Alexandre Duclos, Philippe Gigu\`ere, Fran\c{c}ois Pomerleau

Abstract: Recent works in field robotics highlighted the importance of resiliency against different types of terrains. Boreal forests, in particular, are home to many mobility-impeding terrains that should be considered for off-road autonomous navigation. Also, being one of the largest land biomes on Earth, boreal forests are an area where autonomous vehicles are expected to become increasingly common. In this paper, we address this issue by introducing BorealTC, a publicly available dataset for proprioceptive-based terrain classification (TC). Recorded with a Husky A200, our dataset contains 116 min of Inertial Measurement Unit (IMU), motor current, and wheel odometry data, focusing on typical boreal forest terrains, notably snow, ice, and silty loam. Combining our dataset with another dataset from the state-of-the-art, we evaluate both a Convolutional Neural Network (CNN) and the novel state space model (SSM)-based Mamba architecture on a TC task. Interestingly, we show that while CNN outperforms Mamba on each separate dataset, Mamba achieves greater accuracy when trained on a combination of both. In addition, we demonstrate that Mamba's learning capacity is greater than a CNN for increasing amounts of data. We show that the combination of two TC datasets yields a latent space that can be interpreted with the properties of the terrains. We also discuss the implications of merging datasets on classification. Our source code and dataset are publicly available online: https://github.com/norlab-ulaval/BorealTC.

URLs: https://github.com/norlab-ulaval/BorealTC.

replace-cross Make Large Language Model a Better Ranker

Authors: Wen-Shuo Chao, Zhi Zheng, Hengshu Zhu, Hao Liu

Abstract: Large Language Models (LLMs) demonstrate robust capabilities across various fields, leading to a paradigm shift in LLM-enhanced Recommender System (RS). Research to date focuses on point-wise and pair-wise recommendation paradigms, which are inefficient for LLM-based recommenders due to high computational costs. However, existing list-wise approaches also fall short in ranking tasks due to misalignment between ranking objectives and next-token prediction. Moreover, these LLM-based methods struggle to effectively address the order relation among candidates, particularly given the scale of ratings. To address these challenges, this paper introduces the large language model framework with Aligned Listwise Ranking Objectives (ALRO). ALRO is designed to bridge the gap between the capabilities of LLMs and the nuanced requirements of ranking tasks. Specifically, ALRO employs explicit feedback in a listwise manner by introducing soft lambda loss, a customized adaptation of lambda loss designed for optimizing order relations. This mechanism provides more accurate optimization goals, enhancing the ranking process. Additionally, ALRO incorporates a permutation-sensitive learning mechanism that addresses position bias, a prevalent issue in generative models, without imposing additional computational burdens during inference. Our evaluative studies reveal that ALRO outperforms both existing embedding-based recommendation methods and LLM-based recommendation baselines.

replace-cross The Impact of Unstated Norms in Bias Analysis of Language Models

Authors: Farnaz Kohankhaki, D. B. Emerson, Jacob-Junqi Tian, Laleh Seyyed-Kalantari, Faiza Khan Khattak

Abstract: Bias in large language models (LLMs) has many forms, from overt discrimination to implicit stereotypes. Counterfactual bias evaluation is a widely used approach to quantifying bias and often relies on template-based probes that explicitly state group membership. It measures whether the outcome of a task, performed by an LLM, is invariant to a change of group membership. In this work, we find that template-based probes can lead to unrealistic bias measurements. For example, LLMs appear to mistakenly cast text associated with White race as negative at higher rates than other groups. We hypothesize that this arises artificially via a mismatch between commonly unstated norms, in the form of markedness, in the pretraining text of LLMs (e.g., Black president vs. president) and templates used for bias measurement (e.g., Black president vs. White president). The findings highlight the potential misleading impact of varying group membership through explicit mention in counterfactual bias quantification.

replace-cross PIM-Opt: Demystifying Distributed Optimization Algorithms on a Real-World Processing-In-Memory System

Authors: Steve Rhyner, Haocong Luo, Juan G\'omez-Luna, Mohammad Sadrosadati, Jiawei Jiang, Ataberk Olgun, Harshita Gupta, Ce Zhang, Onur Mutlu

Abstract: Modern Machine Learning (ML) training on large-scale datasets is a very time-consuming workload. It relies on the optimization algorithm Stochastic Gradient Descent (SGD) due to its effectiveness, simplicity, and generalization performance. Processor-centric architectures (e.g., CPUs, GPUs) commonly used for modern ML training workloads based on SGD are bottlenecked by data movement between the processor and memory units due to the poor data locality in accessing large datasets. As a result, processor-centric architectures suffer from low performance and high energy consumption while executing ML training workloads. Processing-In-Memory (PIM) is a promising solution to alleviate the data movement bottleneck by placing the computation mechanisms inside or near memory. Our goal is to understand the capabilities of popular distributed SGD algorithms on real-world PIM systems to accelerate data-intensive ML training workloads. To this end, we 1) implement several representative centralized parallel SGD algorithms on the real-world UPMEM PIM system, 2) rigorously evaluate these algorithms for ML training on large-scale datasets in terms of performance, accuracy, and scalability, 3) compare to conventional CPU and GPU baselines, and 4) discuss implications for future PIM hardware and highlight the need for a shift to an algorithm-hardware codesign. Our results demonstrate three major findings: 1) The UPMEM PIM system can be a viable alternative to state-of-the-art CPUs and GPUs for many memory-bound ML training workloads, especially when operations and datatypes are natively supported by PIM hardware, 2) it is important to carefully choose the optimization algorithms that best fit PIM, and 3) the UPMEM PIM system does not scale approximately linearly with the number of nodes for many data-intensive ML training workloads. We open source all our code to facilitate future research.

replace-cross An Off-Policy Reinforcement Learning Algorithm Customized for Multi-Task Fusion in Large-Scale Recommender Systems

Authors: Peng Liu, Cong Xu, Ming Zhao, Jiawei Zhu, Bin Wang, Yi Ren

Abstract: As the last critical stage of RSs, Multi-Task Fusion (MTF) is responsible for combining multiple scores outputted by Multi-Task Learning (MTL) into a final score to maximize user satisfaction, which determines the ultimate recommendation results. Recently, to optimize long-term user satisfaction within a recommendation session, Reinforcement Learning (RL) is used for MTF in the industry. However, the off-policy RL algorithms used for MTF so far have the following severe problems: 1) to avoid out-of-distribution (OOD) problem, their constraints are overly strict, which seriously damage their performance; 2) they are unaware of the exploration policy used for producing training data and never interact with real environment, so only suboptimal policy can be learned; 3) the traditional exploration policies are inefficient and hurt user experience. To solve the above problems, we propose a novel method named IntegratedRL-MTF customized for MTF in large-scale RSs. IntegratedRL-MTF integrates off-policy RL model with our online exploration policy to relax overstrict and complicated constraints, which significantly improves its performance. We also design an extremely efficient exploration policy, which eliminates low-value exploration space and focuses on exploring potential high-value state-action pairs. Moreover, we adopt progressive training mode to further enhance our model's performance with the help of our exploration policy. We conduct extensive offline and online experiments in the short video channel of Tencent News. The results demonstrate that our model outperforms other models remarkably. IntegratedRL-MTF has been fully deployed in our RS and other large-scale RSs in Tencent, which have achieved significant improvements.

replace-cross Learning to Boost the Performance of Stable Nonlinear Systems

Authors: Luca Furieri, Clara Luc\'ia Galimberti, Giancarlo Ferrari-Trecate

Abstract: The growing scale and complexity of safety-critical control systems underscore the need to evolve current control architectures aiming for the unparalleled performances achievable through state-of-the-art optimization and machine learning algorithms. However, maintaining closed-loop stability while boosting the performance of nonlinear control systems using data-driven and deep-learning approaches stands as an important unsolved challenge. In this paper, we tackle the performance-boosting problem with closed-loop stability guarantees. Specifically, we establish a synergy between the Internal Model Control (IMC) principle for nonlinear systems and state-of-the-art unconstrained optimization approaches for learning stable dynamics. Our methods enable learning over arbitrarily deep neural network classes of performance-boosting controllers for stable nonlinear systems; crucially, we guarantee L_p closed-loop stability even if optimization is halted prematurely, and even when the ground-truth dynamics are unknown, with vanishing conservatism in the class of stabilizing policies as the model uncertainty is reduced to zero. We discuss the implementation details of the proposed control schemes, including distributed ones, along with the corresponding optimization procedures, demonstrating the potential of freely shaping the cost functions through several numerical experiments.

replace-cross Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

Authors: Joo Young Choi, Jaesung R. Park, Inkyu Park, Jaewoong Cho, Albert No, Ernest K. Ryu

Abstract: Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or caption embedding input corresponding to the desired conditional generation. Such conditioning involves scale-and-shift operations to the convolutional layers but does not directly affect the attention layers. While these standard architectural choices are certainly effective, not conditioning the attention layers feels arbitrary and potentially suboptimal. In this work, we show that simply adding LoRA conditioning to the attention layers without changing or tuning the other parts of the U-Net architecture improves the image generation quality. For example, a drop-in addition of LoRA conditioning to EDM diffusion model yields FID scores of 1.91/1.75 for unconditional and class-conditional CIFAR-10 generation, improving upon the baseline of 1.97/1.79.

replace-cross Full error analysis of the random deep splitting method for nonlinear parabolic PDEs and PIDEs

Authors: Ariel Neufeld, Philipp Schmocker, Sizhou Wu

Abstract: In this paper, we present a randomized extension of the deep splitting algorithm introduced in [Beck, Becker, Cheridito, Jentzen, and Neufeld (2021)] using random neural networks suitable to approximately solve both high-dimensional nonlinear parabolic PDEs and PIDEs with jumps having (possibly) infinite activity. We provide a full error analysis of our so-called random deep splitting method. In particular, we prove that our random deep splitting method converges to the (unique viscosity) solution of the nonlinear PDE or PIDE under consideration. Moreover, we empirically analyze our random deep splitting method by considering several numerical examples including both nonlinear PDEs and nonlinear PIDEs relevant in the context of pricing of financial derivatives under default risk. In particular, we empirically demonstrate in all examples that our random deep splitting method can approximately solve nonlinear PDEs and PIDEs in 10'000 dimensions within seconds.

replace-cross In-context Contrastive Learning for Event Causality Identification

Authors: Chao Liang, Wei Xiang, Bang Wang

Abstract: Event Causality Identification (ECI) aims at determining the existence of a causal relation between two events. Although recent prompt learning-based approaches have shown promising improvements on the ECI task, their performance are often subject to the delicate design of multiple prompts and the positive correlations between the main task and derivate tasks. The in-context learning paradigm provides explicit guidance for label prediction in the prompt learning paradigm, alleviating its reliance on complex prompts and derivative tasks. However, it does not distinguish between positive and negative demonstrations for analogy learning. Motivated from such considerations, this paper proposes an In-Context Contrastive Learning (ICCL) model that utilizes contrastive learning to enhance the effectiveness of both positive and negative demonstrations. Additionally, we apply contrastive learning to event pairs to better facilitate event causality identification. Our ICCL is evaluated on the widely used corpora, including the EventStoryLine and Causal-TimeBank, and results show significant performance improvements over the state-of-the-art algorithms.

replace-cross Bi-Directional Transformers vs. word2vec: Discovering Vulnerabilities in Lifted Compiled Code

Authors: Gary A. McCully, John D. Hastings, Shengjie Xu, Adam Fortier

Abstract: Detecting vulnerabilities within compiled binaries is challenging due to lost high-level code structures and other factors such as architectural dependencies, compilers, and optimization options. To address these obstacles, this research explores vulnerability detection using natural language processing (NLP) embedding techniques with word2vec, BERT, and RoBERTa to learn semantics from intermediate representation (LLVM IR) code. Long short-term memory (LSTM) neural networks were trained on embeddings from encoders created using approximately 48k LLVM functions from the Juliet dataset. This study is pioneering in its comparison of word2vec models with multiple bidirectional transformers (BERT, RoBERTa) embeddings built using LLVM code to train neural networks to detect vulnerabilities in compiled binaries. Word2vec Skip-Gram models achieved 92% validation accuracy in detecting vulnerabilities, outperforming word2vec Continuous Bag of Words (CBOW), BERT, and RoBERTa. This suggests that complex contextual embeddings may not provide advantages over simpler word2vec models for this task when a limited number (e.g. 48K) of data samples are used to train the bidirectional transformer-based models. The comparative results provide novel insights into selecting optimal embeddings for learning compiler-independent semantic code representations to advance machine learning detection of vulnerabilities in compiled binaries.

replace-cross LEMMA-RCA: A Large Multi-modal Multi-domain Dataset for Root Cause Analysis

Authors: Lecheng Zheng, Zhengzhang Chen, Dongjie Wang, Chengyuan Deng, Reon Matsuoka, Haifeng Chen

Abstract: Root cause analysis (RCA) is crucial for enhancing the reliability and performance of complex systems. However, progress in this field has been hindered by the lack of large-scale, open-source datasets tailored for RCA. To bridge this gap, we introduce LEMMA-RCA, a large dataset designed for diverse RCA tasks across multiple domains and modalities. LEMMA-RCA features various real-world fault scenarios from IT and OT operation systems, encompassing microservices, water distribution, and water treatment systems, with hundreds of system entities involved. We evaluate the quality of LEMMA-RCA by testing the performance of eight baseline methods on this dataset under various settings, including offline and online modes as well as single and multiple modalities. Our experimental results demonstrate the high quality of LEMMA-RCA. The dataset is publicly available at https://lemma-rca.github.io/.

URLs: https://lemma-rca.github.io/.

replace-cross RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization

Authors: Xijie Huang, Zechun Liu, Shih-Yang Liu, Kwang-Ting Cheng

Abstract: Low-Rank Adaptation (LoRA), as a representative Parameter-Efficient Fine-Tuning (PEFT)method, significantly enhances the training efficiency by updating only a small portion of the weights in Large Language Models (LLMs). Recently, weight-only quantization techniques have also been applied to LoRA methods to reduce the memory footprint of fine-tuning. However, applying weight-activation quantization to the LoRA pipeline is under-explored, and we observe substantial performance degradation primarily due to the presence of activation outliers. In this work, we propose RoLoRA, the first LoRA-based scheme for effective weight-activation quantization. RoLoRA utilizes rotation for outlier elimination and proposes rotation-aware fine-tuning to preserve the outlier-free characteristics in rotated LLMs. Experimental results show RoLoRA consistently improves low-bit LoRA convergence and post-training quantization robustness in weight-activation settings. We evaluate RoLoRA across LLaMA2-7B/13B, LLaMA3-8B models, achieving up to 29.5% absolute accuracy gain of 4-bit weight-activation quantized LLaMA2- 13B on commonsense reasoning tasks compared to LoRA baseline. We further demonstrate its effectiveness on Large Multimodal Models (LLaVA-1.5-7B). Codes are available at https://github.com/HuangOwen/RoLoRA

URLs: https://github.com/HuangOwen/RoLoRA

replace-cross LongLaMP: A Benchmark for Personalized Long-form Text Generation

Authors: Ishita Kumar, Snigdha Viswanathan, Sushrita Yerra, Alireza Salemi, Ryan A. Rossi, Franck Dernoncourt, Hanieh Deilamsalehy, Xiang Chen, Ruiyi Zhang, Shubham Agarwal, Nedim Lipka, Chein Van Nguyen, Thien Huu Nguyen, Hamed Zamani

Abstract: Long-text generation is seemingly ubiquitous in real-world applications of large language models such as generating an email or writing a review. Despite the fundamental importance and prevalence of long-text generation in many practical applications, existing work on personalized generation has focused on the generation of very short text. To overcome these limitations, we study the problem of personalized long-text generation, that is, generating long-text that is personalized for a specific user while being practically useful for the vast majority of real-world applications that naturally require the generation of longer text. In this work, we demonstrate the importance of user-specific personalization for long-text generation tasks and develop the Long-text Language Model Personalization (LongLaMP) Benchmark. LongLaMP provides a comprehensive and diverse evaluation framework for personalized long-text generation. Extensive experiments on LongLaMP for zero-shot and fine-tuned language tasks demonstrate the effectiveness of the proposed benchmark and its utility for developing and evaluating techniques for personalized long-text generation across a wide variety of long-text generation tasks. The results highlight the importance of personalization across a wide variety of long-text generation tasks. Finally, we release the benchmark for others to use for this important problem.

replace-cross Generalisation to unseen topologies: Towards control of biological neural network activity

Authors: Laurens Engwegen, Daan Brinks, Wendelin B\"ohmer

Abstract: Novel imaging and neurostimulation techniques open doors for advancements in closed-loop control of activity in biological neural networks. This would allow for applications in the investigation of activity propagation, and for diagnosis and treatment of pathological behaviour. Due to the partially observable characteristics of activity propagation, through networks in which edges can not be observed, and the dynamic nature of neuronal systems, there is a need for adaptive, generalisable control. In this paper, we introduce an environment that procedurally generates neuronal networks with different topologies to investigate this generalisation problem. Additionally, an existing transformer-based architecture is adjusted to evaluate the generalisation performance of a deep RL agent in the presented partially observable environment. The agent demonstrates the capability to generalise control from a limited number of training networks to unseen test networks.

replace-cross SharkTrack: an accurate, generalisable software for streamlining shark and ray underwater video analysis

Authors: Filippo Varini, Joel H. Gayford, Jeremy Jenrette, Matthew J. Witt, Francesco Garzon, Francesco Ferretti, Sophie Wilday, Mark E. Bond, Michael R. Heithaus, Danielle Robinson, Devon Carter, Najee Gumbs, Vincent Webster, Ben Glocker

Abstract: Elasmobranchs (shark sand rays) represent a critical component of marine ecosystems. Yet, they are experiencing global population declines and effective monitoring of populations is essential to their protection. Underwater stationary videos, such as those from Baited Remote Underwater Video Stations (BRUVS), are critical for understanding elasmobranch spatial ecology and abundance. However, processing these videos requires time-consuming manual analysis that can delay conservation. To address this challenge, we developed SharkTrack, a semi-automatic underwater video analysis software. SharkTrack uses Convolutional Neural Networks (CNN) and Multi-Object Tracking to automatically detect and track elasmobranchs and provides an annotation pipeline to manually classify elasmobranch species and compute species-specific MaxN (ssMaxN), the standard metric of relative abundance. When tested on BRUVS footage from locations unseen by the CNN model during training, SharkTrack computed ssMaxN with 89% accuracy over 207 hours of footage. The semi-automatic SharkTrack pipeline required two minutes of manual classification per hour of video, an estimated 95% reduction of manual analysis time compared to traditional methods. Furthermore, we demonstrate SharkTrack accuracy across diverse marine ecosystems and elasmobranch species, an advancement compared to previous models, which were limited to specific species or locations. SharkTrack applications extend beyond BRUVS, facilitating the analysis of any underwater stationary video. By making video analysis faster and more accessible, SharkTrack enables research and conservation organisations to monitor elasmobranch populations more efficiently, thereby improving conservation efforts. To further support these goals, we provide public access to the SharkTrack software.

replace-cross Optical ISAC: Fundamental Performance Limits and Transceiver Design

Authors: Alireza Ghazavi Khorasgani (5/6GIC, Institute for Communication Systems), Mahtab Mirmohseni (5/6GIC, Institute for Communication Systems), Ahmed Elzanaty (5/6GIC, Institute for Communication Systems)

Abstract: This paper characterizes the optimal Capacity-Distortion (C-D) tradeoff in an optical point-to-point system with Single-Input Single-Output (SISO) for communication and Single-Input Multiple-Output (SIMO) for sensing within an Integrated Sensing and Communication (ISAC) framework. We consider the optimal Rate-Distortion (R-D) region and explore several Inner (IB) and Outer Bounds (OB). We introduce practical, asymptotically optimal Maximum A Posteriori (MAP) and Maximum Likelihood Estimators (MLE) for target distance, addressing nonlinear measurement-to-state relationships and non-conjugate priors. As the number of sensing antennas increases, these estimators converge to the Bayesian Cram\'er-Rao Bound (BCRB). We also establish that the achievable Rate-Cram\'er-Rao Bound (R-CRB) serves as an OB for the optimal C-D region, valid for both unbiased estimators and asymptotically large numbers of receive antennas. To clarify that the input distribution determines the tradeoff across the Pareto boundary of the C-D region, we propose two algorithms: i) an iterative Blahut-Arimoto Algorithm (BAA)-type method, and ii) a memory-efficient Closed-Form (CF) approach. The CF approach includes a CF optimal distribution for high Optical Signal-to-Noise Ratio (O-SNR) conditions. Additionally, we adapt and refine the Deterministic-Random Tradeoff (DRT) to this optical ISAC context.

replace-cross Feature-Based Interpretable Surrogates for Optimization

Authors: Marc Goerigk, Michael Hartisch, Sebastian Merten, Kartikey Sharma

Abstract: For optimization models to be used in practice, it is crucial that users trust the results. A key factor in this aspect is the interpretability of the solution process. A previous framework for inherently interpretable optimization models used decision trees to map instances to solutions of the underlying optimization model. Based on this work, we investigate how we can use more general optimization rules to further increase interpretability and, at the same time, give more freedom to the decision-maker. The proposed rules do not map to a concrete solution but to a set of solutions characterized by common features. To find such optimization rules, we present an exact methodology using mixed-integer programming formulations as well as heuristics. We also outline the challenges and opportunities that these methods present. In particular, we demonstrate the improvement in solution quality that our approach offers compared to existing interpretable surrogates for optimization, and we discuss the relationship between interpretability and performance. These findings are supported by experiments using both synthetic and real-world data.

replace-cross LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like Posts

Authors: Henrique Da Silva Gameiro, Andrei Kucharavy, Ljiljana Dolamic

Abstract: With the emergence of widely available powerful LLMs, disinformation generated by large Language Models (LLMs) has become a major concern. Historically, LLM detectors have been touted as a solution, but their effectiveness in the real world is still to be proven. In this paper, we focus on an important setting in information operations -- short news-like posts generated by moderately sophisticated attackers. We demonstrate that existing LLM detectors, whether zero-shot or purpose-trained, are not ready for real-world use in that setting. All tested zero-shot detectors perform inconsistently with prior benchmarks and are highly vulnerable to sampling temperature increase, a trivial attack absent from recent benchmarks. A purpose-trained detector generalizing across LLMs and unseen attacks can be developed, but it fails to generalize to new human-written texts. We argue that the former indicates domain-specific benchmarking is needed, while the latter suggests a trade-off between the adversarial evasion resilience and overfitting to the reference human text, with both needing evaluation in benchmarks and currently absent. We believe this suggests a re-consideration of current LLM detector benchmarking approaches and provides a dynamically extensible benchmark to allow it (https://github.com/Reliable-Information-Lab-HEVS/benchmark_llm_texts_detection).

URLs: https://github.com/Reliable-Information-Lab-HEVS/benchmark_llm_texts_detection).

replace-cross 2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation?

Authors: T\'eo Guichoux, Laure Soulier, Nicolas Obin, Catherine Pelachaud

Abstract: Co-speech gestures are fundamental for communication. The advent of recent deep learning techniques has facilitated the creation of lifelike, synchronous co-speech gestures for Embodied Conversational Agents. "In-the-wild" datasets, aggregating video content from platforms like YouTube via human pose detection technologies, provide a feasible solution by offering 2D skeletal sequences aligned with speech. Concurrent developments in lifting models enable the conversion of these 2D sequences into 3D gesture databases. However, it is important to note that the 3D poses estimated from the 2D extracted poses are, in essence, approximations of the ground-truth, which remains in the 2D domain. This distinction raises questions about the impact of gesture representation dimensionality on the quality of generated motions - a topic that, to our knowledge, remains largely unexplored. Our study examines the effect of using either 2D or 3D joint coordinates as training data on the performance of speech-to-gesture deep generative models. We employ a lifting model for converting generated 2D pose sequences into 3D and assess how gestures created directly in 3D stack up against those initially generated in 2D and then converted to 3D. We perform an objective evaluation using widely used metrics in the gesture generation field as well as a user study to qualitatively evaluate the different approaches.

replace-cross An Enhanced-State Reinforcement Learning Algorithm for Multi-Task Fusion in Large-Scale Recommender Systems

Authors: Peng Liu, Jiawei Zhu, Cong Xu, Ming Zhao, Bin Wang

Abstract: As the last key stage of Recommender Systems (RSs), Multi-Task Fusion (MTF) is in charge of combining multiple scores predicted by Multi-Task Learning (MTL) into a final score to maximize user satisfaction, which decides the ultimate recommendation results. In recent years, to maximize long-term user satisfaction within a recommendation session, Reinforcement Learning (RL) is widely used for MTF in large-scale RSs. However, limited by their modeling pattern, all the current RL-MTF methods can only utilize user features as the state to generate actions for each user, but unable to make use of item features and other valuable features, which leads to suboptimal results. Addressing this problem is a challenge that requires breaking through the current modeling pattern of RL-MTF. To solve this problem, we propose a novel method called Enhanced-State RL for MTF in RSs. Unlike the existing methods mentioned above, our method first defines user features, item features, and other valuable features collectively as the enhanced state; then proposes a novel actor and critic learning process to utilize the enhanced state to make much better action for each user-item pair. To the best of our knowledge, this novel modeling pattern is being proposed for the first time in the field of RL-MTF. We conduct extensive offline and online experiments in a large-scale RS. The results demonstrate that our model outperforms other models significantly. Enhanced-State RL has been fully deployed in our RS more than half a year, improving +3.84% user valid consumption and +0.58% user duration time compared to baseline.

replace-cross Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking

Authors: Ningyuan Xi, Xiaoyu Wang, Yetao Wu, Teng Chen, Qingqing Gu, Jinxian Qu, Zhonglin Jiang, Yong Chen, Luo Ji

Abstract: Large Language Model can reasonably understand and generate human expressions but may lack of thorough thinking and reasoning mechanisms. Recently there have been several studies which enhance the thinking ability of language models but most of them are not data-driven or training-based. In this paper, we are motivated by the cognitive mechanism in the natural world, and design a novel model architecture called TaS which allows it to first consider the thoughts and then express the response based upon the query. We design several pipelines to annotate or generate the thought contents from prompt-response samples, then add language heads in a middle layer which behaves as the thinking layer. We train the language model by the thoughts-augmented data and successfully let the thinking layer automatically generate reasonable thoughts and finally output more reasonable responses. Both qualitative examples and quantitative results validate the effectiveness and performance of TaS. Our code is available at https://anonymous.4open.science/r/TadE.

URLs: https://anonymous.4open.science/r/TadE.

replace-cross A preliminary study on continual learning in computer vision using Kolmogorov-Arnold Networks

Authors: Alessandro Cacciatore, Valerio Morelli, Federica Paganica, Emanuele Frontoni, Lucia Migliorelli, Daniele Berardini

Abstract: Deep learning has long been dominated by multi-layer perceptrons (MLPs), which have demonstrated superiority over other optimizable models in various domains. Recently, a new alternative to MLPs has emerged - Kolmogorov-Arnold Networks (KAN)- which are based on a fundamentally different mathematical framework. According to their authors, KANs address several major issues in MLPs, such as catastrophic forgetting in continual learning scenarios. However, this claim has only been supported by results from a regression task on a toy 1D dataset. In this paper, we extend the investigation by evaluating the performance of KANs in continual learning tasks within computer vision, specifically using the MNIST datasets. To this end, we conduct a structured analysis of the behavior of MLPs and two KAN-based models in a class-incremental learning scenario, ensuring that the architectures involved have the same number of trainable parameters. Our results demonstrate that an efficient version of KAN outperforms both traditional MLPs and the original KAN implementation. We further analyze the influence of hyperparameters in MLPs and KANs, as well as the impact of certain trainable parameters in KANs, such as bias and scale weights. Additionally, we provide a preliminary investigation of recent KAN-based convolutional networks and compare their performance with that of traditional convolutional neural networks. Our codes can be found at https://github.com/MrPio/KAN-Continual_Learning_tests.

URLs: https://github.com/MrPio/KAN-Continual_Learning_tests.

replace-cross RAMBO: Enhancing RAG-based Repository-Level Method Body Completion

Authors: Tuan-Dung Bui, Duc-Thieu Luu-Van, Thanh-Phat Nguyen, Thu-Trang Nguyen, Son Nguyen, Hieu Dinh Vo

Abstract: Code completion is essential in software development, helping developers by predicting code snippets based on context. Among completion tasks, Method Body Completion (MBC) is particularly challenging as it involves generating complete method bodies based on their signatures and context. This task becomes significantly harder in large repositories, where method bodies must integrate repositoryspecific elements such as custom APIs, inter-module dependencies, and project-specific conventions. In this paper, we introduce RAMBO, a novel RAG-based approach for repository-level MBC. Instead of retrieving similar method bodies, RAMBO identifies essential repository-specific elements, such as classes, methods, and variables/fields, and their relevant usages. By incorporating these elements and their relevant usages into the code generation process, RAMBO ensures more accurate and contextually relevant method bodies. Our experimental results with leading code LLMs across 40 Java projects show that RAMBO significantly outperformed the state-of-the-art repository-level MBC approaches, with the improvements of up to 46% in BLEU, 57% in CodeBLEU, 36% in Compilation Rate, and up to 3X in Exact Match. Notably, RAMBO surpassed RepoCoder Oracle method by up to 12% in Exact Match, setting a new benchmark for repository-level MBC.

replace-cross M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning

Authors: Taowen Wang, Yiyang Liu, James Chenhao Liang, junhan zhao, Yiming Cui, Yuning Mao, Shaoliang Nie, Jiahao Liu, Fuli Feng, Zenglin Xu, Cheng Han, Lifu Huang, Qifan Wang, Dongfang Liu

Abstract: Multimodal Large Language Models (MLLMs) demonstrate remarkable performance across a wide range of domains, with increasing emphasis on enhancing their zero-shot generalization capabilities for unseen tasks across various modalities. Instruction tuning has emerged as an effective strategy for achieving zero-shot generalization by finetuning pretrained models on diverse multimodal tasks. As the scale of MLLMs continues to grow, parameter-efficient finetuning becomes increasingly critical. However, most existing parameter-efficient approaches focus only on single modalities and often overlook the multimodal characteristics during finetuning. In this work, we introduce a novel Multimodal Prompt Tuning (M$^2$PT) approach for efficient instruction tuning of MLLMs. M$^2$PT effectively integrates visual and textual prompts into the vision encoder and language processor respectively during finetuning, facilitating the extraction and alignment of features across modalities. Empirical results on various multimodal evaluation datasets demonstrate the superior performance of our approach compared to several state-of-the-art baselines. A comprehensive set of ablation studies validates the effectiveness of our prompt design and the efficiency of our approach.

replace-cross Improvements to SDXL in NovelAI Diffusion V3

Authors: Juan Ossa, Eren Do\u{g}an, Alex Birch, F. Johnson

Abstract: In this technical report, we document the changes we made to SDXL in the process of training NovelAI Diffusion V3, our state of the art anime image generation model.

replace-cross CryptoTrain: Fast Secure Training on Encrypted Dataset

Authors: Jiaqi Xue, Yancheng Zhang, Yanshan Wang, Xueqiang Wang, Hao Zheng, Qian Lou

Abstract: Secure training, while protecting the confidentiality of both data and model weights, typically incurs significant training overhead. Traditional Fully Homomorphic Encryption (FHE)-based non-inter-active training models are heavily burdened by computationally demanding bootstrapping. To develop an efficient secure training system, we established a foundational framework, CryptoTrain-B, utilizing a hybrid cryptographic protocol that merges FHE with Oblivious Transfer (OT) for handling linear and non-linear operations, respectively. This integration eliminates the need for costly bootstrapping. Although CryptoTrain-B sets a new baseline in performance, reducing its training overhead remains essential. We found that ciphertext-ciphertext multiplication (CCMul) is a critical bottleneck in operations involving encrypted inputs and models. Our solution, the CCMul-Precompute technique, involves precomputing CCMul offline and resorting to the less resource-intensive ciphertext-plaintext multiplication (CPMul) during private training. Furthermore, conventional polynomial convolution in FHE systems tends to encode irrelevant and redundant values into polynomial slots, necessitating additional polynomials and ciphertexts for input representation and leading to extra multiplications. Addressing this, we introduce correlated polynomial convolution, which encodes only related input values into polynomials, thus drastically reducing the number of computations and overheads. By integrating CCMul-Precompute and correlated polynomial convolution into CryptoTrain-B, we facilitate a rapid and efficient secure training framework, CryptoTrain. Extensive experiments demonstrate that CryptoTrain achieves a ~5.3X training time reduction compared to prior methods.

replace-cross Modulated Intervention Preference Optimization (MIPO): Keep the Easy, Refine the Difficult

Authors: Cheolhun Jang

Abstract: Preference optimization methods typically begin training with a well-trained SFT model as a reference model. In RLHF and DPO, a regularization term is used during the preference optimization process to prevent the policy model from deviating too far from the reference model's distribution, thereby avoiding the generation of anomalous responses. When the reference model is already well-aligned with the given data or only requires slight adjustments, this approach can produce a well-aligned model. However, if the reference model is not aligned with the given data and requires significant deviation from its current state, a regularization term may actually hinder the model alignment. In this study, we propose \textbf{Modulated Intervention Preference Optimization (MIPO)} to address this issue. MIPO modulates the degree of intervention from the reference model based on how well the given data is aligned with it. If the data is well-aligned, the intervention is increased to prevent the policy model from diverging significantly from reference model. Conversely, if the alignment is poor, the interference is reduced to facilitate more extensive training. We compare the performance of MIPO and DPO using Mistral-7B and Llama3-8B in Alpaca Eval 2.0 and MT-Bench. The experimental results demonstrate that MIPO consistently outperforms DPO across various evaluation scenarios.

replace-cross MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt Attacks

Authors: Giandomenico Cornacchia, Giulio Zizzo, Kieran Fraser, Muhammad Zaid Hamed, Ambrish Rawat, Mark Purcell

Abstract: The proliferation of Large Language Models (LLMs) in diverse applications underscores the pressing need for robust security measures to thwart potential jailbreak attacks. These attacks exploit vulnerabilities within LLMs, endanger data integrity and user privacy. Guardrails serve as crucial protective mechanisms against such threats, but existing models often fall short in terms of both detection accuracy, and computational efficiency. This paper advocates for the significance of jailbreak attack prevention on LLMs, and emphasises the role of input guardrails in safeguarding these models. We introduce MoJE (Mixture of Jailbreak Expert), a novel guardrail architecture designed to surpass current limitations in existing state-of-the-art guardrails. By employing simple linguistic statistical techniques, MoJE excels in detecting jailbreak attacks while maintaining minimal computational overhead during model inference. Through rigorous experimentation, MoJE demonstrates superior performance capable of detecting 90% of the attacks without compromising benign prompts, enhancing LLMs security against jailbreak attacks.

replace-cross Few-shot Pairwise Rank Prompting: An Effective Non-Parametric Retrieval Model

Authors: Nilanjan Sinhababu, Andrew Parry, Debasis Ganguly, Debasis Samanta, Pabitra Mitra

Abstract: A supervised ranking model, despite its advantage of being effective, usually involves complex processing - typically multiple stages of task-specific pre-training and fine-tuning. This has motivated researchers to explore simpler pipelines leveraging large language models (LLMs) that are capable of working in a zero-shot manner. However, since zero-shot inference does not make use of a training set of pairs of queries and their relevant documents, its performance is mostly worse than that of supervised models, which are trained on such example pairs. Motivated by the existing findings that training examples generally improve zero-shot performance, in our work, we explore if this also applies to ranking models. More specifically, given a query and a pair of documents, the preference prediction task is improved by augmenting examples of preferences for similar queries from a training set. Our proposed pairwise few-shot ranker demonstrates consistent improvements over the zero-shot baseline on both in-domain (TREC DL) and out-domain (BEIR subset) retrieval benchmarks. Our method also achieves a close performance to that of a supervised model without requiring any complex training pipeline.

replace-cross Confidence intervals uncovered: Are we ready for real-world medical imaging AI?

Authors: Evangelia Christodoulou, Annika Reinke, Rola Houhou, Piotr Kalinowski, Selen Erkan, Carole H. Sudre, Ninon Burgos, Sofi\`ene Boutaj, Sophie Loizillon, Ma\"elys Solal, Nicola Rieke, Veronika Cheplygina, Michela Antonelli, Leon D. Mayer, Minu D. Tizabi, M. Jorge Cardoso, Amber Simpson, Paul F. J\"ager, Annette Kopp-Schneider, Ga\"el Varoquaux, Olivier Colliot, Lena Maier-Hein

Abstract: Medical imaging is spearheading the AI transformation of healthcare. Performance reporting is key to determine which methods should be translated into clinical practice. Frequently, broad conclusions are simply derived from mean performance values. In this paper, we argue that this common practice is often a misleading simplification as it ignores performance variability. Our contribution is threefold. (1) Analyzing all MICCAI segmentation papers (n = 221) published in 2023, we first observe that more than 50% of papers do not assess performance variability at all. Moreover, only one (0.5%) paper reported confidence intervals (CIs) for model performance. (2) To address the reporting bottleneck, we show that the unreported standard deviation (SD) in segmentation papers can be approximated by a second-order polynomial function of the mean Dice similarity coefficient (DSC). Based on external validation data from 56 previous MICCAI challenges, we demonstrate that this approximation can accurately reconstruct the CI of a method using information provided in publications. (3) Finally, we reconstructed 95% CIs around the mean DSC of MICCAI 2023 segmentation papers. The median CI width was 0.03 which is three times larger than the median performance gap between the first and second ranked method. For more than 60% of papers, the mean performance of the second-ranked method was within the CI of the first-ranked method. We conclude that current publications typically do not provide sufficient evidence to support which models could potentially be translated into clinical practice.

replace-cross Hypergame Theory for Decentralized Resource Allocation in Multi-user Semantic Communications

Authors: Christo Kurisummoottil Thomas, Walid Saad

Abstract: Semantic communications (SC) is an emerging communication paradigm in which wireless devices can send only relevant information from a source of data while relying on computing resources to regenerate missing data points. However, the design of a multi-user SC system becomes more challenging because of the computing and communication overhead required for coordination. Existing solutions for learning the semantic language and performing resource allocation often fail to capture the computing and communication tradeoffs involved in multiuser SC. To address this gap, a novel framework for decentralized computing and communication resource allocation in multiuser SC systems is proposed. The challenge of efficiently allocating communication and computing resources (for reasoning) in a decentralized manner to maximize the quality of task experience for the end users is addressed through the application of Stackelberg hyper game theory. Leveraging the concept of second-level hyper games, novel analytical formulations are developed to model misperceptions of the users about each other's communication and control strategies. Further, equilibrium analysis of the learned resource allocation protocols examines the convergence of the computing and communication strategies to a local Stackelberg equilibria, considering misperceptions. Simulation results show that the proposed Stackelberg hyper game results in efficient usage of communication and computing resources while maintaining a high quality of experience for the users compared to state-of-the-art that does not account for the misperceptions.