new A Dual-Path neural network model to construct the flame nonlinear thermoacoustic response in the time domain

Authors: Jiawei Wu, Teng Wang, Jiaqi Nan, Lijun Yang, Jingxuan Li

Abstract: Traditional numerical simulation methods require substantial computational resources to accurately determine the complete nonlinear thermoacoustic response of flames to various perturbation frequencies and amplitudes. In this paper, we have developed deep learning algorithms that can construct a comprehensive flame nonlinear response from limited numerical simulation data. To achieve this, we propose using a frequency-sweeping data type as the training dataset, which incorporates a rich array of learnable information within a constrained dataset. To enhance the precision in learning flame nonlinear response patterns from the training data, we introduce a Dual-Path neural network. This network consists of a Chronological Feature Path and a Temporal Detail Feature Path. The Dual-Path network is specifically designed to focus intensively on the temporal characteristics of velocity perturbation sequences, yielding more accurate flame response patterns and enhanced generalization capabilities. Validations confirm that our approach can accurately model flame nonlinear responses, even under conditions of significant nonlinearity, and exhibits robust generalization capabilities across various test scenarios.

new Simplex-enabled Safe Continual Learning Machine

Authors: Yihao Cai, Hongpeng Cao, Yanbing Mao, Lui Sha, Marco Caccamo

Abstract: This paper proposes the SeC-Learning Machine: Simplex-enabled safe continual learning for safety-critical autonomous systems. The SeC-learning machine is built on Simplex logic (that is, ``using simplicity to control complexity'') and physics-regulated deep reinforcement learning (Phy-DRL). The SeC-learning machine thus constitutes HP (high performance)-Student, HA (high assurance)-Teacher, and Coordinator. Specifically, the HP-Student is a pre-trained high-performance but not fully verified Phy-DRL, continuing to learn in a real plant to tune the action policy to be safe. In contrast, the HA-Teacher is a mission-reduced, physics-model-based, and verified design. As a complementary, HA-Teacher has two missions: backing up safety and correcting unsafe learning. The Coordinator triggers the interaction and the switch between HP-Student and HA-Teacher. Powered by the three interactive components, the SeC-learning machine can i) assure lifetime safety (i.e., safety guarantee in any continual-learning stage, regardless of HP-Student's success or convergence), ii) address the Sim2Real gap, and iii) learn to tolerate unknown unknowns in real plants. The experiments on a cart-pole system and a real quadruped robot demonstrate the distinguished features of the SeC-learning machine, compared with continual learning built on state-of-the-art safe DRL frameworks with approaches to addressing the Sim2Real gap.

new Memory-Optimized Once-For-All Network

Authors: Maxime Girard, Victor Qu\'etu, Samuel Tardieu, Van-Tam Nguyen, Enzo Tartaglione

Abstract: Deploying Deep Neural Networks (DNNs) on different hardware platforms is challenging due to varying resource constraints. Besides handcrafted approaches aiming at making deep models hardware-friendly, Neural Architectures Search is rising as a toolbox to craft more efficient DNNs without sacrificing performance. Among these, the Once-For-All (OFA) approach offers a solution by allowing the sampling of well-performing sub-networks from a single supernet -- this leads to evident advantages in terms of computation. However, OFA does not fully utilize the potential memory capacity of the target device, focusing instead on limiting maximum memory usage per layer. This leaves room for an unexploited potential in terms of model generalizability. In this paper, we introduce a Memory-Optimized OFA (MOOFA) supernet, designed to enhance DNN deployment on resource-limited devices by maximizing memory usage (and for instance, features diversity) across different configurations. Tested on ImageNet, our MOOFA supernet demonstrates improvements in memory exploitation and model accuracy compared to the original OFA supernet. Our code is available at https://github.com/MaximeGirard/memory-optimized-once-for-all.

URLs: https://github.com/MaximeGirard/memory-optimized-once-for-all.

new OPAL: Outlier-Preserved Microscaling Quantization A ccelerator for Generative Large Language Models

Authors: Jahyun Koo, Dahoon Park, Sangwoo Jung, Jaeha Kung

Abstract: To overcome the burden on the memory size and bandwidth due to ever-increasing size of large language models (LLMs), aggressive weight quantization has been recently studied, while lacking research on quantizing activations. In this paper, we present a hardware-software co-design method that results in an energy-efficient LLM accelerator, named OPAL, for generation tasks. First of all, a novel activation quantization method that leverages the microscaling data format while preserving several outliers per sub-tensor block (e.g., four out of 128 elements) is proposed. Second, on top of preserving outliers, mixed precision is utilized that sets 5-bit for inputs to sensitive layers in the decoder block of an LLM, while keeping inputs to less sensitive layers to 3-bit. Finally, we present the OPAL hardware architecture that consists of FP units for handling outliers and vectorized INT multipliers for dominant non-outlier related operations. In addition, OPAL uses log2-based approximation on softmax operations that only requires shift and subtraction to maximize power efficiency. As a result, we are able to improve the energy efficiency by 1.6~2.2x, and reduce the area by 2.4~3.1x with negligible accuracy loss, i.e., <1 perplexity increase.

new Towards Narrowing the Generalization Gap in Deep Boolean Networks

Authors: Youngsung Kim

Abstract: The rapid growth of the size and complexity in deep neural networks has sharply increased computational demands, challenging their efficient deployment in real-world scenarios. Boolean networks, constructed with logic gates, offer a hardware-friendly alternative that could enable more efficient implementation. However, their ability to match the performance of traditional networks has remained uncertain. This paper explores strategies to enhance deep Boolean networks with the aim of surpassing their traditional counterparts. We propose novel methods, including logical skip connections and spatiality preserving sampling, and validate them on vision tasks using widely adopted datasets, demonstrating significant improvement over existing approaches. Our analysis shows how deep Boolean networks can maintain high performance while minimizing computational costs through 1-bit logic operations. These findings suggest that Boolean networks are a promising direction for efficient, high-performance deep learning models, with significant potential for advancing hardware-accelerated AI applications.

new Programming Refusal with Conditional Activation Steering

Authors: Bruce W. Lee, Inkit Padhi, Karthikeyan Natesan Ramamurthy, Erik Miehling, Pierre Dognin, Manish Nagireddy, Amit Dhurandhar

Abstract: LLMs have shown remarkable capabilities, but precisely controlling their response behavior remains challenging. Existing activation steering methods alter LLM behavior indiscriminately, limiting their practical applicability in settings where selective responses are essential, such as content moderation or domain-specific assistants. In this paper, we propose Conditional Activation Steering (CAST), which analyzes LLM activation patterns during inference to selectively apply or withhold activation steering based on the input context. Our method is based on the observation that different categories of prompts activate distinct patterns in the model's hidden states. Using CAST, one can systematically control LLM behavior with rules like "if input is about hate speech or adult content, then refuse" or "if input is not about legal advice, then refuse." This allows for selective modification of responses to specific content while maintaining normal responses to other content, all without requiring weight optimization. We release an open-source implementation of our framework.

new Faster Q-Learning Algorithms for Restless Bandits

Authors: Parvish Kakarapalli, Devendra Kayande, Rahul Meshram

Abstract: We study the Whittle index learning algorithm for restless multi-armed bandits (RMAB). We first present Q-learning algorithm and its variants -- speedy Q-learning (SQL), generalized speedy Q-learning (GSQL) and phase Q-learning (PhaseQL). We also discuss exploration policies -- $\epsilon$-greedy and Upper confidence bound (UCB). We extend the study of Q-learning and its variants with UCB policy. We illustrate using numerical example that Q-learning with UCB exploration policy has faster convergence and PhaseQL with UCB have fastest convergence rate. We next extend the study of Q-learning variants for index learning to RMAB. The algorithm of index learning is two-timescale variant of stochastic approximation, on slower timescale we update index learning scheme and on faster timescale we update Q-learning assuming fixed index value. We study constant stepsizes two timescale stochastic approximation algorithm. We describe the performance of our algorithms using numerical example. It illustrate that index learning with Q learning with UCB has faster convergence that $\epsilon$ greedy. Further, PhaseQL (with UCB and $\epsilon$ greedy) has the best convergence than other Q-learning algorithms.

new Developing an Explainable Artificial Intelligent (XAI) Model for Predicting Pile Driving Vibrations in Bangkok's Subsoil

Authors: Sompote Youwai, Anuwat Pamungmoon

Abstract: This study presents an explainable artificial intelligent (XAI) model for predicting pile driving vibrations in Bangkok's soft clay subsoil. A deep neural network was developed using a dataset of 1,018 real-world pile driving measurements, encompassing variations in pile dimensions, hammer characteristics, sensor locations, and vibration measurement axes. The model achieved a mean absolute error (MAE) of 0.276, outperforming traditional empirical methods and other machine learning approaches such as XGBoost and CatBoost. SHapley Additive exPlanations (SHAP) analysis was employed to interpret the model's predictions, revealing complex relationships between input features and peak particle velocity (PPV). Distance from the pile driving location emerged as the most influential factor, followed by hammer weight and pile size. Non-linear relationships and threshold effects were observed, providing new insights into vibration propagation in soft clay. A web-based application was developed to facilitate adoption by practicing engineers, bridging the gap between advanced machine learning techniques and practical engineering applications. This research contributes to the field of geotechnical engineering by offering a more accurate and nuanced approach to predicting pile driving vibrations, with implications for optimizing construction practices and mitigating environmental impacts in urban areas. The model and its source code are publicly available, promoting transparency and reproducibility in geotechnical research.

new STLLM-DF: A Spatial-Temporal Large Language Model with Diffusion for Enhanced Multi-Mode Traffic System Forecasting

Authors: Zhiqi Shao, Haoning Xi, Haohui Lu, Ze Wang, Michael G. H. Bell, Junbin Gao

Abstract: The rapid advancement of Intelligent Transportation Systems (ITS) presents challenges, particularly with missing data in multi-modal transportation and the complexity of handling diverse sequential tasks within a centralized framework. To address these issues, we propose the Spatial-Temporal Large Language Model Diffusion (STLLM-DF), an innovative model that leverages Denoising Diffusion Probabilistic Models (DDPMs) and Large Language Models (LLMs) to improve multi-task transportation prediction. The DDPM's robust denoising capabilities enable it to recover underlying data patterns from noisy inputs, making it particularly effective in complex transportation systems. Meanwhile, the non-pretrained LLM dynamically adapts to spatial-temporal relationships within multi-modal networks, allowing the system to efficiently manage diverse transportation tasks in both long-term and short-term predictions. Extensive experiments demonstrate that STLLM-DF consistently outperforms existing models, achieving an average reduction of 2.40\% in MAE, 4.50\% in RMSE, and 1.51\% in MAPE. This model significantly advances centralized ITS by enhancing predictive accuracy, robustness, and overall system performance across multiple tasks, thus paving the way for more effective spatio-temporal traffic forecasting through the integration of frozen transformer language models and diffusion techniques.

new SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values

Authors: Chengwei Sun, Jiwei Wei, Yujia Wu, Yiming Shi, Shiyuan He, Zeyu Ma, Ning Xie, Yang Yang

Abstract: Large pre-trained models (LPMs) have demonstrated exceptional performance in diverse natural language processing and computer vision tasks. However, fully fine-tuning these models poses substantial memory challenges, particularly in resource-constrained environments. Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, mitigate this issue by adjusting only a small subset of parameters. Nevertheless, these methods typically employ random initialization for low-rank matrices, which can lead to inefficiencies in gradient descent and diminished generalizability due to suboptimal starting points. To address these limitations, we propose SVFit, a novel PEFT approach that leverages singular value decomposition (SVD) to initialize low-rank matrices using critical singular values as trainable parameters. Specifically, SVFit performs SVD on the pre-trained weight matrix to obtain the best rank-r approximation matrix, emphasizing the most critical singular values that capture over 99% of the matrix's information. These top-r singular values are then used as trainable parameters to scale the fundamental subspaces of the matrix, facilitating rapid domain adaptation. Extensive experiments across various pre-trained models in natural language understanding, text-to-image generation, and image classification tasks reveal that SVFit outperforms LoRA while requiring 16 times fewer trainable parameters.

new Machine Learning Based Optimal Design of Fibrillar Adhesives

Authors: Mohammad Shojaeifard, Matteo Ferraresso, Alessandro Lucantonio, Mattia Bacca

Abstract: Fibrillar adhesion, observed in animals like beetles, spiders, and geckos, relies on nanoscopic or microscopic fibrils to enhance surface adhesion via 'contact splitting.' This concept has inspired engineering applications across robotics, transportation, and medicine. Recent studies suggest that functional grading of fibril properties can improve adhesion, but this is a complex design challenge that has only been explored in simplified geometries. While machine learning (ML) has gained traction in adhesive design, no previous attempts have targeted fibril-array scale optimization. In this study, we propose an ML-based tool that optimizes the distribution of fibril compliance to maximize adhesive strength. Our tool, featuring two deep neural networks (DNNs), recovers previous design results for simple geometries and introduces novel solutions for complex configurations. The Predictor DNN estimates adhesive strength based on random compliance distributions, while the Designer DNN optimizes compliance for maximum strength using gradient-based optimization. Our method significantly reduces test error and accelerates the optimization process, offering a high-performance solution for designing fibrillar adhesives and micro-architected materials aimed at fracture resistance by achieving equal load sharing (ELS).

new Alt-MoE: Multimodal Alignment via Alternating Optimization of Multi-directional MoE with Unimodal Models

Authors: Hongyang Lei, Xiaolong Cheng, Dan Wang, Qi Qin, Huazhen Huang, Yetao Wu, Qingqing Gu, Zhonglin Jiang, Yong Chen, Luo Ji

Abstract: Recent Large Multi-Modal Models (LMMs) have made significant advancements in multi-modal alignment by employing lightweight connection modules to facilitate the representation and fusion of knowledge from existing pre-trained uni-modal models. However, these methods still rely on modality-specific and direction-specific connectors, leading to compartmentalized knowledge representations and reduced computational efficiency, which limits the model's ability to form unified multi-modal representations. To address these issues, we introduce a novel training framework, Alt-MoE, which employs the Mixture of Experts (MoE) as a unified multi-directional connector across modalities, and employs a multi-step sequential alternating unidirectional alignment strategy, which converges to bidirectional alignment over iterations. The extensive empirical studies revealed the following key points: 1) Alt-MoE achieves competitive results by integrating diverse knowledge representations from uni-modal models. This approach seamlessly fuses the specialized expertise of existing high-performance uni-modal models, effectively synthesizing their domain-specific knowledge into a cohesive multi-modal representation. 2) Alt-MoE efficiently scales to new tasks and modalities without altering its model architecture or training strategy. Furthermore, Alt-MoE operates in latent space, supporting vector pre-storage and real-time retrieval via lightweight multi-directional MoE, thereby facilitating massive data processing. Our methodology has been validated on several well-performing uni-modal models (LLAMA3, Qwen2, and DINOv2), achieving competitive results on a wide range of downstream tasks and datasets.

new Self-Supervised State Space Model for Real-Time Traffic Accident Prediction Using eKAN Networks

Authors: Xin Tan, Meng Zhao

Abstract: Accurate prediction of traffic accidents across different times and regions is vital for public safety. However, existing methods face two key challenges: 1) Generalization: Current models rely heavily on manually constructed multi-view structures, like POI distributions and road network densities, which are labor-intensive and difficult to scale across cities. 2) Real-Time Performance: While some methods improve accuracy with complex architectures, they often incur high computational costs, limiting their real-time applicability. To address these challenges, we propose SSL-eKamba, an efficient self-supervised framework for traffic accident prediction. To enhance generalization, we design two self-supervised auxiliary tasks that adaptively improve traffic pattern representation through spatiotemporal discrepancy awareness. For real-time performance, we introduce eKamba, an efficient model that redesigns the Kolmogorov-Arnold Network (KAN) architecture. This involves using learnable univariate functions for input activation and applying a selective mechanism (Selective SSM) to capture multi-variate correlations, thereby improving computational efficiency. Extensive experiments on two real-world datasets demonstrate that SSL-eKamba consistently outperforms state-of-the-art baselines. This framework may also offer new insights for other spatiotemporal tasks. Our source code is publicly available at http://github.com/KevinT618/SSL-eKamba.

URLs: http://github.com/KevinT618/SSL-eKamba.

new Predicting Electricity Consumption with Random Walks on Gaussian Processes

Authors: Chlo\'e Hashimoto-Cullen, Benjamin Guedj

Abstract: We consider time-series forecasting problems where data is scarce, difficult to gather, or induces a prohibitive computational cost. As a first attempt, we focus on short-term electricity consumption in France, which is of strategic importance for energy suppliers and public stakeholders. The complexity of this problem and the many levels of geospatial granularity motivate the use of an ensemble of Gaussian Processes (GPs). Whilst GPs are remarkable predictors, they are computationally expensive to train, which calls for a frugal few-shot learning approach. By taking into account performance on GPs trained on a dataset and designing a random walk on these, we mitigate the training cost of our entire Bayesian decision-making procedure. We introduce our algorithm called \textsc{Domino} (ranDOM walk on gaussIaN prOcesses) and present numerical experiments to support its merits.

new CoDiCast: Conditional Diffusion Model for Weather Prediction with Uncertainty Quantification

Authors: Jimeng Shi, Bowen Jin, Jiawei Han, Giri Narasimhan

Abstract: Accurate weather forecasting is critical for science and society. Yet, existing methods have not managed to simultaneously have the properties of high accuracy, low uncertainty, and high computational efficiency. On one hand, to quantify the uncertainty in weather predictions, the strategy of ensemble forecast (i.e., generating a set of diverse predictions) is often employed. However, traditional ensemble numerical weather prediction (NWP) is computationally intensive. On the other hand, most existing machine learning-based weather prediction (MLWP) approaches are efficient and accurate. Nevertheless, they are deterministic and cannot capture the uncertainty of weather forecasting. In this work, we propose CoDiCast, a conditional diffusion model to generate accurate global weather prediction, while achieving uncertainty quantification with ensemble forecasts and modest computational cost. The key idea is to simulate a conditional version of the reverse denoising process in diffusion models, which starts from pure Gaussian noise to generate realistic weather scenarios for a future time point. Each denoising step is conditioned on observations from the recent past. Ensemble forecasts are achieved by repeatedly sampling from stochastic Gaussian noise to represent uncertainty quantification. CoDiCast is trained on a decade of ERA5 reanalysis data from the European Centre for Medium-Range Weather Forecasts (ECMWF). Experimental results demonstrate that our approach outperforms several existing data-driven methods in accuracy. Our conditional diffusion model, CoDiCast, can generate 3-day global weather forecasts, at 6-hour steps and $5.625^\circ$ latitude-longitude resolution, for over 5 variables, in about 12 minutes on a commodity A100 GPU machine with 80GB memory. The open-souced code is provided at \url{https://github.com/JimengShi/CoDiCast}.

URLs: https://github.com/JimengShi/CoDiCast

new FLoRA: Federated Fine-Tuning Large Language Models with Heterogeneous Low-Rank Adaptations

Authors: Ziyao Wang, Zheyu Shen, Yexiao He, Guoheng Sun, Hongyi Wang, Lingjuan Lyu, Ang Li

Abstract: The rapid development of Large Language Models (LLMs) has been pivotal in advancing AI, with pre-trained LLMs being adaptable to diverse downstream tasks through fine-tuning. Federated learning (FL) further enhances fine-tuning in a privacy-aware manner by utilizing clients' local data through in-situ computation, eliminating the need for data movement. However, fine-tuning LLMs, given their massive scale of parameters, poses challenges for clients with constrained and heterogeneous resources in FL. Previous methods employed low-rank adaptation (LoRA) for efficient federated fine-tuning but utilized traditional FL aggregation strategies on LoRA adapters. These approaches led to mathematically inaccurate aggregation noise, reducing fine-tuning effectiveness and failing to address heterogeneous LoRAs. In this work, we first highlight the mathematical incorrectness of LoRA aggregation in existing federated fine-tuning methods. We introduce a new approach called FLORA that enables federated fine-tuning on heterogeneous LoRA adapters across clients through a novel stacking-based aggregation method. Our approach is noise-free and seamlessly supports heterogeneous LoRA adapters. Extensive experiments demonstrate FLORA' s superior performance in both homogeneous and heterogeneous settings, surpassing state-of-the-art methods. We envision this work as a milestone for efficient, privacy-preserving, and accurate federated fine-tuning of LLMs. Our code is available at https://github.com/ATP-1010/FederatedLLM.

URLs: https://github.com/ATP-1010/FederatedLLM.

new FairHome: A Fair Housing and Fair Lending Dataset

Authors: Anusha Bagalkotkar (Zillow Group), Aveek Karmakar (Zillow Group), Gabriel Arnson (Zillow Group), Ondrej Linda (Zillow Group)

Abstract: We present a Fair Housing and Fair Lending dataset (FairHome): A dataset with around 75,000 examples across 9 protected categories. To the best of our knowledge, FairHome is the first publicly available dataset labeled with binary labels for compliance risk in the housing domain. We demonstrate the usefulness and effectiveness of such a dataset by training a classifier and using it to detect potential violations when using a large language model (LLM) in the context of real-estate transactions. We benchmark the trained classifier against state-of-the-art LLMs including GPT-3.5, GPT-4, LLaMA-3, and Mistral Large in both zero-shot and few-shot contexts. Our classifier outperformed with an F1-score of 0.91, underscoring the effectiveness of our dataset.

new Adapting to Shifting Correlations with Unlabeled Data Calibration

Authors: Minh Nguyen, Alan Q. Wang, Heejong Kim, Mert R. Sabuncu

Abstract: Distribution shifts between sites can seriously degrade model performance since models are prone to exploiting unstable correlations. Thus, many methods try to find features that are stable across sites and discard unstable features. However, unstable features might have complementary information that, if used appropriately, could increase accuracy. More recent methods try to adapt to unstable features at the new sites to achieve higher accuracy. However, they make unrealistic assumptions or fail to scale to multiple confounding features. We propose Generalized Prevalence Adjustment (GPA for short), a flexible method that adjusts model predictions to the shifting correlations between prediction target and confounders to safely exploit unstable features. GPA can infer the interaction between target and confounders in new sites using unlabeled samples from those sites. We evaluate GPA on several real and synthetic datasets, and show that it outperforms competitive baselines.

new Statistical Mechanics of Min-Max Problems

Authors: Yuma Ichikawa, Koji Hukushima

Abstract: Min-max optimization problems, also known as saddle point problems, have attracted significant attention due to their applications in various fields, such as fair beamforming, generative adversarial networks (GANs), and adversarial learning. However, understanding the properties of these min-max problems has remained a substantial challenge. This study introduces a statistical mechanical formalism for analyzing the equilibrium values of min-max problems in the high-dimensional limit, while appropriately addressing the order of operations for min and max. As a first step, we apply this formalism to bilinear min-max games and simple GANs, deriving the relationship between the amount of training data and generalization error and indicating the optimal ratio of fake to real data for effective learning. This formalism provides a groundwork for a deeper theoretical analysis of the equilibrium properties in various machine learning methods based on min-max problems and encourages the development of new algorithms and architectures.

new Privacy-Preserving Data Linkage Across Private and Public Datasets for Collaborative Agriculture Research

Authors: Osama Zafar, Rosemarie Santa Gonzalez, Gabriel Wilkins, Alfonso Morales, Erman Ayday

Abstract: Digital agriculture leverages technology to enhance crop yield, disease resilience, and soil health, playing a critical role in agricultural research. However, it raises privacy concerns such as adverse pricing, price discrimination, higher insurance costs, and manipulation of resources, deterring farm operators from sharing data due to potential misuse. This study introduces a privacy-preserving framework that addresses these risks while allowing secure data sharing for digital agriculture. Our framework enables comprehensive data analysis while protecting privacy. It allows stakeholders to harness research-driven policies that link public and private datasets. The proposed algorithm achieves this by: (1) identifying similar farmers based on private datasets, (2) providing aggregate information like time and location, (3) determining trends in price and product availability, and (4) correlating trends with public policy data, such as food insecurity statistics. We validate the framework with real-world Farmer's Market datasets, demonstrating its efficacy through machine learning models trained on linked privacy-preserved data. The results support policymakers and researchers in addressing food insecurity and pricing issues. This work significantly contributes to digital agriculture by providing a secure method for integrating and analyzing data, driving advancements in agricultural technology and development.

new MTLSO: A Multi-Task Learning Approach for Logic Synthesis Optimization

Authors: Faezeh Faez, Raika Karimi, Yingxue Zhang, Xing Li, Lei Chen, Mingxuan Yuan, Mahdi Biparva

Abstract: Electronic Design Automation (EDA) is essential for IC design and has recently benefited from AI-based techniques to improve efficiency. Logic synthesis, a key EDA stage, transforms high-level hardware descriptions into optimized netlists. Recent research has employed machine learning to predict Quality of Results (QoR) for pairs of And-Inverter Graphs (AIGs) and synthesis recipes. However, the severe scarcity of data due to a very limited number of available AIGs results in overfitting, significantly hindering performance. Additionally, the complexity and large number of nodes in AIGs make plain GNNs less effective for learning expressive graph-level representations. To tackle these challenges, we propose MTLSO - a Multi-Task Learning approach for Logic Synthesis Optimization. On one hand, it maximizes the use of limited data by training the model across different tasks. This includes introducing an auxiliary task of binary multi-label graph classification alongside the primary regression task, allowing the model to benefit from diverse supervision sources. On the other hand, we employ a hierarchical graph representation learning strategy to improve the model's capacity for learning expressive graph-level representations of large AIGs, surpassing traditional plain GNNs. Extensive experiments across multiple datasets and against state-of-the-art baselines demonstrate the superiority of our method, achieving an average performance gain of 8.22\% for delay and 5.95\% for area.

new Symmetry constrained neural networks for detection and localization of damage in metal plates

Authors: James Amarel, Christopher Rudolf, Athanasios Iliopoulos, John Michopoulos, Leslie N. Smith

Abstract: The present paper is concerned with deep learning techniques applied to detection and localization of damage in a thin aluminum plate. We used data generated on a tabletop apparatus by mounting to the plate four piezoelectric transducers, each of which took turn to generate a Lamb wave that then traversed the region of interest before being received by the remaining three sensors. On training a neural network to analyze time-series data of the material response, which displayed damage-reflective features whenever the plate guided waves interacted with a contact load, we achieved a model that detected with greater than 99% accuracy in addition to a model that localized with $3.14 \pm 0.21$ mm mean distance error and captured more than 60% of test examples within the diffraction limit. For each task, the best-performing model was designed according to the inductive bias that our transducers were both similar and arranged in a square pattern on a nearly uniform plate.

new Differentiable programming across the PDE and Machine Learning barrier

Authors: Nacime Bouziani, David A. Ham, Ado Farsi

Abstract: The combination of machine learning and physical laws has shown immense potential for solving scientific problems driven by partial differential equations (PDEs) with the promise of fast inference, zero-shot generalisation, and the ability to discover new physics. Examples include the use of fundamental physical laws as inductive bias to machine learning algorithms, also referred to as physics-driven machine learning, and the application of machine learning to represent features not represented in the differential equations such as closures for unresolved spatiotemporal scales. However, the simulation of complex physical systems by coupling advanced numerics for PDEs with state-of-the-art machine learning demands the composition of specialist PDE solving frameworks with industry-standard machine learning tools. Hand-rolling either the PDE solver or the neural net will not cut it. In this work, we introduce a generic differentiable programming abstraction that provides scientists and engineers with a highly productive way of specifying end-to-end differentiable models coupling machine learning and PDE-based components, while relying on code generation for high performance. Our interface automates the coupling of arbitrary PDE-based systems and machine learning models and unlocks new applications that could not hitherto be tackled, while only requiring trivial changes to existing code. Our framework has been adopted in the Firedrake finite-element library and supports the PyTorch and JAX ecosystems, as well as downstream libraries.

new Scalable Multitask Learning Using Gradient-based Estimation of Task Affinity

Authors: Dongyue Li, Aneesh Sharma, Hongyang R. Zhang

Abstract: Multitask learning is a widely used paradigm for training models on diverse tasks, with applications ranging from graph neural networks to language model fine-tuning. Since tasks may interfere with each other, a key notion for modeling their relationships is task affinity. This includes pairwise task affinity, computed among pairs of tasks, and higher-order affinity, computed among subsets of tasks. Naively computing either of them requires repeatedly training on data from various task combinations, which is computationally intensive. We present a new algorithm Grad-TAG that can estimate task affinities without this repeated training. The key idea of Grad-TAG is to train a "base" model for all tasks and then use a linearization technique to estimate the loss of the model for a specific task combination. The linearization works by computing a gradient-based approximation of the loss, using low-dimensional projections of gradients as features in a logistic regression to predict labels for the task combination. We show that the linearized model can provably approximate the loss when the gradient-based approximation is accurate, and also empirically verify that on several large models. Then, given the estimated task affinity, we design a semi-definite program for clustering similar tasks by maximizing the average density of clusters. We evaluate Grad-TAG's performance across seven datasets, including multi-label classification on graphs, and instruction fine-tuning of language models. Our task affinity estimates are within 2.7% distance to the true affinities while needing only 3% of FLOPs in full training. On our largest graph with 21M edges and 500 labeling tasks, our algorithm delivers estimates within 5% distance to the true affinities, using only 112 GPU hours. Our results show that Grad-TAG achieves excellent performance and runtime tradeoffs compared to existing approaches.

new Contrastive Federated Learning with Tabular Data Silos

Authors: Achmad Ginanjar, Xue Li, Wen Hua

Abstract: Learning from data silos is a difficult task for organizations that need to obtain knowledge of objects that appeared in multiple independent data silos. Objects in multi-organizations, such as government agents, are referred by different identifiers, such as driver license, passport number, and tax file number. The data distributions in data silos are mostly non-IID (Independently and Identically Distributed), labelless, and vertically partitioned (i.e., having different attributes). Privacy concerns harden the above issues. Conditions inhibit enthusiasm for collaborative work. While Federated Learning (FL) has been proposed to address these issues, the difficulty of labeling, namely, label costliness, often hinders optimal model performance. A potential solution lies in contrastive learning, an unsupervised self-learning technique to represent semantic data by contrasting similar data pairs. However, contrastive learning is currently not designed to handle tabular data silos that existed within multiple organizations where data linkage by quasi identifiers are needed. To address these challenges, we propose using semi-supervised contrastive federated learning, which we refer to as Contrastive Federated Learning with Data Silos (CFL). Our approach tackles the aforementioned issues with an integrated solution. Our experimental results demonstrate that CFL outperforms current methods in addressing these challenges and providing improvements in accuracy. Additionally, we present positive results that showcase the advantages of our contrastive federated learning approach in complex client environments.

new Configuration Interaction Guided Sampling with Interpretable Restricted Boltzmann Machine

Authors: Jorge I. Hernandez-Martinez, Gerardo Rodriguez-Hernandez, Andres Mendez-Vazquez

Abstract: We propose a data-driven approach using a Restricted Boltzmann Machine (RBM) to solve the Schr\"odinger equation in configuration space. Traditional Configuration Interaction (CI) methods, while powerful, are computationally expensive due to the large number of determinants required. Our approach leverages RBMs to efficiently identify and sample the most significant determinants, accelerating convergence and reducing computational cost. This method achieves up to 99.99\% of the correlation energy even by four orders of magnitude less determinants compared to full CI calculations and up to two orders of magnitude less than previous state of the art works. Additionally, our study demonstrate that the RBM can learn the underlying quantum properties, providing more detail insights than other methods . This innovative data-driven approach offers a promising tool for quantum chemistry, enhancing both efficiency and understanding of complex systems.

new MCDGLN: Masked Connection-based Dynamic Graph Learning Network for Autism Spectrum Disorder

Authors: Peng Wang, Xin Wen, Ruochen Cao, Chengxin Gao, Yanrong Hao, Rui Cao

Abstract: Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder characterized by complex physiological processes. Previous research has predominantly focused on static cerebral interactions, often neglecting the brain's dynamic nature and the challenges posed by network noise. To address these gaps, we introduce the Masked Connection-based Dynamic Graph Learning Network (MCDGLN). Our approach first segments BOLD signals using sliding temporal windows to capture dynamic brain characteristics. We then employ a specialized weighted edge aggregation (WEA) module, which uses the cross convolution with channel-wise element-wise convolutional kernel, to integrate dynamic functional connectivity and to isolating task-relevant connections. This is followed by topological feature extraction via a hierarchical graph convolutional network (HGCN), with key attributes highlighted by a self-attention module. Crucially, we refine static functional connections using a customized task-specific mask, reducing noise and pruning irrelevant links. The attention-based connection encoder (ACE) then enhances critical connections and compresses static features. The combined features are subsequently used for classification. Applied to the Autism Brain Imaging Data Exchange I (ABIDE I) dataset, our framework achieves a 73.3\% classification accuracy between ASD and Typical Control (TC) groups among 1,035 subjects. The pivotal roles of WEA and ACE in refining connectivity and enhancing classification accuracy underscore their importance in capturing ASD-specific features, offering new insights into the disorder.

new VE: Modeling Multivariate Time Series Correlation with Variate Embedding

Authors: Shangjiong Wang, Zhihong Man, Zhengwei Cao, Jinchuan Zheng, Zhikang Ge

Abstract: Multivariate time series forecasting relies on accurately capturing the correlations among variates. Current channel-independent (CI) models and models with a CI final projection layer are unable to capture these dependencies. In this paper, we present the variate embedding (VE) pipeline, which learns a unique and consistent embedding for each variate and combines it with Mixture of Experts (MoE) and Low-Rank Adaptation (LoRA) techniques to enhance forecasting performance while controlling parameter size. The VE pipeline can be integrated into any model with a CI final projection layer to improve multivariate forecasting. The learned VE effectively groups variates with similar temporal patterns and separates those with low correlations. The effectiveness of the VE pipeline is demonstrated through extensive experiments on four widely-used datasets. The code is available at: \url{https://github.com/swang-song/VE}.

URLs: https://github.com/swang-song/VE

new Adaptive Transformer Modelling of Density Function for Nonparametric Survival Analysis

Authors: Xin Zhang, Deval Mehta, Yanan Hu, Chao Zhu, David Darby, Zhen Yu, Daniel Merlo, Melissa Gresle, Anneke Van Der Walt, Helmut Butzkueven, Zongyuan Ge

Abstract: Survival analysis holds a crucial role across diverse disciplines, such as economics, engineering and healthcare. It empowers researchers to analyze both time-invariant and time-varying data, encompassing phenomena like customer churn, material degradation and various medical outcomes. Given the complexity and heterogeneity of such data, recent endeavors have demonstrated successful integration of deep learning methodologies to address limitations in conventional statistical approaches. However, current methods typically involve cluttered probability distribution function (PDF), have lower sensitivity in censoring prediction, only model static datasets, or only rely on recurrent neural networks for dynamic modelling. In this paper, we propose a novel survival regression method capable of producing high-quality unimodal PDFs without any prior distribution assumption, by optimizing novel Margin-Mean-Variance loss and leveraging the flexibility of Transformer to handle both temporal and non-temporal data, coined UniSurv. Extensive experiments on several datasets demonstrate that UniSurv places a significantly higher emphasis on censoring compared to other methods.

new STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning

Authors: Jaeseong Lee, seung-won hwang, Aurick Qiao, Daniel F Campos, Zhewei Yao, Yuxiong He

Abstract: Mixture-of-experts (MoEs) have been adopted for reducing inference costs by sparsely activating experts in Large language models (LLMs). Despite this reduction, the massive number of experts in MoEs still makes them expensive to serve. In this paper, we study how to address this, by pruning MoEs. Among pruning methodologies, unstructured pruning has been known to achieve the highest performance for a given pruning ratio, compared to structured pruning, since the latter imposes constraints on the sparsification structure. This is intuitive, as the solution space of unstructured pruning subsumes that of structured pruning. However, our counterintuitive finding reveals that expert pruning, a form of structured pruning, can actually precede unstructured pruning to outperform unstructured-only pruning. As existing expert pruning, requiring $O(\frac{k^n}{\sqrt{n}})$ forward passes for $n$ experts, cannot scale for recent MoEs, we propose a scalable alternative with $O(1)$ complexity, yet outperforming the more expensive methods. The key idea is leveraging a latent structure between experts, based on behavior similarity, such that the greedy decision of whether to prune closely captures the joint pruning effect. Ours is highly effective -- for Snowflake Arctic, a 480B-sized MoE with 128 experts, our method needs only one H100 and two hours to achieve nearly no loss in performance with 40% sparsity, even in generative tasks such as GSM8K, where state-of-the-art unstructured pruning fails to. The code will be made publicly available.

new Denoising: A Powerful Building-Block for Imaging, Inverse Problems, and Machine Learning

Authors: Peyman Milanfar, Mauricio Delbracio

Abstract: Denoising, the process of reducing random fluctuations in a signal to emphasize essential patterns, has been a fundamental problem of interest since the dawn of modern scientific inquiry. Recent denoising techniques, particularly in imaging, have achieved remarkable success, nearing theoretical limits by some measures. Yet, despite tens of thousands of research papers, the wide-ranging applications of denoising beyond noise removal have not been fully recognized. This is partly due to the vast and diverse literature, making a clear overview challenging. This paper aims to address this gap. We present a comprehensive perspective on denoisers, their structure, and desired properties. We emphasize the increasing importance of denoising and showcase its evolution into an essential building block for complex tasks in imaging, inverse problems, and machine learning. Despite its long history, the community continues to uncover unexpected and groundbreaking uses for denoising, further solidifying its place as a cornerstone of scientific and engineering practice.

new DiPT: Enhancing LLM reasoning through diversified perspective-taking

Authors: Hoang Anh Just, Mahavir Dabas, Lifu Huang, Ming Jin, Ruoxi Jia

Abstract: Existing work on improving language model reasoning typically explores a single solution path, which can be prone to errors. Inspired by perspective-taking in social studies, this paper introduces DiPT, a novel approach that complements current reasoning methods by explicitly incorporating diversified viewpoints. This approach allows the model to gain a deeper understanding of the problem's context and identify the most effective solution path during the inference stage. Additionally, it provides a general data-centric AI recipe for augmenting existing data to improve their quality for fine-tuning. Our empirical results demonstrate that DiPT can be flexibly integrated into existing methods that focus on a single reasoning approach, enhancing their reasoning performance and stability when presented with paraphrased problems. Furthermore, we illustrate improved context understanding by maintaining the model's safe outputs against "jailbreaking" prompts intentionally designed to bypass safeguards built into deployed models. Lastly, we show that fine-tuning with data enriched with diverse perspectives can boost the reasoning capabilities of the model compared to fine-tuning with raw data alone.

new Towards Robust Uncertainty-Aware Incomplete Multi-View Classification

Authors: Mulin Chen, Haojian Huang, Qiang Li

Abstract: Handling incomplete data in multi-view classification is challenging, especially when traditional imputation methods introduce biases that compromise uncertainty estimation. Existing Evidential Deep Learning (EDL) based approaches attempt to address these issues, but they often struggle with conflicting evidence due to the limitations of the Dempster-Shafer combination rule, leading to unreliable decisions. To address these challenges, we propose the Alternating Progressive Learning Network (APLN), specifically designed to enhance EDL-based methods in incomplete MVC scenarios. Our approach mitigates bias from corrupted observed data by first applying coarse imputation, followed by mapping the data to a latent space. In this latent space, we progressively learn an evidence distribution aligned with the target domain, incorporating uncertainty considerations through EDL. Additionally, we introduce a conflict-aware Dempster-Shafer combination rule (DSCR) to better handle conflicting evidence. By sampling from the learned distribution, we optimize the latent representations of missing views, reducing bias and enhancing decision-making robustness. Extensive experiments demonstrate that APLN, combined with DSCR, significantly outperforms traditional methods, particularly in environments characterized by high uncertainty and conflicting evidence, establishing it as a promising solution for incomplete multi-view classification.

new Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models

Authors: Yao Shu, Wenyang Hu, See-Kiong Ng, Bryan Kian Hsiang Low, Fei Richard Yu

Abstract: Large Language Models (LLMs) have become indispensable in numerous real-world applications. Unfortunately, fine-tuning these models at scale, especially in federated settings where data privacy and communication efficiency are critical, presents significant challenges. Existing methods often resort to parameter-efficient fine-tuning (PEFT) to mitigate communication overhead, but this typically comes at the cost of model accuracy. To address these limitations, we propose federated full-parameter tuning at scale for LLMs (Ferret), the first first-order method with shared randomness to enable scalable full-parameter tuning of LLMs across decentralized data sources while maintaining competitive model accuracy. Ferret accomplishes this through three aspects: (1) it employs widely applied first-order methods for efficient local updates; (2) it projects these updates into a low-dimensional space to considerably reduce communication overhead; and (3) it reconstructs local updates from this low-dimensional space with shared randomness to facilitate effective full-parameter global aggregation, ensuring fast convergence and competitive final performance. Our rigorous theoretical analyses and insights along with extensive experiments, show that Ferret significantly enhances the scalability of existing federated full-parameter tuning approaches by achieving high computational efficiency, reduced communication overhead, and fast convergence, all while maintaining competitive model accuracy. Our implementation is available at https://github.com/allen4747/Ferret.

URLs: https://github.com/allen4747/Ferret.

new Learning Augmentation Policies from A Model Zoo for Time Series Forecasting

Authors: Haochen Yuan, Xuelin Li, Yunbo Wang, Xiaokang Yang

Abstract: Time series forecasting models typically rely on a fixed-size training set and treat all data uniformly, which may not effectively capture the specific patterns present in more challenging training samples. To address this issue, we introduce AutoTSAug, a learnable data augmentation method based on reinforcement learning. Our approach begins with an empirical analysis to determine which parts of the training data should be augmented. Specifically, we identify the so-called marginal samples by considering the prediction diversity across a set of pretrained forecasting models. Next, we propose using variational masked autoencoders as the augmentation model and applying the REINFORCE algorithm to transform the marginal samples into new data. The goal of this generative model is not only to mimic the distribution of real data but also to reduce the variance of prediction errors across the model zoo. By augmenting the marginal samples with a learnable policy, AutoTSAug substantially improves forecasting performance, advancing the prior art in this field with minimal additional computational cost.

new PharmacoMatch: Efficient 3D Pharmacophore Screening through Neural Subgraph Matching

Authors: Daniel Rose, Oliver Wieder, Thomas Seidel, Thierry Langer

Abstract: The increasing size of screening libraries poses a significant challenge for the development of virtual screening methods for drug discovery, necessitating a re-evaluation of traditional approaches in the era of big data. Although 3D pharmacophore screening remains a prevalent technique, its application to very large datasets is limited by the computational cost associated with matching query pharmacophores to database ligands. In this study, we introduce PharmacoMatch, a novel contrastive learning approach based on neural subgraph matching. Our method reinterprets pharmacophore screening as an approximate subgraph matching problem and enables efficient querying of conformational databases by encoding query-target relationships in the embedding space. We conduct comprehensive evaluations of the learned representations and benchmark our method on virtual screening datasets in a zero-shot setting. Our findings demonstrate significantly shorter runtimes for pharmacophore matching, offering a promising speed-up for screening very large datasets.

new Rate-Constrained Quantization for Communication-Efficient Federated Learning

Authors: Shayan Mohajer Hamidi, Ali Bereyhi

Abstract: Quantization is a common approach to mitigate the communication cost of federated learning (FL). In practice, the quantized local parameters are further encoded via an entropy coding technique, such as Huffman coding, for efficient data compression. In this case, the exact communication overhead is determined by the bit rate of the encoded gradients. Recognizing this fact, this work deviates from the existing approaches in the literature and develops a novel quantized FL framework, called \textbf{r}ate-\textbf{c}onstrained \textbf{fed}erated learning (RC-FED), in which the gradients are quantized subject to both fidelity and data rate constraints. We formulate this scheme, as a joint optimization in which the quantization distortion is minimized while the rate of encoded gradients is kept below a target threshold. This enables for a tunable trade-off between quantization distortion and communication cost. We analyze the convergence behavior of RC-FED, and show its superior performance against baseline quantized FL schemes on several datasets.

new LAMP: Learnable Meta-Path Guided Adversarial Contrastive Learning for Heterogeneous Graphs

Authors: Siqing Li, Jin-Duk Park, Wei Huang, Xin Cao, Won-Yong Shin, Zhiqiang Xu

Abstract: Heterogeneous graph neural networks (HGNNs) have significantly propelled the information retrieval (IR) field. Still, the effectiveness of HGNNs heavily relies on high-quality labels, which are often expensive to acquire. This challenge has shifted attention towards Heterogeneous Graph Contrastive Learning (HGCL), which usually requires pre-defined meta-paths. However, our findings reveal that meta-path combinations significantly affect performance in unsupervised settings, an aspect often overlooked in current literature. Existing HGCL methods have considerable variability in outcomes across different meta-path combinations, thereby challenging the optimization process to achieve consistent and high performance. In response, we introduce \textsf{LAMP} (\underline{\textbf{L}}earn\underline{\textbf{A}}ble \underline{\textbf{M}}eta-\underline{\textbf{P}}ath), a novel adversarial contrastive learning approach that integrates various meta-path sub-graphs into a unified and stable structure, leveraging the overlap among these sub-graphs. To address the denseness of this integrated sub-graph, we propose an adversarial training strategy for edge pruning, maintaining sparsity to enhance model performance and robustness. \textsf{LAMP} aims to maximize the difference between meta-path and network schema views for guiding contrastive learning to capture the most meaningful information. Our extensive experimental study conducted on four diverse datasets from the Heterogeneous Graph Benchmark (HGB) demonstrates that \textsf{LAMP} significantly outperforms existing state-of-the-art unsupervised models in terms of accuracy and robustness.

new Improving Conditional Level Generation using Automated Validation in Match-3 Games

Authors: Monica Villanueva Aylagas, Joakim Bergdahl, Jonas Gillberg, Alessandro Sestini, Theodor Tolstoy, Linus Gissl\'en

Abstract: Generative models for level generation have shown great potential in game production. However, they often provide limited control over the generation, and the validity of the generated levels is unreliable. Despite this fact, only a few approaches that learn from existing data provide the users with ways of controlling the generation, simultaneously addressing the generation of unsolvable levels. %One of the main challenges it faces is that levels generated through automation may not be solvable thus requiring validation. are not always engaging, challenging, or even solvable. This paper proposes Avalon, a novel method to improve models that learn from existing level designs using difficulty statistics extracted from gameplay. In particular, we use a conditional variational autoencoder to generate layouts for match-3 levels, conditioning the model on pre-collected statistics such as game mechanics like difficulty and relevant visual features like size and symmetry. Our method is general enough that multiple approaches could potentially be used to generate these statistics. We quantitatively evaluate our approach by comparing it to an ablated model without difficulty conditioning. Additionally, we analyze both quantitatively and qualitatively whether the style of the dataset is preserved in the generated levels. Our approach generates more valid levels than the same method without difficulty conditioning.

new Double Successive Over-Relaxation Q-Learning with an Extension to Deep Reinforcement Learning

Authors: Shreyas S R

Abstract: Q-learning is a widely used algorithm in reinforcement learning (RL), but its convergence can be slow, especially when the discount factor is close to one. Successive Over-Relaxation (SOR) Q-learning, which introduces a relaxation factor to speed up convergence, addresses this issue but has two major limitations: In the tabular setting, the relaxation parameter depends on transition probability, making it not entirely model-free, and it suffers from overestimation bias. To overcome these limitations, we propose a sample-based, model-free double SOR Q-learning algorithm. Theoretically and empirically, this algorithm is shown to be less biased than SOR Q-learning. Further, in the tabular setting, the convergence analysis under boundedness assumptions on iterates is discussed. The proposed algorithm is extended to large-scale problems using deep RL. Finally, the tabular version of the proposed algorithm is compared using roulette and grid world environments, while the deep RL version is tested on a maximization bias example and OpenAI Gym environments.

new Connecting Concept Convexity and Human-Machine Alignment in Deep Neural Networks

Authors: Teresa Dorszewski, Lenka T\v{e}tkov\'a, Lorenz Linhardt, Lars Kai Hansen

Abstract: Understanding how neural networks align with human cognitive processes is a crucial step toward developing more interpretable and reliable AI systems. Motivated by theories of human cognition, this study examines the relationship between \emph{convexity} in neural network representations and \emph{human-machine alignment} based on behavioral data. We identify a correlation between these two dimensions in pretrained and fine-tuned vision transformer models. Our findings suggest that the convex regions formed in latent spaces of neural networks to some extent align with human-defined categories and reflect the similarity relations humans use in cognitive tasks. While optimizing for alignment generally enhances convexity, increasing convexity through fine-tuning yields inconsistent effects on alignment, which suggests a complex relationship between the two. This study presents a first step toward understanding the relationship between the convexity of latent representations and human-machine alignment.

new What happens to diffusion model likelihood when your model is conditional?

Authors: Mattias Cross, Anton Ragni

Abstract: Diffusion Models (DMs) iteratively denoise random samples to produce high-quality data. The iterative sampling process is derived from Stochastic Differential Equations (SDEs), allowing a speed-quality trade-off chosen at inference. Another advantage of sampling with differential equations is exact likelihood computation. These likelihoods have been used to rank unconditional DMs and for out-of-domain classification. Despite the many existing and possible uses of DM likelihoods, the distinct properties captured are unknown, especially in conditional contexts such as Text-To-Image (TTI) or Text-To-Speech synthesis (TTS). Surprisingly, we find that TTS DM likelihoods are agnostic to the text input. TTI likelihood is more expressive but cannot discern confounding prompts. Our results show that applying DMs to conditional tasks reveals inconsistencies and strengthens claims that the properties of DM likelihood are unknown. This impact sheds light on the previously unknown nature of DM likelihoods. Although conditional DMs maximise likelihood, the likelihood in question is not as sensitive to the conditioning input as one expects. This investigation provides a new point-of-view on diffusion likelihoods.

new Symmetry Breaking in Neural Network Optimization: Insights from Input Dimension Expansion

Authors: Jun-Jie Zhang, Nan Cheng, Fu-Peng Li, Xiu-Cheng Wang, Jian-Nan Chen, Long-Gang Pang, Deyu Meng

Abstract: Understanding the mechanisms behind neural network optimization is crucial for improving network design and performance. While various optimization techniques have been developed, a comprehensive understanding of the underlying principles that govern these techniques remains elusive. Specifically, the role of symmetry breaking, a fundamental concept in physics, has not been fully explored in neural network optimization. This gap in knowledge limits our ability to design networks that are both efficient and effective. Here, we propose the symmetry breaking hypothesis to elucidate the significance of symmetry breaking in enhancing neural network optimization. We demonstrate that a simple input expansion can significantly improve network performance across various tasks, and we show that this improvement can be attributed to the underlying symmetry breaking mechanism. We further develop a metric to quantify the degree of symmetry breaking in neural networks, providing a practical approach to evaluate and guide network design. Our findings confirm that symmetry breaking is a fundamental principle that underpins various optimization techniques, including dropout, batch normalization, and equivariance. By quantifying the degree of symmetry breaking, our work offers a practical technique for performance enhancement and a metric to guide network design without the need for complete datasets and extensive training processes.

new Length Desensitization in Directed Preference Optimization

Authors: Wei Liu, Yang Bai, Chengcheng Han, Rongxiang Weng, Jun Xu, Xuezhi Cao, Jingang Wang, Xunliang Cai

Abstract: Direct Preference Optimization (DPO) is widely utilized in the Reinforcement Learning from Human Feedback (RLHF) phase to align Large Language Models (LLMs) with human preferences, thereby enhancing both their harmlessness and efficacy. However, it has been observed that DPO tends to over-optimize for verbosity, which can detrimentally affect both performance and user experience. In this paper, we conduct an in-depth theoretical analysis of DPO's optimization objective and reveal a strong correlation between its implicit reward and data length. This correlation misguides the optimization direction, resulting in length sensitivity during the DPO training and leading to verbosity. To address this issue, we propose a length-desensitization improvement method for DPO, termed LD-DPO. The proposed method aims to desensitize DPO to data length by decoupling explicit length preference, which is relatively insignificant, from the other implicit preferences, thereby enabling more effective learning of the intrinsic preferences. We utilized two settings (Base and Instruct) of Llama2-13B, Llama3-8B, and Qwen2-7B for experimental validation on various benchmarks including MT-Bench and AlpacaEval 2. The experimental results indicate that LD-DPO consistently outperforms DPO and other baseline methods, achieving more concise responses with a 10-40\% reduction in length compared to DPO. We conducted in-depth experimental analyses to demonstrate that LD-DPO can indeed achieve length desensitization and align the model more closely with human-real preferences.

new A Short Information-Theoretic Analysis of Linear Auto-Regressive Learning

Authors: Ingvar Ziemann

Abstract: In this note, we give a short information-theoretic proof of the consistency of the Gaussian maximum likelihood estimator in linear auto-regressive models. Our proof yields nearly optimal non-asymptotic rates for parameter recovery and works without any invocation of stability in the case of finite hypothesis classes.

new Extending Explainable Ensemble Trees (E2Tree) to regression contexts

Authors: Massimo Aria, Agostino Gnasso, Carmela Iorio, Marjolein Fokkema

Abstract: Ensemble methods such as random forests have transformed the landscape of supervised learning, offering highly accurate prediction through the aggregation of multiple weak learners. However, despite their effectiveness, these methods often lack transparency, impeding users' comprehension of how RF models arrive at their predictions. Explainable ensemble trees (E2Tree) is a novel methodology for explaining random forests, that provides a graphical representation of the relationship between response variables and predictors. A striking characteristic of E2Tree is that it not only accounts for the effects of predictor variables on the response but also accounts for associations between the predictor variables through the computation and use of dissimilarity measures. The E2Tree methodology was initially proposed for use in classification tasks. In this paper, we extend the methodology to encompass regression contexts. To demonstrate the explanatory power of the proposed algorithm, we illustrate its use on real-world datasets.

new A Machine Learning Based Approach for Statistical Analysis of Detonation Cells from Soot Foils

Authors: Vansh Sharma, Michael Ullman, Venkat Raman

Abstract: This study presents a novel algorithm based on machine learning (ML) for the precise segmentation and measurement of detonation cells from soot foil images, addressing the limitations of manual and primitive edge detection methods prevalent in the field. Using advances in cellular biology segmentation models, the proposed algorithm is designed to accurately extract cellular patterns without a training procedure or dataset, which is a significant challenge in detonation research. The algorithm's performance was validated using a series of test cases that mimic experimental and numerical detonation studies. The results demonstrated consistent accuracy, with errors remaining within 10%, even in complex cases. The algorithm effectively captured key cell metrics such as cell area and span, revealing trends across different soot foil samples with uniform to highly irregular cellular structures. Although the model proved robust, challenges remain in segmenting and analyzing highly complex or irregular cellular patterns. This work highlights the broad applicability and potential of the algorithm to advance the understanding of detonation wave dynamics.

new Deep Learning for Koopman Operator Estimation in Idealized Atmospheric Dynamics

Authors: David Millard, Arielle Carr, St\'ephane Gaudreault

Abstract: Deep learning is revolutionizing weather forecasting, with new data-driven models achieving accuracy on par with operational physical models for medium-term predictions. However, these models often lack interpretability, making their underlying dynamics difficult to understand and explain. This paper proposes methodologies to estimate the Koopman operator, providing a linear representation of complex nonlinear dynamics to enhance the transparency of data-driven models. Despite its potential, applying the Koopman operator to large-scale problems, such as atmospheric modeling, remains challenging. This study aims to identify the limitations of existing methods, refine these models to overcome various bottlenecks, and introduce novel convolutional neural network architectures that capture simplified dynamics.

new MENSA: A Multi-Event Network for Survival Analysis under Informative Censoring

Authors: Christian Marius Lillelund, Ali Hossein Gharari Foomani, Weijie Sun, Shi-ang Qi, Russell Greiner

Abstract: Given an instance, a multi-event survival model predicts the time until that instance experiences each of several different events. These events are not mutually exclusive and there are often statistical dependencies between them. There are relatively few multi-event survival results, most focusing on producing a simple risk score, rather than the time-to-event itself. To overcome these issues, we introduce MENSA, a novel, deep learning approach for multi-event survival analysis that can jointly learn representations of the input covariates and the dependence structure between events. As a practical motivation for multi-event survival analysis, we consider the problem of predicting the time until a patient with amyotrophic lateral sclerosis (ALS) loses various physical functions, i.e., the ability to speak, swallow, write, or walk. When estimating when a patient is no longer able to swallow, our approach achieves an L1-Margin loss of 278.8 days, compared to 355.2 days when modeling each event separately. In addition, we also evaluate our approach in single-event and competing risk scenarios by modeling the censoring and event distributions as equal contributing factors in the optimization process, and show that our approach performs well across multiple benchmark datasets. The source code is available at: https://github.com/thecml/mensa

URLs: https://github.com/thecml/mensa

new Dynamic Decoupling of Placid Terminal Attractor-based Gradient Descent Algorithm

Authors: Jinwei Zhao (Faculty of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China), Marco Gori (Department of Information Engineering and Mathematics, University of Siena, Siena, Italy), Alessandro Betti (IMT Scuola Alti Studi, Lucca, Italy), Stefano Melacci (Department of Information Engineering and Mathematics, University of Siena, Siena, Italy), Hongtao Zhang (Faculty of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China), Jiedong Liu (Faculty of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China), Xinhong Hei (Faculty of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China)

Abstract: Gradient descent (GD) and stochastic gradient descent (SGD) have been widely used in a large number of application domains. Therefore, understanding the dynamics of GD and improving its convergence speed is still of great importance. This paper carefully analyzes the dynamics of GD based on the terminal attractor at different stages of its gradient flow. On the basis of the terminal sliding mode theory and the terminal attractor theory, four adaptive learning rates are designed. Their performances are investigated in light of a detailed theoretical investigation, and the running times of the learning procedures are evaluated and compared. The total times of their learning processes are also studied in detail. To evaluate their effectiveness, various simulation results are investigated on a function approximation problem and an image classification problem.

new Learn2Aggregate: Supervised Generation of Chv\'atal-Gomory Cuts Using Graph Neural Networks

Authors: Arnaud Deza, Elias B. Khalil, Zhenan Fan, Zirui Zhou, Yong Zhang

Abstract: We present $\textit{Learn2Aggregate}$, a machine learning (ML) framework for optimizing the generation of Chv\'atal-Gomory (CG) cuts in mixed integer linear programming (MILP). The framework trains a graph neural network to classify useful constraints for aggregation in CG cut generation. The ML-driven CG separator selectively focuses on a small set of impactful constraints, improving runtimes without compromising the strength of the generated cuts. Key to our approach is the formulation of a constraint classification task which favours sparse aggregation of constraints, consistent with empirical findings. This, in conjunction with a careful constraint labeling scheme and a hybrid of deep learning and feature engineering, results in enhanced CG cut generation across five diverse MILP benchmarks. On the largest test sets, our method closes roughly $\textit{twice}$ as much of the integrality gap as the standard CG method while running 40$% faster. This performance improvement is due to our method eliminating 75% of the constraints prior to aggregation.

new Developing the Temporal Graph Convolutional Neural Network Model to Predict Hip Replacement using Electronic Health Records

Authors: Zoe Hancox, Sarah R. Kingsbury, Andrew Clegg, Philip G. Conaghan, Samuel D. Relton

Abstract: Background: Hip replacement procedures improve patient lives by relieving pain and restoring mobility. Predicting hip replacement in advance could reduce pain by enabling timely interventions, prioritising individuals for surgery or rehabilitation, and utilising physiotherapy to potentially delay the need for joint replacement. This study predicts hip replacement a year in advance to enhance quality of life and health service efficiency. Methods: Adapting previous work using Temporal Graph Convolutional Neural Network (TG-CNN) models, we construct temporal graphs from primary care medical event codes, sourced from ResearchOne EHRs of 40-75-year-old patients, to predict hip replacement risk. We match hip replacement cases to controls by age, sex, and Index of Multiple Deprivation. The model, trained on 9,187 cases and 9,187 controls, predicts hip replacement one year in advance. We validate the model on two unseen datasets, recalibrating for class imbalance. Additionally, we conduct an ablation study and compare against four baseline models. Results: Our best model predicts hip replacement risk one year in advance with an AUROC of 0.724 (95% CI: 0.715-0.733) and an AUPRC of 0.185 (95% CI: 0.160-0.209), achieving a calibration slope of 1.107 (95% CI: 1.074-1.139) after recalibration. Conclusions: The TG-CNN model effectively predicts hip replacement risk by identifying patterns in patient trajectories, potentially improving understanding and management of hip-related conditions.

new Label-free Monitoring of Self-Supervised Learning Progress

Authors: Isaac Xu, Scott Lowe, Thomas Trappenberg

Abstract: Self-supervised learning (SSL) is an effective method for exploiting unlabelled data to learn a high-level embedding space that can be used for various downstream tasks. However, existing methods to monitor the quality of the encoder -- either during training for one model or to compare several trained models -- still rely on access to annotated data. When SSL methodologies are applied to new data domains, a sufficiently large labelled dataset may not always be available. In this study, we propose several evaluation metrics which can be applied on the embeddings of unlabelled data and investigate their viability by comparing them to linear probe accuracy (a common metric which utilizes an annotated dataset). In particular, we apply $k$-means clustering and measure the clustering quality with the silhouette score and clustering agreement. We also measure the entropy of the embedding distribution. We find that while the clusters did correspond better to the ground truth annotations as training of the network progressed, label-free clustering metrics correlated with the linear probe accuracy only when training with SSL methods SimCLR and MoCo-v2, but not with SimSiam. Additionally, although entropy did not always have strong correlations with LP accuracy, this appears to be due to instability arising from early training, with the metric stabilizing and becoming more reliable at later stages of learning. Furthermore, while entropy generally decreases as learning progresses, this trend reverses for SimSiam. More research is required to establish the cause for this unexpected behaviour. Lastly, we find that while clustering based approaches are likely only viable for same-architecture comparisons, entropy may be architecture-independent.

new DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models

Authors: Maryam Akhavan Aghdam, Hongpeng Jin, Yanzhao Wu

Abstract: Transformer-based Mixture-of-Experts (MoE) models have been driving several recent technological advancements in Natural Language Processing (NLP). These MoE models adopt a router mechanism to determine which experts to activate for routing input tokens. However, existing router mechanisms allocate a fixed number of experts to each token, which neglects the varying importance of different input tokens. In this study, we propose a novel dynamic router mechanism that Dynamically Allocates a variable number of experts for Mixture-of-Experts (DA-MoE) models based on an effective token importance measure. First, we show that the Transformer attention mechanism provides a natural and effective way of calculating token importance. Second, we propose a dynamic router mechanism that effectively decides the optimal number of experts (K) and allocates the top-K experts for each input token. Third, comprehensive experiments on several benchmark datasets demonstrate that our DA-MoE approach consistently outperforms the state-of-the-art Transformer based MoE model on the popular GLUE benchmark.

new Geometric-Averaged Preference Optimization for Soft Preference Labels

Authors: Hiroki Furuta, Kuang-Huei Lee, Shixiang Shane Gu, Yutaka Matsuo, Aleksandra Faust, Heiga Zen, Izzeddin Gur

Abstract: Many algorithms for aligning LLMs with human preferences assume that human preferences are binary and deterministic. However, it is reasonable to think that they can vary with different individuals, and thus should be distributional to reflect the fine-grained relationship between the responses. In this work, we introduce the distributional soft preference labels and improve Direct Preference Optimization (DPO) with a weighted geometric average of the LLM output likelihood in the loss function. In doing so, the scale of learning loss is adjusted based on the soft labels, and the loss with equally preferred responses would be close to zero. This simple modification can be easily applied to any DPO family and helps the models escape from the over-optimization and objective mismatch prior works suffer from. In our experiments, we simulate the soft preference labels with AI feedback from LLMs and demonstrate that geometric averaging consistently improves performance on standard benchmarks for alignment research. In particular, we observe more preferable responses than binary labels and significant improvements with data where modestly-confident labels are in the majority.

new HybridFC: A Hybrid Fact-Checking Approach for Knowledge Graphs

Authors: Umair Qudus, Michael Roeder, Muhammad Saleem, Axel-Cyrille Ngonga Ngomo

Abstract: We consider fact-checking approaches that aim to predict the veracity of assertions in knowledge graphs. Five main categories of fact-checking approaches for knowledge graphs have been proposed in the recent literature, of which each is subject to partially overlapping limitations. In particular, current text-based approaches are limited by manual feature engineering. Path-based and rule-based approaches are limited by their exclusive use of knowledge graphs as background knowledge, and embedding-based approaches suffer from low accuracy scores on current fact-checking tasks. We propose a hybrid approach -- dubbed HybridFC -- that exploits the diversity of existing categories of fact-checking approaches within an ensemble learning setting to achieve a significantly better prediction performance. In particular, our approach outperforms the state of the art by 0.14 to 0.27 in terms of Area Under the Receiver Operating Characteristic curve on the FactBench dataset. Our code is open-source and can be found at https://github.com/dice-group/HybridFC.

URLs: https://github.com/dice-group/HybridFC.

new DANCE: Deep Learning-Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images

Authors: Taslim Murad, Prakash Chourasia, Sarwan Ali, Murray Patterson

Abstract: Cancer is a complex disease characterized by uncontrolled cell growth. T cell receptors (TCRs), crucial proteins in the immune system, play a key role in recognizing antigens, including those associated with cancer. Recent advancements in sequencing technologies have facilitated comprehensive profiling of TCR repertoires, uncovering TCRs with potent anti-cancer activity and enabling TCR-based immunotherapies. However, analyzing these intricate biomolecules necessitates efficient representations that capture their structural and functional information. T-cell protein sequences pose unique challenges due to their relatively smaller lengths compared to other biomolecules. An image-based representation approach becomes a preferred choice for efficient embeddings, allowing for the preservation of essential details and enabling comprehensive analysis of T-cell protein sequences. In this paper, we propose to generate images from the protein sequences using the idea of Chaos Game Representation (CGR) using the Kaleidoscopic images approach. This Deep Learning Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images (called DANCE) provides a unique way to visualize protein sequences by recursively applying chaos game rules around a central seed point. we perform the classification of the T cell receptors (TCRs) protein sequences in terms of their respective target cancer cells, as TCRs are known for their immune response against cancer disease. The TCR sequences are converted into images using the DANCE method. We employ deep-learning vision models to perform the classification to obtain insights into the relationship between the visual patterns observed in the generated kaleidoscopic images and the underlying protein properties. By combining CGR-based image generation with deep learning classification, this study opens novel possibilities in the protein analysis domain.

cross Surface Flux Transport Modelling using Physics Informed Neural Networks

Authors: Jithu J Athalathil, Bhargav Vaidya, Sayan Kundu, Vishal Upendran, Mark C. M. Cheung

Abstract: Studying the magnetic field properties on the solar surface is crucial for understanding the solar and heliospheric activities, which in turn shape space weather in the solar system. Surface Flux Transport (SFT) modelling helps us to simulate and analyse the transport and evolution of magnetic flux on the solar surface, providing valuable insights into the mechanisms responsible for solar activity. In this work, we demonstrate the use of machine learning techniques in solving magnetic flux transport, making it accurate. We have developed a novel Physics-Informed Neural Networks (PINNs)-based model to study the evolution of Bipolar Magnetic Regions (BMRs) using SFT in one-dimensional azimuthally averaged and also in two-dimensions. We demonstrate the efficiency and computational feasibility of our PINNs-based model by comparing its performance and accuracy with that of a numerical model implemented using the Runge-Kutta Implicit-Explicit (RK-IMEX) scheme. The mesh-independent PINNs method can be used to reproduce the observed polar magnetic field with better flux conservation. This advancement is important for accurately reproducing observed polar magnetic fields, thereby providing insights into the strength of future solar cycles. This work paves the way for more efficient and accurate simulations of solar magnetic flux transport and showcases the applicability of PINNs in solving advection-diffusion equations with a particular focus on heliophysics.

cross Multi-feature Compensatory Motion Analysis for Reaching Motions Over a Discretely Sampled Workspace

Authors: Qihan Yang, Yuri Gloumakov, Adam J. Spiers

Abstract: The absence of functional arm joints, such as the wrist, in upper extremity prostheses leads to compensatory motions in the users' daily activities. Compensatory motions have been previously studied for varying task protocols and evaluation metrics. However, the movement targets' spatial locations in previous protocols were not standardised and incomparable between studies, and the evaluation metrics were rudimentary. This work analysed compensatory motions in the final pose of subjects reaching across a discretely sampled 7*7 2D grid of targets under unbraced (normative) and braced (compensatory) conditions. For the braced condition, a bracing system was applied to simulate a transradial prosthetic limb by restricting participants' wrist joints. A total of 1372 reaching poses were analysed, and a Compensation Index was proposed to indicate the severity level of compensation. This index combined joint spatial location analysis, joint angle analysis, separability analysis, and machine learning (clustering) analysis. The individual analysis results and the final Compensation Index were presented in heatmap format to correspond to the spatial layout of the workspace, revealing the spatial dependency of compensatory motions. The results indicate that compensatory motions occur mainly in a right trapezoid region in the upper left area and a vertical trapezoid region in the middle left area for right-handed subjects reaching horizontally and vertically. Such results might guide motion selection in clinical rehabilitation, occupational therapy, and prosthetic evaluation to help avoid residual limb pain and overuse syndromes.

cross CSRec: Rethinking Sequential Recommendation from A Causal Perspective

Authors: Xiaoyu Liu, Jiaxin Yuan, Yuhang Zhou, Jingling Li, Furong Huang, Wei Ai

Abstract: The essence of sequential recommender systems (RecSys) lies in understanding how users make decisions. Most existing approaches frame the task as sequential prediction based on users' historical purchase records. While effective in capturing users' natural preferences, this formulation falls short in accurately modeling actual recommendation scenarios, particularly in accounting for how unsuccessful recommendations influence future purchases. Furthermore, the impact of the RecSys itself on users' decisions has not been appropriately isolated and quantitatively analyzed. To address these challenges, we propose a novel formulation of sequential recommendation, termed Causal Sequential Recommendation (CSRec). Instead of predicting the next item in the sequence, CSRec aims to predict the probability of a recommended item's acceptance within a sequential context and backtrack how current decisions are made. Critically, CSRec facilitates the isolation of various factors that affect users' final decisions, especially the influence of the recommender system itself, thereby opening new avenues for the design of recommender systems. CSRec can be seamlessly integrated into existing methodologies. Experimental evaluations on both synthetic and real-world datasets demonstrate that the proposed implementation significantly improves upon state-of-the-art baselines.

cross Syntax-Guided Procedural Synthesis of Molecules

Authors: Michael Sun, Alston Lo, Wenhao Gao, Minghao Guo, Veronika Thost, Jie Chen, Connor Coley, Wojciech Matusik

Abstract: Designing synthetically accessible molecules and recommending analogs to unsynthesizable molecules are important problems for accelerating molecular discovery. We reconceptualize both problems using ideas from program synthesis. Drawing inspiration from syntax-guided synthesis approaches, we decouple the syntactic skeleton from the semantics of a synthetic tree to create a bilevel framework for reasoning about the combinatorial space of synthesis pathways. Given a molecule we aim to generate analogs for, we iteratively refine its skeletal characteristics via Markov Chain Monte Carlo simulations over the space of syntactic skeletons. Given a black-box oracle to optimize, we formulate a joint design space over syntactic templates and molecular descriptors and introduce evolutionary algorithms that optimize both syntactic and semantic dimensions synergistically. Our key insight is that once the syntactic skeleton is set, we can amortize over the search complexity of deriving the program's semantics by training policies to fully utilize the fixed horizon Markov Decision Process imposed by the syntactic template. We demonstrate performance advantages of our bilevel framework for synthesizable analog generation and synthesizable molecule design. Notably, our approach offers the user explicit control over the resources required to perform synthesis and biases the design space towards simpler solutions, making it particularly promising for autonomous synthesis platforms.

cross CF-KAN: Kolmogorov-Arnold Network-based Collaborative Filtering to Mitigate Catastrophic Forgetting in Recommender Systems

Authors: Jin-Duk Park, Kyung-Min Kim, Won-Yong Shin

Abstract: Collaborative filtering (CF) remains essential in recommender systems, leveraging user--item interactions to provide personalized recommendations. Meanwhile, a number of CF techniques have evolved into sophisticated model architectures based on multi-layer perceptrons (MLPs). However, MLPs often suffer from catastrophic forgetting, and thus lose previously acquired knowledge when new information is learned, particularly in dynamic environments requiring continual learning. To tackle this problem, we propose CF-KAN, a new CF method utilizing Kolmogorov-Arnold networks (KANs). By learning nonlinear functions on the edge level, KANs are more robust to the catastrophic forgetting problem than MLPs. Built upon a KAN-based autoencoder, CF-KAN is designed in the sense of effectively capturing the intricacies of sparse user--item interactions and retaining information from previous data instances. Despite its simplicity, our extensive experiments demonstrate 1) CF-KAN's superiority over state-of-the-art methods in recommendation accuracy, 2) CF-KAN's resilience to catastrophic forgetting, underscoring its effectiveness in both static and dynamic recommendation scenarios, and 3) CF-KAN's edge-level interpretation facilitating the explainability of recommendations.

cross Integrating the Expected Future: Schedule Based Energy Forecasting

Authors: Raffael Theiler, Olga Fink

Abstract: Power grid operators depend on accurate and reliable energy forecasts, aiming to minimize cases of extreme errors, as these outliers are particularly challenging to manage during operation. Incorporating planning information -- such as known data about users' future behavior or scheduled events -- has the potential to significantly enhance the accuracy and specificity of forecasts. Although there have been attempts to integrate such expected future behavior, these efforts consistently rely on conventional regression models to process this information. These models often lack the flexibility and capability to effectively incorporate both dynamic, forward-looking contextual inputs and historical data. To address this challenge, we conceptualize this combined forecasting and regression challenge as a sequence-to-sequence modeling problem and demonstrate, with three distinct models, that our contextually enhanced transformer models excel in this task. By leveraging schedule-based contextual information from the Swiss railway traction network, our proposed method significantly improved the average forecasting accuracy of nationwide railway energy consumption. Specifically, enhancing the transformer models with contextual information resulted in an average reduction of mean absolute error by 40.6\% , whereas other state-of-the-art methods did not demonstrate any significant improvement.

cross In-ear ECG Signal Enhancement with Denoising Convolutional Autoencoders

Authors: Edoardo Occhipinti, Marek Zylinski, Harry J. Davies, Amir Nassibi, Matteo Bermond, Patrik Bachtiger, Nicholas S. Peters, Danilo P. Mandic

Abstract: The cardiac dipole has been shown to propagate to the ears, now a common site for consumer wearable electronics, enabling the recording of electrocardiogram (ECG) signals. However, in-ear ECG recordings often suffer from significant noise due to their small amplitude and the presence of other physiological signals, such as electroencephalogram (EEG), which complicates the extraction of cardiovascular features. This study addresses this issue by developing a denoising convolutional autoencoder (DCAE) to enhance ECG information from in-ear recordings, producing cleaner ECG outputs. The model is evaluated using a dataset of in-ear ECGs and corresponding clean Lead I ECGs from 45 healthy participants. The results demonstrate a substantial improvement in signal-to-noise ratio (SNR), with a median increase of 5.9 dB. Additionally, the model significantly improved heart rate estimation accuracy, reducing the mean absolute error by almost 70% and increasing R-peak detection precision to a median value of 90%. We also trained and validated the model using a synthetic dataset, generated from real ECG signals, including abnormal cardiac morphologies, corrupted by pink noise. The results obtained show effective removal of noise sources with clinically plausible waveform reconstruction ability.

cross Fast ($\sim N$) Diffusion Map Algorithm

Authors: Julio Candanedo

Abstract: In this work we explore parsimonious manifold learning techniques, specifically for Diffusion-maps. We demonstrate an algorithm and it's implementation with computational complexity (in both time and memory) of $\sim N$, with $N$ representing the number-of-samples. These techniques are essential for large-scale unsupervised learning tasks without any prior assumptions, due to sampling theorem limitations.

cross Property Neurons in Self-Supervised Speech Transformers

Authors: Tzu-Quan Lin, Guan-Ting Lin, Hung-yi Lee, Hao Tang

Abstract: There have been many studies on analyzing self-supervised speech Transformers, in particular, with layer-wise analysis. It is, however, desirable to have an approach that can pinpoint exactly a subset of neurons that is responsible for a particular property of speech, being amenable to model pruning and model editing. In this work, we identify a set of property neurons in the feedforward layers of Transformers to study how speech-related properties, such as phones, gender, and pitch, are stored. When removing neurons of a particular property (a simple form of model editing), the respective downstream performance significantly degrades, showing the importance of the property neurons. We apply this approach to pruning the feedforward layers in Transformers, where most of the model parameters are. We show that protecting property neurons during pruning is significantly more effective than norm-based pruning.

cross Unlocking Potential Binders: Multimodal Pretraining DEL-Fusion for Denoising DNA-Encoded Libraries

Authors: Chunbin Gu, Mutian He, Hanqun Cao, Guangyong Chen, Chang-yu Hsieh, Pheng Ann Heng

Abstract: In the realm of drug discovery, DNA-encoded library (DEL) screening technology has emerged as an efficient method for identifying high-affinity compounds. However, DEL screening faces a significant challenge: noise arising from nonspecific interactions within complex biological systems. Neural networks trained on DEL libraries have been employed to extract compound features, aiming to denoise the data and uncover potential binders to the desired therapeutic target. Nevertheless, the inherent structure of DEL, constrained by the limited diversity of building blocks, impacts the performance of compound encoders. Moreover, existing methods only capture compound features at a single level, further limiting the effectiveness of the denoising strategy. To mitigate these issues, we propose a Multimodal Pretraining DEL-Fusion model (MPDF) that enhances encoder capabilities through pretraining and integrates compound features across various scales. We develop pretraining tasks applying contrastive objectives between different compound representations and their text descriptions, enhancing the compound encoders' ability to acquire generic features. Furthermore, we propose a novel DEL-fusion framework that amalgamates compound information at the atomic, submolecular, and molecular levels, as captured by various compound encoders. The synergy of these innovations equips MPDF with enriched, multi-scale features, enabling comprehensive downstream denoising. Evaluated on three DEL datasets, MPDF demonstrates superior performance in data processing and analysis for validation tasks. Notably, MPDF offers novel insights into identifying high-affinity molecules, paving the way for improved DEL utility in drug discovery.

cross Hierarchical novel class discovery for single-cell transcriptomic profiles

Authors: Malek Senoussi, Thierry Arti\`eres, Paul Villoutreix

Abstract: One of the major challenges arising from single-cell transcriptomics experiments is the question of how to annotate the associated single-cell transcriptomic profiles. Because of the large size and the high dimensionality of the data, automated methods for annotation are needed. We focus here on datasets obtained in the context of developmental biology, where the differentiation process leads to a hierarchical structure. We consider a frequent setting where both labeled and unlabeled data are available at training time, but the sets of the labels of labeled data on one side and of the unlabeled data on the other side, are disjoint. It is an instance of the Novel Class Discovery problem. The goal is to achieve two objectives, clustering the data and mapping the clusters with labels. We propose extensions of k-Means and GMM clustering methods for solving the problem and report comparative results on artificial and experimental transcriptomic datasets. Our approaches take advantage of the hierarchical nature of the data.

cross DeepFM-Crispr: Prediction of CRISPR On-Target Effects via Deep Learning

Authors: Condy Bao, Fuxiao Liu

Abstract: Since the advent of CRISPR-Cas9, a groundbreaking gene-editing technology that enables precise genomic modifications via a short RNA guide sequence, there has been a marked increase in the accessibility and application of this technology across various fields. The success of CRISPR-Cas9 has spurred further investment and led to the discovery of additional CRISPR systems, including CRISPR-Cas13. Distinct from Cas9, which targets DNA, Cas13 targets RNA, offering unique advantages for gene modulation. We focus on Cas13d, a variant known for its collateral activity where it non-specifically cleaves adjacent RNA molecules upon activation, a feature critical to its function. We introduce DeepFM-Crispr, a novel deep learning model developed to predict the on-target efficiency and evaluate the off-target effects of Cas13d. This model harnesses a large language model to generate comprehensive representations rich in evolutionary and structural data, thereby enhancing predictions of RNA secondary structures and overall sgRNA efficacy. A transformer-based architecture processes these inputs to produce a predictive efficacy score. Comparative experiments show that DeepFM-Crispr not only surpasses traditional models but also outperforms recent state-of-the-art deep learning methods in terms of prediction accuracy and reliability.

cross A Small Claims Court for the NLP: Judging Legal Text Classification Strategies With Small Datasets

Authors: Mariana Yukari Noguti, Edduardo Vellasques, Luiz Eduardo Soares Oliveira

Abstract: Recent advances in language modelling has significantly decreased the need of labelled data in text classification tasks. Transformer-based models, pre-trained on unlabeled data, can outmatch the performance of models trained from scratch for each task. However, the amount of labelled data need to fine-tune such type of model is still considerably high for domains requiring expert-level annotators, like the legal domain. This paper investigates the best strategies for optimizing the use of a small labeled dataset and large amounts of unlabeled data and perform a classification task in the legal area with 50 predefined topics. More specifically, we use the records of demands to a Brazilian Public Prosecutor's Office aiming to assign the descriptions in one of the subjects, which currently demands deep legal knowledge for manual filling. The task of optimizing the performance of classifiers in this scenario is especially challenging, given the low amount of resources available regarding the Portuguese language, especially in the legal domain. Our results demonstrate that classic supervised models such as logistic regression and SVM and the ensembles random forest and gradient boosting achieve better performance along with embeddings extracted with word2vec when compared to BERT language model. The latter demonstrates superior performance in association with the architecture of the model itself as a classifier, having surpassed all previous models in that regard. The best result was obtained with Unsupervised Data Augmentation (UDA), which jointly uses BERT, data augmentation, and strategies of semi-supervised learning, with an accuracy of 80.7% in the aforementioned task.

cross Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting

Authors: Gianmarco Genalti, Marco Mussi, Nicola Gatti, Marcello Restelli, Matteo Castiglioni, Alberto Maria Metelli

Abstract: Rested and Restless Bandits are two well-known bandit settings that are useful to model real-world sequential decision-making problems in which the expected reward of an arm evolves over time due to the actions we perform or due to the nature. In this work, we propose Graph-Triggered Bandits (GTBs), a unifying framework to generalize and extend rested and restless bandits. In this setting, the evolution of the arms' expected rewards is governed by a graph defined over the arms. An edge connecting a pair of arms $(i,j)$ represents the fact that a pull of arm $i$ triggers the evolution of arm $j$, and vice versa. Interestingly, rested and restless bandits are both special cases of our model for some suitable (degenerated) graph. As relevant case studies for this setting, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs. For these cases, we study the optimal policies. We provide suitable algorithms for all scenarios and discuss their theoretical guarantees, highlighting the complexity of the learning problem concerning instance-dependent terms that encode specific properties of the underlying graph structure.

cross A Comprehensive Comparison Between ANNs and KANs For Classifying EEG Alzheimer's Data

Authors: Akshay Sunkara, Sriram Sattiraju, Aakarshan Kumar, Zaryab Kanjiani, Himesh Anumala

Abstract: Alzheimer's Disease is an incurable cognitive condition that affects thousands of people globally. While some diagnostic methods exist for Alzheimer's Disease, many of these methods cannot detect Alzheimer's in its earlier stages. Recently, researchers have explored the use of Electroencephalogram (EEG) technology for diagnosing Alzheimer's. EEG is a noninvasive method of recording the brain's electrical signals, and EEG data has shown distinct differences between patients with and without Alzheimer's. In the past, Artificial Neural Networks (ANNs) have been used to predict Alzheimer's from EEG data, but these models sometimes produce false positive diagnoses. This study aims to compare losses between ANNs and Kolmogorov-Arnold Networks (KANs) across multiple types of epochs, learning rates, and nodes. The results show that across these different parameters, ANNs are more accurate in predicting Alzheimer's Disease from EEG signals.

cross DetoxBench: Benchmarking Large Language Models for Multitask Fraud & Abuse Detection

Authors: Joymallya Chakraborty, Wei Xia, Anirban Majumder, Dan Ma, Walid Chaabene, Naveed Janvekar

Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks. However, their practical application in high-stake domains, such as fraud and abuse detection, remains an area that requires further exploration. The existing applications often narrowly focus on specific tasks like toxicity or hate speech detection. In this paper, we present a comprehensive benchmark suite designed to assess the performance of LLMs in identifying and mitigating fraudulent and abusive language across various real-world scenarios. Our benchmark encompasses a diverse set of tasks, including detecting spam emails, hate speech, misogynistic language, and more. We evaluated several state-of-the-art LLMs, including models from Anthropic, Mistral AI, and the AI21 family, to provide a comprehensive assessment of their capabilities in this critical domain. The results indicate that while LLMs exhibit proficient baseline performance in individual fraud and abuse detection tasks, their performance varies considerably across tasks, particularly struggling with tasks that demand nuanced pragmatic reasoning, such as identifying diverse forms of misogynistic language. These findings have important implications for the responsible development and deployment of LLMs in high-risk applications. Our benchmark suite can serve as a tool for researchers and practitioners to systematically evaluate LLMs for multi-task fraud detection and drive the creation of more robust, trustworthy, and ethically-aligned systems for fraud and abuse detection.

cross Regression with Large Language Models for Materials and Molecular Property Prediction

Authors: Ryan Jacobs, Maciej P. Polak, Lane E. Schultz, Hamed Mahdavi, Vasant Honavar, Dane Morgan

Abstract: We demonstrate the ability of large language models (LLMs) to perform material and molecular property regression tasks, a significant deviation from the conventional LLM use case. We benchmark the Large Language Model Meta AI (LLaMA) 3 on several molecular properties in the QM9 dataset and 24 materials properties. Only composition-based input strings are used as the model input and we fine tune on only the generative loss. We broadly find that LLaMA 3, when fine-tuned using the SMILES representation of molecules, provides useful regression results which can rival standard materials property prediction models like random forest or fully connected neural networks on the QM9 dataset. Not surprisingly, LLaMA 3 errors are 5-10x higher than those of the state-of-the-art models that were trained using far more granular representation of molecules (e.g., atom types and their coordinates) for the same task. Interestingly, LLaMA 3 provides improved predictions compared to GPT-3.5 and GPT-4o. This work highlights the versatility of LLMs, suggesting that LLM-like generative models can potentially transcend their traditional applications to tackle complex physical phenomena, thus paving the way for future research and applications in chemistry, materials science and other scientific domains.

cross DECOLLAGE: 3D Detailization by Controllable, Localized, and Learned Geometry Enhancement

Authors: Qimin Chen, Zhiqin Chen, Vladimir G. Kim, Noam Aigerman, Hao Zhang, Siddhartha Chaudhuri

Abstract: We present a 3D modeling method which enables end-users to refine or detailize 3D shapes using machine learning, expanding the capabilities of AI-assisted 3D content creation. Given a coarse voxel shape (e.g., one produced with a simple box extrusion tool or via generative modeling), a user can directly "paint" desired target styles representing compelling geometric details, from input exemplar shapes, over different regions of the coarse shape. These regions are then up-sampled into high-resolution geometries which adhere with the painted styles. To achieve such controllable and localized 3D detailization, we build on top of a Pyramid GAN by making it masking-aware. We devise novel structural losses and priors to ensure that our method preserves both desired coarse structures and fine-grained features even if the painted styles are borrowed from diverse sources, e.g., different semantic parts and even different shape categories. Through extensive experiments, we show that our ability to localize details enables novel interactive creative workflows and applications. Our experiments further demonstrate that in comparison to prior techniques built on global detailization, our method generates structure-preserving, high-resolution stylized geometries with more coherent shape details and style transitions.

cross Variational Search Distributions

Authors: Daniel M. Steinberg, Rafael Oliveira, Cheng Soon Ong, Edwin V. Bonilla

Abstract: We develop variational search distributions (VSD), a method for finding discrete, combinatorial designs of a rare desired class in a batch sequential manner with a fixed experimental budget. We formalize the requirements and desiderata for this problem and formulate a solution via variational inference that fulfill these. In particular, VSD uses off-the-shelf gradient based optimization routines, and can take advantage of scalable predictive models. We show that VSD can outperform existing baseline methods on a set of real sequence-design problems in various biological systems.

cross Causal Analysis of Shapley Values: Conditional vs. Marginal

Authors: Ilya Rozenfeld

Abstract: Shapley values, a game theoretic concept, has been one of the most popular tools for explaining Machine Learning (ML) models in recent years. Unfortunately, the two most common approaches, conditional and marginal, to calculating Shapley values can lead to different results along with some undesirable side effects when features are correlated. This in turn has led to the situation in the literature where contradictory recommendations regarding choice of an approach are provided by different authors. In this paper we aim to resolve this controversy through the use of causal arguments. We show that the differences arise from the implicit assumptions that are made within each method to deal with missing causal information. We also demonstrate that the conditional approach is fundamentally unsound from a causal perspective. This, together with previous work in [1], leads to the conclusion that the marginal approach should be preferred over the conditional one.

cross Loss Distillation via Gradient Matching for Point Cloud Completion with Weighted Chamfer Distance

Authors: Fangzhou Lin, Haotian Liu, Haoying Zhou, Songlin Hou, Kazunori D Yamada, Gregory S. Fischer, Yanhua Li, Haichong K. Zhang, Ziming Zhang

Abstract: 3D point clouds enhanced the robot's ability to perceive the geometrical information of the environments, making it possible for many downstream tasks such as grasp pose detection and scene understanding. The performance of these tasks, though, heavily relies on the quality of data input, as incomplete can lead to poor results and failure cases. Recent training loss functions designed for deep learning-based point cloud completion, such as Chamfer distance (CD) and its variants (\eg HyperCD ), imply a good gradient weighting scheme can significantly boost performance. However, these CD-based loss functions usually require data-related parameter tuning, which can be time-consuming for data-extensive tasks. To address this issue, we aim to find a family of weighted training losses ({\em weighted CD}) that requires no parameter tuning. To this end, we propose a search scheme, {\em Loss Distillation via Gradient Matching}, to find good candidate loss functions by mimicking the learning behavior in backpropagation between HyperCD and weighted CD. Once this is done, we propose a novel bilevel optimization formula to train the backbone network based on the weighted CD loss. We observe that: (1) with proper weighted functions, the weighted CD can always achieve similar performance to HyperCD, and (2) the Landau weighted CD, namely {\em Landau CD}, can outperform HyperCD for point cloud completion and lead to new state-of-the-art results on several benchmark datasets. {\it Our demo code is available at \url{https://github.com/Zhang-VISLab/IROS2024-LossDistillationWeightedCD}.}

URLs: https://github.com/Zhang-VISLab/IROS2024-LossDistillationWeightedCD

cross Can Large Language Models Unlock Novel Scientific Research Ideas?

Authors: Sandeep Kumar, Tirthankar Ghosal, Vinayak Goyal, Asif Ekbal

Abstract: "An idea is nothing more nor less than a new combination of old elements" (Young, J.W.). The widespread adoption of Large Language Models (LLMs) and publicly available ChatGPT have marked a significant turning point in the integration of Artificial Intelligence (AI) into people's everyday lives. This study explores the capability of LLMs in generating novel research ideas based on information from research papers. We conduct a thorough examination of 4 LLMs in five domains (e.g., Chemistry, Computer, Economics, Medical, and Physics). We found that the future research ideas generated by Claude-2 and GPT-4 are more aligned with the author's perspective than GPT-3.5 and Gemini. We also found that Claude-2 generates more diverse future research ideas than GPT-4, GPT-3.5, and Gemini 1.0. We further performed a human evaluation of the novelty, relevancy, and feasibility of the generated future research ideas. This investigation offers insights into the evolving role of LLMs in idea generation, highlighting both its capability and limitations. Our work contributes to the ongoing efforts in evaluating and utilizing language models for generating future research ideas. We make our datasets and codes publicly available.

cross Bottleneck-based Encoder-decoder ARchitecture (BEAR) for Learning Unbiased Consumer-to-Consumer Image Representations

Authors: Pablo Rivas, Gisela Bichler, Tomas Cerny, Laurie Giddens, Stacie Petter

Abstract: Unbiased representation learning is still an object of study under specific applications and contexts. Novel architectures are usually crafted to resolve particular problems using mixtures of fundamental pieces. This paper presents different image feature extraction mechanisms that work together with residual connections to encode perceptual image information in an autoencoder configuration. We use image data that aims to support a larger research agenda dealing with issues regarding criminal activity in consumer-to-consumer online platforms. Preliminary results suggest that the proposed architecture can learn rich spaces using ours and other image datasets resolving important challenges that are identified.

cross Multi-Source Music Generation with Latent Diffusion

Authors: Zhongweiyang Xu, Debottam Dutta, Yu-Lin Wei, Romit Roy Choudhury

Abstract: Most music generation models directly generate a single music mixture. To allow for more flexible and controllable generation, the Multi-Source Diffusion Model (MSDM) has been proposed to model music as a mixture of multiple instrumental sources (e.g., piano, drums, bass, and guitar). Its goal is to use one single diffusion model to generate consistent music sources, which are further mixed to form the music. Despite its capabilities, MSDM is unable to generate songs with rich melodies and often generates empty sounds. Also, its waveform diffusion introduces significant Gaussian noise artifacts, which compromises audio quality. In response, we introduce a multi-source latent diffusion model (MSLDM) that employs Variational Autoencoders (VAEs) to encode each instrumental source into a distinct latent representation. By training a VAE on all music sources, we efficiently capture each source's unique characteristics in a source latent that our diffusion model models jointly. This approach significantly enhances the total and partial generation of music by leveraging the VAE's latent compression and noise-robustness. The compressed source latent also facilitates more efficient generation. Subjective listening tests and Frechet Audio Distance (FAD) scores confirm that our model outperforms MSDM, showcasing its practical and enhanced applicability in music generation systems. We also emphasize that modeling sources is more effective than direct music mixture modeling. Codes and models are available at https://github.com/XZWY/MSLDM. Demos are available at https://xzwy.github.io/MSLDMDemo.

URLs: https://github.com/XZWY/MSLDM., https://xzwy.github.io/MSLDMDemo.

cross MTDA-HSED: Mutual-Assistance Tuning and Dual-Branch Aggregating for Heterogeneous Sound Event Detection

Authors: Zehao Wang, Haobo Yue, Zhicheng Zhang, Da Mu, Jin Tang, Jianqin Yin

Abstract: Sound Event Detection (SED) plays a vital role in comprehending and perceiving acoustic scenes. Previous methods have demonstrated impressive capabilities. However, they are deficient in learning features of complex scenes from heterogeneous dataset. In this paper, we introduce a novel dual-branch architecture named Mutual-Assistance Tuning and Dual-Branch Aggregating for Heterogeneous Sound Event Detection (MTDA-HSED). The MTDA-HSED architecture employs the Mutual-Assistance Audio Adapter (M3A) to effectively tackle the multi-scenario problem and uses the Dual-Branch Mid-Fusion (DBMF) module to tackle the multi-granularity problem. Specifically, M3A is integrated into the BEATs block as an adapter to improve the BEATs' performance by fine-tuning it on the multi-scenario dataset. The DBMF module connects BEATs and CNN branches, which facilitates the deep fusion of information from the BEATs and the CNN branches. Experimental results show that the proposed methods exceed the baseline of mpAUC by \textbf{$5\%$} on the DESED and MAESTRO Real datasets. Code is \href{https://github.com/Visitor-W/MTDA}{here}.

URLs: https://github.com/Visitor-W/MTDA

cross MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding

Authors: Surbhi Madan, Shreya Ghosh, Lownish Rai Sookha, M. A. Ganaie, Ramanathan Subramanian, Abhinav Dhall, Tom Gedeon

Abstract: Estimating the Most Important Person (MIP) in any social event setup is a challenging problem mainly due to contextual complexity and scarcity of labeled data. Moreover, the causality aspects of MIP estimation are quite subjective and diverse. To this end, we aim to address the problem by annotating a large-scale `in-the-wild' dataset for identifying human perceptions about the `Most Important Person (MIP)' in an image. The paper provides a thorough description of our proposed Multimodal Large Language Model (MLLM) based data annotation strategy, and a thorough data quality analysis. Further, we perform a comprehensive benchmarking of the proposed dataset utilizing state-of-the-art MIP localization methods, indicating a significant drop in performance compared to existing datasets. The performance drop shows that the existing MIP localization algorithms must be more robust with respect to `in-the-wild' situations. We believe the proposed dataset will play a vital role in building the next-generation social situation understanding methods. The code and data is available at https://github.com/surbhimadan92/MIP-GAF.

URLs: https://github.com/surbhimadan92/MIP-GAF.

cross NLP-Powered Repository and Search Engine for Academic Papers: A Case Study on Cyber Risk Literature with CyLit

Authors: Linfeng Zhang, Changyue Hu, Zhiyu Quan

Abstract: As the body of academic literature continues to grow, researchers face increasing difficulties in effectively searching for relevant resources. Existing databases and search engines often fall short of providing a comprehensive and contextually relevant collection of academic literature. To address this issue, we propose a novel framework that leverages Natural Language Processing (NLP) techniques. This framework automates the retrieval, summarization, and clustering of academic literature within a specific research domain. To demonstrate the effectiveness of our approach, we introduce CyLit, an NLP-powered repository specifically designed for the cyber risk literature. CyLit empowers researchers by providing access to context-specific resources and enabling the tracking of trends in the dynamic and rapidly evolving field of cyber risk. Through the automatic processing of large volumes of data, our NLP-powered solution significantly enhances the efficiency and specificity of academic literature searches. We compare the literature categorization results of CyLit to those presented in survey papers or generated by ChatGPT, highlighting the distinctive insights this tool provides into cyber risk research literature. Using NLP techniques, we aim to revolutionize the way researchers discover, analyze, and utilize academic resources, ultimately fostering advancements in various domains of knowledge.

cross Recurrent Neural Networks for Still Images

Authors: Dmitri (Dima), Lvov, Yair Smadar, Ran Bezen

Abstract: In this paper, we explore the application of Recurrent Neural Network (RNN) for still images. Typically, Convolutional Neural Networks (CNNs) are the prevalent method applied for this type of data, and more recently, transformers have gained popularity, although they often require large models. Unlike these methods, RNNs are generally associated with processing sequences over time rather than single images. We argue that RNNs can effectively handle still images by interpreting the pixels as a sequence. This approach could be particularly advantageous for compact models designed for embedded systems, where resources are limited. Additionally, we introduce a novel RNN design tailored for two-dimensional inputs, such as images, and a custom version of BiDirectional RNN (BiRNN) that is more memory-efficient than traditional implementations. In our research, we have tested these layers in Convolutional Recurrent Neural Networks (CRNNs), predominantly composed of Conv2D layers, with RNN layers at or close to the end. Experiments on the COCO and CIFAR100 datasets show better results, particularly for small networks.

cross Market Reaction to News Flows in Supply Chain Networks

Authors: Hiroyasu Inoue, Yasuyuki Todo

Abstract: This study examines whether positive news about firms increases their stock prices and, moreover, whether it increases stock prices of the firms' suppliers and customers, using a large sample of publicly listed firms across the world and another of Japanese listed firms. The level of positiveness of each news article is determined by FinBERT, a natural language processing model fine-tuned specifically for financial information. Supply chains of firms across the world are identified mostly by financial statements, while those of Japanese firms are taken from large-scale firm-level surveys. We find that positive news increases the change rate of stock prices of firms mentioned in the news before its disclosure, most likely because of diffusion of information through informal channels. Positive news also raises stock prices of the firms' suppliers and customers before its disclosure, confirming propagation of market values through supply chains. In addition, we generally find a larger post-news effect on stock prices of the mentioned firms and their suppliers and customers than the pre-news effect. The positive difference between the post- and pre-news effects can be considered as the net effect of the disclosure of positive news, controlling for informal information diffusion. However, the post-news effect on suppliers and customers in Japan is smaller than the pre-news effect, a result opposite to those from firms across the world. This notable result is possibly because supply chain links of Japanese firms are stronger than global supply chains while such knowledge is restricted to selected investors.

cross A new paradigm for global sensitivity analysis

Authors: Gildas Mazo (MaIAGE)

Abstract:

Current theory of global sensitivity analysis, based on a nonlinear functional ANOVA decomposition of the random output, is limited in scope-for instance, the analysis is limited to the output's variance and the inputs have to be mutually independent-and leads to sensitivity indices the interpretation of which is not fully clear, especially interaction effects. Alternatively, sensitivity indices built for arbitrary user-defined importance measures have been proposed but a theory to define interactions in a systematic fashion and/or establish a decomposition of the total importance measure is still missing. It is shown that these important problems are solved all at once by adopting a new paradigm. By partitioning the inputs into those causing the change in the output and those which do not, arbitrary user-defined variability measures are identified with the outcomes of a factorial experiment at two levels, leading to all factorial effects without assuming any functional decomposition. To link various well-known sensitivity indices of the literature (Sobol indices and Shapley effects), weighted factorial effects are studied and utilized.

cross Automate Strategy Finding with LLM in Quant investment

Authors: Zhizhuo Kou, Holam Yu, Jingshu Peng, Lei Chen

Abstract: Despite significant progress in deep learning for financial trading, existing models often face instability and high uncertainty, hindering their practical application. Leveraging advancements in Large Language Models (LLMs) and multi-agent architectures, we propose a novel framework for quantitative stock investment in portfolio management and alpha mining. Our framework addresses these issues by integrating LLMs to generate diversified alphas and employing a multi-agent approach to dynamically evaluate market conditions. This paper proposes a framework where large language models (LLMs) mine alpha factors from multimodal financial data, ensuring a comprehensive understanding of market dynamics. The first module extracts predictive signals by integrating numerical data, research papers, and visual charts. The second module uses ensemble learning to construct a diverse pool of trading agents with varying risk preferences, enhancing strategy performance through a broader market analysis. In the third module, a dynamic weight-gating mechanism selects and assigns weights to the most relevant agents based on real-time market conditions, enabling the creation of an adaptive and context-aware composite alpha formula. Extensive experiments on the Chinese stock markets demonstrate that this framework significantly outperforms state-of-the-art baselines across multiple financial metrics. The results underscore the efficacy of combining LLM-generated alphas with a multi-agent architecture to achieve superior trading performance and stability. This work highlights the potential of AI-driven approaches in enhancing quantitative investment strategies and sets a new benchmark for integrating advanced machine learning techniques in financial trading can also be applied on diverse markets.

cross User Preferences for Large Language Model versus Template-Based Explanations of Movie Recommendations: A Pilot Study

Authors: Julien Albert, Martin Balfroid, Miriam Doh, Jeremie Bogaert, Luca La Fisca, Liesbet De Vos, Bryan Renard, Vincent Stragier, Emmanuel Jean

Abstract: Recommender systems have become integral to our digital experiences, from online shopping to streaming platforms. Still, the rationale behind their suggestions often remains opaque to users. While some systems employ a graph-based approach, offering inherent explainability through paths associating recommended items and seed items, non-experts could not easily understand these explanations. A popular alternative is to convert graph-based explanations into textual ones using a template and an algorithm, which we denote here as ''template-based'' explanations. Yet, these can sometimes come across as impersonal or uninspiring. A novel method would be to employ large language models (LLMs) for this purpose, which we denote as ''LLM-based''. To assess the effectiveness of LLMs in generating more resonant explanations, we conducted a pilot study with 25 participants. They were presented with three explanations: (1) traditional template-based, (2) LLM-based rephrasing of the template output, and (3) purely LLM-based explanations derived from the graph-based explanations. Although subject to high variance, preliminary findings suggest that LLM-based explanations may provide a richer and more engaging user experience, further aligning with user expectations. This study sheds light on the potential limitations of current explanation methods and offers promising directions for leveraging large language models to improve user satisfaction and trust in recommender systems.

cross Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis

Authors: Hao Li, Dong Liang, Zheng Xie

Abstract: Meta-learning is characterized by its ability to learn how to learn, enabling the adaptation of learning strategies across different tasks. Recent research introduced the Meta-Thompson Sampling (Meta-TS), which meta-learns an unknown prior distribution sampled from a meta-prior by interacting with bandit instances drawn from it. However, its analysis was limited to Gaussian bandit. The contextual multi-armed bandit framework is an extension of the Gaussian Bandit, which challenges agent to utilize context vectors to predict the most valuable arms, optimally balancing exploration and exploitation to minimize regret over time. This paper introduces Meta-TSLB algorithm, a modified Meta-TS for linear contextual bandits. We theoretically analyze Meta-TSLB and derive an $ O\left( \left( m+\log \left( m \right) \right) \sqrt{n\log \left( n \right)} \right)$ bound on its Bayes regret, in which $m$ represents the number of bandit instances, and $n$ the number of rounds of Thompson Sampling. Additionally, our work complements the analysis of Meta-TS for linear contextual bandits. The performance of Meta-TSLB is evaluated experimentally under different settings, and we experimente and analyze the generalization capability of Meta-TSLB, showcasing its potential to adapt to unseen instances.

cross Compute-Update Federated Learning: A Lattice Coding Approach

Authors: Seyed Mohammad Azimi-Abarghouyi, Lav R. Varshney

Abstract: This paper introduces a federated learning framework that enables over-the-air computation via digital communications, using a new joint source-channel coding scheme. Without relying on channel state information at devices, this scheme employs lattice codes to both quantize model parameters and exploit interference from the devices. We propose a novel receiver structure at the server, designed to reliably decode an integer combination of the quantized model parameters as a lattice point for the purpose of aggregation. We present a mathematical approach to derive a convergence bound for the proposed scheme and offer design remarks. In this context, we suggest an aggregation metric and a corresponding algorithm to determine effective integer coefficients for the aggregation in each communication round. Our results illustrate that, regardless of channel dynamics and data heterogeneity, our scheme consistently delivers superior learning accuracy across various parameters and markedly surpasses other over-the-air methodologies.

cross One Policy to Run Them All: an End-to-end Learning Approach to Multi-Embodiment Locomotion

Authors: Nico Bohlinger, Grzegorz Czechmanowski, Maciej Krupka, Piotr Kicki, Krzysztof Walas, Jan Peters, Davide Tateo

Abstract: Deep Reinforcement Learning techniques are achieving state-of-the-art results in robust legged locomotion. While there exists a wide variety of legged platforms such as quadruped, humanoids, and hexapods, the field is still missing a single learning framework that can control all these different embodiments easily and effectively and possibly transfer, zero or few-shot, to unseen robot embodiments. We introduce URMA, the Unified Robot Morphology Architecture, to close this gap. Our framework brings the end-to-end Multi-Task Reinforcement Learning approach to the realm of legged robots, enabling the learned policy to control any type of robot morphology. The key idea of our method is to allow the network to learn an abstract locomotion controller that can be seamlessly shared between embodiments thanks to our morphology-agnostic encoders and decoders. This flexible architecture can be seen as a potential first step in building a foundation model for legged robot locomotion. Our experiments show that URMA can learn a locomotion policy on multiple embodiments that can be easily transferred to unseen robot platforms in simulation and the real world.

cross Sources of Uncertainty in 3D Scene Reconstruction

Authors: Marcus Klasson, Riccardo Mereu, Juho Kannala, Arno Solin

Abstract: The process of 3D scene reconstruction can be affected by numerous uncertainty sources in real-world scenes. While Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (GS) achieve high-fidelity rendering, they lack built-in mechanisms to directly address or quantify uncertainties arising from the presence of noise, occlusions, confounding outliers, and imprecise camera pose inputs. In this paper, we introduce a taxonomy that categorizes different sources of uncertainty inherent in these methods. Moreover, we extend NeRF- and GS-based methods with uncertainty estimation techniques, including learning uncertainty outputs and ensembles, and perform an empirical study to assess their ability to capture the sensitivity of the reconstruction. Our study highlights the need for addressing various uncertainty aspects when designing NeRF/GS-based methods for uncertainty-aware 3D reconstruction.

cross Exploring the Integration of Large Language Models in Industrial Test Maintenance Processes

Authors: Ludvig Lemner, Linnea Wahlgren, Gregory Gay, Nasser Mohammadiha, Jingxiong Liu, Joakim Wennerberg

Abstract: Much of the cost and effort required during the software testing process is invested in performing test maintenance - the addition, removal, or modification of test cases to keep the test suite in sync with the system-under-test or to otherwise improve its quality. Tool support could reduce the cost - and improve the quality - of test maintenance by automating aspects of the process or by providing guidance and support to developers. In this study, we explore the capabilities and applications of large language models (LLMs) - complex machine learning models adapted to textual analysis - to support test maintenance. We conducted a case study at Ericsson AB where we explored the triggers that indicate the need for test maintenance, the actions that LLMs can take, and the considerations that must be made when deploying LLMs in an industrial setting. We also proposed and demonstrated implementations of two multi-agent architectures that can predict which test cases require maintenance following a change to the source code. Collectively, these contributions advance our theoretical and practical understanding of how LLMs can be deployed to benefit industrial test maintenance processes.

cross GeMuCo: Generalized Multisensory Correlational Model for Body Schema Learning

Authors: Kento Kawaharazuka, Kei Okada, Masayuki Inaba

Abstract: Humans can autonomously learn the relationship between sensation and motion in their own bodies, estimate and control their own body states, and move while continuously adapting to the current environment. On the other hand, current robots control their bodies by learning the network structure described by humans from their experiences, making certain assumptions on the relationship between sensors and actuators. In addition, the network model does not adapt to changes in the robot's body, the tools that are grasped, or the environment, and there is no unified theory, not only for control but also for state estimation, anomaly detection, simulation, and so on. In this study, we propose a Generalized Multisensory Correlational Model (GeMuCo), in which the robot itself acquires a body schema describing the correlation between sensors and actuators from its own experience, including model structures such as network input/output. The robot adapts to the current environment by updating this body schema model online, estimates and controls its body state, and even performs anomaly detection and simulation. We demonstrate the effectiveness of this method by applying it to tool-use considering changes in grasping state for an axis-driven robot, to joint-muscle mapping learning for a musculoskeletal robot, and to full-body tool manipulation for a low-rigidity plastic-made humanoid.

cross Spectral Map for Slow Collective Variables, Markovian Dynamics, and Transition State Ensembles

Authors: Jakub Rydzewski

Abstract: Understanding the behavior of complex molecular systems is a fundamental problem in physical chemistry. To describe the long-time dynamics of such systems, which is responsible for their most informative characteristics, we can identify a few slow collective variables (CVs) while treating the remaining fast variables as thermal noise. This enables us to simplify the dynamics and treat it as diffusion in a free-energy landscape spanned by slow CVs, effectively rendering the dynamics Markovian. Our recent statistical learning technique, spectral map [Rydzewski, J. Phys. Chem. Lett. 2023, 14, 22, 5216-5220], explores this strategy to learn slow CVs by maximizing a spectral gap of a transition matrix. In this work, we introduce several advancements into our framework, using a high-dimensional reversible folding process of a protein as an example. We implement an algorithm for coarse-graining Markov transition matrices to partition the reduced space of slow CVs kinetically and use it to define a transition state ensemble. We show that slow CVs learned by spectral map closely approach the Markovian limit for an overdamped diffusion. We demonstrate that coordinate-dependent diffusion coefficients only slightly affect the constructed free-energy landscapes. Finally, we present how spectral map can be used to quantify the importance of features and compare slow CVs with structural descriptors commonly used in protein folding. Overall, we demonstrate that a single slow CV learned by spectral map can be used as a physical reaction coordinate to capture essential characteristics of protein folding.

cross Fine-tuning and Prompt Engineering with Cognitive Knowledge Graphs for Scholarly Knowledge Organization

Authors: Gollam Rabby, S\"oren Auer, Jennifer D'Souza, Allard Oelen

Abstract: The increasing amount of published scholarly articles, exceeding 2.5 million yearly, raises the challenge for researchers in following scientific progress. Integrating the contributions from scholarly articles into a novel type of cognitive knowledge graph (CKG) will be a crucial element for accessing and organizing scholarly knowledge, surpassing the insights provided by titles and abstracts. This research focuses on effectively conveying structured scholarly knowledge by utilizing large language models (LLMs) to categorize scholarly articles and describe their contributions in a structured and comparable manner. While previous studies explored language models within specific research domains, the extensive domain-independent knowledge captured by LLMs offers a substantial opportunity for generating structured contribution descriptions as CKGs. Additionally, LLMs offer customizable pathways through prompt engineering or fine-tuning, thus facilitating to leveraging of smaller LLMs known for their efficiency, cost-effectiveness, and environmental considerations. Our methodology involves harnessing LLM knowledge, and complementing it with domain expert-verified scholarly data sourced from a CKG. This strategic fusion significantly enhances LLM performance, especially in tasks like scholarly article categorization and predicate recommendation. Our method involves fine-tuning LLMs with CKG knowledge and additionally injecting knowledge from a CKG with a novel prompting technique significantly increasing the accuracy of scholarly knowledge extraction. We integrated our approach in the Open Research Knowledge Graph (ORKG), thus enabling precise access to organized scholarly knowledge, crucially benefiting domain-independent scholarly knowledge exchange and dissemination among policymakers, industrial practitioners, and the general public.

cross HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data

Authors: Hossein Hajipour, Lea Sch\"onherr, Thorsten Holz, Mario Fritz

Abstract: Large language models (LLMs) have shown great potential for automatic code generation and form the basis for various tools such as GitHub Copilot. However, recent studies highlight that many LLM-generated code contains serious security vulnerabilities. While previous work tries to address this by training models that generate secure code, these attempts remain constrained by limited access to training data and labor-intensive data preparation. In this paper, we introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes by automatically synthesizing secure codes, which reduces the effort of finding suitable training data. HexaCoder comprises two key components: an oracle-guided data synthesis pipeline and a two-step process for secure code generation. The data synthesis pipeline generates pairs of vulnerable and fixed codes for specific Common Weakness Enumeration (CWE) types by utilizing a state-of-the-art LLM for repairing vulnerable code. A security oracle identifies vulnerabilities, and a state-of-the-art LLM repairs them by extending and/or editing the codes, creating data pairs for fine-tuning using the Low-Rank Adaptation (LoRA) method. Each example of our fine-tuning dataset includes the necessary security-related libraries and code that form the basis of our novel two-step generation approach. This allows the model to integrate security-relevant libraries before generating the main code, significantly reducing the number of generated vulnerable codes by up to 85% compared to the baseline methods. We perform extensive evaluations on three different benchmarks for four LLMs, demonstrating that HexaCoder not only improves the security of the generated code but also maintains a high level of functional correctness.

cross Ransomware Detection Using Machine Learning in the Linux Kernel

Authors: Adrian Brodzik, Tomasz Malec-Kruszy\'nski, Wojciech Niewolski, Miko{\l}aj Tkaczyk, Krzysztof Bocianiak, Sok-Yen Loui

Abstract: Linux-based cloud environments have become lucrative targets for ransomware attacks, employing various encryption schemes at unprecedented speeds. Addressing the urgency for real-time ransomware protection, we propose leveraging the extended Berkeley Packet Filter (eBPF) to collect system call information regarding active processes and infer about the data directly at the kernel level. In this study, we implement two Machine Learning (ML) models in eBPF - a decision tree and a multilayer perceptron. Benchmarking latency and accuracy against their user space counterparts, our findings underscore the efficacy of this approach.

cross Superior Computer Chess with Model Predictive Control, Reinforcement Learning, and Rollout

Authors: Atharva Gundawar, Yuchao Li, Dimitri Bertsekas

Abstract: In this paper we apply model predictive control (MPC), rollout, and reinforcement learning (RL) methodologies to computer chess. We introduce a new architecture for move selection, within which available chess engines are used as components. One engine is used to provide position evaluations in an approximation in value space MPC/RL scheme, while a second engine is used as nominal opponent, to emulate or approximate the moves of the true opponent player. We show that our architecture improves substantially the performance of the position evaluation engine. In other words our architecture provides an additional layer of intelligence, on top of the intelligence of the engines on which it is based. This is true for any engine, regardless of its strength: top engines such as Stockfish and Komodo Dragon (of varying strengths), as well as weaker engines. Structurally, our basic architecture selects moves by a one-move lookahead search, with an intermediate move generated by a nominal opponent engine, and followed by a position evaluation by another chess engine. Simpler schemes that forego the use of the nominal opponent, also perform better than the position evaluator, but not quite by as much. More complex schemes, involving multistep lookahead, may also be used and generally tend to perform better as the length of the lookahead increases. Theoretically, our methodology relies on generic cost improvement properties and the superlinear convergence framework of Newton's method, which fundamentally underlies approximation in value space, and related MPC/RL and rollout/policy iteration schemes. A critical requirement of this framework is that the first lookahead step should be executed exactly. This fact has guided our architectural choices, and is apparently an important factor in improving the performance of even the best available chess engines.

cross Learning local and semi-local density functionals from exact exchange-correlation potentials and energies

Authors: Bikash Kanungo, Jeffrey Hatch, Paul M. Zimmerman, Vikram Gavini

Abstract: Finding accurate exchange-correlation (XC) functionals remains the defining challenge in density functional theory (DFT). Despite 40 years of active development, the desired chemical accuracy is still elusive with existing functionals. We present a data-driven pathway to learn the XC functionals by utilizing the exact density, XC energy, and XC potential. While the exact densities are obtained from accurate configuration interaction (CI), the exact XC energies and XC potentials are obtained via inverse DFT calculations on the CI densities. We demonstrate how simple neural network (NN) based local density approximation (LDA) and generalized gradient approximation (GGA), trained on just five atoms and two molecules, provide remarkable improvement in total energies, densities, atomization energies, and barrier heights for hundreds of molecules outside the training set. Particularly, the NN-based GGA functional attains similar accuracy as the higher rung SCAN meta-GGA, highlighting the promise of using the XC potential in modeling XC functionals. We expect this approach to pave the way for systematic learning of increasingly accurate and sophisticated XC functionals.

cross Aligning Machine and Human Visual Representations across Abstraction Levels

Authors: Lukas Muttenthaler, Klaus Greff, Frieda Born, Bernhard Spitzer, Simon Kornblith, Michael C. Mozer, Klaus-Robert M\"uller, Thomas Unterthiner, Andrew K. Lampinen

Abstract: Deep neural networks have achieved success across a wide range of applications, including as models of human behavior in vision tasks. However, neural network training and human learning differ in fundamental ways, and neural networks often fail to generalize as robustly as humans do, raising questions regarding the similarity of their underlying representations. What is missing for modern learning systems to exhibit more human-like behavior? We highlight a key misalignment between vision models and humans: whereas human conceptual knowledge is hierarchically organized from fine- to coarse-scale distinctions, model representations do not accurately capture all these levels of abstraction. To address this misalignment, we first train a teacher model to imitate human judgments, then transfer human-like structure from its representations into pretrained state-of-the-art vision foundation models. These human-aligned models more accurately approximate human behavior and uncertainty across a wide range of similarity tasks, including a new dataset of human judgments spanning multiple levels of semantic abstractions. They also perform better on a diverse set of machine learning tasks, increasing generalization and out-of-distribution robustness. Thus, infusing neural networks with additional human knowledge yields a best-of-both-worlds representation that is both more consistent with human cognition and more practically useful, thus paving the way toward more robust, interpretable, and human-like artificial intelligence systems.

cross Limit Order Book Simulation and Trade Evaluation with $K$-Nearest-Neighbor Resampling

Authors: Michael Giegrich, Roel Oomen, Christoph Reisinger

Abstract: In this paper, we show how $K$-nearest neighbor ($K$-NN) resampling, an off-policy evaluation method proposed in \cite{giegrich2023k}, can be applied to simulate limit order book (LOB) markets and how it can be used to evaluate and calibrate trading strategies. Using historical LOB data, we demonstrate that our simulation method is capable of recreating realistic LOB dynamics and that synthetic trading within the simulation leads to a market impact in line with the corresponding literature. Compared to other statistical LOB simulation methods, our algorithm has theoretical convergence guarantees under general conditions, does not require optimization, is easy to implement and computationally efficient. Furthermore, we show that in a benchmark comparison our method outperforms a deep learning-based algorithm for several key statistics. In the context of a LOB with pro-rata type matching, we demonstrate how our algorithm can calibrate the size of limit orders for a liquidation strategy. Finally, we describe how $K$-NN resampling can be modified for choices of higher dimensional state spaces.

cross Functionally Constrained Algorithm Solves Convex Simple Bilevel Problems

Authors: Huaqing Zhang, Lesi Chen, Jing Xu, Jingzhao Zhang

Abstract: This paper studies simple bilevel problems, where a convex upper-level function is minimized over the optimal solutions of a convex lower-level problem. We first show the fundamental difficulty of simple bilevel problems, that the approximate optimal value of such problems is not obtainable by first-order zero-respecting algorithms. Then we follow recent works to pursue the weak approximate solutions. For this goal, we propose novel near-optimal methods for smooth and nonsmooth problems by reformulating them into functionally constrained problems.

cross Modelling Global Trade with Optimal Transport

Authors: Thomas Gaskin, Marie-Therese Wolfram, Andrew Duncan, Guven Demirel

Abstract: Global trade is shaped by a complex mix of factors beyond supply and demand, including tangible variables like transport costs and tariffs, as well as less quantifiable influences such as political and economic relations. Traditionally, economists model trade using gravity models, which rely on explicit covariates but often struggle to capture these subtler drivers of trade. In this work, we employ optimal transport and a deep neural network to learn a time-dependent cost function from data, without imposing a specific functional form. This approach consistently outperforms traditional gravity models in accuracy while providing natural uncertainty quantification. Applying our framework to global food and agricultural trade, we show that the global South suffered disproportionately from the war in Ukraine's impact on wheat markets. We also analyze the effects of free-trade agreements and trade disputes with China, as well as Brexit's impact on British trade with Europe, uncovering hidden patterns that trade volumes alone cannot reveal.

cross Deep Neural Networks: Multi-Classification and Universal Approximation

Authors: Mart\'in Hern\'andez, Enrique Zuazua

Abstract: We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements in $\mathbb{R}^d$, where $d\ge1,$ and $M$ classes, thereby ensuring accurate classification. By modeling the neural network as a time-discrete nonlinear dynamical system, we interpret the memorization property as a problem of simultaneous or ensemble controllability. This problem is addressed by constructing the network parameters inductively and explicitly, bypassing the need for training or solving any optimization problem. Additionally, we establish that such a network can achieve universal approximation in $L^p(\Omega;\mathbb{R}_+)$, where $\Omega$ is a bounded subset of $\mathbb{R}^d$ and $p\in[1,\infty)$, using a ReLU deep neural network with a width of $d+1$. We also provide depth estimates for approximating $W^{1,p}$ functions and width estimates for approximating $L^p(\Omega;\mathbb{R}^m)$ for $m\geq1$. Our proofs are constructive, offering explicit values for the biases and weights involved.

cross A Primer on Variational Inference for Physics-Informed Deep Generative Modelling

Authors: Alex Glyn-Davies, Arnaud Vadeboncoeur, O. Deniz Akyildiz, Ieva Kazlauskaite, Mark Girolami

Abstract: Variational inference (VI) is a computationally efficient and scalable methodology for approximate Bayesian inference. It strikes a balance between accuracy of uncertainty quantification and practical tractability. It excels at generative modelling and inversion tasks due to its built-in Bayesian regularisation and flexibility, essential qualities for physics related problems. Deriving the central learning objective for VI must often be tailored to new learning tasks where the nature of the problems dictates the conditional dependence between variables of interest, such as arising in physics problems. In this paper, we provide an accessible and thorough technical introduction to VI for forward and inverse problems, guiding the reader through standard derivations of the VI framework and how it can best be realized through deep learning. We then review and unify recent literature exemplifying the creative flexibility allowed by VI. This paper is designed for a general scientific audience looking to solve physics-based problems with an emphasis on uncertainty quantification.

cross Advancing Causal Inference: A Nonparametric Approach to ATE and CATE Estimation with Continuous Treatments

Authors: Hugo Gobato Souto, Francisco Louzada Neto

Abstract: This paper introduces a generalized ps-BART model for the estimation of Average Treatment Effect (ATE) and Conditional Average Treatment Effect (CATE) in continuous treatments, addressing limitations of the Bayesian Causal Forest (BCF) model. The ps-BART model's nonparametric nature allows for flexibility in capturing nonlinear relationships between treatment and outcome variables. Across three distinct sets of Data Generating Processes (DGPs), the ps-BART model consistently outperforms the BCF model, particularly in highly nonlinear settings. The ps-BART model's robustness in uncertainty estimation and accuracy in both point-wise and probabilistic estimation demonstrate its utility for real-world applications. This research fills a crucial gap in causal inference literature, providing a tool better suited for nonlinear treatment-outcome relationships and opening avenues for further exploration in the domain of continuous treatment effect estimation.

cross Alleviating Hallucinations in Large Language Models with Scepticism Modeling

Authors: Yetao Wu, Yihong Wang, Teng Chen, Chenxi Liu, Ningyuan Xi, Qingqing Gu, Hongyang Lei, Zhonglin Jiang, Yong Chen, Luo Ji

Abstract: Hallucinations is a major challenge for large language models (LLMs), prevents adoption in diverse fields. Uncertainty estimation could be used for alleviating the damages of hallucinations. The skeptical emotion of human could be useful for enhancing the ability of self estimation. Inspirited by this observation, we proposed a new approach called Skepticism Modeling (SM). This approach is formalized by combining the information of token and logits for self estimation. We construct the doubt emotion aware data, perform continual pre-training, and then fine-tune the LLMs, improve their ability of self estimation. Experimental results demonstrate this new approach effectively enhances a model's ability to estimate their uncertainty, and validate its generalization ability of other tasks by out-of-domain experiments.

cross Interactive 3D Segmentation for Primary Gross Tumor Volume in Oropharyngeal Cancer

Authors: Mikko Saukkoriipi, Jaakko Sahlsten, Joel Jaskari, Lotta Orasmaa, Jari Kangas, Nastaran Rasouli, Roope Raisamo, Jussi Hirvonen, Helena Mehtonen, Jorma J\"arnstedt, Antti M\"akitie, Mohamed Naser, Clifton Fuller, Benjamin Kann, Kimmo Kaski

Abstract: The main treatment modality for oropharyngeal cancer (OPC) is radiotherapy, where accurate segmentation of the primary gross tumor volume (GTVp) is essential. However, accurate GTVp segmentation is challenging due to significant interobserver variability and the time-consuming nature of manual annotation, while fully automated methods can occasionally fail. An interactive deep learning (DL) model offers the advantage of automatic high-performance segmentation with the flexibility for user correction when necessary. In this study, we examine interactive DL for GTVp segmentation in OPC. We implement state-of-the-art algorithms and propose a novel two-stage Interactive Click Refinement (2S-ICR) framework. Using the 2021 HEad and neCK TumOR (HECKTOR) dataset for development and an external dataset from The University of Texas MD Anderson Cancer Center for evaluation, the 2S-ICR framework achieves a Dice similarity coefficient of 0.713 $\pm$ 0.152 without user interaction and 0.824 $\pm$ 0.099 after five interactions, outperforming existing methods in both cases.

cross Improving the Precision of CNNs for Magnetic Resonance Spectral Modeling

Authors: John LaMaster, Dhritiman Das, Florian Kofler, Jason Crane, Yan Li, Tobias Lasser, Bjoern H Menze

Abstract: Magnetic resonance spectroscopic imaging is a widely available imaging modality that can non-invasively provide a metabolic profile of the tissue of interest, yet is challenging to integrate clinically. One major reason is the expensive, expert data processing and analysis that is required. Using machine learning to predict MRS-related quantities offers avenues around this problem, but deep learning models bring their own challenges, especially model trust. Current research trends focus primarily on mean error metrics, but comprehensive precision metrics are also needed, e.g. standard deviations, confidence intervals, etc.. This work highlights why more comprehensive error characterization is important and how to improve the precision of CNNs for spectral modeling, a quantitative task. The results highlight advantages and trade-offs of these techniques that should be considered when addressing such regression tasks with CNNs. Detailed insights into the underlying mechanisms of each technique, and how they interact with other techniques, are discussed in depth.

cross DemoStart: Demonstration-led auto-curriculum applied to sim-to-real with multi-fingered robots

Authors: Maria Bauza, Jose Enrique Chen, Valentin Dalibard, Nimrod Gileadi, Roland Hafner, Murilo F. Martins, Joss Moore, Rugile Pevceviciute, Antoine Laurens, Dushyant Rao, Martina Zambelli, Martin Riedmiller, Jon Scholz, Konstantinos Bousmalis, Francesco Nori, Nicolas Heess

Abstract: We present DemoStart, a novel auto-curriculum reinforcement learning method capable of learning complex manipulation behaviors on an arm equipped with a three-fingered robotic hand, from only a sparse reward and a handful of demonstrations in simulation. Learning from simulation drastically reduces the development cycle of behavior generation, and domain randomization techniques are leveraged to achieve successful zero-shot sim-to-real transfer. Transferred policies are learned directly from raw pixels from multiple cameras and robot proprioception. Our approach outperforms policies learned from demonstrations on the real robot and requires 100 times fewer demonstrations, collected in simulation. More details and videos in https://sites.google.com/view/demostart.

URLs: https://sites.google.com/view/demostart.

cross One-Shot Imitation under Mismatched Execution

Authors: Kushal Kedia, Prithwish Dan, Sanjiban Choudhury

Abstract: Human demonstrations as prompts are a powerful way to program robots to do long-horizon manipulation tasks. However, directly translating such demonstrations into robot-executable actions poses significant challenges due to execution mismatches, such as different movement styles and physical capabilities. Existing methods either rely on robot-demonstrator paired data, which is infeasible to scale, or overly rely on frame-level visual similarities, which fail to hold. To address these challenges, we propose RHyME, a novel framework that automatically establishes task execution correspondences between the robot and the demonstrator by using optimal transport costs. Given long-horizon robot demonstrations, RHyME synthesizes semantically equivalent human demonstrations by retrieving and composing similar short-horizon human clips, facilitating effective policy training without the need for paired data. We show that RHyME outperforms a range of baselines across various cross-embodiment datasets on all degrees of mismatches. Through detailed analysis, we uncover insights for learning and leveraging cross-embodiment visual representations.

cross Hierarchical Multi-Label Classification with Missing Information for Benthic Habitat Imagery

Authors: Isaac Xu, Benjamin Misiuk, Scott C. Lowe, Martin Gillis, Craig J. Brown, Thomas Trappenberg

Abstract: In this work, we apply state-of-the-art self-supervised learning techniques on a large dataset of seafloor imagery, \textit{BenthicNet}, and study their performance for a complex hierarchical multi-label (HML) classification downstream task. In particular, we demonstrate the capacity to conduct HML training in scenarios where there exist multiple levels of missing annotation information, an important scenario for handling heterogeneous real-world data collected by multiple research groups with differing data collection protocols. We find that, when using smaller one-hot image label datasets typical of local or regional scale benthic science projects, models pre-trained with self-supervision on a larger collection of in-domain benthic data outperform models pre-trained on ImageNet. In the HML setting, we find the model can attain a deeper and more precise classification if it is pre-trained with self-supervision on in-domain data. We hope this work can establish a benchmark for future models in the field of automated underwater image annotation tasks and can guide work in other domains with hierarchical annotations of mixed resolution.

cross A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio

Authors: Ningyuan Xi, Yetao Wu, Kun Fan, Teng Chen, Qingqing Gu, Peng Yu, Jinxian Qu, Chenxi Liu, Zhonglin Jiang, Yong Chen, Luo Ji

Abstract: Large Language Models (LLM) often needs to be Continual Pre-Trained (CPT) to obtain the unfamiliar language skill or adapt into new domains. The huge training cost of CPT often asks for cautious choice of key hyper-parameters such as the mixture ratio of extra language or domain corpus. However, there is no systematic study which bridge the gap between the optimal mixture ratio and the actual model performance, and the gap between experimental scaling law and the actual deployment in the full model size. In this paper, we perform CPT on Llama-3 8B and 70B to enhance its Chinese ability. We study the optimal correlation between the Additional Language Mixture Ratio (ALMR) and the Learning Rate (LR) on the 8B size which directly indicate the optimal experimental set up. By thorough choice of hyper-parameter, and subsequent fine-tuning, the model capability is improved not only on the Chinese-related benchmark, but also some specific domains including math, coding and emotional intelligence. We deploy the final 70B version of LLM on an real-life chat system which obtain satisfying performance.

cross KANtrol: A Physics-Informed Kolmogorov-Arnold Network Framework for Solving Multi-Dimensional and Fractional Optimal Control Problems

Authors: Alireza Afzal Aghaei

Abstract: In this paper, we introduce the KANtrol framework, which utilizes Kolmogorov-Arnold Networks (KANs) to solve optimal control problems involving continuous time variables. We explain how Gaussian quadrature can be employed to approximate the integral parts within the problem, particularly for integro-differential state equations. We also demonstrate how automatic differentiation is utilized to compute exact derivatives for integer-order dynamics, while for fractional derivatives of non-integer order, we employ matrix-vector product discretization within the KAN framework. We tackle multi-dimensional problems, including the optimal control of a 2D heat partial differential equation. The results of our simulations, which cover both forward and parameter identification problems, show that the KANtrol framework outperforms classical MLPs in terms of accuracy and efficiency.

cross Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens

Authors: Taejin Park, Ivan Medennikov, Kunal Dhawan, Weiqing Wang, He Huang, Nithin Rao Koluguri, Krishna C. Puvvada, Jagadeesh Balam, Boris Ginsburg

Abstract: We propose Sortformer, a novel neural model for speaker diarization, trained with unconventional objectives compared to existing end-to-end diarization models. The permutation problem in speaker diarization has long been regarded as a critical challenge. Most prior end-to-end diarization systems employ permutation invariant loss (PIL), which optimizes for the permutation that yields the lowest error. In contrast, we introduce Sort Loss, which enables a diarization model to autonomously resolve permutation, with or without PIL. We demonstrate that combining Sort Loss and PIL achieves performance competitive with state-of-the-art end-to-end diarization models trained exclusively with PIL. Crucially, we present a streamlined multispeaker ASR architecture that leverages Sortformer as a speaker supervision model, embedding speaker label estimation within the ASR encoder state using a sinusoidal kernel function. This approach resolves the speaker permutation problem through sorted objectives, effectively bridging speaker-label timestamps and speaker tokens. In our experiments, we show that the proposed multispeaker ASR architecture, enhanced with speaker supervision, improves performance via adapter techniques. Code and trained models will be made publicly available via the NVIDIA NeMo framework

cross Insuring Uninsurable Risks from AI: The State as Insurer of Last Resort

Authors: Cristian Trout

Abstract: Many experts believe that AI systems will sooner or later pose uninsurable risks, including existential risks. This creates an extreme judgment-proof problem: few if any parties can be held accountable ex post in the event of such a catastrophe. This paper proposes a novel solution: a government-provided, mandatory indemnification program for AI developers. The program uses risk-priced indemnity fees to induce socially optimal levels of care. Risk-estimates are determined by surveying experts, including indemnified developers. The Bayesian Truth Serum mechanism is employed to incent honest and effortful responses. Compared to alternatives, this approach arguably better leverages all private information, and provides a clearer signal to indemnified developers regarding what risks they must mitigate to lower their fees. It's recommended that collected fees be used to help fund the safety research developers need, employing a fund matching mechanism (Quadratic Financing) to induce an optimal supply of this public good. Under Quadratic Financing, safety research projects would compete for private contributions from developers, signaling how much each is to be supplemented with public funds.

cross Liability and Insurance for Catastrophic Losses: the Nuclear Power Precedent and Lessons for AI

Authors: Cristian Trout

Abstract: As AI systems become more autonomous and capable, experts warn of them potentially causing catastrophic losses. Drawing on the successful precedent set by the nuclear power industry, this paper argues that developers of frontier AI models should be assigned limited, strict, and exclusive third party liability for harms resulting from Critical AI Occurrences (CAIOs) - events that cause or easily could have caused catastrophic losses. Mandatory insurance for CAIO liability is recommended to overcome developers' judgment-proofness, mitigate winner's curse dynamics, and leverage insurers' quasi-regulatory abilities. Based on theoretical arguments and observations from the analogous nuclear power context, insurers are expected to engage in a mix of causal risk-modeling, monitoring, lobbying for stricter regulation, and providing loss prevention guidance in the context of insuring against heavy-tail risks from AI. While not a substitute for regulation, clear liability assignment and mandatory insurance can help efficiently allocate resources to risk-modeling and safe design, facilitating future regulatory efforts.

replace Cross Layer Optimization and Distributed Reinforcement Learning for Wireless 360{\deg} Video Streaming

Authors: Anis Elgabli, Mohammed S. Elbamby, Cristina Perfecto, Mounssif Krouka, Mehdi Bennis, Vaneet Aggarwal

Abstract: Wirelessly streaming high quality 360 degree videos is still a challenging problem. When there are many users watching different 360 degree videos and competing for the computing and communication resources, the streaming algorithm at hand should maximize the average quality of experience (QoE) while guaranteeing a minimum rate for each user. In this paper, we propose a cross layer optimization approach that maximizes the available rate to each user and efficiently uses it to maximize users' QoE. Particularly, we consider a tile based 360 degree video streaming, and we optimize a QoE metric that balances the tradeoff between maximizing each user's QoE and ensuring fairness among users. We show that the problem can be decoupled into two interrelated subproblems: (i) a physical layer subproblem whose objective is to find the download rate for each user, and (ii) an application layer subproblem whose objective is to use that rate to find a quality decision per tile such that the user's QoE is maximized. We prove that the physical layer subproblem can be solved optimally with low complexity and an actor-critic deep reinforcement learning (DRL) is proposed to leverage the parallel training of multiple independent agents and solve the application layer subproblem. Extensive experiments reveal the robustness of our scheme and demonstrate its significant performance improvement compared to several baseline algorithms.

replace Efficient Private SCO for Heavy-Tailed Data via Averaged Clipping

Authors: Chenhan Jin, Kaiwen Zhou, Bo Han, James Cheng, Tieyong Zeng

Abstract: We consider stochastic convex optimization for heavy-tailed data with the guarantee of being differentially private (DP). Most prior works on differentially private stochastic convex optimization for heavy-tailed data are either restricted to gradient descent (GD) or performed multi-times clipping on stochastic gradient descent (SGD), which is inefficient for large-scale problems. In this paper, we consider a one-time clipping strategy and provide principled analyses of its bias and private mean estimation. We establish new convergence results and improved complexity bounds for the proposed algorithm called AClipped-dpSGD for constrained and unconstrained convex problems. We also extend our convergent analysis to the strongly convex case and non-smooth case (which works for generalized smooth objectives with H$\ddot{\text{o}}$lder-continuous gradients). All the above results are guaranteed with a high probability for heavy-tailed data. Numerical experiments are conducted to justify the theoretical improvement.

replace Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL)

Authors: Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri

Abstract: Mean-Field Control (MFC) has recently been proven to be a scalable tool to approximately solve large-scale multi-agent reinforcement learning (MARL) problems. However, these studies are typically limited to unconstrained cumulative reward maximization framework. In this paper, we show that one can use the MFC approach to approximate the MARL problem even in the presence of constraints. Specifically, we prove that, an $N$-agent constrained MARL problem, with state, and action spaces of each individual agents being of sizes $|\mathcal{X}|$, and $|\mathcal{U}|$ respectively, can be approximated by an associated constrained MFC problem with an error, $e\triangleq \mathcal{O}\left([\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}]/\sqrt{N}\right)$. In a special case where the reward, cost, and state transition functions are independent of the action distribution of the population, we prove that the error can be improved to $e=\mathcal{O}(\sqrt{|\mathcal{X}|}/\sqrt{N})$. Also, we provide a Natural Policy Gradient based algorithm and prove that it can solve the constrained MARL problem within an error of $\mathcal{O}(e)$ with a sample complexity of $\mathcal{O}(e^{-6})$.

replace Surrogate uncertainty estimation for your time series forecasting black-box: learn when to trust

Authors: Leonid Erlygin, Vladimir Zholobov, Valeriia Baklanova, Evgeny Sokolovskiy, Alexey Zaytsev

Abstract: Machine learning models play a vital role in time series forecasting. These models, however, often overlook an important element: point uncertainty estimates. Incorporating these estimates is crucial for effective risk management, informed model selection, and decision-making.To address this issue, our research introduces a method for uncertainty estimation. We employ a surrogate Gaussian process regression model. It enhances any base regression model with reasonable uncertainty estimates. This approach stands out for its computational efficiency. It only necessitates training one supplementary surrogate and avoids any data-specific assumptions. Furthermore, this method for work requires only the presence of the base model as a black box and its respective training data. The effectiveness of our approach is supported by experimental results. Using various time-series forecasting data, we found that our surrogate model-based technique delivers significantly more accurate confidence intervals. These techniques outperform both bootstrap-based and built-in methods in a medium-data regime. This superiority holds across a range of base model types, including a linear regression, ARIMA, gradient boosting and a neural network.

replace Memory of recurrent networks: Do we compute it right?

Authors: Giovanni Ballarin, Lyudmila Grigoryeva, Juan-Pablo Ortega

Abstract: Numerical evaluations of the memory capacity (MC) of recurrent neural networks reported in the literature often contradict well-established theoretical bounds. In this paper, we study the case of linear echo state networks, for which the total memory capacity has been proven to be equal to the rank of the corresponding Kalman controllability matrix. We shed light on various reasons for the inaccurate numerical estimations of the memory, and we show that these issues, often overlooked in the recent literature, are of an exclusively numerical nature. More explicitly, we prove that when the Krylov structure of the linear MC is ignored, a gap between the theoretical MC and its empirical counterpart is introduced. As a solution, we develop robust numerical approaches by exploiting a result of MC neutrality with respect to the input mask matrix. Simulations show that the memory curves that are recovered using the proposed methods fully agree with the theory.

replace Optimal Differentially Private Model Training with Public Data

Authors: Andrew Lowy, Zeman Li, Tianjian Huang, Meisam Razaviyayn

Abstract: Differential privacy (DP) ensures that training a machine learning model does not leak private data. In practice, we may have access to auxiliary public data that is free of privacy concerns. In this work, we assume access to a given amount of public data and settle the following fundamental open questions: 1. What is the optimal (worst-case) error of a DP model trained over a private data set while having access to side public data? 2. How can we harness public data to improve DP model training in practice? We consider these questions in both the local and central models of pure and approximate DP. To answer the first question, we prove tight (up to log factors) lower and upper bounds that characterize the optimal error rates of three fundamental problems: mean estimation, empirical risk minimization, and stochastic convex optimization. We show that the optimal error rates can be attained (up to log factors) by either discarding private data and training a public model, or treating public data like it is private and using an optimal DP algorithm. To address the second question, we develop novel algorithms that are "even more optimal" (i.e. better constants) than the asymptotically optimal approaches described above. For local DP mean estimation, our algorithm is optimal including constants. Empirically, our algorithms show benefits over the state-of-the-art.

replace NeFL: Nested Model Scaling for Federated Learning with System Heterogeneous Clients

Authors: Honggu Kang, Seohyeon Cha, Jinwoo Shin, Jongmyeong Lee, Joonhyuk Kang

Abstract: Federated learning (FL) enables distributed training while preserving data privacy, but stragglers-slow or incapable clients-can significantly slow down the total training time and degrade performance. To mitigate the impact of stragglers, system heterogeneity, including heterogeneous computing and network bandwidth, has been addressed. While previous studies have addressed system heterogeneity by splitting models into submodels, they offer limited flexibility in model architecture design, without considering potential inconsistencies arising from training multiple submodel architectures. We propose nested federated learning (NeFL), a generalized framework that efficiently divides deep neural networks into submodels using both depthwise and widthwise scaling. To address the inconsistency arising from training multiple submodel architectures, NeFL decouples a subset of parameters from those being trained for each submodel. An averaging method is proposed to handle these decoupled parameters during aggregation. NeFL enables resource-constrained devices to effectively participate in the FL pipeline, facilitating larger datasets for model training. Experiments demonstrate that NeFL achieves performance gain, especially for the worst-case submodel compared to baseline approaches (7.63% improvement on CIFAR-100). Furthermore, NeFL aligns with recent advances in FL, such as leveraging pre-trained models and accounting for statistical heterogeneity. Our code is available online.

replace Hybrid Focal and Full-Range Attention Based Graph Transformers

Authors: Minhong Zhu, Zhenhao Zhao, Weiran Cai

Abstract: The paradigm of Transformers using the self-attention mechanism has manifested its advantage in learning graph-structured data. Yet, Graph Transformers are capable of modeling full range dependencies but are often deficient in extracting information from locality. A common practice is to utilize Message Passing Neural Networks (MPNNs) as an auxiliary to capture local information, which however are still inadequate for comprehending substructures. In this paper, we present a purely attention-based architecture, namely Focal and Full-Range Graph Transformer (FFGT), which can mitigate the loss of local information in learning global correlations. The core component of FFGT is a new mechanism of compound attention, which combines the conventional full-range attention with K-hop focal attention on ego-nets to aggregate both global and local information. Beyond the scope of canonical Transformers, the FFGT has the merit of being more substructure-aware. Our approach enhances the performance of existing Graph Transformers on various open datasets, while achieves compatible SOTA performance on several Long-Range Graph Benchmark (LRGB) datasets even with a vanilla transformer. We further examine influential factors on the optimal focal length of attention via introducing a novel synthetic dataset based on SBM-PATTERN.

replace Multi-intention Inverse Q-learning for Interpretable Behavior Representation

Authors: Hao Zhu, Brice De La Crompe, Gabriel Kalweit, Artur Schneider, Maria Kalweit, Ilka Diester, Joschka Boedecker

Abstract: In advancing the understanding of natural decision-making processes, inverse reinforcement learning (IRL) methods have proven instrumental in reconstructing animal's intentions underlying complex behaviors. Given the recent development of a continuous-time multi-intention IRL framework, there has been persistent inquiry into inferring discrete time-varying rewards with IRL. To address this challenge, we introduce the class of hierarchical inverse Q-learning (HIQL) algorithms. Through an unsupervised learning process, HIQL divides expert trajectories into multiple intention segments, and solves the IRL problem independently for each. Applying HIQL to simulated experiments and several real animal behavior datasets, our approach outperforms current benchmarks in behavior prediction and produces interpretable reward functions. Our results suggest that the intention transition dynamics underlying complex decision-making behavior is better modeled by a step function instead of a smoothly varying function. This advancement holds promise for neuroscience and cognitive science, contributing to a deeper understanding of decision-making and uncovering underlying brain mechanisms.

replace vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training

Authors: Jehyeon Bang, Yujeong Choi, Myeongwoo Kim, Yongdeok Kim, Minsoo Rhu

Abstract: As large language models (LLMs) become widespread in various application domains, a critical challenge the AI community is facing is how to train these large AI models in a cost-effective manner. Existing LLM training plans typically employ a heuristic based parallel training strategy which is based on empirical observations rather than grounded upon a thorough examination of the search space of LLM parallelization. Such limitation renders existing systems to leave significant performance left on the table, wasting millions of dollars worth of training cost. This paper presents our profiling-driven simulator called vTrain, providing AI practitioners a fast yet accurate software framework to determine an efficient and cost-effective LLM training system configuration. We demonstrate vTrain's practicality through several case studies, e.g., effectively evaluating optimal training parallelization strategies that balances training time and its associated training cost, efficient multi-tenant GPU cluster schedulers targeting multiple LLM training jobs, and determining a compute-optimal LLM model architecture given a fixed compute budget.

replace Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy

Authors: Chengli Tan, Jiangshe Zhang, Junmin Liu, Yicheng Wang, Yunda Hao

Abstract: Recently, sharpness-aware minimization (SAM) has attracted much attention because of its surprising effectiveness in improving generalization performance. However, compared to stochastic gradient descent (SGD), it is more prone to getting stuck at the saddle points, which as a result may lead to performance degradation. To address this issue, we propose a simple renormalization strategy, dubbed Stable SAM (SSAM), so that the gradient norm of the descent step maintains the same as that of the ascent step. Our strategy is easy to implement and flexible enough to integrate with SAM and its variants, almost at no computational cost. With elementary tools from convex optimization and learning theory, we also conduct a theoretical analysis of sharpness-aware training, revealing that compared to SGD, the effectiveness of SAM is only assured in a limited regime of learning rate. In contrast, we show how SSAM extends this regime of learning rate and then it can consistently perform better than SAM with the minor modification. Finally, we demonstrate the improved performance of SSAM on several representative data sets and tasks.

replace A Survey on Self-Supervised Learning for Non-Sequential Tabular Data

Authors: Wei-Yao Wang, Wei-Wei Du, Derek Xu, Wei Wang, Wen-Chih Peng

Abstract: Self-supervised learning (SSL) has been incorporated into many state-of-the-art models in various domains, where SSL defines pretext tasks based on unlabeled datasets to learn contextualized and robust representations. Recently, SSL has become a new trend in exploring the representation learning capability in the realm of tabular data, which is more challenging due to not having explicit relations for learning descriptive representations. This survey aims to systematically review and summarize the recent progress and challenges of SSL for non-sequential tabular data (SSL4NS-TD). We first present a formal definition of NS-TD and clarify its correlation to related studies. Then, these approaches are categorized into three groups - predictive learning, contrastive learning, and hybrid learning, with their motivations and strengths of representative methods in each direction. Moreover, application issues of SSL4NS-TD are presented, including automatic data engineering, cross-table transferability, and domain knowledge integration. In addition, we elaborate on existing benchmarks and datasets for NS-TD applications to analyze the performance of existing tabular models. Finally, we discuss the challenges of SSL4NS-TD and provide potential directions for future research. We expect our work to be useful in terms of encouraging more research on lowering the barrier to entry SSL for the tabular domain, and of improving the foundations for implicit tabular data.

replace Multi-Margin Cosine Loss: Proposal and Application in Recommender Systems

Authors: Makbule Gulcin Ozsoy

Abstract: Recommender systems guide users through vast amounts of information by suggesting items based on their predicted preferences. Collaborative filtering-based deep learning techniques have regained popularity due to their straightforward nature, relying only on user-item interactions. Typically, these systems consist of three main components: an interaction module, a loss function, and a negative sampling strategy. Initially, researchers focused on enhancing performance by developing complex interaction modules. However, there has been a recent shift toward refining loss functions and negative sampling strategies. This shift has led to an increased interest in contrastive learning, which pulls similar pairs closer while pushing dissimilar ones apart. Contrastive learning may bring challenges like high memory demands and under-utilization of some negative samples. The proposed Multi-Margin Cosine Loss (MMCL) addresses these challenges by introducing multiple margins and varying weights for negative samples. It efficiently utilizes not only the hardest negatives but also other non-trivial negatives, offers a simpler yet effective loss function that outperforms more complex methods, especially when resources are limited. Experiments on two well-known datasets demonstrated that MMCL achieved up to a 20\% performance improvement compared to a baseline loss function when fewer number of negative samples are used.

replace DispaRisk: Auditing Fairness Through Usable Information

Authors: Jonathan Vasquez, Carlotta Domeniconi, Huzefa Rangwala

Abstract: Machine Learning algorithms (ML) impact virtually every aspect of human lives and have found use across diverse sectors including healthcare, finance, and education. Often, ML algorithms have been found to exacerbate societal biases present in datasets leading to adversarial impacts on subsets/groups of individuals and in many cases on minority groups. To effectively mitigate these untoward effects, it is crucial that disparities/biases are identified early in a ML pipeline. This proactive approach facilitates timely interventions to prevent bias amplification and reduce complexity at later stages of model development. In this paper, we leverage recent advancements in usable information theory to introduce DispaRisk, a novel framework designed to proactively assess the potential risks of disparities in datasets during the initial stages of the ML pipeline. We evaluate DispaRisk's effectiveness by benchmarking it against commonly used datasets in fairness research. Our findings demonstrate DispaRisk's capabilities to identify datasets with a high risk of discrimination, detect model families prone to biases within an ML pipeline, and enhance the explainability of these bias risks. This work contributes to the development of fairer ML systems by providing a robust tool for early bias detection and mitigation. The code for our experiments is available in the following repository: https://github.com/jovasque156/disparisk

URLs: https://github.com/jovasque156/disparisk

replace Higher Order Structures For Graph Explanations

Authors: Akshit Sinha, Sreeram Vennam, Charu Sharma, Ponnurangam Kumaraguru

Abstract: Graph Neural Networks (GNNs) have emerged as powerful tools for learning representations of graph-structured data, demonstrating remarkable performance across various tasks. Recognising their importance, there has been extensive research focused on explaining GNN predictions, aiming to enhance their interpretability and trustworthiness. However, GNNs and their explainers face a notable challenge: graphs are primarily designed to model pair-wise relationships between nodes, which can make it tough to capture higher-order, multi-node interactions. This characteristic can pose difficulties for existing explainers in fully representing multi-node relationships. To address this gap, we present Framework For Higher-Order Representations In Graph Explanations (FORGE), a framework that enables graph explainers to capture such interactions by incorporating higher-order structures, resulting in more accurate and faithful explanations. Extensive evaluation shows that on average real-world datasets from the GraphXAI benchmark and synthetic datasets across various graph explainers, FORGE improves average explanation accuracy by 1.9x and 2.25x, respectively. We perform ablation studies to confirm the importance of higher-order relations in improving explanations, while our scalability analysis demonstrates FORGE's efficacy on large graphs.

replace Deeper-PINNs: Element-wise Multiplication Based Physics-informed Neural Networks

Authors: Feilong Jiang, Xiaonan Hou, Min Xia

Abstract: As a promising framework for resolving partial differential equations (PDEs), physics-informed neural networks (PINNs) have received widespread attention from industrial and scientific fields. However, lack of expressive ability and initialization pathology issues are found to prevent the application of PINNs in complex PDEs. In this work, we propose Deeper Physics-Informed Neural Network (Deeper-PINN) to resolve these issues. The element-wise multiplication operation is adopted to transform features into high-dimensional, non-linear spaces. Benefiting from element-wise multiplication operation, Deeper-PINNs can alleviate the initialization pathologies of PINNs and enhance the expressive capability of PINNs. The proposed structure is verified on various benchmarks. The results show that Deeper-PINNs can effectively resolve the initialization pathology and exhibit strong expressive ability.

replace A Manifold Perspective on the Statistical Generalization of Graph Neural Networks

Authors: Zhiyang Wang, Juan Cervino, Alejandro Ribeiro

Abstract: Convolutional neural networks have been successfully extended to operate on graphs, giving rise to Graph Neural Networks (GNNs). GNNs combine information from adjacent nodes by successive applications of graph convolutions. GNNs have been implemented successfully in various learning tasks while the theoretical understanding of their generalization capability is still in progress. In this paper, we leverage manifold theory to analyze the statistical generalization gap of GNNs operating on graphs constructed on sampled points from manifolds. We study the generalization gaps of GNNs on both node-level and graph-level tasks. We show that the generalization gaps decrease with the number of nodes in the training graphs, which guarantees the generalization of GNNs to unseen points over manifolds. We validate our theoretical results in multiple real-world datasets.

replace Pre-Training and Personalized Fine-Tuning via Over-the-Air Federated Meta-Learning: Convergence-Generalization Trade-Offs

Authors: Haifeng Wen, Hong Xing, Osvaldo Simeone

Abstract: For modern artificial intelligence (AI) applications such as large language models (LLMs), the training paradigm has recently shifted to pre-training followed by fine-tuning. Furthermore, owing to dwindling open repositories of data and thanks to efforts to democratize access to AI models, pre-training is expected to increasingly migrate from the current centralized deployments to federated learning (FL) implementations. Meta-learning provides a general framework in which pre-training and fine-tuning can be formalized. Meta-learning-based personalized FL (meta-pFL) moves beyond basic personalization by targeting generalization to new agents and tasks. This paper studies the generalization performance of meta-pFL for a wireless setting in which the agents participating in the pre-training phase, i.e., meta-learning, are connected via a shared wireless channel to the server. Adopting over-the-air computing, we study the trade-off between generalization to new agents and tasks, on the one hand, and convergence, on the other hand. The trade-off arises from the fact that channel impairments may enhance generalization, while degrading convergence. Extensive numerical results validate the theory.

replace Physics-Inspired Deep Learning and Transferable Models for Bridge Scour Prediction

Authors: Negin Yousefpour, Bo Wang

Abstract: This paper introduces scour physics-inspired neural networks (SPINNs), a hybrid physics-data-driven framework for bridge scour prediction using deep learning. SPINNs integrate physics-based, empirical equations into deep neural networks and are trained using site-specific historical scour monitoring data. Long-short Term Memory Network (LSTM) and Convolutional Neural Network (CNN) are considered as the base deep learning (DL) models. We also explore transferable/general models, trained by aggregating datasets from a cluster of bridges, versus the site/bridge-specific models. Despite variation in performance, SPINNs outperformed pure data-driven models in the majority of cases. In some bridge cases, SPINN reduced forecasting errors by up to 50 percent. The pure data-driven models showed better transferability compared to hybrid models. The transferable DL models particularly proved effective for bridges with limited data. In addition, the calibrated time-dependent empirical equations derived from SPINNs showed great potential for maximum scour depth estimation, providing more accurate predictions compared to commonly used HEC-18 model. Comparing SPINNs with traditional empirical models indicates substantial improvements in scour prediction accuracy. This study can pave the way for further exploration of physics-inspired machine learning methods for scour prediction.

replace STD-PLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with PLM

Authors: YiHeng Huang, Xiaowei Mao, Shengnan Guo, Yubin Chen, Junfeng Shen, Tiankuo Li, Youfang Lin, Huaiyu Wan

Abstract: Spatial-temporal forecasting and imputation are important for real-world intelligent systems. Most existing methods are tailored for individual forecasting or imputation tasks but are not designed for both. Additionally, they are less effective for zero-shot and few-shot learning. While pre-trained language model (PLM) have exhibited strong pattern recognition and reasoning abilities across various tasks, including few-shot and zero-shot learning, their applications in spatial-temporal data understanding has been constrained by insufficient modeling of complex correlations such as the temporal correlations, spatial connectivity, non-pairwise and high-order spatial-temporal correlations within data. In this paper, we propose STD-PLM for understanding both spatial and temporal properties of \underline{S}patial-\underline{T}emporal \underline{D}ata with \underline{PLM}, which is capable of implementing both spatial-temporal forecasting and imputation tasks. STD-PLM understands spatial-temporal correlations via explicitly designed spatial and temporal tokenizers. Topology-aware node embeddings are designed for PLM to comprehend and exploit the topology structure of data in inductive manner. Furthermore, to mitigate the efficiency issues introduced by the PLM, we design a sandglass attention module (SGA) combined with a specific constrained loss function, which significantly improves the model's efficiency while ensuring performance. Extensive experiments demonstrate that STD-PLM exhibits competitive performance and generalization capabilities across the forecasting and imputation tasks on various datasets. Moreover, STD-PLM achieves promising results on both few-shot and zero-shot tasks.The code is made available at \href{https://anonymous.4open.science/r/STD-PLM-F3BA}{https://anonymous.4open.science/r/STD-PLM-F3BA}

URLs: https://anonymous.4open.science/r/STD-PLM-F3BA, https://anonymous.4open.science/r/STD-PLM-F3BA

replace ZNorm: Z-Score Gradient Normalization for Deep Neural Networks

Authors: Juyoung Yun, Hoyoung Kim

Abstract: The rapid advancements in deep learning necessitate better training methods for deep neural networks (DNNs). As models grow in complexity, vanishing and exploding gradients impede performance. We propose Z-Score Normalization for Gradient Descent (ZNorm), an innovative technique that adjusts only the gradients to accelerate training and improve model performance. ZNorm normalizes the overall gradients, providing consistent gradient scaling across layers, thereby reducing the risks of vanishing and exploding gradients, having better performances. Our extensive experiments on CIFAR-10 and medical datasets demonstrate that ZNorm enhances performance metrics. ZNorm consistently outperforms existing methods, achieving superior results using the same experimental settings. In medical imaging applications, ZNorm improves tumor prediction and segmentation performances, underscoring its practical utility. These findings highlight ZNorm's potential as a robust and versatile tool for enhancing the training speed and effectiveness of deep neural networks across a wide range of architectures and applications.

replace Probabilistic energy forecasting through quantile regression in reproducing kernel Hilbert spaces

Authors: Luca Pernigo, Rohan Sen, Davide Baroli

Abstract: Accurate energy demand forecasting is crucial for sustainable and resilient energy development. To meet the Net Zero Representative Concentration Pathways (RCP) $4.5$ scenario in the DACH countries, increased renewable energy production, energy storage, and reduced commercial building consumption are needed. This scenario's success depends on hydroelectric capacity and climatic factors. Informed decisions require quantifying uncertainty in forecasts. This study explores a non-parametric method based on \emph{reproducing kernel Hilbert spaces (RKHS)}, known as kernel quantile regression, for energy prediction. Our experiments demonstrate its reliability and sharpness, and we benchmark it against state-of-the-art methods in load and price forecasting for the DACH region. We offer our implementation in conjunction with additional scripts to ensure the reproducibility of our research.

replace Generalization of Graph Neural Networks is Robust to Model Mismatch

Authors: Zhiyang Wang, Juan Cervino, Alejandro Ribeiro

Abstract: Graph neural networks (GNNs) have demonstrated their effectiveness in various tasks supported by their generalization capabilities. However, the current analysis of GNN generalization relies on the assumption that training and testing data are independent and identically distributed (i.i.d). This imposes limitations on the cases where a model mismatch exists when generating testing data. In this paper, we examine GNNs that operate on geometric graphs generated from manifold models, explicitly focusing on scenarios where there is a mismatch between manifold models generating training and testing data. Our analysis reveals the robustness of the GNN generalization in the presence of such model mismatch. This indicates that GNNs trained on graphs generated from a manifold can still generalize well to unseen nodes and graphs generated from a mismatched manifold. We attribute this mismatch to both node feature perturbations and edge perturbations within the generated graph. Our findings indicate that the generalization gap decreases as the number of nodes grows in the training graph while increasing with larger manifold dimension as well as larger mismatch. Importantly, we observe a trade-off between the generalization of GNNs and the capability to discriminate high-frequency components when facing a model mismatch. The most important practical consequence of this analysis is to shed light on the filter design of generalizable GNNs robust to model mismatch. We verify our theoretical findings with experiments on multiple real-world datasets.

replace SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models

Authors: Yang Cao

Abstract: The rapid advancement in large language models (LLMs) comes with a significant increase in their parameter size, presenting challenges for adaptation and fine-tuning. Parameter-efficient fine-tuning (PEFT) methods are widely used to adapt LLMs for downstream tasks efficiently. In this paper, we propose Singular Values and Orthonormal Regularized Singular Vectors Adaptation, or SORSA, a novel PEFT method. We introduce a method to analyze the variation of the parameters by performing singular value decomposition (SVD) and discuss and analyze SORSA's superiority in minimizing the alteration in the SVD aspect. Each SORSA adapter consists of two main parts: trainable principal singular weights $W_p = U_p \Sigma_p V^\top_p$, and frozen residual weights $W_r = U_r \Sigma_r V^\top_r$. These parts are initialized by performing SVD on pre-trained weights. Moreover, we implement and analyze an orthonormal regularizer, which could effectively transfer the scaling information into $\Sigma_p$ and ultimately allows the training process to be more efficient. SORSA adapters could be merged during inference, thus eliminating any inference latency. After all, SORSA shows a faster convergence than PiSSA and LoRA in our experiments. On the MATH benchmark, Llama 2 7B adapted using SORSA achieved 10.36% accuracy, outperforming LoRA (5.50%), Full FT (7.22%), and PiSSA (7.44%). On the GSM-8K benchmark, SORSA achieved 56.03% accuracy, surpassing LoRA (42.30%), Full FT (49.05%), and PiSSA (53.07%). We conclude that SORSA offers a new perspective on parameter-efficient fine-tuning, demonstrating remarkable performance. The code is available at https://github.com/Gunale0926/SORSA.

URLs: https://github.com/Gunale0926/SORSA.

replace Dreaming is All You Need

Authors: Mingze Ni, Wei Liu

Abstract: In classification tasks, achieving a harmonious balance between exploration and precision is of paramount importance. To this end, this research introduces two novel deep learning models, SleepNet and DreamNet, to strike this balance. SleepNet seamlessly integrates supervised learning with unsupervised ``sleep" stages using pre-trained encoder models. Dedicated neurons within SleepNet are embedded in these unsupervised features, forming intermittent ``sleep" blocks that facilitate exploratory learning. Building upon the foundation of SleepNet, DreamNet employs full encoder-decoder frameworks to reconstruct the hidden states, mimicking the human "dreaming" process. This reconstruction process enables further exploration and refinement of the learned representations. Moreover, the principle ideas of our SleepNet and DreamNet are generic and can be applied to both computer vision and natural language processing downstream tasks. Through extensive empirical evaluations on diverse image and text datasets, SleepNet and DreanNet have demonstrated superior performance compared to state-of-the-art models, showcasing the strengths of unsupervised exploration and supervised precision afforded by our innovative approaches.

replace Optimal Neural Network Approximation for High-Dimensional Continuous Functions

Authors: Ayan Maiti, Michelle Michelle, Haizhao Yang

Abstract: Recently, the authors of Shen Yang Zhang (JMLR, 2022) developed a neural network with width $36d(2d + 1)$ and depth $11$, which utilizes a special activation function called the elementary universal activation function, to achieve the super approximation property for functions in $C([a,b]^d)$. That is, the constructed network only requires a fixed number of neurons to approximate a $d$-variate continuous function on a $d$-dimensional hypercube with arbitrary accuracy. Their network uses $\mathcal{O}(d^2)$ fixed neurons. One natural question to address is whether we can reduce the number of these neurons in such a network. By leveraging a variant of the Kolmogorov Superposition Theorem, our analysis shows that there is a neural network generated by the elementary universal activation function with only $366d +365$ fixed, intrinsic (non-repeated) neurons that attains this super approximation property. Furthermore, we present a family of continuous functions that requires at least width $d$, and therefore at least $d$ intrinsic neurons, to achieve arbitrary accuracy in its approximation. This shows that the requirement of $\mathcal{O}(d)$ intrinsic neurons is optimal in the sense that it grows linearly with the input dimension $d$, unlike some approximation methods where parameters may grow exponentially with $d$.

replace Epistemic Uncertainty and Observation Noise with the Neural Tangent Kernel

Authors: Sergio Calvo-Ordo\~nez, Konstantina Palla, Kamil Ciosek

Abstract: Recent work has shown that training wide neural networks with gradient descent is formally equivalent to computing the mean of the posterior distribution in a Gaussian Process (GP) with the Neural Tangent Kernel (NTK) as the prior covariance and zero aleatoric noise \parencite{jacot2018neural}. In this paper, we extend this framework in two ways. First, we show how to deal with non-zero aleatoric noise. Second, we derive an estimator for the posterior covariance, giving us a handle on epistemic uncertainty. Our proposed approach integrates seamlessly with standard training pipelines, as it involves training a small number of additional predictors using gradient descent on a mean squared error loss. We demonstrate the proof-of-concept of our method through empirical evaluation on synthetic regression.

replace Gaussian-Mixture-Model Q-Functions for Reinforcement Learning by Riemannian Optimization

Authors: Minh Vu, Konstantinos Slavakis

Abstract: This paper establishes a novel role for Gaussian-mixture models (GMMs) as functional approximators of Q-function losses in reinforcement learning (RL). Unlike the existing RL literature, where GMMs play their typical role as estimates of probability density functions, GMMs approximate here Q-function losses. The new Q-function approximators, coined GMM-QFs, are incorporated in Bellman residuals to promote a Riemannian-optimization task as a novel policy-evaluation step in standard policy-iteration schemes. The paper demonstrates how the hyperparameters (means and covariance matrices) of the Gaussian kernels are learned from the data, opening thus the door of RL to the powerful toolbox of Riemannian optimization. Numerical tests show that with no use of experienced data, the proposed design outperforms state-of-the-art methods, even deep Q-networks which use experienced data, on benchmark RL tasks.

replace Some Results on Neural Network Stability, Consistency, and Convergence: Insights into Non-IID Data, High-Dimensional Settings, and Physics-Informed Neural Networks

Authors: Ronald Katende, Henry Kasumba, Godwin Kakuba, John M. Mango

Abstract: This paper addresses critical challenges in machine learning, particularly the stability, consistency, and convergence of neural networks under non-IID data, distribution shifts, and high-dimensional settings. We provide new theoretical results on uniform stability for neural networks with dynamic learning rates in non-convex settings. Further, we establish consistency bounds for federated learning models in non-Euclidean spaces, accounting for distribution shifts and curvature effects. For Physics-Informed Neural Networks (PINNs), we derive stability, consistency, and convergence guarantees for solving Partial Differential Equations (PDEs) in noisy environments. These results fill significant gaps in understanding model behavior in complex, non-ideal conditions, paving the way for more robust and reliable machine learning applications.

replace MaxCutPool: differentiable feature-aware Maxcut for pooling in graph neural networks

Authors: Carlo Abate, Filippo Maria Bianchi

Abstract: We propose a novel approach to compute the MAXCUT in attributed graphs, i.e., graphs with features associated with nodes and edges. Our approach is robust to the underlying graph topology and is fully differentiable, making it possible to find solutions that jointly optimize the MAXCUT along with other objectives. Based on the obtained MAXCUT partition, we implement a hierarchical graph pooling layer for Graph Neural Networks, which is sparse, differentiable, and particularly suitable for downstream tasks on heterophilic graphs.

replace Influence-based Attributions can be Manipulated

Authors: Chhavi Yadav, Ruihan Wu, Kamalika Chaudhuri

Abstract: Influence Functions are a standard tool for attributing predictions to training data in a principled manner and are widely used in applications such as data valuation and fairness. In this work, we present realistic incentives to manipulate influencebased attributions and investigate whether these attributions can be systematically tampered by an adversary. We show that this is indeed possible and provide efficient attacks with backward-friendly implementations. Our work raises questions on the reliability of influence-based attributions under adversarial circumstances.

replace Retrofitting Temporal Graph Neural Networks with Transformer

Authors: Qiang Huang, Xiao Yan, Xin Wang, Susie Xi Rao, Zhichao Han, Fangcheng Fu, Wentao Zhang, Jiawei Jiang

Abstract: Temporal graph neural networks (TGNNs) outperform regular GNNs by incorporating time information into graph-based operations. However, TGNNs adopt specialized models (e.g., TGN, TGAT, and APAN ) and require tailored training frameworks (e.g., TGL and ETC). In this paper, we propose TF-TGN, which uses Transformer decoder as the backbone model for TGNN to enjoy Transformer's codebase for efficient training. In particular, Transformer achieves tremendous success for language modeling, and thus the community developed high-performance kernels (e.g., flash-attention and memory-efficient attention) and efficient distributed training schemes (e.g., PyTorch FSDP, DeepSpeed, and Megatron-LM). We observe that TGNN resembles language modeling, i.e., the message aggregation operation between chronologically occurring nodes and their temporal neighbors in TGNNs can be structured as sequence modeling. Beside this similarity, we also incorporate a series of algorithm designs including suffix infilling, temporal graph attention with self-loop, and causal masking self-attention to make TF-TGN work. During training, existing systems are slow in transforming the graph topology and conducting graph sampling. As such, we propose methods to parallelize the CSR format conversion and graph sampling. We also adapt Transformer codebase to train TF-TGN efficiently with multiple GPUs. We experiment with 9 graphs and compare with 2 state-of-the-art TGNN training frameworks. The results show that TF-TGN can accelerate training by over 2.20 while providing comparable or even superior accuracy to existing SOTA TGNNs. TF-TGN is available at https://github.com/qianghuangwhu/TF-TGN.

URLs: https://github.com/qianghuangwhu/TF-TGN.

replace CRADLE-VAE: Enhancing Single-Cell Gene Perturbation Modeling with Counterfactual Reasoning-based Artifact Disentanglement

Authors: Seungheun Baek, Soyon Park, Yan Ting Chok, Junhyun Lee, Jueon Park, Mogan Gim, Jaewoo Kang

Abstract: Predicting cellular responses to various perturbations is a critical focus in drug discovery and personalized therapeutics, with deep learning models playing a significant role in this endeavor. Single-cell datasets contain technical artifacts that may hinder the predictability of such models, which poses quality control issues highly regarded in this area. To address this, we propose CRADLE-VAE, a causal generative framework tailored for single-cell gene perturbation modeling, enhanced with counterfactual reasoning-based artifact disentanglement. Throughout training, CRADLE-VAE models the underlying latent distribution of technical artifacts and perturbation effects present in single-cell datasets. It employs counterfactual reasoning to effectively disentangle such artifacts by modulating the latent basal spaces and learns robust features for generating cellular response data with improved quality. Experimental results demonstrate that this approach improves not only treatment effect estimation performance but also generative quality as well. The CRADLE-VAE codebase is publicly available at https://github.com/dmis-lab/CRADLE-VAE.

URLs: https://github.com/dmis-lab/CRADLE-VAE.

replace-cross System Neural Diversity: Measuring Behavioral Heterogeneity in Multi-Agent Learning

Authors: Matteo Bettini, Ajay Shankar, Amanda Prorok

Abstract: Evolutionary science provides evidence that diversity confers resilience in natural systems. Yet, traditional multi-agent reinforcement learning techniques commonly enforce homogeneity to increase training sample efficiency. When a system of learning agents is not constrained to homogeneous policies, individuals may develop diverse behaviors, resulting in emergent complementarity that benefits the system. Despite this, there is a surprising lack of tools that quantify behavioral diversity. Such techniques would pave the way towards understanding the impact of diversity in collective artificial intelligence and enabling its control. In this paper, we introduce System Neural Diversity (SND): a measure of behavioral heterogeneity in multi-agent systems. We discuss and prove its theoretical properties, and compare it with alternate, state-of-the-art behavioral diversity metrics used in the robotics domain. Through simulations of a variety of cooperative multi-robot tasks, we show how our metric constitutes an important tool that enables measurement and control of behavioral heterogeneity. In dynamic tasks, where the problem is affected by repeated disturbances during training, we show that SND allows us to measure latent resilience skills acquired by the agents, while other proxies, such as task performance (reward), fail to. Finally, we show how the metric can be employed to control diversity, allowing us to enforce a desired heterogeneity set-point or range. We demonstrate how this paradigm can be used to bootstrap the exploration phase, finding optimal policies faster, thus enabling novel and more efficient MARL paradigms.

replace-cross Pseudo-rigid body networks: learning interpretable deformable object dynamics from partial observations

Authors: Shamil Mamedov, A. Ren\'e Geist, Jan Swevers, Sebastian Trimpe

Abstract: Accurately predicting deformable linear object (DLO) dynamics is challenging, especially when the task requires a model that is both human-interpretable and computationally efficient. In this work, we draw inspiration from the pseudo-rigid body method (PRB) and model a DLO as a serial chain of rigid bodies whose internal state is unrolled through time by a dynamics network. This dynamics network is trained jointly with a physics-informed encoder that maps observed motion variables to the DLO's hidden state. To encourage the state to acquire a physically meaningful representation, we leverage the forward kinematics of the PRB model as a decoder. We demonstrate in robot experiments that the proposed DLO dynamics model provides physically interpretable predictions from partial observations while being on par with black-box models regarding prediction accuracy. The project code is available at: http://tinyurl.com/prb-networks

URLs: http://tinyurl.com/prb-networks

replace-cross INFLECT-DGNN: Influencer Prediction with Dynamic Graph Neural Networks

Authors: Elena Tiukhova, Emiliano Penaloza, Mar\'ia \'Oskarsd\'ottir, Bart Baesens, Monique Snoeck, Cristi\'an Bravo

Abstract: Leveraging network information for predictive modeling has become widespread in many domains. Within the realm of referral and targeted marketing, influencer detection stands out as an area that could greatly benefit from the incorporation of dynamic network representation due to the continuous evolution of customer-brand relationships. In this paper, we present INFLECT-DGNN, a new method for profit-driven INFLuencer prEdiCTion with Dynamic Graph Neural Networks that innovatively combines Graph Neural Networks (GNNs) and Recurrent Neural Networks (RNNs) with weighted loss functions, synthetic minority oversampling adapted to graph data, and a carefully crafted rolling-window strategy. We introduce a novel profit-driven framework that supports decision-making based on model predictions. To test the framework, we use a unique corporate dataset with diverse networks, capturing the customer interactions across three cities with different socioeconomic and demographic characteristics. Our results show how using RNNs to encode temporal attributes alongside GNNs significantly improves predictive performance, while the profit-driven framework determines the optimal classification threshold for profit maximization. We compare the results of different models to demonstrate the importance of capturing network representation, temporal dependencies, and using a profit-driven evaluation. Our research has significant implications for the fields of referral and targeted marketing, expanding the technical use of deep graph learning within corporate environments.

replace-cross Learning Generative Models for Lumped Rainfall-Runoff Modeling

Authors: Yang Yang, Ting Fong May Chui

Abstract: This study presents a novel generative modeling approach to rainfall-runoff modeling, focusing on the synthesis of realistic daily catchment runoff time series in response to catchment-averaged climate forcing. Unlike traditional process-based lumped hydrologic models that depend on predefined sets of variables describing catchment physical properties, our approach uses a small number of latent variables to characterize runoff generation processes. These latent variables encapsulate the intrinsic properties of a catchment and can be inferred from catchment climate forcing and discharge data. By sampling from the latent variable space, the model generates runoff time series that closely resemble real-world observations. In this study, we trained the generative models using neural networks on data from over 3,000 global catchments and achieved prediction accuracies comparable to current deep learning models and various conventional lumped models, both within the catchments from the training set and from other regions worldwide. This suggests that the runoff generation process of catchments can be effectively captured by a low-dimensional latent representation. Yet, challenges such as equifinality and optimal determination of latent variables remain. Future research should focus on refining parameter estimation methods and exploring the physical meaning of these latent dimensions to improve model applicability and robustness. This generative approach offers a promising alternative for hydrological modeling that requires minimal assumptions about the physical processes of the catchment.

replace-cross A Heterogeneous Graph-Based Multi-Task Learning for Fault Event Diagnosis in Smart Grid

Authors: Dibaloke Chanda, Nasim Yahya Soltani

Abstract: Precise and timely fault diagnosis is a prerequisite for a distribution system to ensure minimum downtime and maintain reliable operation. This necessitates access to a comprehensive procedure that can provide the grid operators with insightful information in the case of a fault event. In this paper, we propose a heterogeneous multi-task learning graph neural network (MTL-GNN) capable of detecting, locating and classifying faults in addition to providing an estimate of the fault resistance and current. Using a graph neural network (GNN) allows for learning the topological representation of the distribution system as well as feature learning through a message-passing scheme. We investigate the robustness of our proposed model using the IEEE-123 test feeder system. This work also proposes a novel GNN-based explainability method to identify key nodes in the distribution system which then facilitates informed sparse measurements. Numerical tests validate the performance of the model across all tasks.

replace-cross Feedback-guided Data Synthesis for Imbalanced Classification

Authors: Reyhane Askari Hemmat, Mohammad Pezeshki, Florian Bordes, Michal Drozdzal, Adriana Romero-Soriano

Abstract: Current status quo in machine learning is to use static datasets of real images for training, which often come from long-tailed distributions. With the recent advances in generative models, researchers have started augmenting these static datasets with synthetic data, reporting moderate performance improvements on classification tasks. We hypothesize that these performance gains are limited by the lack of feedback from the classifier to the generative model, which would promote the usefulness of the generated samples to improve the classifier's performance. In this work, we introduce a framework for augmenting static datasets with useful synthetic samples, which leverages one-shot feedback from the classifier to drive the sampling of the generative model. In order for the framework to be effective, we find that the samples must be close to the support of the real data of the task at hand, and be sufficiently diverse. We validate three feedback criteria on a long-tailed dataset (ImageNet-LT) as well as a group-imbalanced dataset (NICO++). On ImageNet-LT, we achieve state-of-the-art results, with over 4 percent improvement on underrepresented classes while being twice efficient in terms of the number of generated synthetic samples. NICO++ also enjoys marked boosts of over 5 percent in worst group accuracy. With these results, our framework paves the path towards effectively leveraging state-of-the-art text-to-image models as data sources that can be queried to improve downstream applications.

replace-cross Learning force laws in many-body systems

Authors: Wentao Yu, Eslam Abdelaleem, Ilya Nemenman, Justin C. Burton

Abstract: Scientific laws describing natural systems may be more complex than our intuition can handle, thus how we discover laws must change. Machine learning (ML) models can analyze large quantities of data, but their structure should match the underlying physical constraints to provide useful insight. While progress has been made using simulated data where the underlying physics is known, training and validating ML models on experimental data requires fundamentally new approaches. Here we demonstrate and experimentally validate an ML approach that incorporates physical intuition to infer force laws in dusty plasma, a complex, many-body system. Trained on 3D particle trajectories, the model accounts for inherent symmetries, non-identical particles, and learns the effective non-reciprocal forces between particles with exquisite accuracy (R^2>0.99). We validate the model by inferring particle masses in two independent yet consistent ways. The model's accuracy enables precise measurements of particle charge and screening length, discovering violations of common theoretical assumptions. Our ability to identify new physics from experimental data demonstrates how ML-powered approaches can guide new routes of scientific discovery in many-body systems. Furthermore, we anticipate our ML approach to be a starting point for inferring laws from dynamics in a wide range of many-body systems, from colloids to living organisms.

replace-cross SpikeCLIP: A Contrastive Language-Image Pretrained Spiking Neural Network

Authors: Tianlong Li, Wenhao Liu, Changze Lv, Yufei Gu, Jianhan Xu, Cenyuan Zhang, Muling Wu, Xiaoqing Zheng, Xuanjing Huang

Abstract: Spiking Neural Networks (SNNs) have emerged as a promising alternative to conventional Artificial Neural Networks (ANNs), demonstrating comparable performance in both visual and linguistic tasks while offering the advantage of improved energy efficiency. Despite these advancements, the integration of linguistic and visual features into a unified representation through spike trains poses a significant challenge, and the application of SNNs to multimodal scenarios remains largely unexplored. This paper presents SpikeCLIP, a novel framework designed to bridge the modality gap in spike-based computation. Our approach employs a two-step recipe: an ``alignment pre-training'' to align features across modalities, followed by a ``dual-loss fine-tuning'' to refine the model's performance. Extensive experiments reveal that SNNs achieve results on par with ANNs while substantially reducing energy consumption across various datasets commonly used for multimodal model evaluation. Furthermore, SpikeCLIP maintains robust image classification capabilities, even when dealing with classes that fall outside predefined categories. This study marks a significant advancement in the development of energy-efficient and biologically plausible multimodal learning systems.

replace-cross Learning a Cross-modality Anomaly Detector for Remote Sensing Imagery

Authors: Jingtao Li, Xinyu Wang, Hengwei Zhao, Liangpei Zhang, Yanfei Zhong

Abstract: Remote sensing anomaly detector can find the objects deviating from the background as potential targets for Earth monitoring. Given the diversity in earth anomaly types, designing a transferring model with cross-modality detection ability should be cost-effective and flexible to new earth observation sources and anomaly types. However, the current anomaly detectors aim to learn the certain background distribution, the trained model cannot be transferred to unseen images. Inspired by the fact that the deviation metric for score ranking is consistent and independent from the image distribution, this study exploits the learning target conversion from the varying background distribution to the consistent deviation metric. We theoretically prove that the large-margin condition in labeled samples ensures the transferring ability of learned deviation metric. To satisfy this condition, two large margin losses for pixel-level and feature-level deviation ranking are proposed respectively. Since the real anomalies are difficult to acquire, anomaly simulation strategies are designed to compute the model loss. With the large-margin learning for deviation metric, the trained model achieves cross-modality detection ability in five modalities including hyperspectral, visible light, synthetic aperture radar (SAR), infrared and low-light in zero-shot manner.

replace-cross Passive Inference Attacks on Split Learning via Adversarial Regularization

Authors: Xiaochen Zhu, Xinjian Luo, Yuncheng Wu, Yangfan Jiang, Xiaokui Xiao, Beng Chin Ooi

Abstract: Split Learning (SL) has emerged as a practical and efficient alternative to traditional federated learning. While previous attempts to attack SL have often relied on overly strong assumptions or targeted easily exploitable models, we seek to develop more capable attacks. We introduce SDAR, a novel attack framework against SL with an honest-but-curious server. SDAR leverages auxiliary data and adversarial regularization to learn a decodable simulator of the client's private model, which can effectively infer the client's private features under the vanilla SL, and both features and labels under the U-shaped SL. We perform extensive experiments in both configurations to validate the effectiveness of our proposed attacks. Notably, in challenging scenarios where existing passive attacks struggle to reconstruct the client's private data effectively, SDAR consistently achieves significantly superior attack performance, even comparable to active attacks. On CIFAR-10, at the deep split level of 7, SDAR achieves private feature reconstruction with less than 0.025 mean squared error in both the vanilla and the U-shaped SL, and attains a label inference accuracy of over 98% in the U-shaped setting, while existing attacks fail to produce non-trivial results.

replace-cross Developing A Multi-Agent and Self-Adaptive Framework with Deep Reinforcement Learning for Dynamic Portfolio Risk Management

Authors: Zhenglong Li, Vincent Tam, Kwan L. Yeung

Abstract: Deep or reinforcement learning (RL) approaches have been adapted as reactive agents to quickly learn and respond with new investment strategies for portfolio management under the highly turbulent financial market environments in recent years. In many cases, due to the very complex correlations among various financial sectors, and the fluctuating trends in different financial markets, a deep or reinforcement learning based agent can be biased in maximising the total returns of the newly formulated investment portfolio while neglecting its potential risks under the turmoil of various market conditions in the global or regional sectors. Accordingly, a multi-agent and self-adaptive framework namely the MASA is proposed in which a sophisticated multi-agent reinforcement learning (RL) approach is adopted through two cooperating and reactive agents to carefully and dynamically balance the trade-off between the overall portfolio returns and their potential risks. Besides, a very flexible and proactive agent as the market observer is integrated into the MASA framework to provide some additional information on the estimated market trends as valuable feedbacks for multi-agent RL approach to quickly adapt to the ever-changing market conditions. The obtained empirical results clearly reveal the potential strengths of our proposed MASA framework based on the multi-agent RL approach against many well-known RL-based approaches on the challenging data sets of the CSI 300, Dow Jones Industrial Average and S&P 500 indexes over the past 10 years. More importantly, our proposed MASA framework shed lights on many possible directions for future investigation.

replace-cross Can Large Language Models Learn Independent Causal Mechanisms?

Authors: Ga\"el Gendron, Bao Trung Nguyen, Alex Yuxuan Peng, Michael Witbrock, Gillian Dobbie

Abstract: Despite impressive performance on language modelling and complex reasoning tasks, Large Language Models (LLMs) fall short on the same tasks in uncommon settings or with distribution shifts, exhibiting a lack of generalisation ability. By contrast, systems such as causal models, that learn abstract variables and causal relationships, can demonstrate increased robustness against changes in the distribution. One reason for this success is the existence and use of Independent Causal Mechanisms (ICMs) representing high-level concepts that only sparsely interact. In this work, we apply two concepts from causality to learn ICMs within LLMs. We develop a new LLM architecture composed of multiple sparsely interacting language modelling modules. We show that such causal constraints can improve out-of-distribution performance on abstract and causal reasoning tasks. We also investigate the level of independence and domain specialisation and show that LLMs rely on pre-trained partially domain-invariant mechanisms resilient to fine-tuning.

replace-cross Estimation of conditional average treatment effects on distributed confidential data

Authors: Yuji Kawamata, Ryoki Motai, Yukihiko Okada, Akira Imakura, Tetsuya Sakurai

Abstract: Estimation of conditional average treatment effects (CATEs) is an important topic in sciences. CATEs can be estimated with high accuracy if distributed data across multiple parties can be centralized. However, it is difficult to aggregate such data owing to confidential or privacy concerns. To address this issue, we proposed data collaboration double machine learning, a method that can estimate CATE models from privacy-preserving fusion data constructed from distributed data, and evaluated our method through simulations. Our contributions are summarized in the following three points. First, our method enables estimation and testing of semi-parametric CATE models without iterative communication on distributed data. Our semi-parametric CATE method enable estimation and testing that is more robust to model mis-specification than parametric methods. Second, our method enables collaborative estimation between multiple time points and different parties through the accumulation of a knowledge base. Third, our method performed equally or better than other methods in simulations using synthetic, semi-synthetic and real-world datasets.

replace-cross Is Epistemic Uncertainty Faithfully Represented by Evidential Deep Learning Methods?

Authors: Mira J\"urgens, Nis Meinert, Viktor Bengs, Eyke H\"ullermeier, Willem Waegeman

Abstract: Trustworthy ML systems should not only return accurate predictions, but also a reliable representation of their uncertainty. Bayesian methods are commonly used to quantify both aleatoric and epistemic uncertainty, but alternative approaches, such as evidential deep learning methods, have become popular in recent years. The latter group of methods in essence extends empirical risk minimization (ERM) for predicting second-order probability distributions over outcomes, from which measures of epistemic (and aleatoric) uncertainty can be extracted. This paper presents novel theoretical insights of evidential deep learning, highlighting the difficulties in optimizing second-order loss functions and interpreting the resulting epistemic uncertainty measures. With a systematic setup that covers a wide range of approaches for classification, regression and counts, it provides novel insights into issues of identifiability and convergence in second-order loss minimization, and the relative (rather than absolute) nature of epistemic uncertainty measures.

replace-cross User-LLM: Efficient LLM Contextualization with User Embeddings

Authors: Lin Ning, Luyang Liu, Jiaxing Wu, Neo Wu, Devora Berlowitz, Sushant Prakash, Bradley Green, Shawn O'Banion, Jun Xie

Abstract: Large language models (LLMs) have achieved remarkable success across various domains, but effectively incorporating complex and potentially noisy user timeline data into LLMs remains a challenge. Current approaches often involve translating user timelines into text descriptions before feeding them to LLMs, which can be inefficient and may not fully capture the nuances of user behavior. Inspired by how LLMs are effectively integrated with images through direct embeddings, we propose User-LLM, a novel framework that leverages user embeddings to directly contextualize LLMs with user history interactions. These embeddings, generated by a user encoder pretrained using self-supervised learning on diverse user interactions, capture latent user behaviors and interests as well as their evolution over time. We integrate these user embeddings with LLMs through cross-attention, enabling LLMs to dynamically adapt their responses based on the context of a user's past actions and preferences. Our approach achieves significant efficiency gains by representing user timelines directly as embeddings, leading to substantial inference speedups of up to 78.1X. Comprehensive experiments on MovieLens, Amazon Review, and Google Local Review datasets demonstrate that User-LLM outperforms text-prompt-based contextualization on tasks requiring deep user understanding, with improvements of up to 16.33%, particularly excelling on long sequences that capture subtle shifts in user behavior. Furthermore, the incorporation of Perceiver layers streamlines the integration between user encoders and LLMs, yielding additional computational savings.

replace-cross ViSaRL: Visual Reinforcement Learning Guided by Human Saliency

Authors: Anthony Liang, Jesse Thomason, Erdem B{\i}y{\i}k

Abstract: Training robots to perform complex control tasks from high-dimensional pixel input using reinforcement learning (RL) is sample-inefficient, because image observations are comprised primarily of task-irrelevant information. By contrast, humans are able to visually attend to task-relevant objects and areas. Based on this insight, we introduce Visual Saliency-Guided Reinforcement Learning (ViSaRL). Using ViSaRL to learn visual representations significantly improves the success rate, sample efficiency, and generalization of an RL agent on diverse tasks including DeepMind Control benchmark, robot manipulation in simulation and on a real robot. We present approaches for incorporating saliency into both CNN and Transformer-based encoders. We show that visual representations learned using ViSaRL are robust to various sources of visual perturbations including perceptual noise and scene variations. ViSaRL nearly doubles success rate on the real-robot tasks compared to the baseline which does not use saliency.

replace-cross RoBERTa and Attention-based BiLSTM for Interpretable Sentiment Analysis of Tweets

Authors: Md Abrar Jahin, Md Sakib Hossain Shovon, M. F. Mridha, Md Rashedul Islam, Yutaka Watanobe

Abstract: Sentiment analysis is crucial for understanding public opinion and consumer behavior. Existing models face challenges with linguistic diversity, generalizability, and explainability. We propose TRABSA, a hybrid framework integrating transformer-based architectures, attention mechanisms, and BiLSTM networks to address this. Leveraging RoBERTa-trained on 124M tweets, we bridge gaps in sentiment analysis benchmarks, ensuring state-of-the-art accuracy. Augmenting datasets with tweets from 32 countries and US states, we compare six word-embedding techniques and three lexicon-based labeling techniques, selecting the best for optimal sentiment analysis. TRABSA outperforms traditional ML and deep learning models with 94% accuracy and significant precision, recall, and F1-score gains. Evaluation across diverse datasets demonstrates consistent superiority and generalizability. SHAP and LIME analyses enhance interpretability, improving confidence in predictions. Our study facilitates pandemic resource management, aiding resource planning, policy formation, and vaccination tactics.

replace-cross Disentangling Hippocampal Shape Variations: A Study of Neurological Disorders Using Mesh Variational Autoencoder with Contrastive Learning

Authors: Jakaria Rabbi, Johannes Kiechle, Christian Beaulieu, Nilanjan Ray, Dana Cobzas

Abstract: This paper presents a comprehensive study focused on disentangling hippocampal shape variations from diffusion tensor imaging (DTI) datasets within the context of neurological disorders. Leveraging a Graph Variational Autoencoder (VAE) enhanced with Supervised Contrastive Learning, our approach aims to improve interpretability by disentangling two distinct latent variables corresponding to age and the presence of diseases. In our ablation study, we investigate a range of VAE architectures and contrastive loss functions, showcasing the enhanced disentanglement capabilities of our approach. This evaluation uses synthetic 3D torus mesh data and real 3D hippocampal mesh datasets derived from the DTI hippocampal dataset. Our supervised disentanglement model outperforms several state-of-the-art (SOTA) methods like attribute and guided VAEs in terms of disentanglement scores. Our model distinguishes between age groups and disease status in patients with Multiple Sclerosis (MS) using the hippocampus data. Our Graph VAE with Supervised Contrastive Learning shows the volume changes of the hippocampus of MS populations at different ages, and the result is consistent with the current neuroimaging literature. This research provides valuable insights into the relationship between neurological disorder and hippocampal shape changes in different age groups of MS populations using a Graph VAE with Supervised Contrastive loss.

replace-cross Relational Prompt-based Pre-trained Language Models for Social Event Detection

Authors: Pu Li, Xiaoyan Yu, Hao Peng, Yantuan Xian, Linqin Wang, Li Sun, Jingyun Zhang, Philip S. Yu

Abstract: Social Event Detection (SED) aims to identify significant events from social streams, and has a wide application ranging from public opinion analysis to risk management. In recent years, Graph Neural Network (GNN) based solutions have achieved state-of-the-art performance. However, GNN-based methods often struggle with missing and noisy edges between messages, affecting the quality of learned message embedding. Moreover, these methods statically initialize node embedding before training, which, in turn, limits the ability to learn from message texts and relations simultaneously. In this paper, we approach social event detection from a new perspective based on Pre-trained Language Models (PLMs), and present RPLM_SED (Relational prompt-based Pre-trained Language Models for Social Event Detection). We first propose a new pairwise message modeling strategy to construct social messages into message pairs with multi-relational sequences. Secondly, a new multi-relational prompt-based pairwise message learning mechanism is proposed to learn more comprehensive message representation from message pairs with multi-relational prompts using PLMs. Thirdly, we design a new clustering constraint to optimize the encoding process by enhancing intra-cluster compactness and inter-cluster dispersion, making the message representation more distinguishable. We evaluate the RPLM_SED on three real-world datasets, demonstrating that the RPLM_SED model achieves state-of-the-art performance in offline, online, low-resource, and long-tail distribution scenarios for social event detection tasks.

replace-cross Sparse Attention Regression Network Based Soil Fertility Prediction With Ummaso

Authors: R V Raghavendra Rao, U Srinivasulu Reddy

Abstract: The challenge of imbalanced soil nutrient datasets significantly hampers accurate predictions of soil fertility. To tackle this, a new method is suggested in this research, combining Uniform Manifold Approximation and Projection (UMAP) with Least Absolute Shrinkage and Selection Operator (LASSO). The main aim is to counter the impact of uneven data distribution and improve soil fertility models' predictive precision. The model introduced uses Sparse Attention Regression, effectively incorporating pertinent features from the imbalanced dataset. UMAP is utilized initially to reduce data complexity, unveiling hidden structures and important patterns. Following this, LASSO is applied to refine features and enhance the model's interpretability. The experimental outcomes highlight the effectiveness of the UMAP and LASSO hybrid approach. The proposed model achieves outstanding performance metrics, reaching a predictive accuracy of 98%, demonstrating its capability in accurate soil fertility predictions. Additionally, it showcases a Precision of 91.25%, indicating its adeptness in identifying fertile soil instances accurately. The Recall metric stands at 90.90%, emphasizing the model's ability to capture true positive cases effectively.

replace-cross Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution

Authors: Shuo Shao, Yiming Li, Hongwei Yao, Yiling He, Zhan Qin, Kui Ren

Abstract: Ownership verification is currently the most critical and widely adopted post-hoc method to safeguard model copyright. In general, model owners exploit it to identify whether a given suspicious third-party model is stolen from them by examining whether it has particular properties `inherited' from their released models. Currently, backdoor-based model watermarks are the primary and cutting-edge methods to implant such properties in the released models. However, backdoor-based methods have two fatal drawbacks, including harmfulness and ambiguity. The former indicates that they introduce maliciously controllable misclassification behaviors ($i.e.$, backdoor) to the watermarked released models. The latter denotes that malicious users can easily pass the verification by finding other misclassified samples, leading to ownership ambiguity. In this paper, we argue that both limitations stem from the `zero-bit' nature of existing watermarking schemes, where they exploit the status ($i.e.$, misclassified) of predictions for verification. Motivated by this understanding, we design a new watermarking paradigm, $i.e.$, Explanation as a Watermark (EaaW), that implants verification behaviors into the explanation of feature attribution instead of model predictions. Specifically, EaaW embeds a `multi-bit' watermark into the feature attribution explanation of specific trigger samples without changing the original prediction. We correspondingly design the watermark embedding and extraction algorithms inspired by explainable artificial intelligence. In particular, our approach can be used for different tasks ($e.g.$, image classification and text generation). Extensive experiments verify the effectiveness and harmlessness of our EaaW and its resistance to potential attacks.

replace-cross Concealing Backdoor Model Updates in Federated Learning by Trigger-Optimized Data Poisoning

Authors: Yujie Zhang, Neil Gong, Michael K. Reiter

Abstract: Federated Learning (FL) is a decentralized machine learning method that enables participants to collaboratively train a model without sharing their private data. Despite its privacy and scalability benefits, FL is susceptible to backdoor attacks, where adversaries poison the local training data of a subset of clients using a backdoor trigger, aiming to make the aggregated model produce malicious results when the same backdoor condition is met by an inference-time input. Existing backdoor attacks in FL suffer from common deficiencies: fixed trigger patterns and reliance on the assistance of model poisoning. State-of-the-art defenses based on analyzing clients' model updates exhibit a good defense performance on these attacks because of the significant divergence between malicious and benign client model updates. To effectively conceal malicious model updates among benign ones, we propose DPOT, a backdoor attack strategy in FL that dynamically constructs backdoor objectives by optimizing a backdoor trigger, making backdoor data have minimal effect on model updates. We provide theoretical justifications for DPOT's attacking principle and display experimental results showing that DPOT, via only a data-poisoning attack, effectively undermines state-of-the-art defenses and outperforms existing backdoor attack techniques on various datasets.

replace-cross Dielectric Tensor Prediction for Inorganic Materials Using Latent Information from Preferred Potential

Authors: Zetian Mao, Wenwen Li, Jethro Tan

Abstract: Dielectrics are crucial for technologies like flash memory, CPUs, photovoltaics, and capacitors, but public data on these materials are scarce, restricting research and development. Existing machine learning models have focused on predicting scalar polycrystalline dielectric constants, neglecting the directional nature of dielectric tensors essential for material design. This study leverages multi-rank equivariant structural embeddings from a universal neural network potential to enhance predictions of dielectric tensors. We develop an equivariant readout decoder to predict total, electronic, and ionic dielectric tensors while preserving O(3) equivariance, and benchmark its performance against state-of-the-art algorithms. Virtual screening of thermodynamically stable materials from Materials Project for two discovery tasks, high-dielectric and highly anisotropic materials, identifies promising candidates including Cs2Ti(WO4)3 (band gap $E_g=2.93 \mathrm{eV}$, dielectric constant $\varepsilon=180.90$) and CsZrCuSe3 (anisotropic ratio $\alpha_r = 121.89$). The results demonstrate our model's accuracy in predicting dielectric tensors and its potential for discovering novel dielectric materials.

replace-cross On the Recoverability of Causal Relations from Temporally Aggregated I.I.D. Data

Authors: Shunxing Fan, Mingming Gong, Kun Zhang

Abstract: We consider the effect of temporal aggregation on instantaneous (non-temporal) causal discovery in general setting. This is motivated by the observation that the true causal time lag is often considerably shorter than the observational interval. This discrepancy leads to high aggregation, causing time-delay causality to vanish and instantaneous dependence to manifest. Although we expect such instantaneous dependence has consistency with the true causal relation in certain sense to make the discovery results meaningful, it remains unclear what type of consistency we need and when will such consistency be satisfied. We proposed functional consistency and conditional independence consistency in formal way correspond functional causal model-based methods and conditional independence-based methods respectively and provide the conditions under which these consistencies will hold. We show theoretically and experimentally that causal discovery results may be seriously distorted by aggregation especially in complete nonlinear case and we also find causal relationship still recoverable from aggregated data if we have partial linearity or appropriate prior. Our findings suggest community should take a cautious and meticulous approach when interpreting causal discovery results from such data and show why and when aggregation will distort the performance of causal discovery methods.

replace-cross Discovering Dynamic Symbolic Policies with Genetic Programming

Authors: Sigur de Vries, Sander Keemink, Marcel van Gerven

Abstract: Artificial intelligence techniques are increasingly being applied to solve control problems, but often rely on black-box methods without transparent output generation. To improve the interpretability and transparency in control systems, models can be defined as white-box symbolic policies described by mathematical expressions. While current approaches to learn symbolic policies focus on static policies that directly map observations to control signals, these may fail in partially observable and volatile environments. We instead consider dynamic symbolic policies with memory, optimised with genetic programming. The resulting policies are robust, and consist of easy to interpret coupled differential equations. Our results show that dynamic symbolic policies compare with black-box policies on a variety of control tasks. Furthermore, the benefit of the memory in dynamic policies is demonstrated on experiments where static policies fall short. Overall, we present a method for evolving high-performing symbolic policies that offer interpretability and transparency, which lacks in black-box models.

replace-cross Lightning-Fast Convective Outlooks: Predicting Severe Convective Environments with Global AI-based Weather Models

Authors: Monika Feldmann, Tom Beucler, Milton Gomez, Olivia Martius

Abstract: Severe convective storms are among the most dangerous weather phenomena and accurate forecasts mitigate their impacts. The recently released suite of AI-based weather models produces medium-range forecasts within seconds, with a skill similar to state-of-the-art operational forecasts for variables on single levels. However, predicting severe thunderstorm environments requires accurate combinations of dynamic and thermodynamic variables and the vertical structure of the atmosphere. Advancing the assessment of AI-models towards process-based evaluations lays the foundation for hazard-driven applications. We assess the forecast skill of three top-performing AI-models for convective parameters at lead-times of up to 10 days against reanalysis and ECMWF's operational numerical weather prediction model IFS. In a case study and seasonal analyses, we see the best performance by GraphCast and Pangu-Weather: these models match or even exceed the performance of IFS for instability and shear. This opens opportunities for fast and inexpensive predictions of severe weather environments.

replace-cross Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning

Authors: Patrik Reizinger, Siyuan Guo, Ferenc Husz\'ar, Bernhard Sch\"olkopf, Wieland Brendel

Abstract: Identifying latent representations or causal structures is important for good generalization and downstream task performance. However, both fields have been developed rather independently. We observe that several methods in both representation and causal structure learning rely on the same data-generating process (DGP), namely, exchangeable but not i.i.d. (independent and identically distributed) data. We provide a unified framework, termed Identifiable Exchangeable Mechanisms (IEM), for representation and structure learning under the lens of exchangeability. IEM provides new insights that let us relax the necessary conditions for causal structure identification in exchangeable non--i.i.d. data. We also demonstrate the existence of a duality condition in identifiable representation learning, leading to new identifiability results. We hope this work will pave the way for further research in causal representation learning.

replace-cross Shedding More Light on Robust Classifiers under the lens of Energy-based Models

Authors: Mujtaba Hussain Mirza, Maria Rosaria Briglia, Senad Beadini, Iacopo Masi

Abstract: By reinterpreting a robust discriminative classifier as Energy-based Model (EBM), we offer a new take on the dynamics of adversarial training (AT). Our analysis of the energy landscape during AT reveals that untargeted attacks generate adversarial images much more in-distribution (lower energy) than the original data from the point of view of the model. Conversely, we observe the opposite for targeted attacks. On the ground of our thorough analysis, we present new theoretical and practical results that show how interpreting AT energy dynamics unlocks a better understanding: (1) AT dynamic is governed by three phases and robust overfitting occurs in the third phase with a drastic divergence between natural and adversarial energies (2) by rewriting the loss of TRadeoff-inspired Adversarial DEfense via Surrogate-loss minimization (TRADES) in terms of energies, we show that TRADES implicitly alleviates overfitting by means of aligning the natural energy with the adversarial one (3) we empirically show that all recent state-of-the-art robust classifiers are smoothing the energy landscape and we reconcile a variety of studies about understanding AT and weighting the loss function under the umbrella of EBMs. Motivated by rigorous evidence, we propose Weighted Energy Adversarial Training (WEAT), a novel sample weighting scheme that yields robust accuracy matching the state-of-the-art on multiple benchmarks such as CIFAR-10 and SVHN and going beyond in CIFAR-100 and Tiny-ImageNet. We further show that robust classifiers vary in the intensity and quality of their generative capabilities, and offer a simple method to push this capability, reaching a remarkable Inception Score (IS) and FID using a robust classifier without training for generative modeling. The code to reproduce our results is available at http://github.com/OmnAI-Lab/Robust-Classifiers-under-the-lens-of-EBM/ .

URLs: http://github.com/OmnAI-Lab/Robust-Classifiers-under-the-lens-of-EBM/

replace-cross OT-VP: Optimal Transport-guided Visual Prompting for Test-Time Adaptation

Authors: Yunbei Zhang, Akshay Mehra, Jihun Hamm

Abstract: Vision Transformers (ViTs) have demonstrated remarkable capabilities in learning representations, but their performance is compromised when applied to unseen domains. Previous methods either engage in prompt learning during the training phase or modify model parameters at test time through entropy minimization. The former often overlooks unlabeled target data, while the latter doesn't fully address domain shifts. In this work, our approach, Optimal Transport-guided Test-Time Visual Prompting (OT-VP), handles these problems by leveraging prompt learning at test time to align the target and source domains without accessing the training process or altering pre-trained model parameters. This method involves learning a universal visual prompt for the target domain by optimizing the Optimal Transport distance.OT-VP, with only four learned prompt tokens, exceeds state-of-the-art performance across three stylistic datasets-PACS, VLCS, OfficeHome, and one corrupted dataset ImageNet-C. Additionally, OT-VP operates efficiently, both in terms of memory and computation, and is adaptable for extension to online settings.

replace-cross PhishLang: A Lightweight, Client-Side Phishing Detection Framework using MobileBERT for Real-Time, Explainable Threat Mitigation

Authors: Sayak Saha Roy, Shirin Nilizadeh

Abstract: In this paper, we introduce PhishLang, an open-source, lightweight language model specifically designed for phishing website detection through contextual analysis of the website. Unlike traditional heuristic or machine learning models that rely on static features and struggle to adapt to new threats, and deep learning models that are computationally intensive, our model leverages MobileBERT, a fast and memory-efficient variant of the BERT architecture, to learn granular features characteristic of phishing attacks. PhishLang operates with minimal data preprocessing and offers performance comparable to leading deep learning anti-phishing tools, while being significantly faster and less resource-intensive. Over a 3.5-month testing period, PhishLang successfully identified 25,796 phishing URLs, many of which were undetected by popular antiphishing blocklists, thus demonstrating its potential to enhance current detection measures. Capitalizing on PhishLang's resource efficiency, we release the first open-source fully client-side Chromium browser extension that provides inference locally without requiring to consult an online blocklist and can be run on low-end systems with no impact on inference times. Our implementation not only outperforms prevalent (server-side) phishing tools, but is significantly more effective than the limited commercial client-side measures available. Furthermore, we study how PhishLang can be integrated with GPT-3.5 Turbo to create explainable blocklisting -- which, upon detection of a website, provides users with detailed contextual information about the features that led to a website being marked as phishing.

replace-cross Performance Law of Large Language Models

Authors: Chuhan Wu, Ruiming Tang

Abstract: Guided by the belief of the scaling law, large language models (LLMs) have achieved impressive performance in recent years. However, scaling law only gives a qualitative estimation of loss, which is influenced by various factors such as model architectures, data distributions, tokenizers, and computation precision. Thus, estimating the real performance of LLMs with different training settings rather than loss may be quite useful in practical development. In this article, we present an empirical equation named "Performance Law" to directly predict the MMLU score of an LLM, which is a widely used metric to indicate the general capability of LLMs in real-world conversations and applications. Based on only a few key hyperparameters of the LLM architecture and the size of training data, we obtain a quite accurate MMLU prediction of various LLMs with diverse sizes and architectures developed by different organizations in different years. Performance law can be used to guide the choice of LLM architecture and the effective allocation of computational resources without extensive experiments.

replace-cross Targeting the partition function of chemically disordered materials with a generative approach based on inverse variational autoencoders

Authors: Maciej J. Karcz, Luca Messina, Eiji Kawasaki, Emeric Bourasseau

Abstract: Computing atomic-scale properties of chemically disordered materials requires an efficient exploration of their vast configuration space. Traditional approaches such as Monte Carlo or Special Quasirandom Structures either entail sampling an excessive amount of configurations or do not ensure that the configuration space has been properly covered. In this work, we propose a novel approach where generative machine learning is used to yield a representative set of configurations for accurate property evaluation and provide accurate estimations of atomic-scale properties with minimal computational cost. Our method employs a specific type of variational autoencoder with inverse roles for the encoder and decoder, enabling the application of an unsupervised active learning scheme that does not require any initial training database. The model iteratively generates configuration batches, whose properties are computed with conventional atomic-scale methods. These results are then fed back into the model to estimate the partition function, repeating the process until convergence. We illustrate our approach by computing point-defect formation energies and concentrations in (U, Pu)O2 mixed-oxide fuels. In addition, the ML model provides valuable insights into the physical factors influencing the target property. Our method is generally applicable to explore other properties, such as atomic-scale diffusion coefficients, in ideally or non-ideally disordered materials like high-entropy alloys.

replace-cross LibMOON: A Gradient-based MultiObjective OptimizatioN Library in PyTorch

Authors: Xiaoyuan Zhang, Liang Zhao, Yingying Yu, Xi Lin, Zhenkun Wang, Han Zhao, Qingfu Zhang

Abstract: Multiobjective optimization problems (MOPs) are prevalent in machine learning, with applications in multi-task learning, learning under fairness or robustness constraints, etc. Instead of reducing multiple objective functions into a scalar objective, MOPs aim to optimize for the so-called Pareto optimality or Pareto set learning, which involves optimizing more than one objective function simultaneously, over models with millions of parameters. Existing benchmark libraries for MOPs mainly focus on evolutionary algorithms, most of which are zeroth-order methods that do not effectively utilize higher-order information from objectives and cannot scale to large-scale models with millions of parameters. In light of the above gap, this paper introduces LibMOON, the first multiobjective optimization library that supports state-of-the-art gradient-based methods, provides a fair benchmark, and is open-sourced for the community.

replace-cross EMCNet : Graph-Nets for Electron Micrographs Classification

Authors: Sakhinana Sagar Srinivas, Rajat Kumar Sarkar, Venkataramana Runkana

Abstract: Characterization of materials via electron micrographs is an important and challenging task in several materials processing industries. Classification of electron micrographs is complex due to the high intra-class dissimilarity, high inter-class similarity, and multi-spatial scales of patterns. However, existing methods are ineffective in learning complex image patterns. We propose an effective end-to-end electron micrograph representation learning-based framework for nanomaterial identification to overcome the challenges. We demonstrate that our framework outperforms the popular baselines on the open-source datasets in nanomaterials-based identification tasks. The ablation studies are reported in great detail to support the efficacy of our approach.

replace-cross Hermes: Memory-Efficient Pipeline Inference for Large Models on Edge Devices

Authors: Xueyuan Han, Zinuo Cai, Yichu Zhang, Chongxin Fan, Junhan Liu, Ruhui Ma, Rajkumar Buyya

Abstract: The application of Transformer-based large models has achieved numerous success in recent years. However, the exponential growth in the parameters of large models introduces formidable memory challenge for edge deployment. Prior works to address this challenge mainly focus on optimizing the model structure and adopting memory swapping methods. However, the former reduces the inference accuracy, and the latter raises the inference latency. This paper introduces PIPELOAD, a novel memory-efficient pipeline execution mechanism. It reduces memory usage by incorporating dynamic memory management and minimizes inference latency by employing parallel model loading. Based on PIPELOAD mechanism, we present Hermes, a framework optimized for large model inference on edge devices. We evaluate Hermes on Transformer-based models of different sizes. Our experiments illustrate that Hermes achieves up to 4.24 X increase in inference speed and 86.7% lower memory consumption than the state-of-the-art pipeline mechanism for BERT and ViT models, 2.58 X increase in inference speed and 90.3% lower memory consumption for GPT-style models.

replace-cross QueryBuilder: Human-in-the-Loop Query Development for Information Retrieval

Authors: Hemanth Kandula, Damianos Karakos, Haoling Qiu, Benjamin Rozonoyer, Ian Soboroff, Lee Tarlin, Bonan Min

Abstract: Frequently, users of an Information Retrieval (IR) system start with an overarching information need (a.k.a., an analytic task) and proceed to define finer-grained queries covering various important aspects (i.e., sub-topics) of that analytic task. We present a novel, interactive system called $\textit{QueryBuilder}$, which allows a novice, English-speaking user to create queries with a small amount of effort, through efficient exploration of an English development corpus in order to rapidly develop cross-lingual information retrieval queries corresponding to the user's information needs. QueryBuilder performs near real-time retrieval of documents based on user-entered search terms; the user looks through the retrieved documents and marks sentences as relevant to the information needed. The marked sentences are used by the system as additional information in query formation and refinement: query terms (and, optionally, event features, which capture event $'triggers'$ (indicator terms) and agent/patient roles) are appropriately weighted, and a neural-based system, which better captures textual meaning, retrieves other relevant content. The process of retrieval and marking is repeated as many times as desired, giving rise to increasingly refined queries in each iteration. The final product is a fine-grained query used in Cross-Lingual Information Retrieval (CLIR). Our experiments using analytic tasks and requests from the IARPA BETTER IR datasets show that with a small amount of effort (at most 10 minutes per sub-topic), novice users can form $\textit{useful}$ fine-grained queries including in languages they don't understand. QueryBuilder also provides beneficial capabilities to the traditional corpus exploration and query formation process. A demonstration video is released at https://vimeo.com/734795835

URLs: https://vimeo.com/734795835

replace-cross Revisiting Trace Norm Minimization for Tensor Tucker Completion: A Direct Multilinear Rank Learning Approach

Authors: Xueke Tong, Hancheng Zhu, Lei Cheng, Yik-Chung Wu

Abstract: To efficiently express tensor data using the Tucker format, a critical task is to minimize the multilinear rank such that the model would not be over-flexible and lead to overfitting. Due to the lack of rank minimization tools in tensor, existing works connect Tucker multilinear rank minimization to trace norm minimization of matrices unfolded from the tensor data. While these formulations try to exploit the common aim of identifying the low-dimensional structure of the tensor and matrix, this paper reveals that existing trace norm-based formulations in Tucker completion are inefficient in multilinear rank minimization. We further propose a new interpretation of Tucker format such that trace norm minimization is applied to the factor matrices of the equivalent representation, rather than some matrices unfolded from tensor data. Based on the newly established problem formulation, a fixed point iteration algorithm is proposed, and its convergence is proved. Numerical results are presented to show that the proposed algorithm exhibits significant improved performance in terms of multilinear rank learning and consequently tensor signal recovery accuracy, compared to existing trace norm based Tucker completion methods.