new Gravix: Active Learning for Gravitational Waves Classification Algorithms

Authors: Raja Vavekanand, Kira Sam, Vavek Bharwani

Abstract: This project explores the integration of Bayesian Optimization (BO) algorithms into a base machine learning model, specifically Convolutional Neural Networks (CNNs), for classifying gravitational waves among background noise. The primary objective is to evaluate whether optimizing hyperparameters using Bayesian Optimization enhances the base model's performance. For this purpose, a Kaggle [1] dataset that comprises real background noise (labeled 0) and simulated gravitational wave signals with noise (labeled 1) is used. Data with real noise is collected from three detectors: LIGO Livingston, LIGO Hanford, and Virgo. Through data preprocessing and training, the models effectively classify testing data, predicting the presence of gravitational wave signals with a remarkable score, of 83.61%. The BO model demonstrates comparable accuracy to the base model, but its performance improvement is not very significant (84.34%). However, it is worth noting that the BO model needs additional computational resources and time due to the iterations required for hyperparameter optimization, requiring additional training on the entire dataset. For this reason, the BO model is less efficient in terms of resources compared to the base model in gravitational wave classification

new Multi-Task Multi-Fidelity Learning of Properties for Energetic Materials

Authors: Robert J. Appleton, Daniel Klinger, Brian H. Lee, Michael Taylor, Sohee Kim, Samuel Blankenship, Brian C. Barnes, Steven F. Son, Alejandro Strachan

Abstract: Data science and artificial intelligence are playing an increasingly important role in the physical sciences. Unfortunately, in the field of energetic materials data scarcity limits the accuracy and even applicability of ML tools. To address data limitations, we compiled multi-modal data: both experimental and computational results for several properties. We find that multi-task neural networks can learn from multi-modal data and outperform single-task models trained for specific properties. As expected, the improvement is more significant for data-scarce properties. These models are trained using descriptors built from simple molecular information and can be readily applied for large-scale materials screening to explore multiple properties simultaneously. This approach is widely applicable to fields outside energetic materials.

new Multimodal Methods for Analyzing Learning and Training Environments: A Systematic Literature Review

Authors: Clayton Cohn, Eduardo Davalos, Caleb Vatral, Joyce Horn Fonteles, Hanchen David Wang, Meiyi Ma, Gautam Biswas

Abstract: Recent technological advancements have enhanced our ability to collect and analyze rich multimodal data (e.g., speech, video, and eye gaze) to better inform learning and training experiences. While previous reviews have focused on parts of the multimodal pipeline (e.g., conceptual models and data fusion), a comprehensive literature review on the methods informing multimodal learning and training environments has not been conducted. This literature review provides an in-depth analysis of research methods in these environments, proposing a taxonomy and framework that encapsulates recent methodological advances in this field and characterizes the multimodal domain in terms of five modality groups: Natural Language, Video, Sensors, Human-Centered, and Environment Logs. We introduce a novel data fusion category -- mid fusion -- and a graph-based technique for refining literature reviews, termed citation graph pruning. Our analysis reveals that leveraging multiple modalities offers a more holistic understanding of the behaviors and outcomes of learners and trainees. Even when multimodality does not enhance predictive accuracy, it often uncovers patterns that contextualize and elucidate unimodal data, revealing subtleties that a single modality may miss. However, there remains a need for further research to bridge the divide between multimodal learning and training studies and foundational AI research.

new Evolvable Psychology Informed Neural Network for Memory Behavior Modeling

Authors: Xiaoxuan Shen, Zhihai Hu, Qirong Chen, Shengyingjie Liu, Ruxia Liang, Jianwen Sun

Abstract: Memory behavior modeling is a core issue in cognitive psychology and education. Classical psychological theories typically use memory equations to describe memory behavior, which exhibits insufficient accuracy and controversy, while data-driven memory modeling methods often require large amounts of training data and lack interpretability. Knowledge-informed neural network models have shown excellent performance in fields like physics, but there have been few attempts in the domain of behavior modeling. This paper proposed a psychology theory informed neural networks for memory behavior modeling named PsyINN, where it constructs a framework that combines neural network with differentiating sparse regression, achieving joint optimization. Specifically, to address the controversies and ambiguity of descriptors in memory equations, a descriptor evolution method based on differentiating operators is proposed to achieve precise characterization of descriptors and the evolution of memory theoretical equations. Additionally, a buffering mechanism for the sparse regression and a multi-module alternating iterative optimization method are proposed, effectively mitigating gradient instability and local optima issues. On four large-scale real-world memory behavior datasets, the proposed method surpasses the state-of-the-art methods in prediction accuracy. Ablation study demonstrates the effectiveness of the proposed refinements, and application experiments showcase its potential in inspiring psychological research.

new Extraction of Typical Operating Scenarios of New Power System Based on Deep Time Series Aggregation

Authors: Zhaoyang Qu, Zhenming Zhang, Nan Qu, Yuguang Zhou, Yang Li, Tao Jiang, Min Li, Chao Long

Abstract: Extracting typical operational scenarios is essential for making flexible decisions in the dispatch of a new power system. This study proposed a novel deep time series aggregation scheme (DTSAs) to generate typical operational scenarios, considering the large amount of historical operational snapshot data. Specifically, DTSAs analyze the intrinsic mechanisms of different scheduling operational scenario switching to mathematically represent typical operational scenarios. A gramian angular summation field (GASF) based operational scenario image encoder was designed to convert operational scenario sequences into high-dimensional spaces. This enables DTSAs to fully capture the spatiotemporal characteristics of new power systems using deep feature iterative aggregation models. The encoder also facilitates the generation of typical operational scenarios that conform to historical data distributions while ensuring the integrity of grid operational snapshots. Case studies demonstrate that the proposed method extracted new fine-grained power system dispatch schemes and outperformed the latest high-dimensional featurescreening methods. In addition, experiments with different new energy access ratios were conducted to verify the robustness of the proposed method. DTSAs enables dispatchers to master the operation experience of the power system in advance, and actively respond to the dynamic changes of the operation scenarios under the high access rate of new energy.

new Knowledge Graph Modeling-Driven Large Language Model Operating System (LLM OS) for Task Automation in Process Engineering Problem-Solving

Authors: Sakhinana Sagar Srinivas, Vijay Sri Vaikunth, Venkataramana Runkana

Abstract: We present the Process Engineering Operations Assistant (PEOA), an AI-driven framework designed to solve complex problems in the chemical and process industries. The framework employs a modular architecture orchestrated by a meta-agent, which serves as the central coordinator, managing an action generator and instruction-tuned small-scale language models (expert models). The action generator decomposes complex problems into sub-tasks and identifies suitable expert models to execute each, delivering precise solutions for multi-step problem-solving. Key techniques include advanced knowledge modeling using property graphs for improved information retrieval, facilitating more accurate and contextually relevant solutions. Additionally, the framework utilizes a teacher-student transfer-learning approach with GPT-4 (Omni) to fine-tune the action generator and expert models for domain adaptation, alongside an iterative problem-solving mechanism with sophisticated error handling. Custom datasets were developed to evaluate the framework against leading proprietary language models on various engineering tasks. The results demonstrate the framework effectiveness in automating calculations, accelerating prototyping, and providing AI-augmented decision support for industrial processes, marking a significant advancement in process engineering capabilities.

new A New Era in Computational Pathology: A Survey on Foundation and Vision-Language Models

Authors: Dibaloke Chanda, Milan Aryal, Nasim Yahya Soltani, Masoud Ganji

Abstract: Recent advances in deep learning have completely transformed the domain of computational pathology (CPath), which in turn altered the diagnostic workflow of pathologists by integrating foundation models (FMs) and vision-language models (VLMs) in their assessment and decision-making process. FMs overcome the limitations of existing deep learning approaches in CPath by learning a representation space that can be adapted to a wide variety of downstream tasks without explicit supervision. VLMs allow pathology reports written in natural language to be used as a rich semantic information source to improve existing models as well as generate predictions in natural language form. In this survey, a holistic and systematic overview of recent innovations in FMs and VLMs in CPath is presented. Furthermore, the tools, datasets and training schemes for these models are summarized in addition to categorizing them into distinct groups. This extensive survey highlights the current trends in CPath and the way it is going to be transformed through FMs and VLMs in the future.

new SHEDAD: SNN-Enhanced District Heating Anomaly Detection for Urban Substations

Authors: Jonne van Dreven, Abbas Cheddad, Sadi Alawadi, Ahmad Nauman Ghazi, Jad Al Koussa, Dirk Vanhoudt

Abstract: District Heating (DH) systems are essential for energy-efficient urban heating. However, despite the advancements in automated fault detection and diagnosis (FDD), DH still faces challenges in operational faults that impact efficiency. This study introduces the Shared Nearest Neighbor Enhanced District Heating Anomaly Detection (SHEDAD) approach, designed to approximate the DH network topology and allow for local anomaly detection without disclosing sensitive information, such as substation locations. The approach leverages a multi-adaptive k-Nearest Neighbor (k-NN) graph to improve the initial neighborhood creation. Moreover, it introduces a merging technique that reduces noise and eliminates trivial edges. We use the Median Absolute Deviation (MAD) and modified z-scores to flag anomalous substations. The results reveal that SHEDAD outperforms traditional clustering methods, achieving significantly lower intra-cluster variance and distance. Additionally, SHEDAD effectively isolates and identifies two distinct categories of anomalies: supply temperatures and substation performance. We identified 30 anomalous substations and reached a sensitivity of approximately 65\% and specificity of approximately 97\%. By focusing on this subset of poor-performing substations in the network, SHEDAD enables more targeted and effective maintenance interventions, which can reduce energy usage while optimizing network performance.

new Applying graph neural network to SupplyGraph for supply chain network

Authors: Kihwan Han

Abstract: Supply chain networks describe interactions between products, manufacture facilities, storages in the context of supply and demand of the products. Supply chain data are inherently under graph structure; thus, it can be fertile ground for applications of graph neural network (GNN). Very recently, supply chain dataset, SupplyGraph, has been released to the public. Though the SupplyGraph dataset is valuable given scarcity of publicly available data, there was less clarity on description of the dataset, data quality assurance process, and hyperparameters of the selected models. Further, for generalizability of findings, it would be more convincing to present the findings by performing statistical analyses on the distribution of errors rather than showing the average value of the errors. Therefore, this study assessed the supply chain dataset, SupplyGraph, with better clarity on analyses processes, data quality assurance, machine learning (ML) model specifications. After data quality assurance procedures, this study compared performance of Multilayer Perceptions (MLP), Graph Convolution Network (GCN), and Graph Attention Network (GAT) on a demanding forecasting task while matching hyperparameters as feasible as possible. The analyses revealed that GAT performed best, followed by GCN and MLP. Those performance improvements were statistically significant at $\alpha = 0.05$ after correction for multiple comparisons. This study also discussed several considerations in applying GNN to supply chain networks. The current study reinforces the previous study in supply chain benchmark dataset with respect to description of the dataset and methodology, so that the future research in applications of GNN to supply chain becomes more reproducible.

new Physics-Informed Neural Network for Concrete Manufacturing Process Optimization

Authors: Sam Varghese, Mr. Rahul Anand, Gaurav Paliwal

Abstract: Concrete manufacturing projects are one of the most common ones for consulting agencies. Because of the highly non-linear dependency of input materials like ash, water, cement, superplastic, etc; with the resultant strength of concrete, it gets difficult for machine learning models to successfully capture this relation and perform cost optimizations. This paper highlights how PINNs (Physics Informed Neural Networks) can be useful in the given situation. This state-of-the-art model shall also get compared with traditional models like Linear Regression, Random Forest, Gradient Boosting, and Deep Neural Network. Results of the research highlights how well PINNs performed even with reduced dataset, thus resolving one of the biggest issues of limited data availability for ML models. On an average, PINN got the loss value reduced by 26.3% even with 40% lesser data compared to the Deep Neural Network. In addition to predicting strength of the concrete given the quantity of raw materials, the paper also highlights the use of heuristic optimization method like Particle Swarm Optimization (PSO) in predicting quantity of raw materials required to manufacture concrete of given strength with least cost.

new Empowering Pre-Trained Language Models for Spatio-Temporal Forecasting via Decoupling Enhanced Discrete Reprogramming

Authors: Hao Wang, Jindong Han, Wei Fan, Hao Liu

Abstract: Spatio-temporal time series forecasting plays a critical role in various real-world applications, such as transportation optimization, energy management, and climate analysis. The recent advancements in Pre-trained Language Models (PLMs) have inspired efforts to reprogram these models for time series forecasting tasks, by leveraging their superior reasoning and generalization capabilities. However, existing approaches fall short in handling complex spatial inter-series dependencies and intrinsic intra-series frequency components, limiting their spatio-temporal forecasting performance. Moreover, the linear mapping of continuous time series to a compressed subset vocabulary in reprogramming constrains the spatio-temporal semantic expressivity of PLMs and may lead to potential information bottleneck. To overcome the above limitations, we propose \textsc{RePST}, a tailored PLM reprogramming framework for spatio-temporal forecasting. The key insight of \textsc{RePST} is to decouple the spatio-temporal dynamics in the frequency domain, allowing better alignment with the PLM text space. Specifically, we first decouple spatio-temporal data in Fourier space and devise a structural diffusion operator to obtain temporal intrinsic and spatial diffusion signals, making the dynamics more comprehensible and predictable for PLMs. To avoid information bottleneck from a limited vocabulary, we further propose a discrete reprogramming strategy that selects relevant discrete textual information from an expanded vocabulary space in a differentiable manner. Extensive experiments on four real-world datasets show that our proposed approach significantly outperforms state-of-the-art spatio-temporal forecasting models, particularly in data-scarce scenarios.

new Distilling Long-tailed Datasets

Authors: Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang, Kai Wang, Yan Yan

Abstract: Dataset distillation (DD) aims to distill a small, information-rich dataset from a larger one for efficient neural network training. However, existing DD methods struggle with long-tailed datasets, which are prevalent in real-world scenarios. By investigating the reasons behind this unexpected result, we identified two main causes: 1) Expert networks trained on imbalanced data develop biased gradients, leading to the synthesis of similarly imbalanced distilled datasets. Parameter matching, a common technique in DD, involves aligning the learning parameters of the distilled dataset with that of the original dataset. However, in the context of long-tailed datasets, matching biased experts leads to inheriting the imbalance present in the original data, causing the distilled dataset to inadequately represent tail classes. 2) The experts trained on such datasets perform suboptimally on tail classes, resulting in misguided distillation supervision and poor-quality soft-label initialization. To address these issues, we propose a novel long-tailed dataset distillation method, Long-tailed Aware Dataset distillation (LAD). Specifically, we propose Weight Mismatch Avoidance to avoid directly matching the biased expert trajectories. It reduces the distance between the student and the biased expert trajectories and prevents the tail class bias from being distilled to the synthetic dataset. Moreover, we propose Adaptive Decoupled Matching, which jointly matches the decoupled backbone and classifier to improve the tail class performance and initialize reliable soft labels. This work pioneers the field of long-tailed dataset distillation (LTDD), marking the first effective effort to distill long-tailed datasets.

new LLMs as Zero-shot Graph Learners: Alignment of GNN Representations with LLM Token Embeddings

Authors: Duo Wang, Yuan Zuo, Fengzhi Li, Junjie Wu

Abstract: Zero-shot graph machine learning, especially with graph neural networks (GNNs), has garnered significant interest due to the challenge of scarce labeled data. While methods like self-supervised learning and graph prompt learning have been extensively explored, they often rely on fine-tuning with task-specific labels, limiting their effectiveness in zero-shot scenarios. Inspired by the zero-shot capabilities of instruction-fine-tuned large language models (LLMs), we introduce a novel framework named Token Embedding-Aligned Graph Language Model (TEA-GLM) that leverages LLMs as cross-dataset and cross-task zero-shot learners for graph machine learning. Concretely, we pretrain a GNN, aligning its representations with token embeddings of an LLM. We then train a linear projector that transforms the GNN's representations into a fixed number of graph token embeddings without tuning the LLM. A unified instruction is designed for various graph tasks at different levels, such as node classification (node-level) and link prediction (edge-level). These design choices collectively enhance our method's effectiveness in zero-shot learning, setting it apart from existing methods. Experiments show that our graph token embeddings help the LLM predictor achieve state-of-the-art performance on unseen datasets and tasks compared to other methods using LLMs as predictors.

new Variational autoencoder-based neural network model compression

Authors: Liang Cheng, Peiyuan Guan, Amir Taherkordi, Lei Liu, Dapeng Lan

Abstract: Variational Autoencoders (VAEs), as a form of deep generative model, have been widely used in recent years, and shown great great peformance in a number of different domains, including image generation and anomaly detection, etc.. This paper aims to explore neural network model compression method based on VAE. The experiment uses different neural network models for MNIST recognition as compression targets, including Feedforward Neural Network (FNN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM). These models are the most basic models in deep learning, and other more complex and advanced models are based on them or inherit their features and evolve. In the experiment, the first step is to train the models mentioned above, each trained model will have different accuracy and number of total parameters. And then the variants of parameters for each model are processed as training data in VAEs separately, and the trained VAEs are tested by the true model parameters. The experimental results show that using the latent space as a representation of the model compression can improve the compression rate compared to some traditional methods such as pruning and quantization, meanwhile the accuracy is not greatly affected using the model parameters reconstructed based on the latent space. In the future, a variety of different large-scale deep learning models will be used more widely, so exploring different ways to save time and space on saving or transferring models will become necessary, and the use of VAE in this paper can provide a basis for these further explorations.

new Improving Nonlinear Projection Heads using Pretrained Autoencoder Embeddings

Authors: Andreas Schliebitz, Heiko Tapken, Martin Atzmueller

Abstract: This empirical study aims at improving the effectiveness of the standard 2-layer MLP projection head $g(\cdot)$ featured in the SimCLR framework through the use of pretrained autoencoder embeddings. Given a contrastive learning task with a largely unlabeled image classification dataset, we first train a shallow autoencoder architecture and extract its compressed representations contained in the encoder's embedding layer. After freezing the weights within this pretrained layer, we use it as a drop-in replacement for the input layer of SimCLR's default projector. Additionally, we also apply further architectural changes to the projector by decreasing its width and changing its activation function. The different projection heads are then used to contrastively train and evaluate a feature extractor $f(\cdot)$ following the SimCLR protocol, while also examining the performance impact of Z-score normalized datasets. Our experiments indicate that using a pretrained autoencoder embedding in the projector can not only increase classification accuracy by up to 2.9% or 1.7% on average but can also significantly decrease the dimensionality of the projection space. Our results also suggest, that using the sigmoid and tanh activation functions within the projector can outperform ReLU in terms of peak and average classification accuracy. When applying our presented projectors, then not applying Z-score normalization to datasets often increases peak performance. In contrast, the default projection head can benefit more from normalization. All experiments involving our pretrained projectors are conducted with frozen embeddings, since our test results indicate an advantage compared to using their non-frozen counterparts.

new A Multilateral Attention-enhanced Deep Neural Network for Disease Outbreak Forecasting: A Case Study on COVID-19

Authors: Ashutosh Anshul, Jhalak Gupta, Mohammad Zia Ur Rehman, Nagendra Kumar

Abstract: The worldwide impact of the recent COVID-19 pandemic has been substantial, necessitating the development of accurate forecasting models to predict the spread and course of a pandemic. Previous methods for outbreak forecasting have faced limitations by not utilizing multiple sources of input and yielding suboptimal performance due to the limited availability of data. In this study, we propose a novel approach to address the challenges of infectious disease forecasting. We introduce a Multilateral Attention-enhanced GRU model that leverages information from multiple sources, thus enabling a comprehensive analysis of factors influencing the spread of a pandemic. By incorporating attention mechanisms within a GRU framework, our model can effectively capture complex relationships and temporal dependencies in the data, leading to improved forecasting performance. Further, we have curated a well-structured multi-source dataset for the recent COVID-19 pandemic that the research community can utilize as a great resource to conduct experiments and analysis on time-series forecasting. We evaluated the proposed model on our COVID-19 dataset and reported the output in terms of RMSE and MAE. The experimental results provide evidence that our proposed model surpasses existing techniques in terms of performance. We also performed performance gain and qualitative analysis on our dataset to evaluate the impact of the attention mechanism and show that the proposed model closely follows the trajectory of the pandemic.

new Towards Graph Prompt Learning: A Survey and Beyond

Authors: Qingqing Long, Yuchen Yan, Peiyan Zhang, Chen Fang, Wentao Cui, Zhiyuan Ning, Meng Xiao, Ning Cao, Xiao Luo, Lingjun Xu, Shiyue Jiang, Zheng Fang, Chong Chen, Xian-Sheng Hua, Yuanchun Zhou

Abstract: Large-scale "pre-train and prompt learning" paradigms have demonstrated remarkable adaptability, enabling broad applications across diverse domains such as question answering, image recognition, and multimodal retrieval. This approach fully leverages the potential of large-scale pre-trained models, reducing downstream data requirements and computational costs while enhancing model applicability across various tasks. Graphs, as versatile data structures that capture relationships between entities, play pivotal roles in fields such as social network analysis, recommender systems, and biological graphs. Despite the success of pre-train and prompt learning paradigms in Natural Language Processing (NLP) and Computer Vision (CV), their application in graph domains remains nascent. In graph-structured data, not only do the node and edge features often have disparate distributions, but the topological structures also differ significantly. This diversity in graph data can lead to incompatible patterns or gaps between pre-training and fine-tuning on downstream graphs. We aim to bridge this gap by summarizing methods for alleviating these disparities. This includes exploring prompt design methodologies, comparing related techniques, assessing application scenarios and datasets, and identifying unresolved problems and challenges. This survey categorizes over 100 relevant works in this field, summarizing general design principles and the latest applications, including text-attributed graphs, molecules, proteins, and recommendation systems. Through this extensive review, we provide a foundational understanding of graph prompt learning, aiming to impact not only the graph mining community but also the broader Artificial General Intelligence (AGI) community.

new Retrieval Augmented Generation for Dynamic Graph Modeling

Authors: Yuxia Wu, Yuan Fang, Lizi Liao

Abstract: Dynamic graph modeling is crucial for analyzing evolving patterns in various applications. Existing approaches often integrate graph neural networks with temporal modules or redefine dynamic graph modeling as a generative sequence task. However, these methods typically rely on isolated historical contexts of the target nodes from a narrow perspective, neglecting occurrences of similar patterns or relevant cases associated with other nodes. In this work, we introduce the Retrieval-Augmented Generation for Dynamic Graph Modeling (RAG4DyG) framework, which leverages guidance from contextually and temporally analogous examples to broaden the perspective of each node. This approach presents two critical challenges: (1) How to identify and retrieve high-quality demonstrations that are contextually and temporally analogous to dynamic graph samples? (2) How can these demonstrations be effectively integrated to improve dynamic graph modeling? To address these challenges, we propose RAG4DyG, which enriches the understanding of historical contexts by retrieving and learning from contextually and temporally pertinent demonstrations. Specifically, we employ a time- and context-aware contrastive learning module to identify and retrieve relevant cases for each query sequence. Moreover, we design a graph fusion strategy to integrate the retrieved cases, thereby augmenting the inherent historical contexts for improved prediction. Extensive experiments on real-world datasets across different domains demonstrate the effectiveness of RAG4DyG for dynamic graph modeling.

new Estimating Uncertainty with Implicit Quantile Network

Authors: Yi Hung Lim

Abstract: Uncertainty quantification is an important part of many performance critical applications. This paper provides a simple alternative to existing approaches such as ensemble learning and bayesian neural networks. By directly modeling the loss distribution with an Implicit Quantile Network, we get an estimate of how uncertain the model is of its predictions. For experiments with MNIST and CIFAR datasets, the mean of the estimated loss distribution is 2x higher for incorrect predictions. When data with high estimated uncertainty is removed from the test dataset, the accuracy of the model goes up as much as 10%. This method is simple to implement while offering important information to applications where the user has to know when the model could be wrong (e.g. deep learning for healthcare).

new Adaptive Resolution Inference (ARI): Energy-Efficient Machine Learning for Internet of Things

Authors: Ziheng Wang, Pedro Reviriego, Farzad Niknia, Javier Conde, Shanshan Liu, Fabrizio Lombardi

Abstract: The implementation of machine learning in Internet of Things devices poses significant operational challenges due to limited energy and computation resources. In recent years, significant efforts have been made to implement simplified ML models that can achieve reasonable performance while reducing computation and energy, for example by pruning weights in neural networks, or using reduced precision for the parameters and arithmetic operations. However, this type of approach is limited by the performance of the ML implementation, i.e., by the loss for example in accuracy due to the model simplification. In this article, we present adaptive resolution inference (ARI), a novel approach that enables to evaluate new tradeoffs between energy dissipation and model performance in ML implementations. The main principle of the proposed approach is to run inferences with reduced precision (quantization) and use the margin over the decision threshold to determine if either the result is reliable, or the inference must run with the full model. The rationale is that quantization only introduces small deviations in the inference scores, such that if the scores have a sufficient margin over the decision threshold, it is unlikely that the full model would have a different result. Therefore, we can run the quantized model first, and only when the scores do not have a sufficient margin, the full model is run. This enables most inferences to run with the reduced precision model and only a small fraction requires the full model, so significantly reducing computation and energy while not affecting model performance. The proposed ARI approach is presented, analyzed in detail, and evaluated using different data sets for floating-point and stochastic computing implementations. The results show that ARI can significantly reduce the energy for inference in different configurations with savings between 40% and 85%.

new Aiding Humans in Financial Fraud Decision Making: Toward an XAI-Visualization Framework

Authors: Angelos Chatzimparmpas, Evanthia Dimara

Abstract: AI prevails in financial fraud detection and decision making. Yet, due to concerns about biased automated decision making or profiling, regulations mandate that final decisions are made by humans. Financial fraud investigators face the challenge of manually synthesizing vast amounts of unstructured information, including AI alerts, transaction histories, social media insights, and governmental laws. Current Visual Analytics (VA) systems primarily support isolated aspects of this process, such as explaining binary AI alerts and visualizing transaction patterns, thus adding yet another layer of information to the overall complexity. In this work, we propose a framework where the VA system supports decision makers throughout all stages of financial fraud investigation, including data collection, information synthesis, and human criteria iteration. We illustrate how VA can claim a central role in AI-aided decision making, ensuring that human judgment remains in control while minimizing potential biases and labor-intensive tasks.

new CURLoRA: Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation

Authors: Muhammad Fawi

Abstract: This paper introduces CURLoRA, a novel approach to fine-tuning large language models (LLMs) that leverages CUR matrix decomposition in the context of Low-Rank Adaptation (LoRA). Our method addresses two critical challenges in LLM fine-tuning: mitigating catastrophic forgetting during continual learning and reducing the number of trainable parameters. We propose a unique modification to the CUR decomposition process, utilizing inverted probabilities for column and row selection which acts as an implicit regularization, and initializing the $U$ matrix as a zero matrix, and only fine-tuning it. We demonstrate through experiments on multiple datasets that CURLoRA outperforms standard LoRA in mitigating catastrophic forgetting. It maintains model stability and performance across tasks while significantly reducing the number of trainable parameters. Our results show that CURLoRA achieves very good and stable task accuracy while maintaining base model's perplexity scores fixed compared to LoRA upon continual fine-tuning, particularly in scenarios with limited data.

new Efficient fine-tuning of 37-level GraphCast with the Canadian global deterministic analysis

Authors: Christopher Subich

Abstract: This work describes a process for efficiently fine-tuning the GraphCast data-driven forecast model to simulate another analysis system, here the Global Deterministic Prediction System (GDPS) of Environment and Climate Change Canada (ECCC). Using two years of training data (July 2019 -- December 2021) and 37 GPU-days of computation to tune the 37-level, quarter-degree version of GraphCast, the resulting model significantly outperforms both the unmodified GraphCast and operational forecast, showing significant forecast skill in the troposphere over lead times from 1 to 10 days. This fine-tuning is accomplished through abbreviating DeepMind's original training curriculum for GraphCast, relying on a shorter single-step forecast stage to accomplish the bulk of the adaptation work and consolidating the autoregressive stages into separate 12hr, 1d, 2d, and 3d stages with larger learning rates. Additionally, training over 3d forecasts is split into two sub-steps to conserve host memory while maintaining a strong correlation with training over the full period.

new Biased Dueling Bandits with Stochastic Delayed Feedback

Authors: Bongsoo Yi, Yue Kang, Yao Li

Abstract: The dueling bandit problem, an essential variation of the traditional multi-armed bandit problem, has become significantly prominent recently due to its broad applications in online advertising, recommendation systems, information retrieval, and more. However, in many real-world applications, the feedback for actions is often subject to unavoidable delays and is not immediately available to the agent. This partially observable issue poses a significant challenge to existing dueling bandit literature, as it significantly affects how quickly and accurately the agent can update their policy on the fly. In this paper, we introduce and examine the biased dueling bandit problem with stochastic delayed feedback, revealing that this new practical problem will delve into a more realistic and intriguing scenario involving a preference bias between the selections. We present two algorithms designed to handle situations involving delay. Our first algorithm, requiring complete delay distribution information, achieves the optimal regret bound for the dueling bandit problem when there is no delay. The second algorithm is tailored for situations where the distribution is unknown, but only the expected value of delay is available. We provide a comprehensive regret analysis for the two proposed algorithms and then evaluate their empirical performance on both synthetic and real datasets.

new Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold

Authors: Lazar Atanackovic, Xi Zhang, Brandon Amos, Mathieu Blanchette, Leo J. Lee, Yoshua Bengio, Alexander Tong, Kirill Neklyudov

Abstract: Numerous biological and physical processes can be modeled as systems of interacting entities evolving continuously over time, e.g. the dynamics of communicating cells or physical particles. Learning the dynamics of such systems is essential for predicting the temporal evolution of populations across novel samples and unseen environments. Flow-based models allow for learning these dynamics at the population level - they model the evolution of the entire distribution of samples. However, current flow-based models are limited to a single initial population and a set of predefined conditions which describe different dynamics. We argue that multiple processes in natural sciences have to be represented as vector fields on the Wasserstein manifold of probability densities. That is, the change of the population at any moment in time depends on the population itself due to the interactions between samples. In particular, this is crucial for personalized medicine where the development of diseases and their respective treatment response depends on the microenvironment of cells specific to each patient. We propose Meta Flow Matching (MFM), a practical approach to integrating along these vector fields on the Wasserstein manifold by amortizing the flow model over the initial populations. Namely, we embed the population of samples using a Graph Neural Network (GNN) and use these embeddings to train a Flow Matching model. This gives MFM the ability to generalize over the initial distributions unlike previously proposed methods. We demonstrate the ability of MFM to improve prediction of individual treatment responses on a large scale multi-patient single-cell drug screen dataset.

new Hybrid Deep Convolutional Neural Networks Combined with Autoencoders And Augmented Data To Predict The Look-Up Table 2006

Authors: Messaoud Djeddou, Aouatef Hellal, Ibrahim A. Hameed, Xingang Zhao, Djehad Al Dallal

Abstract: This study explores the development of a hybrid deep convolutional neural network (DCNN) model enhanced by autoencoders and data augmentation techniques to predict critical heat flux (CHF) with high accuracy. By augmenting the original input features using three different autoencoder configurations, the model's predictive capabilities were significantly improved. The hybrid models were trained and tested on a dataset of 7225 samples, with performance metrics including the coefficient of determination (R2), Nash-Sutcliffe efficiency (NSE), mean absolute error (MAE), and normalized root-mean-squared error (NRMSE) used for evaluation. Among the tested models, the DCNN_3F-A2 configuration demonstrated the highest accuracy, achieving an R2 of 0.9908 during training and 0.9826 during testing, outperforming the base model and other augmented versions. These results suggest that the proposed hybrid approach, combining deep learning with feature augmentation, offers a robust solution for CHF prediction, with the potential to generalize across a wider range of conditions.

new Can Optimization Trajectories Explain Multi-Task Transfer?

Authors: David Mueller, Mark Dredze, Nicholas Andrews

Abstract: Despite the widespread adoption of multi-task training in deep learning, little is understood about how multi-task learning (MTL) affects generalization. Prior work has conjectured that the negative effects of MTL are due to optimization challenges that arise during training, and many optimization methods have been proposed to improve multi-task performance. However, recent work has shown that these methods fail to consistently improve multi-task generalization. In this work, we seek to improve our understanding of these failures by empirically studying how MTL impacts the optimization of tasks, and whether this impact can explain the effects of MTL on generalization. We show that MTL results in a generalization gap-a gap in generalization at comparable training loss-between single-task and multi-task trajectories early into training. However, we find that factors of the optimization trajectory previously proposed to explain generalization gaps in single-task settings cannot explain the generalization gaps between single-task and multi-task models. Moreover, we show that the amount of gradient conflict between tasks is correlated with negative effects to task optimization, but is not predictive of generalization. Our work sheds light on the underlying causes for failures in MTL and, importantly, raises questions about the role of general purpose multi-task optimization algorithms.

new Enhancing Neural Network Interpretability Through Conductance-Based Information Plane Analysis

Authors: Jaouad Dabounou, Amine Baazzouz

Abstract: The Information Plane is a conceptual framework used to analyze the flow of information in neural networks, but traditional methods based on activations may not fully capture the dynamics of information processing. This paper introduces a new approach that uses layer conductance, a measure of sensitivity to input features, to enhance the Information Plane analysis. By incorporating gradient-based contributions, we provide a more precise characterization of information dynamics within the network. The proposed conductance-based Information Plane and a new Information Transformation Efficiency (ITE) metric are evaluated on pretrained ResNet50 and VGG16 models using the ImageNet dataset. Our results demonstrate the ability to identify critical hidden layers that contribute significantly to model performance and interpretability, giving insights into information compression, preservation, and utilization across layers. The conductance-based approach offers a granular perspective on feature attribution, enhancing our understanding of the decision-making processes within neural networks. Furthermore, our empirical findings challenge certain theoretical predictions of the Information Bottleneck theory, highlighting the complexities of information dynamics in real-world data scenarios. The proposed method not only advances our understanding of information dynamics in neural networks but also has the potential to significantly impact the broader field of Artificial Intelligence by enabling the development of more interpretable, efficient, and robust models.

new Detecting Interpretable Subgroup Drifts

Authors: Flavio Giobergia, Eliana Pastor, Luca de Alfaro, Elena Baralis

Abstract: The ability to detect and adapt to changes in data distributions is crucial to maintain the accuracy and reliability of machine learning models. Detection is generally approached by observing the drift of model performance from a global point of view. However, drifts occurring in (fine-grained) data subgroups may go unnoticed when monitoring global drift. We take a different perspective, and introduce methods for observing drift at the finer granularity of subgroups. Relevant data subgroups are identified during training and monitored efficiently throughout the model's life. Performance drifts in any subgroup are detected, quantified and characterized so as to provide an interpretable summary of the model behavior over time. Experimental results confirm that our subgroup-level drift analysis identifies drifts that do not show at the (coarser) global dataset level. The proposed approach provides a valuable tool for monitoring model performance in dynamic real-world applications, offering insights into the evolving nature of data and ultimately contributing to more robust and adaptive models.

new A Synthetic Benchmark to Explore Limitations of Localized Drift Detections

Authors: Flavio Giobergia, Eliana Pastor, Luca de Alfaro, Elena Baralis

Abstract: Concept drift is a common phenomenon in data streams where the statistical properties of the target variable change over time. Traditionally, drift is assumed to occur globally, affecting the entire dataset uniformly. However, this assumption does not always hold true in real-world scenarios where only specific subpopulations within the data may experience drift. This paper explores the concept of localized drift and evaluates the performance of several drift detection techniques in identifying such localized changes. We introduce a synthetic dataset based on the Agrawal generator, where drift is induced in a randomly chosen subgroup. Our experiments demonstrate that commonly adopted drift detection methods may fail to detect drift when it is confined to a small subpopulation. We propose and test various drift detection approaches to quantify their effectiveness in this localized drift scenario. We make the source code for the generation of the synthetic benchmark available at https://github.com/fgiobergia/subgroup-agrawal-drift.

URLs: https://github.com/fgiobergia/subgroup-agrawal-drift.

new PAT: Pruning-Aware Tuning for Large Language Models

Authors: Yijiang Liu, Huanrui Yang, Youxin Chen, Rongyu Zhang, Miao Wang, Yuan Du, Li Du

Abstract: Large language models (LLMs) excel in language tasks, especially with supervised fine-tuning after pre-training. However, their substantial memory and computational requirements hinder practical applications. Structural pruning, which reduces less significant weight dimensions, is one solution. Yet, traditional post-hoc pruning often leads to significant performance loss, with limited recovery from further fine-tuning due to reduced capacity. Since the model fine-tuning refines the general and chaotic knowledge in pre-trained models, we aim to incorporate structural pruning with the fine-tuning, and propose the Pruning-Aware Tuning (PAT) paradigm to eliminate model redundancy while preserving the model performance to the maximum extend. Specifically, we insert the innovative Hybrid Sparsification Modules (HSMs) between the Attention and FFN components to accordingly sparsify the upstream and downstream linear modules. The HSM comprises a lightweight operator and a globally shared trainable mask. The lightweight operator maintains a training overhead comparable to that of LoRA, while the trainable mask unifies the channels to be sparsified, ensuring structural pruning. Additionally, we propose the Identity Loss which decouples the transformation and scaling properties of the HSMs to enhance training robustness. Extensive experiments demonstrate that PAT excels in both performance and efficiency. For example, our Llama2-7b model with a 25\% pruning ratio achieves 1.33$\times$ speedup while outperforming the LoRA-finetuned model by up to 1.26\% in accuracy with a similar training cost. Code: https://github.com/kriskrisliu/PAT_Pruning-Aware-Tuning

URLs: https://github.com/kriskrisliu/PAT_Pruning-Aware-Tuning

new TART: Boosting Clean Accuracy Through Tangent Direction Guided Adversarial Training

Authors: Bongsoo Yi, Rongjie Lai, Yao Li

Abstract: Adversarial training has been shown to be successful in enhancing the robustness of deep neural networks against adversarial attacks. However, this robustness is accompanied by a significant decline in accuracy on clean data. In this paper, we propose a novel method, called Tangent Direction Guided Adversarial Training (TART), that leverages the tangent space of the data manifold to ameliorate the existing adversarial defense algorithms. We argue that training with adversarial examples having large normal components significantly alters the decision boundary and hurts accuracy. TART mitigates this issue by estimating the tangent direction of adversarial examples and allocating an adaptive perturbation limit according to the norm of their tangential component. To the best of our knowledge, our paper is the first work to consider the concept of tangent space and direction in the context of adversarial defense. We validate the effectiveness of TART through extensive experiments on both simulated and benchmark datasets. The results demonstrate that TART consistently boosts clean accuracy while retaining a high level of robustness against adversarial attacks. Our findings suggest that incorporating the geometric properties of data can lead to more effective and efficient adversarial training methods.

new General-Kindred Physics-Informed Neural Network to the Solutions of Singularly Perturbed Differential Equations

Authors: Sen Wang, Peizhi Zhao, Qinglong Ma, Tao Song

Abstract: Physics-Informed Neural Networks (PINNs) have become a promising research direction in the field of solving Partial Differential Equations (PDEs). Dealing with singular perturbation problems continues to be a difficult challenge in the field of PINN. The solution of singular perturbation problems often exhibits sharp boundary layers and steep gradients, and traditional PINN cannot achieve approximation of boundary layers. In this manuscript, we propose the General-Kindred Physics-Informed Neural Network (GKPINN) for solving Singular Perturbation Differential Equations (SPDEs). This approach utilizes asymptotic analysis to acquire prior knowledge of the boundary layer from the equation and establishes a novel network to assist PINN in approximating the boundary layer. It is compared with traditional PINN by solving examples of one-dimensional, two-dimensional, and time-varying SPDE equations. The research findings underscore the exceptional performance of our novel approach, GKPINN, which delivers a remarkable enhancement in reducing the $L_2$ error by two to four orders of magnitude compared to the established PINN methodology. This significant improvement is accompanied by a substantial acceleration in convergence rates, without compromising the high precision that is critical for our applications. Furthermore, GKPINN still performs well in extreme cases with perturbation parameters of ${1\times10}^{-38}$, demonstrating its excellent generalization ability.

new Learning Differentially Private Diffusion Models via Stochastic Adversarial Distillation

Authors: Bochao Liu, Pengju Wang, Shiming Ge

Abstract: While the success of deep learning relies on large amounts of training datasets, data is often limited in privacy-sensitive domains. To address this challenge, generative model learning with differential privacy has emerged as a solution to train private generative models for desensitized data generation. However, the quality of the images generated by existing methods is limited due to the complexity of modeling data distribution. We build on the success of diffusion models and introduce DP-SAD, which trains a private diffusion model by a stochastic adversarial distillation method. Specifically, we first train a diffusion model as a teacher and then train a student by distillation, in which we achieve differential privacy by adding noise to the gradients from other models to the student. For better generation quality, we introduce a discriminator to distinguish whether an image is from the teacher or the student, which forms the adversarial training. Extensive experiments and analysis clearly demonstrate the effectiveness of our proposed method.

new Training-Free Time-Series Anomaly Detection: Leveraging Image Foundation Models

Authors: Nobuo Namura, Yuma Ichikawa

Abstract: Recent advancements in time-series anomaly detection have relied on deep learning models to handle the diverse behaviors of time-series data. However, these models often suffer from unstable training and require extensive hyperparameter tuning, leading to practical limitations. Although foundation models present a potential solution, their use in time series is limited. To overcome these issues, we propose an innovative image-based, training-free time-series anomaly detection (ITF-TAD) approach. ITF-TAD converts time-series data into images using wavelet transform and compresses them into a single representation, leveraging image foundation models for anomaly detection. This approach achieves high-performance anomaly detection without unstable neural network training or hyperparameter tuning. Furthermore, ITF-TAD identifies anomalies across different frequencies, providing users with a detailed visualization of anomalies and their corresponding frequencies. Comprehensive experiments on five benchmark datasets, including univariate and multivariate time series, demonstrate that ITF-TAD offers a practical and effective solution with performance exceeding or comparable to that of deep models.

new Explainable Hierarchical Urban Representation Learning for Commuting Flow Prediction

Authors: Mingfei Cai, Yanbo Pang, Yoshihide Sekimoto

Abstract: Commuting flow prediction is an essential task for municipal operations in the real world. Previous studies have revealed that it is feasible to estimate the commuting origin-destination (OD) demand within a city using multiple auxiliary data. However, most existing methods are not suitable to deal with a similar task at a large scale, namely within a prefecture or the whole nation, owing to the increased number of geographical units that need to be maintained. In addition, region representation learning is a universal approach for gaining urban knowledge for diverse metropolitan downstream tasks. Although many researchers have developed comprehensive frameworks to describe urban units from multi-source data, they have not clarified the relationship between the selected geographical elements. Furthermore, metropolitan areas naturally preserve ranked structures, like cities and their inclusive districts, which makes elucidating relations between cross-level urban units necessary. Therefore, we develop a heterogeneous graph-based model to generate meaningful region embeddings at multiple spatial resolutions for predicting different types of inter-level OD flows. To demonstrate the effectiveness of the proposed method, extensive experiments were conducted using real-world aggregated mobile phone datasets collected from Shizuoka Prefecture, Japan. The results indicate that our proposed model outperforms existing models in terms of a uniform urban structure. We extend the understanding of predicted results using reasonable explanations to enhance the credibility of the model.

new Channel-wise Influence: Estimating Data Influence for Multivariate Time Series

Authors: Muyao Wang, Zeke Xie, Bo Chen

Abstract: The influence function, a technique from robust statistics, measures the impact on model parameters or related functions when training data is removed or modified. This effective and valuable post-hoc method allows for studying the interpretability of machine learning models without requiring costly model retraining. It would provide extensions like increasing model performance, improving model generalization, and offering interpretability. Recently, Multivariate Time Series (MTS) analysis has become an important yet challenging task, attracting significant attention. However, there is no preceding research on the influence functions of MTS to shed light on the effects of modifying the channel of training MTS. Given that each channel in an MTS plays a crucial role in its analysis, it is essential to characterize the influence of different channels. To fill this gap, we propose a channel-wise influence function, which is the first method that can estimate the influence of different channels in MTS, utilizing a first-order gradient approximation that leverages the more informative average gradient of the data set. Additionally, we demonstrate how this influence function can be used to estimate the impact of a channel in MTS. Finally, we validated the accuracy and effectiveness of our influence estimation function in critical MTS analysis tasks, such as MTS anomaly detection and MTS forecasting. According to abundant experiments on real-world dataset, the original influence function performs worse than our method and even fail for the channel pruning problem, which demonstrate the superiority and necessity of channel-wise influence function in MTS analysis tasks.

new Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning

Authors: Simran Kaur, Simon Park, Anirudh Goyal, Sanjeev Arora

Abstract: We introduce Instruct-SkillMix, an automated approach for creating diverse, high quality SFT data. The Instruct-SkillMix pipeline involves two stages, each leveraging an existing powerful LLM: (1) Skill extraction: uses the LLM to extract core "skills" for instruction-following, either from existing datasets, or by directly prompting the model; (2) Data generation: uses the powerful LLM to generate (instruction, response) data that exhibit a randomly chosen pair of these skills. Here, the use of random skill combinations promotes diversity and difficulty. Vanilla SFT (i.e., no PPO, DPO, or RL methods) on data generated from Instruct-SkillMix leads to strong gains on instruction following benchmarks such as AlpacaEval 2.0, MT-Bench, and WildBench. With just $4$K examples, LLaMA-3-8B-Base achieves 42.76% length-controlled win rate on AlpacaEval 2.0. To our knowledge, this achieves state-of-the-art performance among all models that have only undergone SFT (no RL methods) and competes with proprietary models such as Claude 3 Opus and LLaMA-3.1-405B-Instruct. Ablation studies also suggest plausible reasons for why creating open instruction-tuning datasets via naive crowd-sourcing has proved difficult. Introducing low quality answers ("shirkers") in $20\%$ of Instruct-SkillMix examples causes performance to plummet, sometimes catastrophically. The Instruct-SkillMix pipeline is flexible and is adaptable to other settings.

new GINN-KAN: Interpretability pipelining with applications in Physics Informed Neural Networks

Authors: Nisal Ranasinghe, Yu Xia, Sachith Seneviratne, Saman Halgamuge

Abstract: Neural networks are powerful function approximators, yet their ``black-box" nature often renders them opaque and difficult to interpret. While many post-hoc explanation methods exist, they typically fail to capture the underlying reasoning processes of the networks. A truly interpretable neural network would be trained similarly to conventional models using techniques such as backpropagation, but additionally provide insights into the learned input-output relationships. In this work, we introduce the concept of interpretability pipelineing, to incorporate multiple interpretability techniques to outperform each individual technique. To this end, we first evaluate several architectures that promise such interpretability, with a particular focus on two recent models selected for their potential to incorporate interpretability into standard neural network architectures while still leveraging backpropagation: the Growing Interpretable Neural Network (GINN) and Kolmogorov Arnold Networks (KAN). We analyze the limitations and strengths of each and introduce a novel interpretable neural network GINN-KAN that synthesizes the advantages of both models. When tested on the Feynman symbolic regression benchmark datasets, GINN-KAN outperforms both GINN and KAN. To highlight the capabilities and the generalizability of this approach, we position GINN-KAN as an alternative to conventional black-box networks in Physics-Informed Neural Networks (PINNs). We expect this to have far-reaching implications in the application of deep learning pipelines in the natural sciences. Our experiments with this interpretable PINN on 15 different partial differential equations demonstrate that GINN-KAN augmented PINNs outperform PINNs with black-box networks in solving differential equations and surpass the capabilities of both GINN and KAN.

new Unsupervised-to-Online Reinforcement Learning

Authors: Junsu Kim, Seohong Park, Sergey Levine

Abstract: Offline-to-online reinforcement learning (RL), a framework that trains a policy with offline RL and then further fine-tunes it with online RL, has been considered a promising recipe for data-driven decision-making. While sensible, this framework has drawbacks: it requires domain-specific offline RL pre-training for each task, and is often brittle in practice. In this work, we propose unsupervised-to-online RL (U2O RL), which replaces domain-specific supervised offline RL with unsupervised offline RL, as a better alternative to offline-to-online RL. U2O RL not only enables reusing a single pre-trained model for multiple downstream tasks, but also learns better representations, which often result in even better performance and stability than supervised offline-to-online RL. To instantiate U2O RL in practice, we propose a general recipe for U2O RL to bridge task-agnostic unsupervised offline skill-based policy pre-training and supervised online fine-tuning. Throughout our experiments in nine state-based and pixel-based environments, we empirically demonstrate that U2O RL achieves strong performance that matches or even outperforms previous offline-to-online RL approaches, while being able to reuse a single pre-trained model for a number of different downstream tasks.

new Learning from Complementary Features

Authors: Kosuke Sugiyama, Masato Uchida

Abstract: While precise data observation is essential for the learning processes of predictive models, it can be challenging owing to factors such as insufficient observation accuracy, high collection costs, and privacy constraints. In this paper, we examines cases where some qualitative features are unavailable as precise information indicating "what it is," but rather as complementary information indicating "what it is not." We refer to features defined by precise information as ordinary features (OFs) and those defined by complementary information as complementary features (CFs). We then formulate a new learning scenario termed Complementary Feature Learning (CFL), where predictive models are constructed using instances consisting of OFs and CFs. The simplest formalization of CFL applies conventional supervised learning directly using the observed values of CFs. However, this approach does not resolve the ambiguity associated with CFs, making learning challenging and complicating the interpretation of the predictive model's specific predictions. Therefore, we derive an objective function from an information-theoretic perspective to estimate the OF values corresponding to CFs and to predict output labels based on these estimations. Based on this objective function, we propose a theoretically guaranteed graph-based estimation method along with its practical approximation, for estimating OF values corresponding to CFs. The results of numerical experiments conducted with real-world data demonstrate that our proposed method effectively estimates OF values corresponding to CFs and predicts output labels.

new Poly2Vec: Polymorphic Encoding of Geospatial Objects for Spatial Reasoning with Deep Neural Networks

Authors: Maria Despoina Siampou, Jialiang Li, John Krumm, Cyrus Shahabi, Hua Lu

Abstract: Encoding geospatial data is crucial for enabling machine learning (ML) models to perform tasks that require spatial reasoning, such as identifying the topological relationships between two different geospatial objects. However, existing encoding methods are limited as they are typically customized to handle only specific types of spatial data, which impedes their applicability across different downstream tasks where multiple data types coexist. To address this, we introduce Poly2Vec, an encoding framework that unifies the modeling of different geospatial objects, including 2D points, polylines, and polygons, irrespective of the downstream task. We leverage the power of the 2D Fourier transform to encode useful spatial properties, such as shape and location, from geospatial objects into fixed-length vectors. These vectors are then inputted into neural network models for spatial reasoning tasks.This unified approach eliminates the need to develop and train separate models for each distinct spatial type. We evaluate Poly2Vec on both synthetic and real datasets of mixed geometry types and verify its consistent performance across several downstream spatial reasoning tasks.

new A Comprehensive Benchmark of Machine and Deep Learning Across Diverse Tabular Datasets

Authors: Assaf Shmuel, Oren Glickman, Teddy Lazebnik

Abstract: The analysis of tabular datasets is highly prevalent both in scientific research and real-world applications of Machine Learning (ML). Unlike many other ML tasks, Deep Learning (DL) models often do not outperform traditional methods in this area. Previous comparative benchmarks have shown that DL performance is frequently equivalent or even inferior to models such as Gradient Boosting Machines (GBMs). In this study, we introduce a comprehensive benchmark aimed at better characterizing the types of datasets where DL models excel. Although several important benchmarks for tabular datasets already exist, our contribution lies in the variety and depth of our comparison: we evaluate 111 datasets with 20 different models, including both regression and classification tasks. These datasets vary in scale and include both those with and without categorical variables. Importantly, our benchmark contains a sufficient number of datasets where DL models perform best, allowing for a thorough analysis of the conditions under which DL models excel. Building on the results of this benchmark, we train a model that predicts scenarios where DL models outperform alternative methods with 86.1% accuracy (AUC 0.78). We present insights derived from this characterization and compare these findings to previous benchmarks.

new Data-driven Effective Modeling of Multiscale Stochastic Dynamical Systems

Authors: Yuan Chen, Dongbin Xiu

Abstract: We present a numerical method for learning the dynamics of slow components of unknown multiscale stochastic dynamical systems. While the governing equations of the systems are unknown, bursts of observation data of the slow variables are available. By utilizing the observation data, our proposed method is capable of constructing a generative stochastic model that can accurately capture the effective dynamics of the slow variables in distribution. We present a comprehensive set of numerical examples to demonstrate the performance of the proposed method.

new DRL-Based Federated Self-Supervised Learning for Task Offloading and Resource Allocation in ISAC-Enabled Vehicle Edge Computing

Authors: Xueying Gu, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Khaled B. Letaief

Abstract: Intelligent Transportation Systems (ITS) leverage Integrated Sensing and Communications (ISAC) to enhance data exchange between vehicles and infrastructure in the Internet of Vehicles (IoV). This integration inevitably increases computing demands, risking real-time system stability. Vehicle Edge Computing (VEC) addresses this by offloading tasks to Road Side Unit (RSU), ensuring timely services. Our previous work FLSimCo algorithm, which uses local resources for Federated Self-Supervised Learning (SSL), though vehicles often can't complete all iterations task. Our improved algorithm offloads partial task to RSU and optimizes energy consumption by adjusting transmission power, CPU frequency, and task assignment ratios, balancing local and RSU-based training. Meanwhile, setting an offloading threshold further prevents inefficiencies. Simulation results show that the enhanced algorithm reduces energy consumption, improves offloading efficiency and the accuracy of Federated SSL.

new Diffusion Models Are Real-Time Game Engines

Authors: Dani Valevski, Yaniv Leviathan, Moab Arar, Shlomi Fruchter

Abstract: We present GameNGen, the first game engine powered entirely by a neural model that enables real-time interaction with a complex environment over long trajectories at high quality. GameNGen can interactively simulate the classic game DOOM at over 20 frames per second on a single TPU. Next frame prediction achieves a PSNR of 29.4, comparable to lossy JPEG compression. Human raters are only slightly better than random chance at distinguishing short clips of the game from clips of the simulation. GameNGen is trained in two phases: (1) an RL-agent learns to play the game and the training sessions are recorded, and (2) a diffusion model is trained to produce the next frame, conditioned on the sequence of past frames and actions. Conditioning augmentations enable stable auto-regressive generation over long trajectories.

new Correntropy-Based Improper Likelihood Model for Robust Electrophysiological Source Imaging

Authors: Yuanhao Li, Badong Chen, Zhongxu Hu, Keita Suzuki, Wenjun Bai, Yasuharu Koike, Okito Yamashita

Abstract: Bayesian learning provides a unified skeleton to solve the electrophysiological source imaging task. From this perspective, existing source imaging algorithms utilize the Gaussian assumption for the observation noise to build the likelihood function for Bayesian inference. However, the electromagnetic measurements of brain activity are usually affected by miscellaneous artifacts, leading to a potentially non-Gaussian distribution for the observation noise. Hence the conventional Gaussian likelihood model is a suboptimal choice for the real-world source imaging task. In this study, we aim to solve this problem by proposing a new likelihood model which is robust with respect to non-Gaussian noises. Motivated by the robust maximum correntropy criterion, we propose a new improper distribution model concerning the noise assumption. This new noise distribution is leveraged to structure a robust likelihood function and integrated with hierarchical prior distributions to estimate source activities by variational inference. In particular, the score matching is adopted to determine the hyperparameters for the improper likelihood model. A comprehensive performance evaluation is performed to compare the proposed noise assumption to the conventional Gaussian model. Simulation results show that, the proposed method can realize more precise source reconstruction by designing known ground-truth. The real-world dataset also demonstrates the superiority of our new method with the visual perception task. This study provides a new backbone for Bayesian source imaging, which would facilitate its application using real-world noisy brain signal.

new Dynamic operator management in meta-heuristics using reinforcement learning: an application to permutation flowshop scheduling problems

Authors: Maryam Karimi Mamaghan, Mehrdad Mohammadi, Wout Dullaert, Daniele Vigo, Amir Pirayesh

Abstract: This study develops a framework based on reinforcement learning to dynamically manage a large portfolio of search operators within meta-heuristics. Using the idea of tabu search, the framework allows for continuous adaptation by temporarily excluding less efficient operators and updating the portfolio composition during the search. A Q-learning-based adaptive operator selection mechanism is used to select the most suitable operator from the dynamically updated portfolio at each stage. Unlike traditional approaches, the proposed framework requires no input from the experts regarding the search operators, allowing domain-specific non-experts to effectively use the framework. The performance of the proposed framework is analyzed through an application to the permutation flowshop scheduling problem. The results demonstrate the superior performance of the proposed framework against state-of-the-art algorithms in terms of optimality gap and convergence speed.

new Adversarial Attacks and Defenses in Multivariate Time-Series Forecasting for Smart and Connected Infrastructures

Authors: Pooja Krishan, Rohan Mohapatra, Saptarshi Sengupta

Abstract: The emergence of deep learning models has revolutionized various industries over the last decade, leading to a surge in connected devices and infrastructures. However, these models can be tricked into making incorrect predictions with high confidence, leading to disastrous failures and security concerns. To this end, we explore the impact of adversarial attacks on multivariate time-series forecasting and investigate methods to counter them. Specifically, we employ untargeted white-box attacks, namely the Fast Gradient Sign Method (FGSM) and the Basic Iterative Method (BIM), to poison the inputs to the training process, effectively misleading the model. We also illustrate the subtle modifications to the inputs after the attack, which makes detecting the attack using the naked eye quite difficult. Having demonstrated the feasibility of these attacks, we develop robust models through adversarial training and model hardening. We are among the first to showcase the transferability of these attacks and defenses by extrapolating our work from the benchmark electricity data to a larger, 10-year real-world data used for predicting the time-to-failure of hard disks. Our experimental results confirm that the attacks and defenses achieve the desired security thresholds, leading to a 72.41% and 94.81% decrease in RMSE for the electricity and hard disk datasets respectively after implementing the adversarial defenses.

new Can Transformers Do Enumerative Geometry?

Authors: Baran Hashemi, Roderic G. Corominas, Alessandro Giacchetto

Abstract: How can Transformers model and learn enumerative geometry? What is a robust procedure for using Transformers in abductive knowledge discovery within a mathematician-machine collaboration? In this work, we introduce a new paradigm in computational enumerative geometry in analyzing the $\psi$-class intersection numbers on the moduli space of curves. By formulating the enumerative problem as a continuous optimization task, we develop a Transformer-based model for computing $\psi$-class intersection numbers based on the underlying quantum Airy structure. For a finite range of genera, our model is capable of regressing intersection numbers that span an extremely wide range of values, from $10^{-45}$ to $10^{45}$. To provide a proper inductive bias for capturing the recursive behavior of intersection numbers, we propose a new activation function, Dynamic Range Activator (DRA). Moreover, given the severe heteroscedasticity of $\psi$-class intersections and the required precision, we quantify the uncertainty of the predictions using Conformal Prediction with a dynamic sliding window that is aware of the number of marked points. Next, we go beyond merely computing intersection numbers and explore the enumerative "world-model" of the Transformers. Through a series of causal inference and correlational interpretability analyses, we demonstrate that Transformers are actually modeling Virasoro constraints in a purely data-driven manner. Additionally, we provide evidence for the comprehension of several values appearing in the large genus asymptotic of $\psi$-class intersection numbers through abductive hypothesis testing.

new Quotient Normalized Maximum Likelihood Criterion for Learning Bayesian Network Structures

Authors: Tomi Silander, Janne Lepp\"a-aho, Elias J\"a\"asaari, Teemu Roos

Abstract: We introduce an information theoretic criterion for Bayesian network structure learning which we call quotient normalized maximum likelihood (qNML). In contrast to the closely related factorized normalized maximum likelihood criterion, qNML satisfies the property of score equivalence. It is also decomposable and completely free of adjustable hyperparameters. For practical computations, we identify a remarkably accurate approximation proposed earlier by Szpankowski and Weinberger. Experiments on both simulated and real data demonstrate that the new criterion leads to parsimonious models with good predictive accuracy.

new Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning

Authors: Sakhinana Sagar Srinivas, Venkataramana Runkana

Abstract: In the field of chemistry, the objective is to create novel molecules with desired properties, facilitating accurate property predictions for applications such as material design and drug screening. However, existing graph deep learning methods face limitations that curb their expressive power. To address this, we explore the integration of vast molecular domain knowledge from Large Language Models (LLMs) with the complementary strengths of Graph Neural Networks (GNNs) to enhance performance in property prediction tasks. We introduce a Multi-Modal Fusion (MMF) framework that synergistically harnesses the analytical prowess of GNNs and the linguistic generative and predictive abilities of LLMs, thereby improving accuracy and robustness in predicting molecular properties. Our framework combines the effectiveness of GNNs in modeling graph-structured data with the zero-shot and few-shot learning capabilities of LLMs, enabling improved predictions while reducing the risk of overfitting. Furthermore, our approach effectively addresses distributional shifts, a common challenge in real-world applications, and showcases the efficacy of learning cross-modal representations, surpassing state-of-the-art baselines on benchmark datasets for property prediction tasks.

new Prior-free Balanced Replay: Uncertainty-guided Reservoir Sampling for Long-Tailed Continual Learning

Authors: Lei Liu, Li Liu, Yawen Cui

Abstract: Even in the era of large models, one of the well-known issues in continual learning (CL) is catastrophic forgetting, which is significantly challenging when the continual data stream exhibits a long-tailed distribution, termed as Long-Tailed Continual Learning (LTCL). Existing LTCL solutions generally require the label distribution of the data stream to achieve re-balance training. However, obtaining such prior information is often infeasible in real scenarios since the model should learn without pre-identifying the majority and minority classes. To this end, we propose a novel Prior-free Balanced Replay (PBR) framework to learn from long-tailed data stream with less forgetting. Concretely, motivated by our experimental finding that the minority classes are more likely to be forgotten due to the higher uncertainty, we newly design an uncertainty-guided reservoir sampling strategy to prioritize rehearsing minority data without using any prior information, which is based on the mutual dependence between the model and samples. Additionally, we incorporate two prior-free components to further reduce the forgetting issue: (1) Boundary constraint is to preserve uncertain boundary supporting samples for continually re-estimating task boundaries. (2) Prototype constraint is to maintain the consistency of learned class prototypes along with training. Our approach is evaluated on three standard long-tailed benchmarks, demonstrating superior performance to existing CL methods and previous SOTA LTCL approach in both task- and class-incremental learning settings, as well as ordered- and shuffled-LTCL settings.

new MONAS: Efficient Zero-Shot Neural Architecture Search for MCUs

Authors: Ye Qiao, Haocheng Xu, Yifan Zhang, Sitao Huang

Abstract: Neural Architecture Search (NAS) has proven effective in discovering new Convolutional Neural Network (CNN) architectures, particularly for scenarios with well-defined accuracy optimization goals. However, previous approaches often involve time-consuming training on super networks or intensive architecture sampling and evaluations. Although various zero-cost proxies correlated with CNN model accuracy have been proposed for efficient architecture search without training, their lack of hardware consideration makes it challenging to target highly resource-constrained edge devices such as microcontroller units (MCUs). To address these challenges, we introduce MONAS, a novel hardware-aware zero-shot NAS framework specifically designed for MCUs in edge computing. MONAS incorporates hardware optimality considerations into the search process through our proposed MCU hardware latency estimation model. By combining this with specialized performance indicators (proxies), MONAS identifies optimal neural architectures without incurring heavy training and evaluation costs, optimizing for both hardware latency and accuracy under resource constraints. MONAS achieves up to a 1104x improvement in search efficiency over previous work targeting MCUs and can discover CNN models with over 3.23x faster inference on MCUs while maintaining similar accuracy compared to more general NAS approaches.

new Causal Rule Forest: Toward Interpretable and Precise Treatment Effect Estimation

Authors: Chan Hsu, Jun-Ting Wu, Yihuang Kang

Abstract: Understanding and inferencing Heterogeneous Treatment Effects (HTE) and Conditional Average Treatment Effects (CATE) are vital for developing personalized treatment recommendations. Many state-of-the-art approaches achieve inspiring performance in estimating HTE on benchmark datasets or simulation studies. However, the indirect predicting manner and complex model architecture reduce the interpretability of these approaches. To mitigate the gap between predictive performance and heterogeneity interpretability, we introduce the Causal Rule Forest (CRF), a novel approach to learning hidden patterns from data and transforming the patterns into interpretable multi-level Boolean rules. By training the other interpretable causal inference models with data representation learned by CRF, we can reduce the predictive errors of these models in estimating HTE and CATE, while keeping their interpretability for identifying subgroups that a treatment is more effective. Our experiments underscore the potential of CRF to advance personalized interventions and policies, paving the way for future research to enhance its scalability and application across complex causal inference challenges.

new Subgroup Analysis via Model-based Rule Forest

Authors: I-Ling Cheng, Chan Hsu, Chantung Ku, Pei-Ju Lee, Yihuang Kang

Abstract: Machine learning models are often criticized for their black-box nature, raising concerns about their applicability in critical decision-making scenarios. Consequently, there is a growing demand for interpretable models in such contexts. In this study, we introduce Model-based Deep Rule Forests (mobDRF), an interpretable representation learning algorithm designed to extract transparent models from data. By leveraging IF-THEN rules with multi-level logic expressions, mobDRF enhances the interpretability of existing models without compromising accuracy. We apply mobDRF to identify key risk factors for cognitive decline in an elderly population, demonstrating its effectiveness in subgroup analysis and local model optimization. Our method offers a promising solution for developing trustworthy and interpretable machine learning models, particularly valuable in fields like healthcare, where understanding differential effects across patient subgroups can lead to more personalized and effective treatments.

new MiWaves Reinforcement Learning Algorithm

Authors: Susobhan Ghosh, Yongyi Guo, Pei-Yao Hung, Lara Coughlin, Erin Bonar, Inbal Nahum-Shani, Maureen Walton, Susan Murphy

Abstract: The escalating prevalence of cannabis use poses a significant public health challenge globally. In the U.S., cannabis use is more prevalent among emerging adults (EAs) (ages 18-25) than any other age group, with legalization in the multiple states contributing to a public perception that cannabis is less risky than in prior decades. To address this growing concern, we developed MiWaves, a reinforcement learning (RL) algorithm designed to optimize the delivery of personalized intervention prompts to reduce cannabis use among EAs. MiWaves leverages domain expertise and prior data to tailor the likelihood of delivery of intervention messages. This paper presents a comprehensive overview of the algorithm's design, including key decisions and experimental outcomes. The finalized MiWaves RL algorithm was deployed in a clinical trial from March to May 2024.

new Constrained Diffusion Models via Dual Training

Authors: Shervin Khalafi, Dongsheng Ding, Alejandro Ribeiro

Abstract: Diffusion models have attained prominence for their ability to synthesize a probability distribution for a given dataset via a diffusion process, enabling the generation of new data points with high fidelity. However, diffusion processes are prone to generating biased data based on the training dataset. To address this issue, we develop constrained diffusion models by imposing diffusion constraints based on desired distributions that are informed by requirements. Specifically, we cast the training of diffusion models under requirements as a constrained distribution optimization problem that aims to reduce the distribution difference between original and generated data while obeying constraints on the distribution of generated data. We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints. To train constrained diffusion models, we develop a dual training algorithm and characterize the optimality of the trained constrained diffusion model. We empirically demonstrate the effectiveness of our constrained models in two constrained generation tasks: (i) we consider a dataset with one or more underrepresented classes where we train the model with constraints to ensure fairly sampling from all classes during inference; (ii) we fine-tune a pre-trained diffusion model to sample from a new dataset while avoiding overfitting.

new Post-processing fairness with minimal changes

Authors: Federico Di Gennaro, Thibault Laugel, Vincent Grari, Xavier Renard, Marcin Detyniecki

Abstract: In this paper, we introduce a novel post-processing algorithm that is both model-agnostic and does not require the sensitive attribute at test time. In addition, our algorithm is explicitly designed to enforce minimal changes between biased and debiased predictions; a property that, while highly desirable, is rarely prioritized as an explicit objective in fairness literature. Our approach leverages a multiplicative factor applied to the logit value of probability scores produced by a black-box classifier. We demonstrate the efficacy of our method through empirical evaluations, comparing its performance against other four debiasing algorithms on two widely used datasets in fairness research.

new No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery

Authors: Alexander Rutherford, Michael Beukman, Timon Willi, Bruno Lacerda, Nick Hawes, Jakob Foerster

Abstract: What data or environments to use for training to improve downstream performance is a longstanding and very topical question in reinforcement learning. In particular, Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula enable agents to be robust to in- and out-of-distribution tasks. We ask to what extent these methods are themselves robust when applied to a novel setting, closely inspired by a real-world robotics problem. Surprisingly, we find that the state-of-the-art UED methods either do not improve upon the na\"{i}ve baseline of Domain Randomisation (DR), or require substantial hyperparameter tuning to do so. Our analysis shows that this is due to their underlying scoring functions failing to predict intuitive measures of ``learnability'', i.e., in finding the settings that the agent sometimes solves, but not always. Based on this, we instead directly train on levels with high learnability and find that this simple and intuitive approach outperforms UED methods and DR in several binary-outcome environments, including on our domain and the standard UED domain of Minigrid. We further introduce a new adversarial evaluation procedure for directly measuring robustness, closely mirroring the conditional value at risk (CVaR). We open-source all our code and present visualisations of final policies here: https://github.com/amacrutherford/sampling-for-learnability.

URLs: https://github.com/amacrutherford/sampling-for-learnability.

new Evaluating the Energy Consumption of Machine Learning: Systematic Literature Review and Experiments

Authors: Charlotte Rodriguez, Laura Degioanni, Laetitia Kameni, Richard Vidal, Giovanni Neglia

Abstract: Monitoring, understanding, and optimizing the energy consumption of Machine Learning (ML) are various reasons why it is necessary to evaluate the energy usage of ML. However, there exists no universal tool that can answer this question for all use cases, and there may even be disagreement on how to evaluate energy consumption for a specific use case. Tools and methods are based on different approaches, each with their own advantages and drawbacks, and they need to be mapped out and explained in order to select the most suitable one for a given situation. We address this challenge through two approaches. First, we conduct a systematic literature review of all tools and methods that permit to evaluate the energy consumption of ML (both at training and at inference), irrespective of whether they were originally designed for machine learning or general software. Second, we develop and use an experimental protocol to compare a selection of these tools and methods. The comparison is both qualitative and quantitative on a range of ML tasks of different nature (vision, language) and computational complexity. The systematic literature review serves as a comprehensive guide for understanding the array of tools and methods used in evaluating energy consumption of ML, for various use cases going from basic energy monitoring to consumption optimization. Two open-source repositories are provided for further exploration. The first one contains tools that can be used to replicate this work or extend the current review. The second repository houses the experimental protocol, allowing users to augment the protocol with new ML computing tasks and additional energy evaluation tools.

new Using LLMs for Explaining Sets of Counterfactual Examples to Final Users

Authors: Arturo Fredes, Jordi Vitria

Abstract: Causality is vital for understanding true cause-and-effect relationships between variables within predictive models, rather than relying on mere correlations, making it highly relevant in the field of Explainable AI. In an automated decision-making scenario, causal inference methods can analyze the underlying data-generation process, enabling explanations of a model's decision by manipulating features and creating counterfactual examples. These counterfactuals explore hypothetical scenarios where a minimal number of factors are altered, providing end-users with valuable information on how to change their situation. However, interpreting a set of multiple counterfactuals can be challenging for end-users who are not used to analyzing raw data records. In our work, we propose a novel multi-step pipeline that uses counterfactuals to generate natural language explanations of actions that will lead to a change in outcome in classifiers of tabular data using LLMs. This pipeline is designed to guide the LLM through smaller tasks that mimic human reasoning when explaining a decision based on counterfactual cases. We conducted various experiments using a public dataset and proposed a method of closed-loop evaluation to assess the coherence of the final explanation with the counterfactuals, as well as the quality of the content. Results are promising, although further experiments with other datasets and human evaluations should be carried out.

new How transformers learn structured data: insights from hierarchical filtering

Authors: Jerome Garnier-Brun, Marc M\'ezard, Emanuele Moscato, Luca Saglietti

Abstract: We introduce a hierarchical filtering procedure for generative models of sequences on trees, enabling control over the range of positional correlations in the data. Leveraging this controlled setting, we provide evidence that vanilla encoder-only transformer architectures can implement the optimal Belief Propagation algorithm on both root classification and masked language modeling tasks. Correlations at larger distances corresponding to increasing layers of the hierarchy are sequentially included as the network is trained. We analyze how the transformer layers succeed by focusing on attention maps from models trained with varying degrees of filtering. These attention maps show clear evidence for iterative hierarchical reconstruction of correlations, and we can relate these observations to a plausible implementation of the exact inference algorithm for the network sizes considered.

new Delay as Payoff in MAB

Authors: Ofir Schlisselberg, Ido Cohen, Tal Lancewicki, Yishay Mansour

Abstract: In this paper, we investigate a variant of the classical stochastic Multi-armed Bandit (MAB) problem, where the payoff received by an agent (either cost or reward) is both delayed, and directly corresponds to the magnitude of the delay. This setting models faithfully many real world scenarios such as the time it takes for a data packet to traverse a network given a choice of route (where delay serves as the agent's cost); or a user's time spent on a web page given a choice of content (where delay serves as the agent's reward). Our main contributions are tight upper and lower bounds for both the cost and reward settings. For the case that delays serve as costs, which we are the first to consider, we prove optimal regret that scales as $\sum_{i:\Delta_i > 0}\frac{\log T}{\Delta_i} + d^*$, where $T$ is the maximal number of steps, $\Delta_i$ are the sub-optimality gaps and $d^*$ is the minimal expected delay amongst arms. For the case that delays serves as rewards, we show optimal regret of $\sum_{i:\Delta_i > 0}\frac{\log T}{\Delta_i} + \bar{d}$, where $\bar d$ is the second maximal expected delay. These improve over the regret in the general delay-dependent payoff setting, which scales as $\sum_{i:\Delta_i > 0}\frac{\log T}{\Delta_i} + D$, where $D$ is the maximum possible delay. Our regret bounds highlight the difference between the cost and reward scenarios, showing that the improvement in the cost scenario is more significant than for the reward. Finally, we accompany our theoretical results with an empirical evaluation.

new Latent Ewald summation for machine learning of long-range interactions

Authors: Bingqing Cheng

Abstract: Machine learning interatomic potentials (MLIPs) often neglect long-range interactions, such as electrostatic and dispersion forces. In this work, we introduce a straightforward and efficient method to account for long-range interactions by learning a latent variable from local atomic descriptors and applying an Ewald summation to this variable. We demonstrate that in systems including charged, polar, or apolar molecular dimers, bulk water, and water-vapor interface, standard short-ranged MLIPs can lead to unphysical predictions even when employing message passing. The long-range models effectively eliminate these artifacts, with only about twice the computational cost of short-range MLIPs.

new LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet

Authors: Nathaniel Li, Ziwen Han, Ian Steneker, Willow Primack, Riley Goodside, Hugh Zhang, Zifan Wang, Cristina Menghini, Summer Yue

Abstract: Recent large language model (LLM) defenses have greatly improved models' ability to refuse harmful queries, even when adversarially attacked. However, LLM defenses are primarily evaluated against automated adversarial attacks in a single turn of conversation, an insufficient threat model for real-world malicious use. We demonstrate that multi-turn human jailbreaks uncover significant vulnerabilities, exceeding 70% attack success rate (ASR) on HarmBench against defenses that report single-digit ASRs with automated single-turn attacks. Human jailbreaks also reveal vulnerabilities in machine unlearning defenses, successfully recovering dual-use biosecurity knowledge from unlearned models. We compile these results into Multi-Turn Human Jailbreaks (MHJ), a dataset of 2,912 prompts across 537 multi-turn jailbreaks. We publicly release MHJ alongside a compendium of jailbreak tactics developed across dozens of commercial red teaming engagements, supporting research towards stronger LLM defenses.

new The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Authors: Junxiong Wang, Daniele Paliotta, Avner May, Alexander M. Rush, Tri Dao

Abstract: Linear RNN architectures, like Mamba, can be competitive with Transformer models in language modeling while having advantageous deployment characteristics. Given the focus on training large-scale Transformer models, we consider the challenge of converting these pretrained models for deployment. We demonstrate that it is feasible to distill large Transformers into linear RNNs by reusing the linear projection weights from attention layers with academic GPU resources. The resulting hybrid model, which incorporates a quarter of the attention layers, achieves performance comparable to the original Transformer in chat benchmarks and outperforms open-source hybrid Mamba models trained from scratch with trillions of tokens in both chat benchmarks and general benchmarks. Moreover, we introduce a hardware-aware speculative decoding algorithm that accelerates the inference speed of Mamba and hybrid models. Overall we show how, with limited computation resources, we can remove many of the original attention layers and generate from the resulting model more efficiently. Our top-performing model, distilled from Llama3-8B-Instruct, achieves a 29.61 length-controlled win rate on AlpacaEval 2 against GPT-4 and 7.35 on MT-Bench, surpassing the best instruction-tuned linear RNN model.

new Generative Verifiers: Reward Modeling as Next-Token Prediction

Authors: Lunjun Zhang, Arian Hosseini, Hritik Bansal, Mehran Kazemi, Aviral Kumar, Rishabh Agarwal

Abstract: Verifiers or reward models are often used to enhance the reasoning performance of large language models (LLMs). A common approach is the Best-of-N method, where N candidate solutions generated by the LLM are ranked by a verifier, and the best one is selected. While LLM-based verifiers are typically trained as discriminative classifiers to score solutions, they do not utilize the text generation capabilities of pretrained LLMs. To overcome this limitation, we instead propose training verifiers using the ubiquitous next-token prediction objective, jointly on verification and solution generation. Compared to standard verifiers, such generative verifiers (GenRM) can benefit from several advantages of LLMs: they integrate seamlessly with instruction tuning, enable chain-of-thought reasoning, and can utilize additional inference-time compute via majority voting for better verification. We demonstrate that when using Gemma-based verifiers on algorithmic and grade-school math reasoning tasks, GenRM outperforms discriminative verifiers and LLM-as-a-Judge, showing a 16-64% improvement in the percentage of problems solved with Best-of-N. Furthermore, we show that GenRM scales favorably across dataset size, model capacity, and inference-time compute.

cross Satellite Sunroof: High-res Digital Surface Models and Roof Segmentation for Global Solar Mapping

Authors: Vishal Batchu, Alex Wilson, Betty Peng, Carl Elkin, Umangi Jain, Christopher Van Arsdale, Ross Goroshin, Varun Gulshan

Abstract: The transition to renewable energy, particularly solar, is key to mitigating climate change. Google's Solar API aids this transition by estimating solar potential from aerial imagery, but its impact is constrained by geographical coverage. This paper proposes expanding the API's reach using satellite imagery, enabling global solar potential assessment. We tackle challenges involved in building a Digital Surface Model (DSM) and roof instance segmentation from lower resolution and single oblique views using deep learning models. Our models, trained on aligned satellite and aerial datasets, produce 25cm DSMs and roof segments. With ~1m DSM MAE on buildings, ~5deg roof pitch error and ~56% IOU on roof segmentation, they significantly enhance the Solar API's potential to promote solar adoption.

cross RISE-iEEG: Robust to Inter-Subject Electrodes Implantation Variability iEEG Classifier

Authors: Maryam Ostadsharif Memar, Navid Ziaei, Behzad Nazari, Ali Yousefi

Abstract: Utilization of intracranial electroencephalography (iEEG) is rapidly increasing for clinical and brain-computer interface applications. iEEG facilitates the recording of neural activity with high spatial and temporal resolution, making it a desirable neuroimaging modality for studying neural dynamics. Despite its benefits, iEEG faces challenges such as inter-subject variability in electrode implantation, which makes the development of unified neural decoder models across different patients difficult. In this research, we introduce a novel decoder model that is robust to inter-subject electrode implantation variability. We call this model RISE-iEEG, which stands for Robust Inter-Subject Electrode Implantation Variability iEEG Classifier. RISE-iEEG employs a deep neural network structure preceded by a patient-specific projection network. The projection network maps the neural data of individual patients onto a common low-dimensional space, compensating for the implantation variability. In other words, we developed an iEEG decoder model that can be applied across multiple patients' data without requiring the coordinates of electrode for each patient. The performance of RISE-iEEG across multiple datasets, including the Audio-Visual dataset, Music Reconstruction dataset, and Upper-Limb Movement dataset, surpasses that of state-of-the-art iEEG decoder models such as HTNet and EEGNet. Our analysis shows that the performance of RISE-iEEG is 10\% higher than that of HTNet and EEGNet in terms of F1 score, with an average F1 score of 83\%, which is the highest result among the evaluation methods defined. Furthermore, the analysis of projection network weights in the Music Reconstruction dataset across patients suggests that the Superior Temporal lobe serves as the primary encoding neural node. This finding aligns with the auditory processing physiology.

cross Agentic Retrieval-Augmented Generation for Time Series Analysis

Authors: Chidaksh Ravuru, Sagar Srinivas Sakhinana, Venkataramana Runkana

Abstract: Time series modeling is crucial for many applications, however, it faces challenges such as complex spatio-temporal dependencies and distribution shifts in learning from historical context to predict task-specific outcomes. To address these challenges, we propose a novel approach using an agentic Retrieval-Augmented Generation (RAG) framework for time series analysis. The framework leverages a hierarchical, multi-agent architecture where the master agent orchestrates specialized sub-agents and delegates the end-user request to the relevant sub-agent. The sub-agents utilize smaller, pre-trained language models (SLMs) customized for specific time series tasks through fine-tuning using instruction tuning and direct preference optimization, and retrieve relevant prompts from a shared repository of prompt pools containing distilled knowledge about historical patterns and trends to improve predictions on new data. Our proposed modular, multi-agent RAG approach offers flexibility and achieves state-of-the-art performance across major time series tasks by tackling complex challenges more effectively than task-specific customized methods across benchmark datasets.

cross Active learning of digenic functions with boolean matrix logic programming

Authors: Lun Ai, Stephen H. Muggleton, Shi-shun Liang, Geoff S. Baldwin

Abstract: We apply logic-based machine learning techniques to facilitate cellular engineering and drive biological discovery, based on comprehensive databases of metabolic processes called genome-scale metabolic network models (GEMs). Predicted host behaviours are not always correctly described by GEMs. Learning the intricate genetic interactions within GEMs presents computational and empirical challenges. To address these, we describe a novel approach called Boolean Matrix Logic Programming (BMLP) by leveraging boolean matrices to evaluate large logic programs. We introduce a new system, $BMLP_{active}$, which efficiently explores the genomic hypothesis space by guiding informative experimentation through active learning. In contrast to sub-symbolic methods, $BMLP_{active}$ encodes a state-of-the-art GEM of a widely accepted bacterial host in an interpretable and logical representation using datalog logic programs. Notably, $BMLP_{active}$ can successfully learn the interaction between a gene pair with fewer training examples than random experimentation, overcoming the increase in experimental design space. $BMLP_{active}$ enables rapid optimisation of metabolic models and offers a realistic approach to a self-driving lab for microbial engineering.

cross Reconstruction-based Multi-Normal Prototypes Learning for Weakly Supervised Anomaly Detection

Authors: Zhijin Dong, Hongzhi Liu, Boyuan Ren, Weimin Xiong, Zhonghai Wu

Abstract: Anomaly detection is a crucial task in various domains. Most of the existing methods assume the normal sample data clusters around a single central prototype while the real data may consist of multiple categories or subgroups. In addition, existing methods always assume all unlabeled data are normal while they inevitably contain some anomalous samples. To address these issues, we propose a reconstruction-based multi-normal prototypes learning framework that leverages limited labeled anomalies in conjunction with abundant unlabeled data for anomaly detection. Specifically, we assume the normal sample data may satisfy multi-modal distribution, and utilize deep embedding clustering and contrastive learning to learn multiple normal prototypes to represent it. Additionally, we estimate the likelihood of each unlabeled sample being normal based on the multi-normal prototypes, guiding the training process to mitigate the impact of contaminated anomalies in the unlabeled data. Extensive experiments on various datasets demonstrate the superior performance of our method compared to state-of-the-art techniques.

cross Artificial intelligence for science: The easy and hard problems

Authors: Ruairidh M. Battleday, Samuel J. Gershman

Abstract: A suite of impressive scientific discoveries have been driven by recent advances in artificial intelligence. These almost all result from training flexible algorithms to solve difficult optimization problems specified in advance by teams of domain scientists and engineers with access to large amounts of data. Although extremely useful, this kind of problem solving only corresponds to one part of science - the "easy problem." The other part of scientific research is coming up with the problem itself - the "hard problem." Solving the hard problem is beyond the capacities of current algorithms for scientific discovery because it requires continual conceptual revision based on poorly defined constraints. We can make progress on understanding how humans solve the hard problem by studying the cognitive science of scientists, and then use the results to design new computational agents that automatically infer and update their scientific paradigms.

cross Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods

Authors: Xinyang Hu, Fengzhuo Zhang, Siyu Chen, Zhuoran Yang

Abstract: Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems using pretrained large language models (LLMs). In this work, we analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity. To this end, we introduce a multi-step latent variable model that encapsulates the reasoning process, where the latent variable encodes the task information. Under this framework, we demonstrate that when the pretraining dataset is sufficiently large, the estimator formed by CoT prompting is equivalent to a Bayesian estimator. This estimator effectively solves the multi-step reasoning problem by aggregating a posterior distribution inferred from the demonstration examples in the prompt. Moreover, we prove that the statistical error of the CoT estimator can be decomposed into two main components: (i) a prompting error, which arises from inferring the true task using CoT prompts, and (ii) the statistical error of the pretrained LLM. We establish that, under appropriate assumptions, the prompting error decays exponentially to zero as the number of demonstrations increases. Additionally, we explicitly characterize the approximation and generalization errors of the pretrained LLM. Notably, we construct a transformer model that approximates the target distribution of the multi-step reasoning problem with an error that decreases exponentially in the number of transformer blocks. Our analysis extends to other variants of CoT, including Self-Consistent CoT, Tree-of-Thought, and Selection-Inference, offering a broad perspective on the efficacy of these methods. We also provide numerical experiments to validate the theoretical findings.

cross A Joint Learning Model with Variational Interaction for Multilingual Program Translation

Authors: Yali Du, Hui Sun, Ming Li

Abstract: Programs implemented in various programming languages form the foundation of software applications. To alleviate the burden of program migration and facilitate the development of software systems, automated program translation across languages has garnered significant attention. Previous approaches primarily focus on pairwise translation paradigms, learning translation between pairs of languages using bilingual parallel data. However, parallel data is difficult to collect for some language pairs, and the distribution of program semantics across languages can shift, posing challenges for pairwise program translation. In this paper, we argue that jointly learning a unified model to translate code across multiple programming languages is superior to separately learning from bilingual parallel data. We propose Variational Interaction for Multilingual Program Translation~(VIM-PT), a disentanglement-based generative approach that jointly trains a unified model for multilingual program translation across multiple languages. VIM-PT disentangles code into language-shared and language-specific features, using variational inference and interaction information with a novel lower bound, then achieves program translation through conditional generation. VIM-PT demonstrates four advantages: 1) captures language-shared information more accurately from various implementations and improves the quality of multilingual program translation, 2) mines and leverages the capability of non-parallel data, 3) addresses the distribution shift of program semantics across languages, 4) and serves as a unified model, reducing deployment complexity.

cross A Survey on Reinforcement Learning Applications in SLAM

Authors: Mohammad Dehghani Tezerjani, Mohammad Khoshnazar, Mohammadhamed Tangestanizadeh, Qing Yang

Abstract: The emergence of mobile robotics, particularly in the automotive industry, introduces a promising era of enriched user experiences and adept handling of complex navigation challenges. The realization of these advancements necessitates a focused technological effort and the successful execution of numerous intricate tasks, particularly in the critical domain of Simultaneous Localization and Mapping (SLAM). Various artificial intelligence (AI) methodologies, such as deep learning and reinforcement learning, present viable solutions to address the challenges in SLAM. This study specifically explores the application of reinforcement learning in the context of SLAM. By enabling the agent (the robot) to iteratively interact with and receive feedback from its environment, reinforcement learning facilitates the acquisition of navigation and mapping skills, thereby enhancing the robot's decision-making capabilities. This approach offers several advantages, including improved navigation proficiency, increased resilience, reduced dependence on sensor precision, and refinement of the decision-making process. The findings of this study, which provide an overview of reinforcement learning's utilization in SLAM, reveal significant advancements in the field. The investigation also highlights the evolution and innovative integration of these techniques.

cross Exploring the Potential of Synthetic Data to Replace Real Data

Authors: Hyungtae Lee, Yan Zhang, Heesung Kwon, Shuvra S. Bhattacharrya

Abstract: The potential of synthetic data to replace real data creates a huge demand for synthetic data in data-hungry AI. This potential is even greater when synthetic data is used for training along with a small number of real images from domains other than the test domain. We find that this potential varies depending on (i) the number of cross-domain real images and (ii) the test set on which the trained model is evaluated. We introduce two new metrics, the train2test distance and $\text{AP}_\text{t2t}$, to evaluate the ability of a cross-domain training set using synthetic data to represent the characteristics of test instances in relation to training performance. Using these metrics, we delve deeper into the factors that influence the potential of synthetic data and uncover some interesting dynamics about how synthetic data impacts training performance. We hope these discoveries will encourage more widespread use of synthetic data.

cross General targeted machine learning for modern causal mediation analysis

Authors: Richard Liu, Nicholas T. Williams, Kara E. Rudolph, Iv\'an D\'iaz

Abstract: Causal mediation analyses investigate the mechanisms through which causes exert their effects, and are therefore central to scientific progress. The literature on the non-parametric definition and identification of mediational effects in rigourous causal models has grown significantly in recent years, and there has been important progress to address challenges in the interpretation and identification of such effects. Despite great progress in the causal inference front, statistical methodology for non-parametric estimation has lagged behind, with few or no methods available for tackling non-parametric estimation in the presence of multiple, continuous, or high-dimensional mediators. In this paper we show that the identification formulas for six popular non-parametric approaches to mediation analysis proposed in recent years can be recovered from just two statistical estimands. We leverage this finding to propose an all-purpose one-step estimation algorithm that can be coupled with machine learning in any mediation study that uses any of these six definitions of mediation. The estimators have desirable properties, such as $\sqrt{n}$-convergence and asymptotic normality. Estimating the first-order correction for the one-step estimator requires estimation of complex density ratios on the potentially high-dimensional mediators, a challenge that is solved using recent advancements in so-called Riesz learning. We illustrate the properties of our methods in a simulation study and illustrate its use on real data to estimate the extent to which pain management practices mediate the total effect of having a chronic pain disorder on opioid use disorder.

cross Relationships are Complicated! An Analysis of Relationships Between Datasets on the Web

Authors: Kate Lin, Tarfah Alrashed, Natasha Noy

Abstract: The Web today has millions of datasets, and the number of datasets continues to grow at a rapid pace. These datasets are not standalone entities; rather, they are intricately connected through complex relationships. Semantic relationships between datasets provide critical insights for research and decision-making processes. In this paper, we study dataset relationships from the perspective of users who discover, use, and share datasets on the Web: what relationships are important for different tasks? What contextual information might users want to know? We first present a comprehensive taxonomy of relationships between datasets on the Web and map these relationships to user tasks performed during dataset discovery. We develop a series of methods to identify these relationships and compare their performance on a large corpus of datasets generated from Web pages with schema.org markup. We demonstrate that machine-learning based methods that use dataset metadata achieve multi-class classification accuracy of 90%. Finally, we highlight gaps in available semantic markup for datasets and discuss how incorporating comprehensive semantics can facilitate the identification of dataset relationships. By providing a comprehensive overview of dataset relationships at scale, this paper sets a benchmark for future research.

cross KGPrune: a Web Application to Extract Subgraphs of Interest from Wikidata with Analogical Pruning

Authors: Pierre Monnin, Cherif-Hassan Nousradine, Lucas Jarnac, Laurel Zuckerman, Miguel Couceiro

Abstract: Knowledge graphs (KGs) have become ubiquitous publicly available knowledge sources, and are nowadays covering an ever increasing array of domains. However, not all knowledge represented is useful or pertaining when considering a new application or specific task. Also, due to their increasing size, handling large KGs in their entirety entails scalability issues. These two aspects asks for efficient methods to extract subgraphs of interest from existing KGs. To this aim, we introduce KGPrune, a Web Application that, given seed entities of interest and properties to traverse, extracts their neighboring subgraphs from Wikidata. To avoid topical drift, KGPrune relies on a frugal pruning algorithm based on analogical reasoning to only keep relevant neighbors while pruning irrelevant ones. The interest of KGPrune is illustrated by two concrete applications, namely, bootstrapping an enterprise KG and extracting knowledge related to looted artworks.

cross Bridging the Gap: Unpacking the Hidden Challenges in Knowledge Distillation for Online Ranking Systems

Authors: Nikhil Khani, Shuo Yang, Aniruddh Nath, Yang Liu, Pendo Abbo, Li Wei, Shawn Andrews, Maciej Kula, Jarrod Kahn, Zhe Zhao, Lichan Hong, Ed Chi

Abstract: Knowledge Distillation (KD) is a powerful approach for compressing a large model into a smaller, more efficient model, particularly beneficial for latency-sensitive applications like recommender systems. However, current KD research predominantly focuses on Computer Vision (CV) and NLP tasks, overlooking unique data characteristics and challenges inherent to recommender systems. This paper addresses these overlooked challenges, specifically: (1) mitigating data distribution shifts between teacher and student models, (2) efficiently identifying optimal teacher configurations within time and budgetary constraints, and (3) enabling computationally efficient and rapid sharing of teacher labels to support multiple students. We present a robust KD system developed and rigorously evaluated on multiple large-scale personalized video recommendation systems within Google. Our live experiment results demonstrate significant improvements in student model performance while ensuring consistent and reliable generation of high quality teacher labels from a continuous data stream of data.

cross On-Chip Learning with Memristor-Based Neural Networks: Assessing Accuracy and Efficiency Under Device Variations, Conductance Errors, and Input Noise

Authors: M. Reza Eslami, Dhiman Biswas, Soheib Takhtardeshir, Sarah S. Sharif, Yaser M. Banad

Abstract: This paper presents a memristor-based compute-in-memory hardware accelerator for on-chip training and inference, focusing on its accuracy and efficiency against device variations, conductance errors, and input noise. Utilizing realistic SPICE models of commercially available silver-based metal self-directed channel (M-SDC) memristors, the study incorporates inherent device non-idealities into the circuit simulations. The hardware, consisting of 30 memristors and 4 neurons, utilizes three different M-SDC structures with tungsten, chromium, and carbon media to perform binary image classification tasks. An on-chip training algorithm precisely tunes memristor conductance to achieve target weights. Results show that incorporating moderate noise (<15%) during training enhances robustness to device variations and noisy input data, achieving up to 97% accuracy despite conductance variations and input noises. The network tolerates a 10% conductance error without significant accuracy loss. Notably, omitting the initial memristor reset pulse during training considerably reduces training time and energy consumption. The hardware designed with chromium-based memristors exhibits superior performance, achieving a training time of 2.4 seconds and an energy consumption of 18.9 mJ. This research provides insights for developing robust and energy-efficient memristor-based neural networks for on-chip learning in edge applications.

cross Model-Based Reinforcement Learning for Control of Strongly-Disturbed Unsteady Aerodynamic Flows

Authors: Zhecheng Liu (University of California, Los Angeles), Diederik Beckers (California Institute of Technology), Jeff D. Eldredge (University of California, Los Angeles)

Abstract: The intrinsic high dimension of fluid dynamics is an inherent challenge to control of aerodynamic flows, and this is further complicated by a flow's nonlinear response to strong disturbances. Deep reinforcement learning, which takes advantage of the exploratory aspects of reinforcement learning (RL) and the rich nonlinearity of a deep neural network, provides a promising approach to discover feasible control strategies. However, the typical model-free approach to reinforcement learning requires a significant amount of interaction between the flow environment and the RL agent during training, and this high training cost impedes its development and application. In this work, we propose a model-based reinforcement learning (MBRL) approach by incorporating a novel reduced-order model as a surrogate for the full environment. The model consists of a physics-augmented autoencoder, which compresses high-dimensional CFD flow field snaphsots into a three-dimensional latent space, and a latent dynamics model that is trained to accurately predict the long-time dynamics of trajectories in the latent space in response to action sequences. The robustness and generalizability of the model is demonstrated in two distinct flow environments, a pitching airfoil in a highly disturbed environment and a vertical-axis wind turbine in a disturbance-free environment. Based on the trained model in the first problem, we realize an MBRL strategy to mitigate lift variation during gust-airfoil encounters. We demonstrate that the policy learned in the reduced-order environment translates to an effective control strategy in the full CFD environment.

cross Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning

Authors: Zichen Tang, Junlin Huang, Rudan Yan, Yuxin Wang, Zhenheng Tang, Shaohuai Shi, Amelie Chi Zhou, Xiaowen Chu

Abstract: Current data compression methods, such as sparsification in Federated Averaging (FedAvg), effectively enhance the communication efficiency of Federated Learning (FL). However, these methods encounter challenges such as the straggler problem and diminished model performance due to heterogeneous bandwidth and non-IID (Independently and Identically Distributed) data. To address these issues, we introduce a bandwidth-aware compression framework for FL, aimed at improving communication efficiency while mitigating the problems associated with non-IID data. First, our strategy dynamically adjusts compression ratios according to bandwidth, enabling clients to upload their models at a close pace, thus exploiting the otherwise wasted time to transmit more data. Second, we identify the non-overlapped pattern of retained parameters after compression, which results in diminished client update signals due to uniformly averaged weights. Based on this finding, we propose a parameter mask to adjust the client-averaging coefficients at the parameter level, thereby more closely approximating the original updates, and improving the training convergence under heterogeneous environments. Our evaluations reveal that our method significantly boosts model accuracy, with a maximum improvement of 13% over the uncompressed FedAvg. Moreover, it achieves a $3.37\times$ speedup in reaching the target accuracy compared to FedAvg with a Top-K compressor, demonstrating its effectiveness in accelerating convergence with compression. The integration of common compression techniques into our framework further establishes its potential as a versatile foundation for future cross-device, communication-efficient FL research, addressing critical challenges in FL and advancing the field of distributed machine learning.

cross Benchmarking Reinforcement Learning Methods for Dexterous Robotic Manipulation with a Three-Fingered Gripper

Authors: Elizabeth Cutler, Yuning Xing, Tony Cui, Brendan Zhou, Koen van Rijnsoever, Ben Hart, David Valencia, Lee Violet C. Ong, Trevor Gee, Minas Liarokapis, Henry Williams

Abstract: Reinforcement Learning (RL) training is predominantly conducted in cost-effective and controlled simulation environments. However, the transfer of these trained models to real-world tasks often presents unavoidable challenges. This research explores the direct training of RL algorithms in controlled yet realistic real-world settings for the execution of dexterous manipulation. The benchmarking results of three RL algorithms trained on intricate in-hand manipulation tasks within practical real-world contexts are presented. Our study not only demonstrates the practicality of RL training in authentic real-world scenarios, facilitating direct real-world applications, but also provides insights into the associated challenges and considerations. Additionally, our experiences with the employed experimental methods are shared, with the aim of empowering and engaging fellow researchers and practitioners in this dynamic field of robotics.

cross Learning effective pruning at initialization from iterative pruning

Authors: Shengkai Liu, Yaofeng Cheng, Fusheng Zha, Wei Guo, Lining Sun, Zhenshan Bing, Chenguang Yang

Abstract: Pruning at initialization (PaI) reduces training costs by removing weights before training, which becomes increasingly crucial with the growing network size. However, current PaI methods still have a large accuracy gap with iterative pruning, especially at high sparsity levels. This raises an intriguing question: can we get inspiration from iterative pruning to improve the PaI performance? In the lottery ticket hypothesis, the iterative rewind pruning (IRP) finds subnetworks retroactively by rewinding the parameter to the original initialization in every pruning iteration, which means all the subnetworks are based on the initial state. Here, we hypothesise the surviving subnetworks are more important and bridge the initial feature and their surviving score as the PaI criterion. We employ an end-to-end neural network (\textbf{AutoS}parse) to learn this correlation, input the model's initial features, output their score and then prune the lowest score parameters before training. To validate the accuracy and generalization of our method, we performed PaI across various models. Results show that our approach outperforms existing methods in high-sparsity settings. Notably, as the underlying logic of model pruning is consistent in different models, only one-time IRP on one model is needed (e.g., once IRP on ResNet-18/CIFAR-10, AutoS can be generalized to VGG-16/CIFAR-10, ResNet-18/TinyImageNet, et al.). As the first neural network-based PaI method, we conduct extensive experiments to validate the factors influencing this approach. These results reveal the learning tendencies of neural networks and provide new insights into our understanding and research of PaI from a practical perspective. Our code is available at: https://github.com/ChengYaofeng/AutoSparse.git.

URLs: https://github.com/ChengYaofeng/AutoSparse.git.

cross Quartered Chirp Spectral Envelope for Whispered vs Normal Speech Classification

Authors: S. Johanan Joysingh, P. Vijayalakshmi, T. Nagarajan

Abstract: Whispered speech as an acceptable form of human-computer interaction is gaining traction. Systems that address multiple modes of speech require a robust front-end speech classifier. Performance of whispered vs normal speech classification drops in the presence of additive white Gaussian noise, since normal speech takes on some of the characteristics of whispered speech. In this work, we propose a new feature named the quartered chirp spectral envelope, a combination of the chirp spectrum and the quartered spectral envelope, to classify whispered and normal speech. The chirp spectrum can be fine-tuned to obtain customized features for a given task, and the quartered spectral envelope has been proven to work especially well for the current task. The feature is trained on a one dimensional convolutional neural network, that captures the trends in the spectral envelope. The proposed system performs better than the state of the art, in the presence of white noise.

cross GPU-Accelerated Counterfactual Regret Minimization

Authors: Juho Kim

Abstract: Counterfactual regret minimization (CFR) is a family of algorithms of no-regret learning dynamics capable of solving large-scale imperfect information games. There has been a notable lack of work on making CFR more computationally efficient. We propose implementing this algorithm as a series of dense and sparse matrix and vector operations, thereby making it highly parallelizable for a graphical processing unit. Our experiments show that our implementation performs up to about 352.5 times faster than OpenSpiel's Python implementation and up to about 22.2 times faster than OpenSpiel's C++ implementation and the speedup becomes more pronounced as the size of the game being solved grows.

cross MaskCycleGAN-based Whisper to Normal Speech Conversion

Authors: K. Rohith Gupta, K. Ramnath, S. Johanan Joysingh, P. Vijayalakshmi, T. Nagarajan

Abstract: Whisper to normal speech conversion is an active area of research. Various architectures based on generative adversarial networks have been proposed in the recent past. Especially, recent study shows that MaskCycleGAN, which is a mask guided, and cyclic consistency keeping, generative adversarial network, performs really well for voice conversion from spectrogram representations. In the current work we present a MaskCycleGAN approach for the conversion of whispered speech to normal speech. We find that tuning the mask parameters, and pre-processing the signal with a voice activity detector provides superior performance when compared to the existing approach. The wTIMIT dataset is used for evaluation. Objective metrics such as PESQ and G-Loss are used to evaluate the converted speech, along with subjective evaluation using mean opinion score. The results show that the proposed approach offers considerable benefits.

cross From Rule-Based Models to Deep Learning Transformers Architectures for Natural Language Processing and Sign Language Translation Systems: Survey, Taxonomy and Performance Evaluation

Authors: Nada Shahin, Leila Ismail

Abstract: With the growing Deaf and Hard of Hearing population worldwide and the persistent shortage of certified sign language interpreters, there is a pressing need for an efficient, signs-driven, integrated end-to-end translation system, from sign to gloss to text and vice-versa. There has been a wealth of research on machine translations and related reviews. However, there are few works on sign language machine translation considering the particularity of the language being continuous and dynamic. This paper aims to address this void, providing a retrospective analysis of the temporal evolution of sign language machine translation algorithms and a taxonomy of the Transformers architectures, the most used approach in language translation. We also present the requirements of a real-time Quality-of-Service sign language ma-chine translation system underpinned by accurate deep learning algorithms. We propose future research directions for sign language translation systems.

cross CL4KGE: A Curriculum Learning Method for Knowledge Graph Embedding

Authors: Yang Liu, Chuan Zhou, Peng Zhang, Yanan Cao, Yongchao Liu, Zhao Li, Hongyang Chen

Abstract: Knowledge graph embedding (KGE) constitutes a foundational task, directed towards learning representations for entities and relations within knowledge graphs (KGs), with the objective of crafting representations comprehensive enough to approximate the logical and symbolic interconnections among entities. In this paper, we define a metric Z-counts to measure the difficulty of training each triple ($<$head entity, relation, tail entity$>$) in KGs with theoretical analysis. Based on this metric, we propose \textbf{CL4KGE}, an efficient \textbf{C}urriculum \textbf{L}earning based training strategy for \textbf{KGE}. This method includes a difficulty measurer and a training scheduler that aids in the training of KGE models. Our approach possesses the flexibility to act as a plugin within a wide range of KGE models, with the added advantage of adaptability to the majority of KGs in existence. The proposed method has been evaluated on popular KGE models, and the results demonstrate that it enhances the state-of-the-art methods. The use of Z-counts as a metric has enabled the identification of challenging triples in KGs, which helps in devising effective training strategies.

cross From Bias to Balance: Detecting Facial Expression Recognition Biases in Large Multimodal Foundation Models

Authors: Kaylee Chhua, Zhoujinyi Wen, Vedant Hathalia, Kevin Zhu, Sean O'Brien

Abstract: This study addresses the racial biases in facial expression recognition (FER) systems within Large Multimodal Foundation Models (LMFMs). Despite advances in deep learning and the availability of diverse datasets, FER systems often exhibit higher error rates for individuals with darker skin tones. Existing research predominantly focuses on traditional FER models (CNNs, RNNs, ViTs), leaving a gap in understanding racial biases in LMFMs. We benchmark four leading LMFMs: GPT-4o, PaliGemma, Gemini, and CLIP to assess their performance in facial emotion detection across different racial demographics. A linear classifier trained on CLIP embeddings obtains accuracies of 95.9\% for RADIATE, 90.3\% for Tarr, and 99.5\% for Chicago Face. Furthermore, we identify that Anger is misclassified as Disgust 2.1 times more often in Black Females than White Females. This study highlights the need for fairer FER systems and establishes a foundation for developing unbiased, accurate FER technologies. Visit https://kvjvhub.github.io/FERRacialBias/ for further information regarding the biases within facial expression recognition.

URLs: https://kvjvhub.github.io/FERRacialBias/

cross Intraoperative Glioma Segmentation with YOLO + SAM for Improved Accuracy in Tumor Resection

Authors: Samir Kassam, Angelo Markham, Katie Vo, Yashas Revanakara, Michael Lam, Kevin Zhu

Abstract: Gliomas, a common type of malignant brain tumor, present significant surgical challenges due to their similarity to healthy tissue. Preoperative Magnetic Resonance Imaging (MRI) images are often ineffective during surgery due to factors such as brain shift, which alters the position of brain structures and tumors. This makes real-time intraoperative MRI (ioMRI) crucial, as it provides updated imaging that accounts for these shifts, ensuring more accurate tumor localization and safer resections. This paper presents a deep learning pipeline combining You Only Look Once Version 8 (YOLOv8) and Segment Anything Model Vision Transformer-base (SAM ViT-b) to enhance glioma detection and segmentation during ioMRI. Our model was trained using the Brain Tumor Segmentation 2021 (BraTS 2021) dataset, which includes standard magnetic resonance imaging (MRI) images, and noise-augmented MRI images that simulate ioMRI images. Noised MRI images are harder for a deep learning pipeline to segment, but they are more representative of surgical conditions. Achieving a Dice Similarity Coefficient (DICE) score of 0.79, our model performs comparably to state-of-the-art segmentation models tested on noiseless data. This performance demonstrates the model's potential to assist surgeons in maximizing tumor resection and improving surgical outcomes.

cross Data downlink prioritization using image classification on-board a 6U CubeSat

Authors: Keenan A. A. Chatar, Ezra Fielding, Kei Sano, Kentaro Kitamura

Abstract: Nanosatellites are proliferating as low-cost dedicated sensing systems with lean development cycles. Kyushu Institute of Technology and collaborators have launched a joint venture for a nanosatellite mission, VERTECS. The primary mission is to elucidate the formation history of stars by observing the optical-wavelength cosmic background radiation. The VERTECS satellite will be equipped with a small-aperture telescope and a high-precision attitude control system to capture the cosmic data for analysis on the ground. However, nanosatellites are limited by their onboard memory resources and downlink speed capabilities. Additionally, due to a limited number of ground stations, the satellite mission will face issues meeting the required data budget for mission success. To alleviate this issue, we propose an on-orbit system to autonomously classify and then compress desirable image data for data downlink prioritization and optimization. The system comprises a prototype Camera Controller Board (CCB) which carries a Raspberry Pi Compute Module 4 which is used for classification and compression. The system uses a lightweight Convolutional Neural Network (CNN) model to classify and determine the desirability of captured image data. The model is designed to be lean and robust to reduce the computational and memory load on the satellite. The model is trained and tested on a novel star field dataset consisting of data captured by the Sloan Digital Sky Survey (SDSS). The dataset is meant to simulate the expected data produced by the 6U satellite. The compression step implements GZip, RICE or HCOMPRESS compression, which are standards for astronomical data. Preliminary testing on the proposed CNN model results in a classification accuracy of about 100\% on the star field dataset, with compression ratios of 3.99, 5.16 and 5.43 for GZip, RICE and HCOMPRESS that were achieved on tested FITS image data.

cross Advancing Adversarial Suffix Transfer Learning on Aligned Large Language Models

Authors: Hongfu Liu, Yuxi Xie, Ye Wang, Michael Shieh

Abstract: Language Language Models (LLMs) face safety concerns due to potential misuse by malicious users. Recent red-teaming efforts have identified adversarial suffixes capable of jailbreaking LLMs using the gradient-based search algorithm Greedy Coordinate Gradient (GCG). However, GCG struggles with computational inefficiency, limiting further investigations regarding suffix transferability and scalability across models and data. In this work, we bridge the connection between search efficiency and suffix transferability. We propose a two-stage transfer learning framework, DeGCG, which decouples the search process into behavior-agnostic pre-searching and behavior-relevant post-searching. Specifically, we employ direct first target token optimization in pre-searching to facilitate the search process. We apply our approach to cross-model, cross-data, and self-transfer scenarios. Furthermore, we introduce an interleaved variant of our approach, i-DeGCG, which iteratively leverages self-transferability to accelerate the search process. Experiments on HarmBench demonstrate the efficiency of our approach across various models and domains. Notably, our i-DeGCG outperforms the baseline on Llama2-chat-7b with ASRs of $43.9$ ($+22.2$) and $39.0$ ($+19.5$) on valid and test sets, respectively. Further analysis on cross-model transfer indicates the pivotal role of first target token optimization in leveraging suffix transferability for efficient searching.

cross Learning Robust Reward Machines from Noisy Labels

Authors: Roko Parac, Lorenzo Nodari, Leo Ardon, Daniel Furelos-Blanco, Federico Cerutti, Alessandra Russo

Abstract: This paper presents PROB-IRM, an approach that learns robust reward machines (RMs) for reinforcement learning (RL) agents from noisy execution traces. The key aspect of RM-driven RL is the exploitation of a finite-state machine that decomposes the agent's task into different subtasks. PROB-IRM uses a state-of-the-art inductive logic programming framework robust to noisy examples to learn RMs from noisy traces using the Bayesian posterior degree of beliefs, thus ensuring robustness against inconsistencies. Pivotal for the results is the interleaving between RM learning and policy learning: a new RM is learned whenever the RL agent generates a trace that is believed not to be accepted by the current RM. To speed up the training of the RL agent, PROB-IRM employs a probabilistic formulation of reward shaping that uses the posterior Bayesian beliefs derived from the traces. Our experimental analysis shows that PROB-IRM can learn (potentially imperfect) RMs from noisy traces and exploit them to train an RL agent to solve its tasks successfully. Despite the complexity of learning the RM from noisy traces, agents trained with PROB-IRM perform comparably to agents provided with handcrafted RMs.

cross Literary and Colloquial Dialect Identification for Tamil using Acoustic Features

Authors: M. Nanmalar, P. Vijayalakshmi, T. Nagarajan

Abstract: The evolution and diversity of a language is evident from it's various dialects. If the various dialects are not addressed in technological advancements like automatic speech recognition and speech synthesis, there is a chance that these dialects may disappear. Speech technology plays a role in preserving various dialects of a language from going extinct. In order to build a full fledged automatic speech recognition system that addresses various dialects, an Automatic Dialect Identification (ADI) system acting as the front end is required. This is similar to how language identification systems act as front ends to automatic speech recognition systems that handle multiple languages. The current work proposes a way to identify two popular and broadly classified Tamil dialects, namely literary and colloquial Tamil. Acoustical characteristics rather than phonetics and phonotactics are used, alleviating the requirement of language-dependant linguistic tools. Hence one major advantage of the proposed method is that it does not require an annotated corpus, hence it can be easily adapted to other languages. Gaussian Mixture Models (GMM) using Mel Frequency Cepstral Coefficient (MFCC) features are used to perform the classification task. The experiments yielded an error rate of 12%. Vowel nasalization, as being the reason for this good performance, is discussed. The number of mixture models for the GMM is varied and the performance is analysed.

cross Towards turbine-location-aware multi-decadal wind power predictions with CMIP6

Authors: Nina Effenberger, Nicole Ludwig

Abstract: With the increasing amount of renewable energy in the grid, long-term wind power forecasting for multiple decades becomes more critical. In these long-term forecasts, climate data is essential as it allows us to account for climate change. Yet the resolution of climate models is often very coarse. In this paper, we show that by including turbine locations when downscaling with Gaussian Processes, we can generate valuable aggregate wind power predictions despite the low resolution of the CMIP6 climate models. This work is a first step towards multi-decadal turbine-location-aware wind power forecasting using global climate model output.

cross Development of Large Annotated Music Datasets using HMM-based Forced Viterbi Alignment

Authors: S. Johanan Joysingh, P. Vijayalakshmi, T. Nagarajan

Abstract: Datasets are essential for any machine learning task. Automatic Music Transcription (AMT) is one such task, where considerable amount of data is required depending on the way the solution is achieved. Considering the fact that a music dataset, complete with audio and its time-aligned transcriptions would require the effort of people with musical experience, it could be stated that the task becomes even more challenging. Musical experience is required in playing the musical instrument(s), and in annotating and verifying the transcriptions. We propose a method that would help in streamlining this process, making the task of obtaining a dataset from a particular instrument easy and efficient. We use predefined guitar exercises and hidden Markov model(HMM) based forced viterbi alignment to accomplish this. The guitar exercises are designed to be simple. Since the note sequence are already defined, HMM based forced viterbi alignment provides time-aligned transcriptions of these audio files. The onsets of the transcriptions are manually verified and the labels are accurate up to 10ms, averaging at 5ms. The contributions of the proposed work is two fold, i) a well streamlined and efficient method for generating datasets for any instrument, especially monophonic and, ii) an acoustic plectrum guitar dataset containing wave files and transcriptions in the form of label files. This method will aid as a preliminary step towards building concrete datasets for building AMT systems for different instruments.

cross SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models

Authors: Shuaijie Shen, Chao Wang, Renzhuo Huang, Yan Zhong, Qinghai Guo, Zhichao Lu, Jianguo Zhang, Luziwei Leng

Abstract: Known as low energy consumption networks, spiking neural networks (SNNs) have gained a lot of attention within the past decades. While SNNs are increasing competitive with artificial neural networks (ANNs) for vision tasks, they are rarely used for long sequence tasks, despite their intrinsic temporal dynamics. In this work, we develop spiking state space models (SpikingSSMs) for long sequence learning by leveraging on the sequence learning abilities of state space models (SSMs). Inspired by dendritic neuron structure, we hierarchically integrate neuronal dynamics with the original SSM block, meanwhile realizing sparse synaptic computation. Furthermore, to solve the conflict of event-driven neuronal dynamics with parallel computing, we propose a light-weight surrogate dynamic network which accurately predicts the after-reset membrane potential and compatible to learnable thresholds, enabling orders of acceleration in training speed compared with conventional iterative methods. On the long range arena benchmark task, SpikingSSM achieves competitive performance to state-of-the-art SSMs meanwhile realizing on average 90\% of network sparsity. On language modeling, our network significantly surpasses existing spiking large language models (spikingLLMs) on the WikiText-103 dataset with only a third of the model size, demonstrating its potential as backbone architecture for low computation cost LLMs.

cross Targetin the partition function of chemically disordered materials with a generative approach based on inverse variational autoencoders

Authors: Maciej J. Karcz, Luca Messina, Eiji Kawasaki, Emeric Bourasseau

Abstract: Computing atomic-scale properties of chemically disordered materials requires an efficient exploration of their vast configuration space. Traditional approaches such as Monte Carlo or Special Quasirandom Structures either entail sampling an excessive amount of configurations or do not ensure that the configuration space has been properly covered. In this work, we propose a novel approach where generative machine learning is used to yield a representative set of configurations for accurate property evaluation and provide accurate estimations of atomic-scale properties with minimal computational cost. Our method employs a specific type of variational autoencoder with inverse roles for the encoder and decoder, enabling the application of an unsupervised active learning scheme that does not require any initial training database. The model iteratively generates configuration batches, whose properties are computed with conventional atomic-scale methods. These results are then fed back into the model to estimate the partition function, repeating the process until convergence. We illustrate our approach by computing point-defect formation energies and concentrations in (U, Pu)O2 mixed-oxide fuels. In addition, the ML model provides valuable insights into the physical factors influencing the target property. Our method is generally applicable to explore other properties, such as atomic-scale diffusion coefficients, in ideally or non-ideally disordered materials like high-entropy alloys.

cross Domain-decoupled Physics-informed Neural Networks with Closed-form Gradients for Fast Model Learning of Dynamical Systems

Authors: Henrik Krauss, Tim-Lukas Habich, Max Bartholdt, Thomas Seel, Moritz Schappler

Abstract: Physics-informed neural networks (PINNs) are trained using physical equations and can also incorporate unmodeled effects by learning from data. PINNs for control (PINCs) of dynamical systems are gaining interest due to their prediction speed compared to classical numerical integration methods for nonlinear state-space models, making them suitable for real-time control applications. We introduce the domain-decoupled physics-informed neural network (DD-PINN) to address current limitations of PINC in handling large and complex nonlinear dynamic systems. The time domain is decoupled from the feed-forward neural network to construct an Ansatz function, allowing for calculation of gradients in closed form. This approach significantly reduces training times, especially for large dynamical systems, compared to PINC, which relies on graph-based automatic differentiation. Additionally, the DD-PINN inherently fulfills the initial condition and supports higher-order excitation inputs, simplifying the training process and enabling improved prediction accuracy. Validation on three systems - a nonlinear mass-spring-damper, a five-mass-chain, and a two-link robot - demonstrates that the DD-PINN achieves significantly shorter training times. In cases where the PINC's prediction diverges, the DD-PINN's prediction remains stable and accurate due to higher physics loss reduction or use of a higher-order excitation input. The DD-PINN allows for fast and accurate learning of large dynamical systems previously out of reach for the PINC.

cross Earth Observation Satellite Scheduling with Graph Neural Networks

Authors: Antoine Jacquet, Guillaume Infantes, Nicolas Meuleau, Emmanuel Benazera, St\'ephanie Roussel, Vincent Baudoui, Jonathan Guerra

Abstract: The Earth Observation Satellite Planning (EOSP) is a difficult optimization problem with considerable practical interest. A set of requested observations must be scheduled on an agile Earth observation satellite while respecting constraints on their visibility window, as well as maneuver constraints that impose varying delays between successive observations. In addition, the problem is largely oversubscribed: there are much more candidate observations than what can possibly be achieved. Therefore, one must select the set of observations that will be performed while maximizing their weighted cumulative benefit, and propose a feasible schedule for these observations. As previous work mostly focused on heuristic and iterative search algorithms, this paper presents a new technique for selecting and scheduling observations based on Graph Neural Networks (GNNs) and Deep Reinforcement Learning (DRL). GNNs are used to extract relevant information from the graphs representing instances of the EOSP, and DRL drives the search for optimal schedules. Our simulations show that it is able to learn on small problem instances and generalize to larger real-world instances, with very competitive performance compared to traditional approaches.

cross The Benefits of Balance: From Information Projections to Variance Reduction

Authors: Lang Liu, Ronak Mehta, Soumik Pal, Zaid Harchaoui

Abstract: Data balancing across multiple modalities/sources appears in various forms in several foundation models (e.g., CLIP and DINO) achieving universal representation learning. We show that this iterative algorithm, usually used to avoid representation collapse, enjoys an unsuspected benefit: reducing the variance of estimators that are functionals of the empirical distribution over these sources. We provide non-asymptotic bounds quantifying this variance reduction effect and relate them to the eigendecays of appropriately defined Markov operators. We explain how various forms of data balancing in contrastive multimodal learning and self-supervised clustering can be interpreted as instances of this variance reduction scheme.

cross Interactive dense pixel visualizations for time series and model attribution explanations

Authors: Udo Schlegel, Daniel A. Keim

Abstract: The field of Explainable Artificial Intelligence (XAI) for Deep Neural Network models has developed significantly, offering numerous techniques to extract explanations from models. However, evaluating explanations is often not trivial, and differences in applied metrics can be subtle, especially with non-intelligible data. Thus, there is a need for visualizations tailored to explore explanations for domains with such data, e.g., time series. We propose DAVOTS, an interactive visual analytics approach to explore raw time series data, activations of neural networks, and attributions in a dense-pixel visualization to gain insights into the data, models' decisions, and explanations. To further support users in exploring large datasets, we apply clustering approaches to the visualized data domains to highlight groups and present ordering strategies for individual and combined data exploration to facilitate finding patterns. We visualize a CNN trained on the FordA dataset to demonstrate the approach.

cross MMASD+: A Novel Dataset for Privacy-Preserving Behavior Analysis of Children with Autism Spectrum Disorder

Authors: Pavan Uttej Ravva, Behdokht Kiafar, Pinar Kullu, Jicheng Li, Anjana Bhat, Roghayeh Leila Barmaki

Abstract: Autism spectrum disorder (ASD) is characterized by significant challenges in social interaction and comprehending communication signals. Recently, therapeutic interventions for ASD have increasingly utilized Deep learning powered-computer vision techniques to monitor individual progress over time. These models are trained on private, non-public datasets from the autism community, creating challenges in comparing results across different models due to privacy-preserving data-sharing issues. This work introduces MMASD+. MMASD+ consists of diverse data modalities, including 3D-Skeleton, 3D Body Mesh, and Optical Flow data. It integrates the capabilities of Yolov8 and Deep SORT algorithms to distinguish between the therapist and children, addressing a significant barrier in the original dataset. Additionally, a Multimodal Transformer framework is proposed to predict 11 action types and the presence of ASD. This framework achieves an accuracy of 95.03% for predicting action types and 96.42% for predicting ASD presence, demonstrating over a 10% improvement compared to models trained on single data modalities. These findings highlight the advantages of integrating multiple data modalities within the Multimodal Transformer framework.

cross SiHGNN: Leveraging Properties of Semantic Graphs for Efficient HGNN Acceleration

Authors: Runzhen Xue, Mingyu Yan, Dengke Han, Zhimin Tang, Xiaochun Ye, Dongrui Fan

Abstract: Heterogeneous Graph Neural Networks (HGNNs) have expanded graph representation learning to heterogeneous graph fields. Recent studies have demonstrated their superior performance across various applications, including medical analysis and recommendation systems, often surpassing existing methods. However, GPUs often experience inefficiencies when executing HGNNs due to their unique and complex execution patterns. Compared to traditional Graph Neural Networks, these patterns further exacerbate irregularities in memory access. To tackle these challenges, recent studies have focused on developing domain-specific accelerators for HGNNs. Nonetheless, most of these efforts have concentrated on optimizing the datapath or scheduling data accesses, while largely overlooking the potential benefits that could be gained from leveraging the inherent properties of the semantic graph, such as its topology, layout, and generation. In this work, we focus on leveraging the properties of semantic graphs to enhance HGNN performance. First, we analyze the Semantic Graph Build (SGB) stage and identify significant opportunities for data reuse during semantic graph generation. Next, we uncover the phenomenon of buffer thrashing during the Graph Feature Processing (GFP) stage, revealing potential optimization opportunities in semantic graph layout. Furthermore, we propose a lightweight hardware accelerator frontend for HGNNs, called SiHGNN. This accelerator frontend incorporates a tree-based Semantic Graph Builder for efficient semantic graph generation and features a novel Graph Restructurer for optimizing semantic graph layouts. Experimental results show that SiHGNN enables the state-of-the-art HGNN accelerator to achieve an average performance improvement of 2.95$\times$.

cross Data-Driven Nonlinear Deformation Design of 3D-Printable Shells

Authors: Samuel Silverman, Kelsey L. Snapp, Keith A. Brown, Emily Whiting

Abstract: Designing and fabricating structures with specific mechanical properties requires understanding the intricate relationship between design parameters and performance. Understanding the design-performance relationship becomes increasingly complicated for nonlinear deformations. Though successful at modeling elastic deformations, simulation-based techniques struggle to model large elastoplastic deformations exhibiting plasticity and densification. We propose a neural network trained on experimental data to learn the design-performance relationship between 3D-printable shells and their compressive force-displacement behavior. Trained on thousands of physical experiments, our network aids in both forward and inverse design to generate shells exhibiting desired elastoplastic and hyperelastic deformations. We validate a subset of generated designs through fabrication and testing. Furthermore, we demonstrate the network's inverse design efficacy in generating custom shells for several applications.

cross Few-Shot Unsupervised Implicit Neural Shape Representation Learning with Spatial Adversaries

Authors: Amine Ouasfi, Adnane Boukhayma

Abstract: Implicit Neural Representations have gained prominence as a powerful framework for capturing complex data modalities, encompassing a wide range from 3D shapes to images and audio. Within the realm of 3D shape representation, Neural Signed Distance Functions (SDF) have demonstrated remarkable potential in faithfully encoding intricate shape geometry. However, learning SDFs from sparse 3D point clouds in the absence of ground truth supervision remains a very challenging task. While recent methods rely on smoothness priors to regularize the learning, our method introduces a regularization term that leverages adversarial samples around the shape to improve the learned SDFs. Through extensive experiments and evaluations, we illustrate the efficacy of our proposed method, highlighting its capacity to improve SDF learning with respect to baselines and the state-of-the-art using synthetic and real data.

cross Force-Guided Bridge Matching for Full-Atom Time-Coarsened Dynamics of Peptides

Authors: Ziyang Yu, Wenbing Huang, Yang Liu

Abstract: Molecular Dynamics (MD) simulations are irreplaceable and ubiquitous in fields of materials science, chemistry, pharmacology just to name a few. Conventional MD simulations are plagued by numerical stability as well as long equilibration time issues, which limits broader applications of MD simulations. Recently, a surge of deep learning approaches have been devised for time-coarsened dynamics, which learns the state transition mechanism over much larger time scales to overcome these limitations. However, only a few methods target the underlying Boltzmann distribution by resampling techniques, where proposals are rarely accepted as new states with low efficiency. In this work, we propose a force-guided bridge matching model, FBM, a novel framework that first incorporates physical priors into bridge matching for full-atom time-coarsened dynamics. With the guidance of our well-designed intermediate force field, FBM is feasible to target the Boltzmann-like distribution by direct inference without extra steps. Experiments on small peptides verify our superiority in terms of comprehensive metrics and demonstrate transferability to unseen peptide systems.

cross Low-Budget Simulation-Based Inference with Bayesian Neural Networks

Authors: Arnaud Delaunoy, Maxence de la Brassinne Bonardeaux, Siddharth Mishra-Sharma, Gilles Louppe

Abstract: Simulation-based inference methods have been shown to be inaccurate in the data-poor regime, when training simulations are limited or expensive. Under these circumstances, the inference network is particularly prone to overfitting, and using it without accounting for the computational uncertainty arising from the lack of identifiability of the network weights can lead to unreliable results. To address this issue, we propose using Bayesian neural networks in low-budget simulation-based inference, thereby explicitly accounting for the computational uncertainty of the posterior approximation. We design a family of Bayesian neural network priors that are tailored for inference and show that they lead to well-calibrated posteriors on tested benchmarks, even when as few as $O(10)$ simulations are available. This opens up the possibility of performing reliable simulation-based inference using very expensive simulators, as we demonstrate on a problem from the field of cosmology where single simulations are computationally expensive. We show that Bayesian neural networks produce informative and well-calibrated posterior estimates with only a few hundred simulations.

cross Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning

Authors: Batuhan Yardim, Niao He

Abstract: Mean-field games (MFG) have become significant tools for solving large-scale multi-agent reinforcement learning problems under symmetry. However, the assumption of exact symmetry limits the applicability of MFGs, as real-world scenarios often feature inherent heterogeneity. Furthermore, most works on MFG assume access to a known MFG model, which might not be readily available for real-world finite-agent games. In this work, we broaden the applicability of MFGs by providing a methodology to extend any finite-player, possibly asymmetric, game to an "induced MFG". First, we prove that $N$-player dynamic games can be symmetrized and smoothly extended to the infinite-player continuum via explicit Kirszbraun extensions. Next, we propose the notion of $\alpha,\beta$-symmetric games, a new class of dynamic population games that incorporate approximate permutation invariance. For $\alpha,\beta$-symmetric games, we establish explicit approximation bounds, demonstrating that a Nash policy of the induced MFG is an approximate Nash of the $N$-player dynamic game. We show that TD learning converges up to a small bias using trajectories of the $N$-player game with finite-sample guarantees, permitting symmetrized learning without building an explicit MFG model. Finally, for certain games satisfying monotonicity, we prove a sample complexity of $\widetilde{\mathcal{O}}(\varepsilon^{-6})$ for the $N$-agent game to learn an $\varepsilon$-Nash up to symmetrization bias. Our theory is supported by evaluations on MARL benchmarks with thousands of agents.

cross On latent dynamics learning in nonlinear reduced order modeling

Authors: Nicola Farenga, Stefania Fresca, Simone Brivio, Andrea Manzoni

Abstract: In this work, we present the novel mathematical framework of latent dynamics models (LDMs) for reduced order modeling of parameterized nonlinear time-dependent PDEs. Our framework casts this latter task as a nonlinear dimensionality reduction problem, while constraining the latent state to evolve accordingly to an (unknown) dynamical system. A time-continuous setting is employed to derive error and stability estimates for the LDM approximation of the full order model (FOM) solution. We analyze the impact of using an explicit Runge-Kutta scheme in the time-discrete setting, resulting in the $\Delta\text{LDM}$ formulation, and further explore the learnable setting, $\Delta\text{LDM}_\theta$, where deep neural networks approximate the discrete LDM components, while providing a bounded approximation error with respect to the FOM. Moreover, we extend the concept of parameterized Neural ODE - recently proposed as a possible way to build data-driven dynamical systems with varying input parameters - to be a convolutional architecture, where the input parameters information is injected by means of an affine modulation mechanism, while designing a convolutional autoencoder neural network able to retain spatial-coherence, thus enhancing interpretability at the latent level. Numerical experiments, including the Burgers' and the advection-reaction-diffusion equations, demonstrate the framework's ability to obtain, in a multi-query context, a time-continuous approximation of the FOM solution, thus being able to query the LDM approximation at any given time instance while retaining a prescribed level of accuracy. Our findings highlight the remarkable potential of the proposed LDMs, representing a mathematically rigorous framework to enhance the accuracy and approximation capabilities of reduced order modeling for time-dependent parameterized PDEs.

cross Automatic 8-tissue Segmentation for 6-month Infant Brains

Authors: Yilan Dong (School of Biomedical Engineering & Imaging Sciences, King's College London, London, United Kingdom, Department of Forensic and Neurodevelopmental Science, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, United Kingdom), Vanessa Kyriakopoulou (School of Biomedical Engineering & Imaging Sciences, King's College London, London, United Kingdom, Department of Forensic and Neurodevelopmental Science, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, United Kingdom), Irina Grigorescu (School of Biomedical Engineering & Imaging Sciences, King's College London, London, United Kingdom), Grainne McAlonan (Department of Forensic and Neurodevelopmental Science, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, United Kingdom), Dafnis Batalle (School of Biomedical Engineering & Imaging Sciences, King's College London, London, United Kingdom, Department of Forensic and Neurodevelopmental Science, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, United Kingdom), Maria Deprez (School of Biomedical Engineering & Imaging Sciences, King's College London, London, United Kingdom)

Abstract: Numerous studies have highlighted that atypical brain development, particularly during infancy and toddlerhood, is linked to an increased likelihood of being diagnosed with a neurodevelopmental condition, such as autism. Accurate brain tissue segmentations for morphological analysis are essential in numerous infant studies. However, due to ongoing white matter (WM) myelination changing tissue contrast in T1- and T2-weighted images, automatic tissue segmentation in 6-month infants is particularly difficult. On the other hand, manual labelling by experts is time-consuming and labor-intensive. In this study, we propose the first 8-tissue segmentation pipeline for six-month-old infant brains. This pipeline utilizes domain adaptation (DA) techniques to leverage our longitudinal data, including neonatal images segmented with the neonatal Developing Human Connectome Project structural pipeline. Our pipeline takes raw 6-month images as inputs and generates the 8-tissue segmentation as outputs, forming an end-to-end segmentation pipeline. The segmented tissues include WM, gray matter (GM), cerebrospinal fluid (CSF), ventricles, cerebellum, basal ganglia, brainstem, and hippocampus/amygdala. Cycle-Consistent Generative Adversarial Network (CycleGAN) and Attention U-Net were employed to achieve the image contrast transformation between neonatal and 6-month images and perform tissue segmentation on the synthesized 6-month images (neonatal images with 6-month intensity contrast), respectively. Moreover, we incorporated the segmentation outputs from Infant Brain Extraction and Analysis Toolbox (iBEAT) and another Attention U-Net to further enhance the performance and construct the end-to-end segmentation pipeline. Our evaluation with real 6-month images achieved a DICE score of 0.92, an HD95 of 1.6, and an ASSD of 0.42.

cross DCT-CryptoNets: Scaling Private Inference in the Frequency Domain

Authors: Arjun Roy, Kaushik Roy

Abstract: The convergence of fully homomorphic encryption (FHE) and machine learning offers unprecedented opportunities for private inference of sensitive data. FHE enables computation directly on encrypted data, safeguarding the entire machine learning pipeline, including data and model confidentiality. However, existing FHE-based implementations for deep neural networks face significant challenges in computational cost, latency, and scalability, limiting their practical deployment. This paper introduces DCT-CryptoNets, a novel approach that leverages frequency-domain learning to tackle these issues. Our method operates directly in the frequency domain, utilizing the discrete cosine transform (DCT) commonly employed in JPEG compression. This approach is inherently compatible with remote computing services, where images are usually transmitted and stored in compressed formats. DCT-CryptoNets reduces the computational burden of homomorphic operations by focusing on perceptually relevant low-frequency components. This is demonstrated by substantial latency reduction of up to 5.3$\times$ compared to prior work on image classification tasks, including a novel demonstration of ImageNet inference within 2.5 hours, down from 12.5 hours compared to prior work on equivalent compute resources. Moreover, DCT-CryptoNets improves the reliability of encrypted accuracy by reducing variability (e.g., from $\pm$2.5\% to $\pm$1.0\% on ImageNet). This study demonstrates a promising avenue for achieving efficient and practical privacy-preserving deep learning on high resolution images seen in real-world applications.

replace A Note on Knowledge Distillation Loss Function for Object Classification

Authors: Defang Chen

Abstract: This research note provides a quick introduction to the knowledge distillation loss function used in object classification. In particular, we discuss its connection to a previously proposed logits matching loss function. We further treat knowledge distillation as a specific form of output regularization and demonstrate its connection to label smoothing and entropy-based regularization.

replace Diffusion Tensor Estimation with Uncertainty Calibration

Authors: Davood Karimi, Simon K. Warfield, Ali Gholipour

Abstract: It is highly desirable to know how uncertain a model's predictions are, especially for models that are complex and hard to understand as in deep learning. Although there has been a growing interest in using deep learning methods in diffusion-weighted MRI, prior works have not addressed the issue of model uncertainty. Here, we propose a deep learning method to estimate the diffusion tensor and compute the estimation uncertainty. Data-dependent uncertainty is computed directly by the network and learned via loss attenuation. Model uncertainty is computed using Monte Carlo dropout. We also propose a new method for evaluating the quality of predicted uncertainties. We compare the new method with the standard least-squares tensor estimation and bootstrap-based uncertainty computation techniques. Our experiments show that when the number of measurements is small the deep learning method is more accurate and its uncertainty predictions are better calibrated than the standard methods. We show that the estimation uncertainties computed by the new method can highlight the model's biases, detect domain shift, and reflect the strength of noise in the measurements. Our study shows the importance and practical value of modeling prediction uncertainties in deep learning-based diffusion MRI analysis.

replace The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks

Authors: Emmanuel Abbe, Enric Boix-Adsera, Theodor Misiakiewicz

Abstract: It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parameterizations: neural networks in the linear regime, and neural networks with no structural constraints. However, for the main parametrization of interest (non-linear but regular networks) no tight characterization has yet been achieved, despite significant developments. We take a step in this direction by considering depth-2 neural networks trained by SGD in the mean-field regime. We consider functions on binary inputs that depend on a latent low-dimensional subspace (i.e., small number of coordinates). This regime is of interest since it is poorly understood how neural networks routinely tackle high-dimensional datasets and adapt to latent low-dimensional structure without suffering from the curse of dimensionality. Accordingly, we study SGD-learnability with $O(d)$ sample complexity in a large ambient dimension $d$. Our main results characterize a hierarchical property, the "merged-staircase property", that is both necessary and nearly sufficient for learning in this setting. We further show that non-linear training is necessary: for this class of functions, linear methods on any feature map (e.g., the NTK) are not capable of learning efficiently. The key tools are a new "dimension-free" dynamics approximation result that applies to functions defined on a latent space of low-dimension, a proof of global convergence based on polynomial identity testing, and an improvement of lower bounds against linear methods for non-almost orthogonal functions.

replace Assessing Lower Limb Strength using Internet-of-Things Enabled Chair

Authors: Chelsea Yeh, Hanna Kaitlin Dy, Phillip Schodinger, Hudson Kaleb Dy

Abstract: This project describes the application of the technologies of Machine Learning and Internet-of-Things to assess the lower limb strength of individuals undergoing rehabilitation or therapy. Specifically, it seeks to measure and assess the progress of individuals by sensors attached to chairs and processing the data through Google GPU Tensorflow CoLab. Pressure sensors are attached to various locations on a chair, including but not limited to the seating area, backrest, hand rests, and legs. Sensor data from the individual performing both sit-to-stand transition and stand-to-sit transition provides a time series dataset regarding the pressure distribution and vibratory motion on the chair. The dataset and timing information can then be fed into a machine learning model to estimate the relative strength and weakness during various phases of the movement.

replace Adaptive Log-Euclidean Metrics for SPD Matrix Learning

Authors: Ziheng Chen, Yue Song, Tianyang Xu, Zhiwu Huang, Xiao-Jun Wu, Nicu Sebe

Abstract: Symmetric Positive Definite (SPD) matrices have received wide attention in machine learning due to their intrinsic capacity to encode underlying structural correlation in data. Many successful Riemannian metrics have been proposed to reflect the non-Euclidean geometry of SPD manifolds. However, most existing metric tensors are fixed, which might lead to sub-optimal performance for SPD matrix learning, especially for deep SPD neural networks. To remedy this limitation, we leverage the commonly encountered pullback techniques and propose Adaptive Log-Euclidean Metrics (ALEMs), which extend the widely used Log-Euclidean Metric (LEM). Compared with the previous Riemannian metrics, our metrics contain learnable parameters, which can better adapt to the complex dynamics of Riemannian neural networks with minor extra computations. We also present a complete theoretical analysis to support our ALEMs, including algebraic and Riemannian properties. The experimental and theoretical results demonstrate the merit of the proposed metrics in improving the performance of SPD neural networks. The efficacy of our metrics is further showcased on a set of recently developed Riemannian building blocks, including Riemannian batch normalization, Riemannian Residual blocks, and Riemannian classifiers.

replace Active and Passive Causal Inference Learning

Authors: Daniel Jiwoong Im, Kyunghyun Cho

Abstract: This paper serves as a starting point for machine learning researchers, engineers and students who are interested in but not yet familiar with causal inference. We start by laying out an important set of assumptions that are collectively needed for causal identification, such as exchangeability, positivity, consistency and the absence of interference. From these assumptions, we build out a set of important causal inference techniques, which we do so by categorizing them into two buckets; active and passive approaches. We describe and discuss randomized controlled trials and bandit-based approaches from the active category. We then describe classical approaches, such as matching and inverse probability weighting, in the passive category, followed by more recent deep learning based algorithms. By finishing the paper with some of the missing aspects of causal inference from this paper, such as collider biases, we expect this paper to provide readers with a diverse set of starting points for further reading and research in causal inference and discovery.

replace Irregular Traffic Time Series Forecasting Based on Asynchronous Spatio-Temporal Graph Convolutional Network

Authors: Weijia Zhang, Le Zhang, Jindong Han, Hao Liu, Yanjie Fu, Jingbo Zhou, Yu Mei, Hui Xiong

Abstract: Accurate traffic forecasting is crucial for the development of Intelligent Transportation Systems (ITS), playing a pivotal role in modern urban traffic management. Traditional forecasting methods, however, struggle with the irregular traffic time series resulting from adaptive traffic signal controls, presenting challenges in asynchronous spatial dependency, irregular temporal dependency, and predicting variable-length sequences. To this end, we propose an Asynchronous Spatio-tEmporal graph convolutional nEtwoRk (ASeer) tailored for irregular traffic time series forecasting. Specifically, we first propose an Asynchronous Graph Diffusion Network to capture the spatial dependency between asynchronously measured traffic states regulated by adaptive traffic signals. After that, to capture the temporal dependency within irregular traffic state sequences, a personalized time encoding is devised to embed the continuous time signals. Then, we propose a Transformable Time-aware Convolution Network, which adapts meta-filters for time-aware convolution on the sequences with inconsistent temporal flow. Additionally, a Semi-Autoregressive Prediction Network, comprising a state evolution unit and a semi-autoregressive predictor, is designed to predict variable-length traffic sequences effectively and efficiently. Extensive experiments on a newly established benchmark demonstrate the superiority of ASeer compared with twelve competitive baselines across six metrics.

replace Revisiting LARS for Large Batch Training Generalization of Neural Networks

Authors: Khoi Do, Duong Nguyen, Hoa Nguyen, Long Tran-Thanh, Nguyen-Hoang Tran, Quoc-Viet Pham

Abstract: This paper explores Large Batch Training techniques using layer-wise adaptive scaling ratio (LARS) across diverse settings, uncovering insights. LARS algorithms with warm-up tend to be trapped in sharp minimizers early on due to redundant ratio scaling. Additionally, a fixed steep decline in the latter phase restricts deep neural networks from effectively navigating early-phase sharp minimizers. Building on these findings, we propose Time Varying LARS (TVLARS), a novel algorithm that replaces warm-up with a configurable sigmoid-like function for robust training in the initial phase. TVLARS promotes gradient exploration early on, surpassing sharp optimizers and gradually transitioning to LARS for robustness in later phases. Extensive experiments demonstrate that TVLARS consistently outperforms LARS and LAMB in most cases, with up to 2\% improvement in classification scenarios. Notably, in all self-supervised learning cases, TVLARS dominates LARS and LAMB with performance improvements of up to 10\%.

replace Anti-Matthew FL: Bridging the Performance Gap in Federated Learning to Counteract the Matthew Effect

Authors: Jiashi Gao, Xin Yao, Xuetao Wei

Abstract: Federated learning (FL) stands as a paradigmatic approach that facilitates model training across heterogeneous and diverse datasets originating from various data providers. However, conventional FLs fall short of achieving consistent performance, potentially leading to performance degradation for clients who are disadvantaged in data resources. Influenced by the Matthew effect, deploying a performance-imbalanced global model in applications further impedes the generation of high-quality data from disadvantaged clients, exacerbating the disparities in data resources among clients. In this work, we propose anti-Matthew fairness for the global model at the client level, requiring equal accuracy and equal decision bias across clients. To balance the trade-off between achieving anti-Matthew fairness and performance optimality, we formalize the anti-Matthew effect federated learning (anti-Matthew FL) as a multi-constrained multi-objectives optimization (MCMOO) problem and propose a three-stage multi-gradient descent algorithm to obtain the Pareto optimality. We theoretically analyze the convergence and time complexity of our proposed algorithms. Additionally, through extensive experimentation, we demonstrate that our proposed anti-Matthew FL outperforms other state-of-the-art FL algorithms in achieving a high-performance global model while effectively bridging performance gaps among clients. We hope this work provides valuable insights into the manifestation of the Matthew effect in FL and other decentralized learning scenarios and can contribute to designing fairer learning mechanisms, ultimately fostering societal welfare.

replace FERI: A Multitask-based Fairness Achieving Algorithm with Applications to Fair Organ Transplantation

Authors: Can Li, Dejian Lai, Xiaoqian Jiang, Kai Zhang

Abstract: Liver transplantation often faces fairness challenges across subgroups defined by sensitive attributes such as age group, gender, and race/ethnicity. Machine learning models for outcome prediction can introduce additional biases. Therefore, we introduce Fairness through the Equitable Rate of Improvement in Multitask Learning (FERI) algorithm for fair predictions of graft failure risk in liver transplant patients. FERI constrains subgroup loss by balancing learning rates and preventing subgroup dominance in the training process. Our results show that FERI maintained high predictive accuracy with AUROC and AUPRC comparable to baseline models. More importantly, FERI demonstrated an ability to improve fairness without sacrificing accuracy. Specifically, for the gender, FERI reduced the demographic parity disparity by 71.74%, and for the age group, it decreased the equalized odds disparity by 40.46%. Therefore, the FERI algorithm advanced fairness-aware predictive modeling in healthcare and provides an invaluable tool for equitable healthcare systems.

replace When Fairness Meets Privacy: Exploring Privacy Threats in Fair Binary Classifiers via Membership Inference Attacks

Authors: Huan Tian, Guangsheng Zhang, Bo Liu, Tianqing Zhu, Ming Ding, Wanlei Zhou

Abstract: Previous studies have developed fairness methods for biased models that exhibit discriminatory behaviors towards specific subgroups. While these models have shown promise in achieving fair predictions, recent research has identified their potential vulnerability to score-based membership inference attacks (MIAs). In these attacks, adversaries can infer whether a particular data sample was used during training by analyzing the model's prediction scores. However, our investigations reveal that these score-based MIAs are ineffective when targeting fairness-enhanced models in binary classifications. The attack models trained to launch the MIAs degrade into simplistic threshold models, resulting in lower attack performance. Meanwhile, we observe that fairness methods often lead to prediction performance degradation for the majority subgroups of the training data. This raises the barrier to successful attacks and widens the prediction gaps between member and non-member data. Building upon these insights, we propose an efficient MIA method against fairness-enhanced models based on fairness discrepancy results (FD-MIA). It leverages the difference in the predictions from both the original and fairness-enhanced models and exploits the observed prediction gaps as attack clues. We also explore potential strategies for mitigating privacy leakages. Extensive experiments validate our findings and demonstrate the efficacy of the proposed method.

replace Enhanced Latent Multi-view Subspace Clustering

Authors: Long Shi, Lei Cao, Jun Wang, Badong Chen

Abstract: Latent multi-view subspace clustering has been demonstrated to have desirable clustering performance. However, the original latent representation method vertically concatenates the data matrices from multiple views into a single matrix along the direction of dimensionality to recover the latent representation matrix, which may result in an incomplete information recovery. To fully recover the latent space representation, we in this paper propose an Enhanced Latent Multi-view Subspace Clustering (ELMSC) method. The ELMSC method involves constructing an augmented data matrix that enhances the representation of multi-view data. Specifically, we stack the data matrices from various views into the block-diagonal locations of the augmented matrix to exploit the complementary information. Meanwhile, the non-block-diagonal entries are composed based on the similarity between different views to capture the consistent information. In addition, we enforce a sparse regularization for the non-diagonal blocks of the augmented self-representation matrix to avoid redundant calculations of consistency information. Finally, a novel iterative algorithm based on the framework of Alternating Direction Method of Multipliers (ADMM) is developed to solve the optimization problem for ELMSC. Extensive experiments on real-world datasets demonstrate that our proposed ELMSC is able to achieve higher clustering performance than some state-of-art multi-view clustering methods.

replace Deep Reinforcement Learning for Multi-Truck Vehicle Routing Problems with Multi-Leg Demand Routes

Authors: Joshua Levin, Randall Correll, Takanori Ide, Takafumi Suzuki, Takaho Saito, Alan Arai

Abstract: Deep reinforcement learning (RL) has been shown to be effective in producing approximate solutions to some vehicle routing problems (VRPs), especially when using policies generated by encoder-decoder attention mechanisms. While these techniques have been quite successful for relatively simple problem instances, there are still under-researched and highly complex VRP variants for which no effective RL method has been demonstrated. In this work we focus on one such VRP variant, which contains multiple trucks and multi-leg routing requirements. In these problems, demand is required to move along sequences of nodes, instead of just from a start node to an end node. With the goal of making deep RL a viable strategy for real-world industrial-scale supply chain logistics, we develop new extensions to existing encoder-decoder attention models which allow them to handle multiple trucks and multi-leg routing requirements. Our models have the advantage that they can be trained for a small number of trucks and nodes, and then embedded into a large supply chain to yield solutions for larger numbers of trucks and nodes. We test our approach on a real supply chain environment arising in the operations of Japanese automotive parts manufacturer Aisin Corporation, and find that our algorithm outperforms Aisin's previous best solution.

replace Natural Mitigation of Catastrophic Interference: Continual Learning in Power-Law Learning Environments

Authors: Atith Gandhi, Raj Sanjay Shah, Vijay Marupudi, Sashank Varma

Abstract: Neural networks often suffer from catastrophic interference (CI): performance on previously learned tasks drops off significantly when learning a new task. This contrasts strongly with humans, who can continually learn new tasks without appreciably forgetting previous tasks. Prior work has explored various techniques for mitigating CI and promoting continual learning such as regularization, rehearsal, generative replay, and context-specific components. This paper takes a different approach, one guided by cognitive science research showing that in naturalistic environments, the probability of encountering a task decreases as a power-law of the time since it was last performed. We argue that techniques for mitigating CI should be compared against the intrinsic mitigation in simulated naturalistic learning environments. Thus, we evaluate the extent of the natural mitigation of CI when training models in power-law environments, similar to those humans face. Our results show that natural rehearsal environments are better at mitigating CI than existing methods, calling for the need for better evaluation processes. The benefits of this environment include simplicity, rehearsal that is agnostic to both tasks and models, and the lack of a need for extra neural circuitry. In addition, we explore popular mitigation techniques in power-law environments to create new baselines for continual learning research.

replace Nonlinear subspace clustering by functional link neural networks

Authors: Long Shi, Lei Cao, Zhongpu Chen, Badong Chen, Yu Zhao

Abstract: Nonlinear subspace clustering based on a feed-forward neural network has been demonstrated to provide better clustering accuracy than some advanced subspace clustering algorithms. While this approach demonstrates impressive outcomes, it involves a balance between effectiveness and computational cost. In this study, we employ a functional link neural network to transform data samples into a nonlinear domain. Subsequently, we acquire a self-representation matrix through a learning mechanism that builds upon the mapped samples. As the functional link neural network is a single-layer neural network, our proposed method achieves high computational efficiency while ensuring desirable clustering performance. By incorporating the local similarity regularization to enhance the grouping effect, our proposed method further improves the quality of the clustering results. Additionally, we introduce a convex combination subspace clustering scheme, which combining a linear subspace clustering method with the functional link neural network subspace clustering approach. This combination approach allows for a dynamic balance between linear and nonlinear representations. Extensive experiments confirm the advancement of our methods. The source code will be released on https://lshi91.github.io/ soon.

URLs: https://lshi91.github.io/

replace Synergistic eigenanalysis of covariance and Hessian matrices for enhanced binary classification

Authors: Agus Hartoyo, Jan Argasi\'nski, Aleksandra Trenk, Kinga Przybylska, Anna B{\l}asiak, Alessandro Crimi

Abstract: Covariance and Hessian matrices have been analyzed separately in the literature for classification problems. However, integrating these matrices has the potential to enhance their combined power in improving classification performance. We present a novel approach that combines the eigenanalysis of a covariance matrix evaluated on a training set with a Hessian matrix evaluated on a deep learning model to achieve optimal class separability in binary classification tasks. Our approach is substantiated by formal proofs that establish its capability to maximize between-class mean distance and minimize within-class variances, particularly under ideal data conditions such as isotropy around class means and dominant leading eigenvalues. By projecting data into the combined space of the most relevant eigendirections from both matrices, we achieve optimal class separability as per the linear discriminant analysis (LDA) criteria. Empirical validation across neural and health datasets consistently supports our theoretical framework and demonstrates that our method outperforms established methods. Our method stands out by addressing both LDA criteria, unlike PCA and the Hessian method, which predominantly emphasize one criterion each. This comprehensive approach captures intricate patterns and relationships, enhancing classification performance. Furthermore, through the utilization of both LDA criteria, our method outperforms LDA itself by leveraging higher-dimensional feature spaces, in accordance with Cover's theorem, which favors linear separability in higher dimensions. Our method also surpasses kernel-based methods and manifold learning techniques in performance. Additionally, our approach sheds light on complex DNN decision-making, rendering them comprehensible within a 2D space.

replace A StrongREJECT for Empty Jailbreaks

Authors: Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, Sam Toyer

Abstract: Most jailbreak papers claim the jailbreaks they propose are highly effective, often boasting near-100% attack success rates. However, it is perhaps more common than not for jailbreak developers to substantially exaggerate the effectiveness of their jailbreaks. We suggest this problem arises because jailbreak researchers lack a standard, high-quality benchmark for evaluating jailbreak performance, leaving researchers to create their own. To create a benchmark, researchers must choose a dataset of forbidden prompts to which a victim model will respond, along with an evaluation method that scores the harmfulness of the victim model's responses. We show that existing benchmarks suffer from significant shortcomings and introduce the StrongREJECT benchmark to address these issues. StrongREJECT's dataset contains prompts that victim models must answer with specific, harmful information, while its automated evaluator measures the extent to which a response gives useful information to forbidden prompts. In doing so, the StrongREJECT evaluator achieves state-of-the-art agreement with human judgments of jailbreak effectiveness. Notably, we find that existing evaluation methods significantly overstate jailbreak effectiveness compared to human judgments and the StrongREJECT evaluator. We describe a surprising and novel phenomenon that explains this discrepancy: jailbreaks bypassing a victim model's safety fine-tuning tend to reduce its capabilities. Together, our findings underscore the need for researchers to use a high-quality benchmark, such as StrongREJECT, when developing new jailbreak attacks. We release the StrongREJECT code and data at https://strong-reject.readthedocs.io/en/latest/.

URLs: https://strong-reject.readthedocs.io/en/latest/.

replace Predicting O-GlcNAcylation Sites in Mammalian Proteins with Transformers and RNNs Trained with a New Loss Function

Authors: Pedro Seber

Abstract: Glycosylation, a protein modification, has multiple essential functional and structural roles. O-GlcNAcylation, a subtype of glycosylation, has the potential to be an important target for therapeutics, but methods to reliably predict O-GlcNAcylation sites had not been available until 2023; a 2021 review correctly noted that published models were insufficient and failed to generalize. Moreover, many are no longer usable. In 2023, a considerably better RNN model with an F$_1$ score of 36.17% and an MCC of 34.57% on a large dataset was published. This article first sought to improve these metrics using transformer encoders. While transformers displayed high performance on this dataset, their performance was inferior to that of the previously published RNN. We then created a new loss function, which we call the weighted focal differentiable MCC, to improve the performance of classification models. RNN models trained with this new function display superior performance to models trained using the weighted cross-entropy loss; this new function can also be used to fine-tune trained models. A two-cell RNN trained with this loss achieves state-of-the-art performance in O-GlcNAcylation site prediction with an F$_1$ score of 38.88% and an MCC of 38.20% on that large dataset.

replace Causal Multi-Label Feature Selection in Federated Setting

Authors: Yukun Song, Dayuan Cao, Jiali Miao, Shuai Yang, Kui Yu

Abstract: Multi-label feature selection serves as an effective mean for dealing with high-dimensional multi-label data. To achieve satisfactory performance, existing methods for multi-label feature selection often require the centralization of substantial data from multiple sources. However, in Federated setting, centralizing data from all sources and merging them into a single dataset is not feasible. To tackle this issue, in this paper, we study a challenging problem of causal multi-label feature selection in federated setting and propose a Federated Causal Multi-label Feature Selection (FedCMFS) algorithm with three novel subroutines. Specifically, FedCMFS first uses the FedCFL subroutine that considers the correlations among label-label, label-feature, and feature-feature to learn the relevant features (candidate parents and children) of each class label while preserving data privacy without centralizing data. Second, FedCMFS employs the FedCFR subroutine to selectively recover the missed true relevant features. Finally, FedCMFS utilizes the FedCFC subroutine to remove false relevant features. The extensive experiments on 8 datasets have shown that FedCMFS is effect for causal multi-label feature selection in federated setting.

replace Localising the Seizure Onset Zone from Single-Pulse Electrical Stimulation Responses with a CNN Transformer

Authors: Jamie Norris, Aswin Chari, Dorien van Blooijs, Gerald Cooray, Karl Friston, Martin Tisdall, Richard Rosch

Abstract: Epilepsy is one of the most common neurological disorders, often requiring surgical intervention when medication fails to control seizures. For effective surgical outcomes, precise localisation of the epileptogenic focus - often approximated through the Seizure Onset Zone (SOZ) - is critical yet remains a challenge. Active probing through electrical stimulation is already standard clinical practice for identifying epileptogenic areas. Our study advances the application of deep learning for SOZ localisation using Single-Pulse Electrical Stimulation (SPES) responses, with two key contributions. Firstly, we implement an existing deep learning model to compare two SPES analysis paradigms: divergent and convergent. These paradigms evaluate outward and inward effective connections, respectively. We assess the generalisability of these models to unseen patients and electrode placements using held-out test sets. Our findings reveal a notable improvement in moving from a divergent (AUROC: 0.574) to a convergent approach (AUROC: 0.666), marking the first application of the latter in this context. Secondly, we demonstrate the efficacy of CNN Transformers with cross-channel attention in handling heterogeneous electrode placements, increasing the AUROC to 0.730. These findings represent a significant step in modelling patient-specific intracranial EEG electrode placements in SPES. Future work will explore integrating these models into clinical decision-making processes to bridge the gap between deep learning research and practical healthcare applications.

replace Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

Authors: Aniruddha Nrusimha, Mayank Mishra, Naigang Wang, Dan Alistarh, Rameswar Panda, Yoon Kim

Abstract: We consider the problem of accurate quantization for language models, where both the weights and activations are uniformly quantized to 4 bits per parameter, the lowest bitwidth format natively supported by GPU hardware. In this context, the key challenge is activation quantization: it is known that language models contain outlier channels whose values on average are orders of magnitude higher than than other channels, which prevents accurate low-bitwidth quantization with known techniques. We systematically study this phenomena and find that these outlier channels emerge early in training, and that they occur more frequently in layers with residual streams. We then propose a simple strategy which regularizes a layer's inputs via quantization-aware training (QAT) and its outputs via activation kurtosis regularization. We show that regularizing both the inputs and outputs is crucial for preventing a model's "migrating" the difficulty in input quantization to the weights, which makes post-training quantization (PTQ) of weights more difficult. When combined with weight PTQ, we show that our approach can obtain a W4A4 model that performs competitively to the standard-precision W16A16 baseline.

replace Inference-Time Rule Eraser: Fair Recognition via Distilling and Removing Biased Rules

Authors: Yi Zhang, Dongyuan Lu, Jitao Sang

Abstract: Machine learning models often make predictions based on biased features such as gender, race, and other social attributes, posing significant fairness risks, especially in societal applications, such as hiring, banking, and criminal justice. Traditional approaches to addressing this issue involve retraining or fine-tuning neural networks with fairness-aware optimization objectives. However, these methods can be impractical due to significant computational resources, complex industrial tests, and the associated CO2 footprint. Additionally, regular users often fail to fine-tune models because they lack access to model parameters In this paper, we introduce the Inference-Time Rule Eraser (Eraser), a novel method designed to address fairness concerns by removing biased decision-making rules from deployed models during inference without altering model weights. We begin by establishing a theoretical foundation for modifying model outputs to eliminate biased rules through Bayesian analysis. Next, we present a specific implementation of Eraser that involves two stages: (1) distilling the biased rules from the deployed model into an additional patch model, and (2) removing these biased rules from the output of the deployed model during inference. Extensive experiments validate the effectiveness of our approach, showcasing its superior performance in addressing fairness concerns in AI systems.

replace Predictive Modeling of Flexible EHD Pumps using Kolmogorov-Arnold Networks

Authors: Yanhong Peng, Yuxin Wang, Fangchao Hu, Miao He, Zebing Mao, Xia Huang, Jun Ding

Abstract: We present a novel approach to predicting the pressure and flow rate of flexible electrohydrodynamic pumps using the Kolmogorov-Arnold Network. Inspired by the Kolmogorov-Arnold representation theorem, KAN replaces fixed activation functions with learnable spline-based activation functions, enabling it to approximate complex nonlinear functions more effectively than traditional models like Multi-Layer Perceptron and Random Forest. We evaluated KAN on a dataset of flexible EHD pump parameters and compared its performance against RF, and MLP models. KAN achieved superior predictive accuracy, with Mean Squared Errors of 12.186 and 0.001 for pressure and flow rate predictions, respectively. The symbolic formulas extracted from KAN provided insights into the nonlinear relationships between input parameters and pump performance. These findings demonstrate that KAN offers exceptional accuracy and interpretability, making it a promising alternative for predictive modeling in electrohydrodynamic pumping.

replace Glauber Generative Model: Discrete Diffusion Models via Binary Classification

Authors: Harshit Varma, Dheeraj Nagaraj, Karthikeyan Shanmugam

Abstract: We introduce the Glauber Generative Model (GGM), a new class of discrete diffusion models, to obtain new samples from a distribution given samples from a discrete space. GGM deploys a discrete Markov chain called the heat bath dynamics (or the Glauber dynamics) to denoise a sequence of noisy tokens to a sample from a joint distribution of discrete tokens. Our novel conceptual framework provides an exact reduction of the task of learning the denoising Markov chain to solving a class of binary classification tasks. More specifically, the model learns to classify a given token in a noisy sequence as signal or noise. In contrast, prior works on discrete diffusion models either solve regression problems to learn importance ratios, or minimize loss functions given by variational approximations. We apply GGM to language modeling and image generation, where images are discretized using image tokenizers like VQGANs. We show that it outperforms existing discrete diffusion models in language generation, and demonstrates strong performance for image generation without using dataset-specific image tokenizers. We also show that our model is capable of performing well in zero-shot control settings like text and image infilling.

replace Conformal Depression Prediction

Authors: Yonghong Li, Xiuzhuang Zhou

Abstract: While existing depression prediction methods based on deep learning show promise, their practical application is hindered by the lack of trustworthiness, as these deep models are often deployed as black box models, leaving us uncertain on the confidence of their predictions. For high-risk clinical applications like depression prediction, uncertainty quantification is essential in decision-making. In this paper, we introduce conformal depression prediction (CDP), a depression prediction method with uncertainty quantification based on conformal prediction (CP), giving valid confidence intervals with theoretical coverage guarantees for the model predictions. CDP is a plug-and-play module that requires neither model retraining nor an assumption about the depression data distribution. As CDP provides only an average coverage guarantee across all inputs rather than per-input performance guarantee, we further propose CDP-ACC, an improved conformal prediction with approximate conditional coverage. CDP-ACC firstly estimates the prediction distribution through neighborhood relaxation, and then introduces a conformal score function by constructing nested sequences, so as to provide a tighter prediction interval adaptive to specific input. We empirically demonstrate the application of CDP in uncertainty-aware facial depression prediction, as well as the effectiveness and superiority of CDP-ACC on the AVEC 2013 and AVEC 2014 datasets. Our code is publicly available at https://github.com/PushineLee/CDP.

URLs: https://github.com/PushineLee/CDP.

replace Enhancing Sign Language Detection through Mediapipe and Convolutional Neural Networks (CNN)

Authors: Aditya Raj Verma, Gagandeep Singh, Karnim Meghwal, Banawath Ramji, Praveen Kumar Dadheech

Abstract: This research combines MediaPipe and CNNs for the efficient and accurate interpretation of ASL dataset for the real-time detection of sign language. The system presented here captures and processes hands' gestures in real time. the intended purpose was to create a very easy, accurate, and fast way of entering commands without the necessity of touching something.MediaPipe supports one of the powerful frameworks in real-time hand tracking capabilities for the ability to capture and preprocess hand movements, which increases the accuracy of the gesture recognition system. Actually, the integration of CNN with the MediaPipe results in higher efficiency in using the model of real-time processing.The accuracy achieved by the model on ASL datasets is 99.12\%.The model was tested using American Sign Language (ASL) datasets. The results were then compared to those of existing methods to evaluate how well it performed, using established evaluation techniques. The system will have applications in the communication, education, and accessibility domains. Making systems such as described in this paper even better will assist people with hearing impairment and make things accessible to them. We tested the recognition and translation performance on an ASL dataset and achieved better accuracy over previous models.It is meant to the research is to identify the characters that American signs recognize using hand images taken from a web camera by based on mediapipe and CNNs

replace Parallelizing Linear Transformers with the Delta Rule over Sequence Length

Authors: Songlin Yang, Bailin Wang, Yu Zhang, Yikang Shen, Yoon Kim

Abstract: Transformers with linear attention (i.e., linear transformers) and state-space models have recently been suggested as a viable linear-time alternative to transformers with softmax attention. However, these models still underperform transformers especially on tasks that require in-context retrieval. While more expressive variants of linear transformers which replace the additive outer-product update in linear transformers with the delta rule have been found to be more effective at associative recall, existing algorithms for training such models do not parallelize over sequence length and are thus inefficient to train on modern hardware. This work describes a hardware-efficient algorithm for training linear transformers with the delta rule, which exploits a memory-efficient representation for computing products of Householder matrices. This algorithm allows us to scale up DeltaNet to standard language modeling settings. We train a 1.3B model for 100B tokens and find that it outperforms recent linear-time baselines such as Mamba and GLA in terms of perplexity and zero-shot performance on downstream tasks (including on tasks that focus on recall). We also experiment with two hybrid models which combine DeltaNet layers with (1) sliding-window attention layers every other layer or (2) two global attention layers, and find that these hybrid models outperform strong transformer baselines.

replace On Newton's Method to Unlearn Neural Networks

Authors: Nhung Bui, Xinyang Lu, Rachael Hwee Ling Sim, See-Kiong Ng, Bryan Kian Hsiang Low

Abstract: With the widespread applications of neural networks (NNs) trained on personal data, machine unlearning has become increasingly important for enabling individuals to exercise their personal data ownership, particularly the "right to be forgotten" from trained NNs. Since retraining is computationally expensive, we seek approximate unlearning algorithms for NNs that return identical models to the retrained oracle. While Newton's method has been successfully used to approximately unlearn linear models, we observe that adapting it for NN is challenging due to degenerate Hessians that make computing Newton's update impossible. Additionally, we show that when coupled with popular techniques to resolve the degeneracy, Newton's method often incurs offensively large norm updates and empirically degrades model performance post-unlearning. To address these challenges, we propose CureNewton's method, a principle approach that leverages cubic regularization to handle the Hessian degeneracy effectively. The added regularizer eliminates the need for manual finetuning and affords a natural interpretation within the unlearning context. Experiments across different models and datasets show that our method can achieve competitive unlearning performance to the state-of-the-art algorithm in practical unlearning settings, while being theoretically justified and efficient in running time.

replace STD-PLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with PLM

Authors: YiHeng Huang, Xiaowei Mao, Shengnan Guo, Yubin Chen, Junfeng Shen, Tiankuo Li, Youfang Lin, Huaiyu Wan

Abstract: Spatial-temporal forecasting and imputation are important for real-world intelligent systems. Most existing methods are tailored for individual forecasting or imputation tasks but are not designed for both. Additionally, they are less effective for zero-shot and few-shot learning. While pre-trained language model (PLM) have exhibited strong pattern recognition and reasoning abilities across various tasks, including few-shot and zero-shot learning, their applications in spatial-temporal data understanding has been constrained by insufficient modeling of complex correlations such as the temporal correlations, spatial connectivity, non-pairwise and high-order spatial-temporal correlations within data. In this paper, we propose STD-PLM for understanding both spatial and temporal properties of \underline{S}patial-\underline{T}emporal \underline{D}ata with \underline{PLM}, which is capable of implementing both spatial-temporal forecasting and imputation tasks. STD-PLM understands spatial-temporal correlations via explicitly designed spatial and temporal tokenizers. Topology-aware node embeddings are designed for PLM to comprehend and exploit the topology structure of data in inductive manner. Furthermore, to mitigate the efficiency issues introduced by the PLM, we design a sandglass attention module (SGA) combined with a specific constrained loss function, which significantly improves the model's efficiency while ensuring performance. Extensive experiments demonstrate that STD-PLM exhibits competitive performance and generalization capabilities across the forecasting and imputation tasks on various datasets. Moreover, STD-PLM achieves promising results on both few-shot and zero-shot tasks.

replace Fast Matrix Multiplications for Lookup Table-Quantized LLMs

Authors: Han Guo, William Brandon, Radostin Cholakov, Jonathan Ragan-Kelley, Eric P. Xing, Yoon Kim

Abstract: The deployment of large language models (LLMs) is often constrained by memory bandwidth, where the primary bottleneck is the cost of transferring model parameters from the GPU's global memory to its registers. When coupled with custom kernels that fuse the dequantization and matmul operations, weight-only quantization can thus enable faster inference by reducing the amount of memory movement. However, developing high-performance kernels for weight-quantized LLMs presents substantial challenges, especially when the weights are compressed to non-evenly-divisible bit widths (e.g., 3 bits) with non-uniform, lookup table (LUT) quantization. This paper describes FLUTE, a flexible lookup table engine for LUT-quantized LLMs, which uses offline restructuring of the quantized weight matrix to minimize bit manipulations associated with unpacking, and vectorization and duplication of the lookup table to mitigate shared memory bandwidth constraints. At batch sizes < 32 and quantization group size of 128 (typical in LLM inference), the FLUTE kernel can be 2-4x faster than existing GEMM kernels. As an application of FLUTE, we explore a simple extension to lookup table-based NormalFloat quantization and apply it to quantize LLaMA3 to various configurations, obtaining competitive quantization performance against strong baselines while obtaining an end-to-end throughput increase of 1.5 to 2 times.

replace A Comprehensive Survey on Kolmogorov Arnold Networks (KAN)

Authors: Yuntian Hou, Di Zhang

Abstract: Through this comprehensive survey of Kolmogorov-Arnold Networks(KAN), we have gained a thorough understanding of its theoretical foundation, architectural design, application scenarios, and current research progress. KAN, with its unique architecture and flexible activation functions, excels in handling complex data patterns and nonlinear relationships, demonstrating wide-ranging application potential. While challenges remain, KAN is poised to pave the way for innovative solutions in various fields, potentially revolutionizing how we approach complex computational problems.

replace STAMP: Outlier-Aware Test-Time Adaptation with Stable Memory Replay

Authors: Yongcan Yu, Lijun Sheng, Ran He, Jian Liang

Abstract: Test-time adaptation (TTA) aims to address the distribution shift between the training and test data with only unlabeled data at test time. Existing TTA methods often focus on improving recognition performance specifically for test data associated with classes in the training set. However, during the open-world inference process, there are inevitably test data instances from unknown classes, commonly referred to as outliers. This paper pays attention to the problem that conducts both sample recognition and outlier rejection during inference while outliers exist. To address this problem, we propose a new approach called STAble Memory rePlay (STAMP), which performs optimization over a stable memory bank instead of the risky mini-batch. In particular, the memory bank is dynamically updated by selecting low-entropy and label-consistent samples in a class-balanced manner. In addition, we develop a self-weighted entropy minimization strategy that assigns higher weight to low-entropy samples. Extensive results demonstrate that STAMP outperforms existing TTA methods in terms of both recognition and outlier detection performance. The code is released at https://github.com/yuyongcan/STAMP.

URLs: https://github.com/yuyongcan/STAMP.

replace Exploring Cross-model Neuronal Correlations in the Context of Predicting Model Performance and Generalizability

Authors: Haniyeh Ehsani Oskouie, Lionel Levine, Majid Sarrafzadeh

Abstract: As Artificial Intelligence (AI) models are increasingly integrated into critical systems, the need for a robust framework to establish the trustworthiness of AI is increasingly paramount. While collaborative efforts have established conceptual foundations for such a framework, there remains a significant gap in developing concrete, technically robust methods for assessing AI model quality and performance. A critical drawback in the traditional methods for assessing the validity and generalizability of models is their dependence on internal developer datasets, rendering it challenging to independently assess and verify their performance claims. This paper introduces a novel approach for assessing a newly trained model's performance based on another known model by calculating correlation between neural networks. The proposed method evaluates correlations by determining if, for each neuron in one network, there exists a neuron in the other network that produces similar output. This approach has implications for memory efficiency, allowing for the use of smaller networks when high correlation exists between networks of different sizes. Additionally, the method provides insights into robustness, suggesting that if two highly correlated networks are compared and one demonstrates robustness when operating in production environments, the other is likely to exhibit similar robustness. This contribution advances the technical toolkit for responsible AI, supporting more comprehensive and nuanced evaluations of AI models to ensure their safe and effective deployment. Code is available at https://github.com/aheldis/Cross-model-correlation.git.

URLs: https://github.com/aheldis/Cross-model-correlation.git.

replace NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models

Authors: Cheng Lin, Lujun Li, Dezhi Li, Jie Zou, Wei Xue, Yike Guo

Abstract: In this paper, we introduce Nested Low-Rank Adaptation (NoRA), a novel approach to parameter-efficient fine-tuning that extends the capabilities of Low-Rank Adaptation (LoRA) techniques. Vanilla LoRA overlooks pre-trained weight inheritance and still requires fine-tuning numerous parameters. To addresses these issues, our NoRA adopts a dual-layer nested structure with Singular Value Decomposition (SVD), effectively leveraging original matrix knowledge while reducing tunable parameters. Specifically, NoRA freezes the outer LoRA weights and utilizes an inner LoRA design, providing enhanced control over model optimization. This approach allows the model to more precisely adapt to specific tasks while maintaining a compact parameter space. By freezing outer LoRA weights and using an inner LoRA design, NoRA enables precise task adaptation with a compact parameter space. Evaluations on tasks including commonsense reasoning with large language models, fine-tuning vision-language models, and subject-driven generation demonstrate NoRA's superiority over LoRA and its variants. Code will be released upon acceptance.

replace Do Neural Scaling Laws Exist on Graph Self-Supervised Learning?

Authors: Qian Ma, Haitao Mao, Jingzhe Liu, Zhehua Zhang, Chunlin Feng, Yu Song, Yihan Shao, Yao Ma

Abstract: Self-supervised learning~(SSL) is essential to obtain foundation models in NLP and CV domains via effectively leveraging knowledge in large-scale unlabeled data. The reason for its success is that a suitable SSL design can help the model to follow the neural scaling law, i.e., the performance consistently improves with increasing model and dataset sizes. However, it remains a mystery whether existing SSL in the graph domain can follow the scaling behavior toward building Graph Foundation Models~(GFMs) with large-scale pre-training. In this study, we examine whether existing graph SSL techniques can follow the neural scaling behavior with the potential to serve as the essential component for GFMs. Our benchmark includes comprehensive SSL technique implementations with analysis conducted on both the conventional SSL setting and many new settings adopted in other domains. Surprisingly, despite the SSL loss continuously decreasing, no existing graph SSL techniques follow the neural scaling behavior on the downstream performance. The model performance only merely fluctuates on different data scales and model scales. Instead of the scales, the key factors influencing the performance are the choices of model architecture and pretext task design. This paper examines existing SSL techniques for the feasibility of Graph SSL techniques in developing GFMs and opens a new direction for graph SSL design with the new evaluation prototype. Our code implementation is available online to ease reproducibility on https://github.com/GraphSSLScaling/GraphSSLScaling.

URLs: https://github.com/GraphSSLScaling/GraphSSLScaling.

replace ALIAS: DAG Learning with Efficient Unconstrained Policies

Authors: Bao Duong, Hung Le, Thin Nguyen

Abstract: Recently, reinforcement learning (RL) has proved a promising alternative for conventional local heuristics in score-based approaches to learning directed acyclic causal graphs (DAGs) from observational data. However, the intricate acyclicity constraint still challenges the efficient exploration of the vast space of DAGs in existing methods. In this study, we introduce ALIAS (reinforced dAg Learning wIthout Acyclicity conStraints), a novel approach to causal discovery powered by the RL machinery. Our method features an efficient policy for generating DAGs in just a single step with an optimal quadratic complexity, fueled by a novel parametrization of DAGs that directly translates a continuous space to the space of all DAGs, bypassing the need for explicitly enforcing acyclicity constraints. This approach enables us to navigate the search space more effectively by utilizing policy gradient methods and established scoring functions. In addition, we provide compelling empirical evidence for the strong performance of ALIAS in comparison with state-of-the-arts in causal discovery over increasingly difficult experiment conditions on both synthetic and real datasets.

replace Data Augmentation for Continual RL via Adversarial Gradient Episodic Memory

Authors: Sihao Wu, Xingyu Zhao, Xiaowei Huang

Abstract: Data efficiency of learning, which plays a key role in the Reinforcement Learning (RL) training process, becomes even more important in continual RL with sequential environments. In continual RL, the learner interacts with non-stationary, sequential tasks and is required to learn new tasks without forgetting previous knowledge. However, there is little work on implementing data augmentation for continual RL. In this paper, we investigate the efficacy of data augmentation for continual RL. Specifically, we provide benchmarking data augmentations for continual RL, by (1) summarising existing data augmentation methods and (2) including a new augmentation method for continual RL: Adversarial Augmentation with Gradient Episodic Memory (Adv-GEM). Extensive experiments show that data augmentations, such as random amplitude scaling, state-switch, mixup, adversarial augmentation, and Adv-GEM, can improve existing continual RL algorithms in terms of their average performance, catastrophic forgetting, and forward transfer, on robot control tasks. All data augmentation methods are implemented as plug-in modules for trivial integration into continual RL methods.

replace Time Series Analysis for Education: Methods, Applications, and Future Directions

Authors: Shengzhong Mao, Chaoli Zhang, Yichi Song, Jindong Wang, Xiao-Jun Zeng, Zenglin Xu, Qingsong Wen

Abstract: Recent advancements in the collection and analysis of sequential educational data have brought time series analysis to a pivotal position in educational research, highlighting its essential role in facilitating data-driven decision-making. However, there is a lack of comprehensive summaries that consolidate these advancements. To the best of our knowledge, this paper is the first to provide a comprehensive review of time series analysis techniques specifically within the educational context. We begin by exploring the landscape of educational data analytics, categorizing various data sources and types relevant to education. We then review four prominent time series methods-forecasting, classification, clustering, and anomaly detection-illustrating their specific application points in educational settings. Subsequently, we present a range of educational scenarios and applications, focusing on how these methods are employed to address diverse educational tasks, which highlights the practical integration of multiple time series methods to solve complex educational problems. Finally, we conclude with a discussion on future directions, including personalized learning analytics, multimodal data fusion, and the role of large language models (LLMs) in educational time series. The contributions of this paper include a detailed taxonomy of educational data, a synthesis of time series techniques with specific educational applications, and a forward-looking perspective on emerging trends and future research opportunities in educational analysis. The related papers and resources are available and regularly updated at the project page.

replace Improving Water Quality Time-Series Prediction in Hong Kong using Sentinel-2 MSI Data and Google Earth Engine Cloud Computing

Authors: Rohin Sood, Kevin Zhu

Abstract: Effective water quality monitoring in coastal regions is crucial due to the progressive deterioration caused by pollution and human activities. To address this, this study develops time-series models to predict chlorophyll-a (Chl-a), suspended solids (SS), and turbidity using Sentinel-2 satellite data and Google Earth Engine (GEE) in the coastal regions of Hong Kong. Leveraging Long Short-Term Memory (LSTM) Recurrent Neural Networks, the study incorporates extensive temporal datasets to enhance prediction accuracy. The models utilize spectral data from Sentinel-2, focusing on optically active components, and demonstrate that selected variables closely align with the spectral characteristics of Chl-a and SS. The results indicate improved predictive performance over previous methods, highlighting the potential for remote sensing technology in continuous and comprehensive water quality assessment.

replace An Item Response Theory-based R Module for Algorithm Portfolio Analysis

Authors: Brodie Oldfield, Sevvandi Kandanaarachchi, Ziqi Xu, Mario Andr\'es Mu\~noz

Abstract: Experimental evaluation is crucial in AI research, especially for assessing algorithms across diverse tasks. Many studies often evaluate a limited set of algorithms, failing to fully understand their strengths and weaknesses within a comprehensive portfolio. This paper introduces an Item Response Theory (IRT) based analysis tool for algorithm portfolio evaluation called AIRT-Module. Traditionally used in educational psychometrics, IRT models test question difficulty and student ability using responses to test questions. Adapting IRT to algorithm evaluation, the AIRT-Module contains a Shiny web application and the R package airt. AIRT-Module uses algorithm performance measures to compute anomalousness, consistency, and difficulty limits for an algorithm and the difficulty of test instances. The strengths and weaknesses of algorithms are visualised using the difficulty spectrum of the test instances. AIRT-Module offers a detailed understanding of algorithm capabilities across varied test instances, thus enhancing comprehensive AI method assessment. It is available at https://sevvandi.shinyapps.io/AIRT/ .

URLs: https://sevvandi.shinyapps.io/AIRT/

replace A domain decomposition-based autoregressive deep learning model for unsteady and nonlinear partial differential equations

Authors: Sheel Nidhan, Haoliang Jiang, Lalit Ghule, Clancy Umphrey, Rishikesh Ranade, Jay Pathak

Abstract: In this paper, we propose a domain-decomposition-based deep learning (DL) framework, named transient-CoMLSim, for accurately modeling unsteady and nonlinear partial differential equations (PDEs). The framework consists of two key components: (a) a convolutional neural network (CNN)-based autoencoder architecture and (b) an autoregressive model composed of fully connected layers. Unlike existing state-of-the-art methods that operate on the entire computational domain, our CNN-based autoencoder computes a lower-dimensional basis for solution and condition fields represented on subdomains. Timestepping is performed entirely in the latent space, generating embeddings of the solution variables from the time history of embeddings of solution and condition variables. This approach not only reduces computational complexity but also enhances scalability, making it well-suited for large-scale simulations. Furthermore, to improve the stability of our rollouts, we employ a curriculum learning (CL) approach during the training of the autoregressive model. The domain-decomposition strategy enables scaling to out-of-distribution domain sizes while maintaining the accuracy of predictions -- a feature not easily integrated into popular DL-based approaches for physics simulations. We benchmark our model against two widely-used DL architectures, Fourier Neural Operator (FNO) and U-Net, and demonstrate that our framework outperforms them in terms of accuracy, extrapolation to unseen timesteps, and stability for a wide range of use cases.

replace-cross Does Audio Deepfake Detection Generalize?

Authors: Nicolas M. M\"uller, Pavel Czempin, Franziska Dieckmann, Adam Froghyar, Konstantin B\"ottinger

Abstract: Current text-to-speech algorithms produce realistic fakes of human voices, making deepfake detection a much-needed area of research. While researchers have presented various techniques for detecting audio spoofs, it is often unclear exactly why these architectures are successful: Preprocessing steps, hyperparameter settings, and the degree of fine-tuning are not consistent across related work. Which factors contribute to success, and which are accidental? In this work, we address this problem: We systematize audio spoofing detection by re-implementing and uniformly evaluating architectures from related work. We identify overarching features for successful audio deepfake detection, such as using cqtspec or logspec features instead of melspec features, which improves performance by 37% EER on average, all other factors constant. Additionally, we evaluate generalization capabilities: We collect and publish a new dataset consisting of 37.9 hours of found audio recordings of celebrities and politicians, of which 17.2 hours are deepfakes. We find that related work performs poorly on such real-world data (performance degradation of up to one thousand percent). This may suggest that the community has tailored its solutions too closely to the prevailing ASVSpoof benchmark and that deepfakes are much harder to detect outside the lab than previously thought.

replace-cross MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine Learning Inference

Authors: Yongqin Wang, Rachit Rajat, Murali Annavaram

Abstract: Multi-party computing (MPC) has been gaining popularity as a secure computing model over the past few years. However, prior works have demonstrated that MPC protocols still pay substantial performance penalties compared to plaintext, particularly when applied to ML algorithms. The overhead is due to added computation and communication costs. Prior studies, as well as our own analysis, found that most MPC protocols today sequentially perform communication and computation. The participating parties must compute on their shares first and then perform data communication to allow the distribution of new secret shares before proceeding to the next computation step. In this work, we show that serialization is unnecessary, particularly in the context of ML computations (both in Convolutional neural networks and in Transformer-based models). We demonstrate that it is possible to carefully orchestrate the computation and communication steps to overlap. We propose MPC-Pipe, an efficient MPC system for both training and inference of ML workloads, which pipelines computations and communications in an MPC protocol during the online phase. MPC-Pipe proposes three pipeline schemes to optimize the online phase of ML in the semi-honest majority adversary setting. We implement MPC-Pipe by augmenting a modified version of CrypTen, which separates online and offline phases. We evaluate the end-to-end system performance benefits of the online phase of MPC using deep neural networks (VGG16, ResNet50) and Transformers using different network settings. We show that MPC-Pipe can improve the throughput and latency of ML workloads.

replace-cross Deep R Programming

Authors: Marek Gagolewski

Abstract: Deep R Programming is a comprehensive and in-depth introductory course on one of the most popular languages for data science. It equips ambitious students, professionals, and researchers with the knowledge and skills to become independent users of this potent environment so that they can tackle any problem related to data wrangling and analytics, numerical computing, statistics, and machine learning. This textbook is a non-profit project. Its online and PDF versions are freely available at .

URLs: https://deepr.gagolewski.com/>.

replace-cross Variational Autoencoding of Dental Point Clouds

Authors: Johan Ziruo Ye, Thomas {\O}rkild, Peter Lempel S{\o}ndergaard, S{\o}ren Hauberg

Abstract: Digital dentistry has made significant advancements, yet numerous challenges remain. This paper introduces the FDI 16 dataset, an extensive collection of tooth meshes and point clouds. Additionally, we present a novel approach: Variational FoldingNet (VF-Net), a fully probabilistic variational autoencoder for point clouds. Notably, prior latent variable models for point clouds lack a one-to-one correspondence between input and output points. Instead, they rely on optimizing Chamfer distances, a metric that lacks a normalized distributional counterpart, rendering it unsuitable for probabilistic modeling. We replace the explicit minimization of Chamfer distances with a suitable encoder, increasing computational efficiency while simplifying the probabilistic extension. This allows for straightforward application in various tasks, including mesh generation, shape completion, and representation learning. Empirically, we provide evidence of lower reconstruction error in dental reconstruction and interpolation, showcasing state-of-the-art performance in dental sample generation while identifying valuable latent representations

replace-cross Causal structure learning with momentum: Sampling distributions over Markov Equivalence Classes of DAGs

Authors: Moritz Schauer, Marcel Wien\"obst

Abstract: In the context of inferring a Bayesian network structure (directed acyclic graph, DAG for short), we devise a non-reversible continuous time Markov chain, the ``Causal Zig-Zag sampler'', that targets a probability distribution over classes of observationally equivalent (Markov equivalent) DAGs. The classes are represented as completed partially directed acyclic graphs (CPDAGs). The non-reversible Markov chain relies on the operators used in Chickering's Greedy Equivalence Search (GES) and is endowed with a momentum variable, which improves mixing significantly as we show empirically. The possible target distributions include posterior distributions based on a prior over DAGs and a Markov equivalent likelihood. We offer an efficient implementation wherein we develop new algorithms for listing, counting, uniformly sampling, and applying possible moves of the GES operators, all of which significantly improve upon the state-of-the-art run-time.

replace-cross Estimating optical vegetation indices and biophysical variables for temperate forests with Sentinel-1 SAR data using machine learning techniques: A case study for Czechia

Authors: Daniel Paluba, Bertrand Le Saux, P\v{r}emysl Stych

Abstract: Current optical vegetation indices (VIs) for monitoring forest ecosystems are well established and widely used in various applications, but can be limited by atmospheric effects such as clouds. In contrast, synthetic aperture radar (SAR) data can offer insightful and systematic forest monitoring with complete time series (TS) due to signal penetration through clouds and day and night image acquisitions. This study aims to address the limitations of optical satellite data by using SAR data as an alternative for estimating optical VIs for forests through machine learning (ML). While this approach is less direct and likely only feasible through the power of ML, it raises the scientific question of whether enough relevant information is contained in the SAR signal to accurately estimate VIs. This work covers the estimation of TS of four VIs (LAI, FAPAR, EVI and NDVI) using multitemporal Sentinel-1 SAR and ancillary data. The study focused on both healthy and disturbed temperate forest areas in Czechia for the year 2021, while ground truth labels generated from Sentinel-2 multispectral data. This was enabled by creating a paired multi-modal TS dataset in Google Earth Engine (GEE), including temporally and spatially aligned Sentinel-1, Sentinel-2, DEM, weather and land cover datasets. The inclusion of DEM-derived auxiliary features and additional meteorological information, further improved the results. In the comparison of ML models, the traditional ML algorithms, RFR and XGBoost slightly outperformed the AutoML approach, auto-sklearn, for all VIs, achieving high accuracies ($R^2$ between 70-86%) and low errors (0.055-0.29 of MAE). In general, up to 240 measurements per year and a spatial resolution of 20 m can be achieved using estimated SAR-based VIs with high accuracy. A great advantage of the SAR-based VI is the ability to detect abrupt forest changes with sub-weekly temporal accuracy.

replace-cross Graph GOSPA metric: a metric to measure the discrepancy between graphs of different sizes

Authors: Jinhao Gu, \'Angel F. Garc\'ia-Fern\'andez, Robert E. Firth, Lennart Svensson

Abstract: This paper proposes a metric to measure the dissimilarity between graphs that may have a different number of nodes. The proposed metric extends the generalised optimal subpattern assignment (GOSPA) metric, which is a metric for sets, to graphs. The proposed graph GOSPA metric includes costs associated with node attribute errors for properly assigned nodes, missed and false nodes and edge mismatches between graphs. The computation of this metric is based on finding the optimal assignments between nodes in the two graphs, with the possibility of leaving some of the nodes unassigned. We also propose a lower bound for the metric, which is also a metric for graphs and is computable in polynomial time using linear programming. The metric is first derived for undirected unweighted graphs and it is then extended to directed and weighted graphs. The properties of the metric are demonstrated via simulated and empirical datasets.

replace-cross LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning

Authors: Han Guo, Philip Greengard, Eric P. Xing, Yoon Kim

Abstract: We propose a simple approach for memory-efficient adaptation of pretrained language models. Our approach uses an iterative algorithm to decompose each pretrained matrix into a high-precision low-rank component and a memory-efficient quantized component. During finetuning, the quantized component remains fixed and only the low-rank component is updated. We present an integer linear programming formulation of the quantization component which enables dynamic configuration of quantization parameters (e.g., bit-width, block size) for each matrix given an overall target memory budget. We further explore a data-aware version of the algorithm which uses an approximation of the Fisher information matrix to weight the reconstruction objective during matrix decomposition. Experiments on finetuning RoBERTa and LLaMA-2 (7B and 70B) demonstrate that our low-rank plus quantized matrix decomposition approach (LQ-LoRA) outperforms strong QLoRA and GPTQ-LoRA baselines and enables aggressive quantization to sub-3 bits with only minor performance degradations. When finetuned on a language modeling calibration dataset, LQ-LoRA can also be used for model compression; in this setting our 2.75-bit LLaMA-2-70B model (which has 2.85 bits on average when including the low-rank components and requires 27GB of GPU memory) performs respectably compared to the 16-bit baseline.

replace-cross Creating Temporally Correlated High-Resolution Profiles of Load Injection Using Constrained Generative Adversarial Networks

Authors: Hritik Gopal Shah, Behrouz Azimian, Anamitra Pal

Abstract: Traditional smart meters, which measure energy usage every 15 minutes or more and report it at least a few hours later, lack the granularity needed for real-time decision-making. To address this practical problem, we introduce a new method using generative adversarial networks (GAN) that enforces temporal consistency on its high-resolution outputs via hard inequality constraints using convex optimization. A unique feature of our GAN model is that it is trained solely on slow timescale aggregated historical energy data obtained from smart meters. The results demonstrate that the model can successfully create minute-by-minute temporally correlated profiles of power usage from 15-minute interval average power consumption information. This innovative approach, emphasizing inter-neuron constraints, offers a promising avenue for improved high-speed state estimation in distribution systems and enhances the applicability of data-driven solutions for monitoring and subsequently controlling such systems.

replace-cross Conditional Stochastic Interpolation for Generative Learning

Authors: Ding Huang, Jian Huang, Ting Li, Guohao Shen

Abstract: We propose a conditional stochastic interpolation (CSI) method for learning conditional distributions. CSI is based on estimating probability flow equations or stochastic differential equations that transport a reference distribution to the target conditional distribution. This is achieved by first learning the conditional drift and score functions based on CSI, which are then used to construct a deterministic process governed by an ordinary differential equation or a diffusion process for conditional sampling. In our proposed approach, we incorporate an adaptive diffusion term to address the instability issues arising in the diffusion process. We derive explicit expressions of the conditional drift and score functions in terms of conditional expectations, which naturally lead to an nonparametric regression approach to estimating these functions. Furthermore, we establish nonasymptotic error bounds for learning the target conditional distribution. We illustrate the application of CSI on image generation using a benchmark image dataset.

replace-cross Frustrated Random Walks: A Fast Method to Compute Node Distances on Hypergraphs

Authors: Enzhi Li, Scott Nickleach, Bilal Fadlallah

Abstract: A hypergraph is a generalization of a graph that arises naturally when attribute-sharing among entities is considered. Compared to graphs, hypergraphs have the distinct advantage that they contain explicit communities and are more convenient to manipulate. An open problem in hypergraph research is how to accurately and efficiently calculate node distances on hypergraphs. Estimating node distances enables us to find a node's nearest neighbors, which has important applications in such areas as recommender system, targeted advertising, etc. In this paper, we propose using expected hitting times of random walks to compute hypergraph node distances. We note that simple random walks (SRW) cannot accurately compute node distances on highly complex real-world hypergraphs, which motivates us to introduce frustrated random walks (FRW) for this task. We further benchmark our method against DeepWalk, and show that while the latter can achieve comparable results, FRW has a distinct computational advantage in cases where the number of targets is fairly small. For such cases, we show that FRW runs in significantly shorter time than DeepWalk. Finally, we analyze the time complexity of our method, and show that for large and sparse hypergraphs, the complexity is approximately linear, rendering it superior to the DeepWalk alternative.

replace-cross A Comprehensive Survey of Cross-Domain Policy Transfer for Embodied Agents

Authors: Haoyi Niu, Jianming Hu, Guyue Zhou, Xianyuan Zhan

Abstract: The burgeoning fields of robot learning and embodied AI have triggered an increasing demand for large quantities of data. However, collecting sufficient unbiased data from the target domain remains a challenge due to costly data collection processes and stringent safety requirements. Consequently, researchers often resort to data from easily accessible source domains, such as simulation and laboratory environments, for cost-effective data acquisition and rapid model iteration. Nevertheless, the environments and embodiments of these source domains can be quite different from their target domain counterparts, underscoring the need for effective cross-domain policy transfer approaches. In this paper, we conduct a systematic review of existing cross-domain policy transfer methods. Through a nuanced categorization of domain gaps, we encapsulate the overarching insights and design considerations of each problem setting. We also provide a high-level discussion about the key methodologies used in cross-domain policy transfer problems. Lastly, we summarize the open challenges that lie beyond the capabilities of current paradigms and discuss potential future directions in this field.

replace-cross From Variability to Stability: Advancing RecSys Benchmarking Practices

Authors: Valeriy Shevchenko, Nikita Belousov, Alexey Vasilev, Vladimir Zholobov, Artyom Sosedka, Natalia Semenova, Anna Volodkevich, Andrey Savchenko, Alexey Zaytsev

Abstract: In the rapidly evolving domain of Recommender Systems (RecSys), new algorithms frequently claim state-of-the-art performance based on evaluations over a limited set of arbitrarily selected datasets. However, this approach may fail to holistically reflect their effectiveness due to the significant impact of dataset characteristics on algorithm performance. Addressing this deficiency, this paper introduces a novel benchmarking methodology to facilitate a fair and robust comparison of RecSys algorithms, thereby advancing evaluation practices. By utilizing a diverse set of $30$ open datasets, including two introduced in this work, and evaluating $11$ collaborative filtering algorithms across $9$ metrics, we critically examine the influence of dataset characteristics on algorithm performance. We further investigate the feasibility of aggregating outcomes from multiple datasets into a unified ranking. Through rigorous experimental analysis, we validate the reliability of our methodology under the variability of datasets, offering a benchmarking strategy that balances quality and computational demands. This methodology enables a fair yet effective means of evaluating RecSys algorithms, providing valuable guidance for future research endeavors.

replace-cross Data-Driven Dynamic Friction Models based on Recurrent Neural Networks

Authors: Joaquin Garcia-Suarez

Abstract: In this letter, it is demonstrated that Recurrent Neural Networks (RNNs) based on Gated Recurrent Unit (GRU) architecture, possess the capability to learn the complex dynamics of rate-and-state friction (RSF) laws from synthetic data. The data employed for training the network is generated through the application of traditional RSF equations coupled with either the aging law or the slip law for state evolution. A novel aspect of this approach is the formulation of a loss function that explicitly accounts for the direct effect by means of automatic differentiation. It is found that the GRU-based RNNs effectively learns to predict changes in the friction coefficient resulting from velocity jumps (with and without noise in the target data), thereby showcasing the potential of machine learning models in capturing and simulating the physics of frictional processes. Current limitations and challenges are discussed.

replace-cross RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations

Authors: Jing Huang, Zhengxuan Wu, Christopher Potts, Mor Geva, Atticus Geiger

Abstract: Individual neurons participate in the representation of multiple high-level concepts. To what extent can different interpretability methods successfully disentangle these roles? To help address this question, we introduce RAVEL (Resolving Attribute-Value Entanglements in Language Models), a dataset that enables tightly controlled, quantitative comparisons between a variety of existing interpretability methods. We use the resulting conceptual framework to define the new method of Multi-task Distributed Alignment Search (MDAS), which allows us to find distributed representations satisfying multiple causal criteria. With Llama2-7B as the target language model, MDAS achieves state-of-the-art results on RAVEL, demonstrating the importance of going beyond neuron-level analyses to identify features distributed across activations. We release our benchmark at https://github.com/explanare/ravel.

URLs: https://github.com/explanare/ravel.

replace-cross Structured Deep Neural Networks-Based Backstepping Trajectory Tracking Control for Lagrangian Systems

Authors: Jiajun Qian, Liang Xu, Xiaoqiang Ren, Xiaofan Wang

Abstract: Deep neural networks (DNN) are increasingly being used to learn controllers due to their excellent approximation capabilities. However, their black-box nature poses significant challenges to closed-loop stability guarantees and performance analysis. In this paper, we introduce a structured DNN-based controller for the trajectory tracking control of Lagrangian systems using backing techniques. By properly designing neural network structures, the proposed controller can ensure closed-loop stability for any compatible neural network parameters. In addition, improved control performance can be achieved by further optimizing neural network parameters. Besides, we provide explicit upper bounds on tracking errors in terms of controller parameters, which allows us to achieve the desired tracking performance by properly selecting the controller parameters. Furthermore, when system models are unknown, we propose an improved Lagrangian neural network (LNN) structure to learn the system dynamics and design the controller. We show that in the presence of model approximation errors and external disturbances, the closed-loop stability and tracking control performance can still be guaranteed. The effectiveness of the proposed approach is demonstrated through simulations.

replace-cross Learning to Decode Collaboratively with Multiple Language Models

Authors: Shannon Zejiang Shen, Hunter Lang, Bailin Wang, Yoon Kim, David Sontag

Abstract: We propose a method to teach multiple large language models (LLM) to collaborate by interleaving their generations at the token level. We model the decision of which LLM generates the next token as a latent variable. By optimizing the marginal likelihood of a training set under our latent variable model, the base LLM automatically learns when to generate itself and when to call on one of the ``assistant'' language models to generate, all without direct supervision. Token-level collaboration during decoding allows for a fusion of each model's expertise in a manner tailored to the specific task at hand. Our collaborative decoding is especially useful in cross-domain settings where a generalist base LLM learns to invoke domain expert models. On instruction-following, domain-specific QA, and reasoning tasks, we show that the performance of the joint system exceeds that of the individual models. Through qualitative analysis of the learned latent decisions, we show models trained with our method exhibit several interesting collaboration patterns, e.g., template-filling. Our code is available at https://github.com/clinicalml/co-llm.

URLs: https://github.com/clinicalml/co-llm.

replace-cross Riemannian Flow Matching Policy for Robot Motion Learning

Authors: Max Braun, No\'emie Jaquier, Leonel Rozo, Tamim Asfour

Abstract: We introduce Riemannian Flow Matching Policies (RFMP), a novel model for learning and synthesizing robot visuomotor policies. RFMP leverages the efficient training and inference capabilities of flow matching methods. By design, RFMP inherits the strengths of flow matching: the ability to encode high-dimensional multimodal distributions, commonly encountered in robotic tasks, and a very simple and fast inference process. We demonstrate the applicability of RFMP to both state-based and vision-conditioned robot motion policies. Notably, as the robot state resides on a Riemannian manifold, RFMP inherently incorporates geometric awareness, which is crucial for realistic robotic tasks. To evaluate RFMP, we conduct two proof-of-concept experiments, comparing its performance against Diffusion Policies. Although both approaches successfully learn the considered tasks, our results show that RFMP provides smoother action trajectories with significantly lower inference times.

replace-cross Compressed Federated Reinforcement Learning with a Generative Model

Authors: Ali Beikmohammadi, Sarit Khirirat, Sindri Magn\'usson

Abstract: Reinforcement learning has recently gained unprecedented popularity, yet it still grapples with sample inefficiency. Addressing this challenge, federated reinforcement learning (FedRL) has emerged, wherein agents collaboratively learn a single policy by aggregating local estimations. However, this aggregation step incurs significant communication costs. In this paper, we propose CompFedRL, a communication-efficient FedRL approach incorporating both \textit{periodic aggregation} and (direct/error-feedback) compression mechanisms. Specifically, we consider compressed federated $Q$-learning with a generative model setup, where a central server learns an optimal $Q$-function by periodically aggregating compressed $Q$-estimates from local agents. For the first time, we characterize the impact of these two mechanisms (which have remained elusive) by providing a finite-time analysis of our algorithm, demonstrating strong convergence behaviors when utilizing either direct or error-feedback compression. Our bounds indicate improved solution accuracy concerning the number of agents and other federated hyperparameters while simultaneously reducing communication costs. To corroborate our theory, we also conduct in-depth numerical experiments to verify our findings, considering Top-$K$ and Sparsified-$K$ sparsification operators.

replace-cross Attack on Scene Flow using Point Clouds

Authors: Haniyeh Ehsani Oskouie, Mohammad-Shahram Moin, Shohreh Kasaei

Abstract: Deep neural networks have made significant advancements in accurately estimating scene flow using point clouds, which is vital for many applications like video analysis, action recognition, and navigation. The robustness of these techniques, however, remains a concern, particularly in the face of adversarial attacks that have been proven to deceive state-of-the-art deep neural networks in many domains. Surprisingly, the robustness of scene flow networks against such attacks has not been thoroughly investigated. To address this problem, the proposed approach aims to bridge this gap by introducing adversarial white-box attacks specifically tailored for scene flow networks. Experimental results show that the generated adversarial examples obtain up to 33.7 relative degradation in average end-point error on the KITTI and FlyingThings3D datasets. The study also reveals the significant impact that attacks targeting point clouds in only one dimension or color channel have on average end-point error. Analyzing the success and failure of these attacks on the scene flow networks and their 2D optical flow network variants shows a higher vulnerability for the optical flow networks. Code is available at https://github.com/aheldis/Attack-on-Scene-Flow-using-Point-Clouds.git.

URLs: https://github.com/aheldis/Attack-on-Scene-Flow-using-Point-Clouds.git.

replace-cross Baseline Results for Selected Nonlinear System Identification Benchmarks

Authors: Max D. Champneys, Gerben I. Beintema, Roland T\'oth, Maarten Schoukens, Timothy J. Rogers

Abstract: Nonlinear system identification remains an important open challenge across research and academia. Large numbers of novel approaches are seen published each year, each presenting improvements or extensions to existing methods. It is natural, therefore, to consider how one might choose between these competing models. Benchmark datasets provide one clear way to approach this question. However, to make meaningful inference based on benchmark performance it is important to understand how well a new method performs comparatively to results available with well-established methods. This paper presents a set of ten baseline techniques and their relative performances on five popular benchmarks. The aim of this contribution is to stimulate thought and discussion regarding objective comparison of identification methodologies.

replace-cross Prompt Exploration with Prompt Regression

Authors: Michael Feffer, Ronald Xu, Yuekai Sun, Mikhail Yurochkin

Abstract: In the advent of democratized usage of large language models (LLMs), there is a growing desire to systematize LLM prompt creation and selection processes beyond iterative trial-and-error. Prior works majorly focus on searching the space of prompts without accounting for relations between prompt variations. Here we propose a framework, Prompt Exploration with Prompt Regression (PEPR), to predict the effect of prompt combinations given results for individual prompt elements as well as a simple method to select an effective prompt for a given use-case. We evaluate our approach with open-source LLMs of different sizes on several different tasks.

replace-cross Local Causal Discovery for Structural Evidence of Direct Discrimination

Authors: Jacqueline Maasch, Kyra Gan, Violet Chen, Agni Orfanoudaki, Nil-Jana Akpinar, Fei Wang

Abstract: Identifying the causal pathways of unfairness is a critical objective in improving policy design and algorithmic decision-making. Prior work in causal fairness analysis often requires knowledge of the causal graph, hindering practical applications in complex or low-knowledge domains. Moreover, global discovery methods that learn causal structure from data can result in unstable performance with finite samples, potentially leading to contradictory fairness conclusions. To mitigate these issues, we introduce local discovery for direct discrimination (LD3): a method that uncovers structural evidence of direct discrimination by identifying the causal parents of an outcome variable. LD3 performs a linear number of conditional independence tests relative to variable set size, and allows for latent confounding under the sufficient condition that no parent of the outcome is latent. We show that LD3 returns a valid adjustment set (VAS) under a new graphical criterion for the weighted controlled direct effect, a qualitative indicator of direct discrimination. LD3 limits unnecessary adjustment, providing interpretable VAS for assessing unfairness. We use LD3 to analyze causal fairness in two complex decision systems: criminal recidivism prediction and liver transplant allocation. LD3 was more time-efficient and returned more plausible results on real-world data than baselines, which took 46x to 5870x longer to execute.

replace-cross Research on the Spatial Data Intelligent Foundation Model

Authors: Shaohua Wang (State Key Laboratory of Remote Sensing Science, Aerospace Information Research Institute, Chinese Academy of Sciences), Xing Xie (Microsoft Research Asia), Yong Li (Tsinghua University), Danhuai Guo (Beijing University of Chemical Technology), Zhi Cai (Beijing University of Technology), Yu Liu (Peking University), Yang Yue (Shenzhen University), Xiao Pan (Shijiazhuang Railway University), Feng Lu (Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences), Huayi Wu (Wuhan University), Zhipeng Gui (Wuhan University), Zhiming Ding (Research Institute of Software, Chinese Academy of Sciences), Bolong Zheng (Huazhong University of Science and Technology), Fuzheng Zhang (Fast Natural Language Processing Center and Audio Center), Jingyuan Wang (Beihang University), Zhengchao Chen (State Key Laboratory of Remote Sensing Science, Aerospace Information Research Institute, Chinese Academy of Sciences), Hao Lu (SuperMap Software Co. Ltd), Jiayi Li (Wuhan University), Peng Yue (Wuhan University), Wenhao Yu (China University of Geosciences), Yao Yao (China University of Geosciences), Leilei Sun (Beihang University), Yong Zhang (Beijing University of Technology), Longbiao Chen (Xiamen University), Xiaoping Du (Key Laboratory of Digital Geography, Chinese Academy of Sciences), Xiang Li (East China Normal University), Xueying Zhang (Nanjing Normal University), Kun Qin (Wuhan University), Zhaoya Gong (Peking University), Weihua Dong (Beijing Normal University), Xiaofeng Meng (Renmin University of China)

Abstract: This report focuses on spatial data intelligent large models, delving into the principles, methods, and cutting-edge applications of these models. It provides an in-depth discussion on the definition, development history, current status, and trends of spatial data intelligent large models, as well as the challenges they face. The report systematically elucidates the key technologies of spatial data intelligent large models and their applications in urban environments, aerospace remote sensing, geography, transportation, and other scenarios. Additionally, it summarizes the latest application cases of spatial data intelligent large models in themes such as urban development, multimodal systems, remote sensing, smart transportation, and resource environments. Finally, the report concludes with an overview and outlook on the development prospects of spatial data intelligent large models.

replace-cross Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Authors: Jiaming Tang, Yilong Zhao, Kan Zhu, Guangxuan Xiao, Baris Kasikci, Song Han

Abstract: As the demand for long-context large language models (LLMs) increases, models with context windows of up to 128K or 1M tokens are becoming increasingly prevalent. However, long-context LLM inference is challenging since the inference speed decreases significantly as the sequence length grows. This slowdown is primarily caused by loading a large KV cache during self-attention. Previous works have shown that a small portion of critical tokens will dominate the attention outcomes. However, we observe the criticality of a token highly depends on the query. To this end, we propose Quest, a query-aware KV cache selection algorithm. Quest keeps track of the minimal and maximal Key values in KV cache pages and estimates the criticality of a given page using Query vectors. By only loading the Top-K critical KV cache pages for attention, Quest significantly speeds up self-attention without sacrificing accuracy. We show that Quest can achieve up to 2.23x self-attention speedup, which reduces inference latency by 7.03x while performing well on tasks with long dependencies with negligible accuracy loss. Code is available at http://github.com/mit-han-lab/Quest .

URLs: http://github.com/mit-han-lab/Quest

replace-cross Tractable Equilibrium Computation in Markov Games through Risk Aversion

Authors: Eric Mazumdar, Kishan Panaganti, Laixi Shi

Abstract: A significant roadblock to the development of principled multi-agent reinforcement learning is the fact that desired solution concepts like Nash equilibria may be intractable to compute. To overcome this obstacle, we take inspiration from behavioral economics and show that -- by imbuing agents with important features of human decision-making like risk aversion and bounded rationality -- a class of risk-averse quantal response equilibria (RQE) become tractable to compute in all $n$-player matrix and finite-horizon Markov games. In particular, we show that they emerge as the endpoint of no-regret learning in suitably adjusted versions of the games. Crucially, the class of computationally tractable RQE is independent of the underlying game structure and only depends on agents' degree of risk-aversion and bounded rationality. To validate the richness of this class of solution concepts we show that it captures peoples' patterns of play in a number of 2-player matrix games previously studied in experimental economics. Furthermore, we give a first analysis of the sample complexity of computing these equilibria in finite-horizon Markov games when one has access to a generative model and validate our findings on a simple multi-agent reinforcement learning benchmark.

replace-cross Dr.E Bridges Graphs with Large Language Models through Words

Authors: Zipeng Liu, Likang Wu, Ming He, Zhong Guan, Hongke Zhao, Nan Feng

Abstract: Significant efforts have been dedicated to integrating the powerful Large Language Models (LLMs) with diverse modalities, particularly focusing on the fusion of language, vision and audio data. However, the graph-structured data, which is inherently rich in structural and domain-specific knowledge, has not yet been gracefully adapted to LLMs. Existing methods either describe the graph with raw text, suffering the loss of graph structural information, or feed Graph Neural Network (GNN) embeddings into LLMs at the cost of losing explainable prompt semantics. To bridge this gap, we introduce an end-to-end modality-aligning framework for LLM-graph alignment: Dual-Residual Vector Quantized-Variational AutoEncoder, namely Dr.E. Our approach is purposefully designed to facilitate token-level alignment with LLMs, enabling an effective translation of the intrinsic `language' of graphs into comprehensible natural language. We also manage to enhance LLMs' more robust structural understanding of graphs by incorporating multiple views of the central nodes based on their surrounding nodes at various distances. Our experimental evaluations on standard graph tasks demonstrate competitive performance against other state-of-the-art (SOTA) approaches. Additionally, our framework ensures certain visual interpretability, efficiency, and robustness, marking the promising successful endeavor to achieve token-level alignment between LLMs and GNNs. Our code is available at: https://anonymous.4open.science/r/dre-817.

URLs: https://anonymous.4open.science/r/dre-817.

replace-cross BayTTA: Uncertainty-aware medical image classification with optimized test-time augmentation using Bayesian model averaging

Authors: Zeinab Sherkatghanad, Moloud Abdar, Mohammadreza Bakhtyari, Pawel Plawiak, Vladimir Makarenkov

Abstract: Test-time augmentation (TTA) is a well-known technique employed during the testing phase of computer vision tasks. It involves aggregating multiple augmented versions of input data. Combining predictions using a simple average formulation is a common and straightforward approach after performing TTA. This paper introduces a novel framework for optimizing TTA, called BayTTA (Bayesian-based TTA), which is based on Bayesian Model Averaging (BMA). First, we generate a prediction list associated with different variations of the input data created through TTA. Then, we use BMA to combine predictions weighted by the respective posterior probabilities. Such an approach allows one to take into account model uncertainty, and thus to enhance the predictive performance of the related machine learning or deep learning model. We evaluate the performance of BayTTA on various public data, including three medical image datasets comprising skin cancer, breast cancer, and chest X-ray images and two well-known gene editing datasets, CRISPOR and GUIDE-seq. Our experimental results indicate that BayTTA can be effectively integrated into state-of-the-art deep learning models used in medical image analysis as well as into some popular pre-trained CNN models such as VGG-16, MobileNetV2, DenseNet201, ResNet152V2, and InceptionRes-NetV2, leading to the enhancement in their accuracy and robustness performance. The source code of the proposed BayTTA method is freely available at: \underline {https://github.com/Z-Sherkat/BayTTA}.

URLs: https://github.com/Z-Sherkat/BayTTA

replace-cross TAPVid-3D: A Benchmark for Tracking Any Point in 3D

Authors: Skanda Koppula, Ignacio Rocco, Yi Yang, Joe Heyward, Jo\~ao Carreira, Andrew Zisserman, Gabriel Brostow, Carl Doersch

Abstract: We introduce a new benchmark, TAPVid-3D, for evaluating the task of long-range Tracking Any Point in 3D (TAP-3D). While point tracking in two dimensions (TAP) has many benchmarks measuring performance on real-world videos, such as TAPVid-DAVIS, three-dimensional point tracking has none. To this end, leveraging existing footage, we build a new benchmark for 3D point tracking featuring 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor and outdoor environments. To measure performance on the TAP-3D task, we formulate a collection of metrics that extend the Jaccard-based metric used in TAP to handle the complexities of ambiguous depth scales across models, occlusions, and multi-track spatio-temporal smoothness. We manually verify a large sample of trajectories to ensure correct video annotations, and assess the current state of the TAP-3D task by constructing competitive baselines using existing tracking models. We anticipate this benchmark will serve as a guidepost to improve our ability to understand precise 3D motion and surface deformation from monocular video. Code for dataset download, generation, and model evaluation is available at https://tapvid3d.github.io

URLs: https://tapvid3d.github.io

replace-cross Generating $SROI^-$ Ontologies via Knowledge Graph Query Embedding Learning

Authors: Yunjie He, Daniel Hernandez, Mojtaba Nayyeri, Bo Xiong, Yuqicheng Zhu, Evgeny Kharlamov, Steffen Staab

Abstract: Query embedding approaches answer complex logical queries over incomplete knowledge graphs (KGs) by computing and operating on low-dimensional vector representations of entities, relations, and queries. However, current query embedding models heavily rely on excessively parameterized neural networks and cannot explain the knowledge learned from the graph. We propose a novel query embedding method, AConE, which explains the knowledge learned from the graph in the form of $SROI^-$ description logic axioms while being more parameter-efficient than most existing approaches. AConE associates queries to a $SROI^-$ description logic concept. Every $SROI^-$ concept is embedded as a cone in complex vector space, and each $SROI^-$ relation is embedded as a transformation that rotates and scales cones. We show theoretically that AConE can learn $SROI^-$ axioms, and defines an algebra whose operations correspond one to one to $SROI^-$ description logic concept constructs. Our empirical study on multiple query datasets shows that AConE achieves superior results over previous baselines with fewer parameters. Notably on the WN18RR dataset, AConE achieves significant improvement over baseline models. We provide comprehensive analyses showing that the capability to represent axioms positively impacts the results of query answering.

replace-cross Pareto Front Approximation for Multi-Objective Session-Based Recommender Systems

Authors: Timo Wilm, Philipp Normann, Felix Stepprath

Abstract: This work introduces MultiTRON, an approach that adapts Pareto front approximation techniques to multi-objective session-based recommender systems using a transformer neural network. Our approach optimizes trade-offs between key metrics such as click-through and conversion rates by training on sampled preference vectors. A significant advantage is that after training, a single model can access the entire Pareto front, allowing it to be tailored to meet the specific requirements of different stakeholders by adjusting an additional input vector that weights the objectives. We validate the model's performance through extensive offline and online evaluation. For broader application and research, the source code is made available at https://github.com/otto-de/MultiTRON. The results confirm the model's ability to manage multiple recommendation objectives effectively, offering a flexible tool for diverse business needs.

URLs: https://github.com/otto-de/MultiTRON.

replace-cross Polyp SAM 2: Advancing Zero shot Polyp Segmentation in Colorectal Cancer Detection

Authors: Mobina Mansoori, Sajjad Shahabodini, Jamshid Abouei, Konstantinos N. Plataniotis, Arash Mohammadi

Abstract: Polyp segmentation plays a crucial role in the early detection and diagnosis of colorectal cancer. However, obtaining accurate segmentations often requires labor-intensive annotations and specialized models. Recently, Meta AI Research released a general Segment Anything Model 2 (SAM 2), which has demonstrated promising performance in several segmentation tasks. In this manuscript, we evaluate the performance of SAM 2 in segmenting polyps under various prompted settings. We hope this report will provide insights to advance the field of polyp segmentation and promote more interesting work in the future. This project is publicly available at https://github.com/ sajjad-sh33/Polyp-SAM-2.

URLs: https://github.com/

replace-cross Bayesian Learning in a Nonlinear Multiscale State-Space Model

Authors: Nayely V\'elez-Cruz, Manfred D. Laubichler

Abstract: The ubiquity of multiscale interactions in complex systems is well-recognized, with development and heredity serving as a prime example of how processes at different temporal scales influence one another. This work introduces a novel multiscale state-space model to explore the dynamic interplay between systems interacting across different time scales, with feedback between each scale. We propose a Bayesian learning framework to estimate unknown states by learning the unknown process noise covariances within this multiscale model. We develop a Particle Gibbs with Ancestor Sampling (PGAS) algorithm for inference and demonstrate through simulations the efficacy of our approach.

replace-cross Development of a Large Language Model-based Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments

Authors: Seungjun Han, Wongyung Choi

Abstract: Emergency department (ED) overcrowding and the complexity of rapid decision-making in critical care settings pose significant challenges to healthcare systems worldwide. While clinical decision support systems (CDSS) have shown promise, the integration of large language models (LLMs) offers new possibilities for enhancing triage accuracy and clinical decision-making. This study presents an LLM-driven CDSS designed to assist ED physicians and nurses in patient triage, treatment planning, and overall emergency care management. We developed a multi-agent CDSS utilizing Llama-3-70b as the base LLM, orchestrated by CrewAI and Langchain. The system comprises four AI agents emulating key ED roles: Triage Nurse, Emergency Physician, Pharmacist, and ED Coordinator. It incorporates the Korean Triage and Acuity Scale (KTAS) for triage assessment and integrates with the RxNorm API for medication management. The model was evaluated using the Asclepius dataset, with performance assessed by a clinical emergency medicine specialist. The CDSS demonstrated high accuracy in triage decision-making compared to the baseline of a single-agent system. Furthermore, the system exhibited strong performance in critical areas, including primary diagnosis, critical findings identification, disposition decision-making, treatment planning, and resource allocation. Our multi-agent CDSS demonstrates significant potential for supporting comprehensive emergency care management. By leveraging state-of-the-art AI technologies, this system offers a scalable and adaptable tool that could enhance emergency medical care delivery, potentially alleviating ED overcrowding and improving patient outcomes. This work contributes to the growing field of AI applications in emergency medicine and offers a promising direction for future research and clinical implementation.

replace-cross PolyRouter: A Multi-LLM Querying System

Authors: Dimitris Stripelis, Zijian Hu, Jipeng Zhang, Zhaozhuo Xu, Alay Dilipbhai Shah, Han Jin, Yuhang Yao, Salman Avestimehr, Chaoyang He

Abstract: With the rapid growth of Large Language Models (LLMs) across various domains, numerous new LLMs have emerged, each possessing domain-specific expertise. This proliferation has highlighted the need for quick, high-quality, and cost-effective LLM query response methods. Yet, no single LLM exists to efficiently balance this trilemma. Some models are powerful but extremely costly, while others are fast and inexpensive but qualitatively inferior. To address this challenge, we present PolyRouter, a non-monolithic LLM querying system that seamlessly integrates various LLM experts into a single query interface and dynamically routes incoming queries to the most high-performant expert based on query's requirements. Through extensive experiments, we demonstrate that when compared to standalone expert models, PolyRouter improves query efficiency by up to 40%, and leads to significant cost reductions of up to 30%, while maintaining or enhancing model performance by up to 10%.

replace-cross EX-DRL: Hedging Against Heavy Losses with EXtreme Distributional Reinforcement Learning

Authors: Parvin Malekzadeh, Zissis Poulos, Jacky Chen, Zeyu Wang, Konstantinos N. Plataniotis

Abstract: Recent advancements in Distributional Reinforcement Learning (DRL) for modeling loss distributions have shown promise in developing hedging strategies in derivatives markets. A common approach in DRL involves learning the quantiles of loss distributions at specified levels using Quantile Regression (QR). This method is particularly effective in option hedging due to its direct quantile-based risk assessment, such as Value at Risk (VaR) and Conditional Value at Risk (CVaR). However, these risk measures depend on the accurate estimation of extreme quantiles in the loss distribution's tail, which can be imprecise in QR-based DRL due to the rarity and extremity of tail data, as highlighted in the literature. To address this issue, we propose EXtreme DRL (EX-DRL), which enhances extreme quantile prediction by modeling the tail of the loss distribution with a Generalized Pareto Distribution (GPD). This method introduces supplementary data to mitigate the scarcity of extreme quantile observations, thereby improving estimation accuracy through QR. Comprehensive experiments on gamma hedging options demonstrate that EX-DRL improves existing QR-based models by providing more precise estimates of extreme quantiles, thereby improving the computation and reliability of risk metrics for complex financial risk management.

replace-cross GNN: Graph Neural Network and Large Language Model for Data Discovery

Authors: Thomas Hoang

Abstract: Our algorithm GNN: Graph Neural Network and Large Language Model for Data Discovery inherit the benefits of \cite{hoang2024plod} (PLOD: Predictive Learning Optimal Data Discovery), \cite{Hoang2024BODBO} (BOD: Blindly Optimal Data Discovery) in terms of overcoming the challenges of having to predefine utility function and the human input for attribute ranking, which helps prevent the time-consuming loop process. In addition to these previous works, our algorithm GNN leverages the advantages of graph neural networks and large language models to understand text type values that cannot be understood by PLOD and MOD, thus making the task of predicting outcomes more reliable. GNN could be seen as an extension of PLOD in terms of understanding the text type value and the user's preferences, not only numerical values but also text values, making the promise of data science and analytics purposes.

replace-cross Enhancing Uplift Modeling in Multi-Treatment Marketing Campaigns: Leveraging Score Ranking and Calibration Techniques

Authors: Yoon Tae Park, Ting Xu, Mohamed Anany

Abstract: Uplift modeling is essential for optimizing marketing strategies by selecting individuals likely to respond positively to specific marketing campaigns. This importance escalates in multi-treatment marketing campaigns, where diverse treatment is available and we may want to assign the customers to treatment that can make the most impact. While there are existing approaches with convenient frameworks like Causalml, there are potential spaces to enhance the effect of uplift modeling in multi treatment cases. This paper introduces a novel approach to uplift modeling in multi-treatment campaigns, leveraging score ranking and calibration techniques to improve overall performance of the marketing campaign. We review existing uplift models, including Meta Learner frameworks (S, T, X), and their application in real-world scenarios. Additionally, we delve into insights from multi-treatment studies to highlight the complexities and potential advancements in the field. Our methodology incorporates Meta-Learner calibration and a scoring rank-based offer selection strategy. Extensive experiment results with real-world datasets demonstrate the practical benefits and superior performance of our approach. The findings underscore the critical role of integrating score ranking and calibration techniques in refining the performance and reliability of uplift predictions, thereby advancing predictive modeling in marketing analytics and providing actionable insights for practitioners seeking to optimize their campaign strategies.

replace-cross Verifiable cloud-based variational quantum algorithms

Authors: Junhong Yang, Banghai Wang, Junyu Quan, Qin Li

Abstract: Variational quantum algorithms (VQAs) have shown potential for quantum advantage with noisy intermediate-scale quantum (NISQ) devices for quantum machine learning (QML). However, given the high cost and limited availability of quantum resources, delegating VQAs via cloud networks is a more practical solution for clients with limited quantum capabilities. Recently, Shingu et al.[Physical Review A, 105, 022603 (2022)] proposed a variational secure cloud quantum computing protocol, utilizing ancilla-driven quantum computation (ADQC) for cloud-based VQAs with minimal quantum resource consumption. However, their protocol lacks verifiability, which exposes it to potential malicious behaviors by the server. Additionally, channel loss requires frequent re-delegation as the size of the delegated variational circuit grows, complicating verification due to increased circuit complexity. This paper introduces a new protocol to address these challenges and enhance both verifiability and tolerance to channel loss in cloud-based VQAs.

replace-cross Improved identification of breakpoints in piecewise regression and its applications

Authors: Taehyeong Kim, Hyungu Lee, Hayoung Choi

Abstract: Identifying breakpoints in piecewise regression is critical in enhancing the reliability and interpretability of data fitting. In this paper, we propose novel algorithms based on the greedy algorithm to accurately and efficiently identify breakpoints in piecewise polynomial regression. The algorithm updates the breakpoints to minimize the error by exploring the neighborhood of each breakpoint. It has a fast convergence rate and stability to find optimal breakpoints. Moreover, it can determine the optimal number of breakpoints. The computational results for real and synthetic data show that its accuracy is better than any existing methods. The real-world datasets demonstrate that breakpoints through the proposed algorithm provide valuable data information.

replace-cross Enhancing Robustness of Human Detection Algorithms in Maritime SAR through Augmented Aerial Images to Simulate Weather Conditions

Authors: Miguel Tjia, Artem Kim, Elaine Wynette Wijaya, Hanna Tefara, Kevin Zhu

Abstract: 7,651 cases of Search and Rescue Missions (SAR) were reported by the United States Coast Guard in 2024, with over 1322 SAR helicopters deployed in the 6 first months alone. Through the utilizations of YOLO, we were able to run different weather conditions and lighting from our augmented dataset for training. YOLO then utilizes CNNs to apply a series of convolutions and pooling layers to the input image, where the convolution layers are able to extract the main features of the image. Through this, our YOLO model is able to learn to differentiate different objects which may considerably improve its accuracy, possibly enhancing the efficiency of SAR operations through enhanced detection accuracy. This paper aims to improve the model's accuracy of human detection in maritime SAR by evaluating a robust datasets containing various elevations and geological locations, as well as through data augmentation which simulates different weather and lighting. We observed that models trained on augmented datasets outperformed their non-augmented counterparts in which the human recall scores ranged from 0.891 to 0.911 with an improvement rate of 3.4\% on the YOLOv5l model. Results showed that these models demonstrate greater robustness to real-world conditions in varying of weather, brightness, tint, and contrast.

replace-cross Consistent machine learning for topology optimization with microstructure-dependent neural network material models

Authors: Harikrishnan Vijayakumaran, Jonathan B. Russ, Glaucio H. Paulino, Miguel A. Bessa

Abstract: Additive manufacturing methods together with topology optimization have enabled the creation of multiscale structures with controlled spatially-varying material microstructure. However, topology optimization or inverse design of such structures in the presence of nonlinearities remains a challenge due to the expense of computational homogenization methods and the complexity of differentiably parameterizing the microstructural response. A solution to this challenge lies in machine learning techniques that offer efficient, differentiable mappings between the material response and its microstructural descriptors. This work presents a framework for designing multiscale heterogeneous structures with spatially varying microstructures by merging a homogenization-based topology optimization strategy with a consistent machine learning approach grounded in hyperelasticity theory. We leverage neural architectures that adhere to critical physical principles such as polyconvexity, objectivity, material symmetry, and thermodynamic consistency to supply the framework with a reliable constitutive model that is dependent on material microstructural descriptors. Our findings highlight the potential of integrating consistent machine learning models with density-based topology optimization for enhancing design optimization of heterogeneous hyperelastic structures under finite deformations.

replace-cross SONICS: Synthetic Or Not -- Identifying Counterfeit Songs

Authors: Md Awsafur Rahman, Zaber Ibn Abdul Hakim, Najibul Haque Sarker, Bishmoy Paul, Shaikh Anowarul Fattah

Abstract: The recent surge in AI-generated songs presents exciting possibilities and challenges. While these tools democratize music creation, they also necessitate the ability to distinguish between human-composed and AI-generated songs for safeguarding artistic integrity and content curation. Existing research and datasets in fake song detection only focus on singing voice deepfake detection (SVDD), where the vocals are AI-generated but the instrumental music is sourced from real songs. However, this approach is inadequate for contemporary end-to-end AI-generated songs where all components (vocals, lyrics, music, and style) could be AI-generated. Additionally, existing datasets lack lyrics-music diversity, long-duration songs, and open fake songs. To address these gaps, we introduce SONICS, a novel dataset for end-to-end Synthetic Song Detection (SSD), comprising over 97k songs with over 49k synthetic songs from popular platforms like Suno and Udio. Furthermore, we highlight the importance of modeling long-range temporal dependencies in songs for effective authenticity detection, an aspect overlooked in existing methods. To capture these patterns, we propose a novel model, SpecTTTra, that is up to 3 times faster and 6 times more memory efficient compared to popular CNN and Transformer-based models while maintaining competitive performance. Finally, we offer both AI-based and Human evaluation benchmarks, addressing another deficiency in current research.

replace-cross Foundation Models for Music: A Survey

Authors: Yinghao Ma, Anders {\O}land, Anton Ragni, Bleiz MacSen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elio Quinton, Elona Shatri, Fabio Morreale, Ge Zhang, Gy\"orgy Fazekas, Gus Xia, Huan Zhang, Ilaria Manco, Jiawen Huang, Julien Guinot, Liwei Lin, Luca Marinelli, Max W. Y. Lam, Megha Sharma, Qiuqiang Kong, Roger B. Dannenberg, Ruibin Yuan, Shangda Wu, Shih-Lun Wu, Shuqi Dai, Shun Lei, Shiyin Kang, Simon Dixon, Wenhu Chen, Wenhao Huang, Xingjian Du, Xingwei Qu, Xu Tan, Yizhi Li, Zeyue Tian, Zhiyong Wu, Zhizheng Wu, Ziyang Ma, Ziyu Wang

Abstract: In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music, spanning from representation learning, generative learning and multimodal learning. We first contextualise the significance of music in various industries and trace the evolution of AI in music. By delineating the modalities targeted by foundation models, we discover many of the music representations are underexplored in FM development. Then, emphasis is placed on the lack of versatility of previous methods on diverse music applications, along with the potential of FMs in music understanding, generation and medical application. By comprehensively exploring the details of the model pre-training paradigm, architectural choices, tokenisation, finetuning methodologies and controllability, we emphasise the important topics that should have been well explored, like instruction tuning and in-context learning, scaling law and emergent ability, as well as long-sequence modelling etc. A dedicated section presents insights into music agents, accompanied by a thorough analysis of datasets and evaluations essential for pre-training and downstream tasks. Finally, by underscoring the vital importance of ethical considerations, we advocate that following research on FM for music should focus more on such issues as interpretability, transparency, human responsibility, and copyright issues. The paper offers insights into future challenges and trends on FMs for music, aiming to shape the trajectory of human-AI collaboration in the music realm.