Authors: Qiyang Sun, Alican Akman, Bj\"orn W. Schuller
Abstract: The continuous development of artificial intelligence (AI) theory has propelled this field to unprecedented heights, owing to the relentless efforts of scholars and researchers. In the medical realm, AI takes a pivotal role, leveraging robust machine learning (ML) algorithms. AI technology in medical imaging aids physicians in X-ray, computed tomography (CT) scans, and magnetic resonance imaging (MRI) diagnoses, conducts pattern recognition and disease prediction based on acoustic data, delivers prognoses on disease types and developmental trends for patients, and employs intelligent health management wearable devices with human-computer interaction technology to name but a few. While these well-established applications have significantly assisted in medical field diagnoses, clinical decision-making, and management, collaboration between the medical and AI sectors faces an urgent challenge: How to substantiate the reliability of decision-making? The underlying issue stems from the conflict between the demand for accountability and result transparency in medical scenarios and the black-box model traits of AI. This article reviews recent research grounded in explainable artificial intelligence (XAI), with an emphasis on medical practices within the visual, audio, and multimodal perspectives. We endeavour to categorise and synthesise these practices, aiming to provide support and guidance for future researchers and healthcare professionals.
Authors: Henan Sun, Xunkai Li, Daohan Su, Junyi Han, Rong-Hua Li, Guoren Wang
Abstract: In recent years, Graph Neural Networks (GNNs) have made significant advances in processing structured data. However, most of them primarily adopted a model-centric approach, which simplifies graphs by converting it into undirected formats and emphasizes model designs. This approach is inherently constrained in real-world applications due to inevitable information loss in simple undirected graphs and data-driven model optimization dilemmas associated with exceeding the upper bounds of representational capacity. As a result, there has been a shift toward data-centric methods that prioritize improving graph quality and representation. Specifically, various types of graphs can be derived from naturally structured data, including heterogeneous graphs, hypergraphs, and directed graphs. Among these, directed graphs offer distinct advantages in topological systems by modeling causal relationships, and directed GNNs have been extensively studied in recent years. However, a comprehensive survey of this emerging topic is still lacking. Therefore, we aim to provide a comprehensive review of directed graph learning, with a particular focus on a data-centric perspective. Specifically, we first introduce a novel taxonomy for existing studies. Subsequently, we re-examine these methods from the data-centric perspective, with an emphasis on understanding and improving data representation. It demonstrates that a deep understanding of directed graphs and its quality plays a crucial role in model performance. Additionally, we explore the diverse applications of directed GNNs across 10+ domains, highlighting their broad applicability. Finally, we identify key opportunities and challenges within the field, offering insights that can guide future research and development in directed graph learning.
Authors: Roy Fairstein, Dan Vilenchik, Kobi Gal
Abstract: Participatory Budgeting (PB) offers a democratic process for communities to allocate public funds across various projects through voting. In practice, PB organizers face challenges in selecting aggregation rules either because they are not familiar with the literature and the exact details of every existing rule or because no existing rule echoes their expectations. This paper presents a novel data-driven approach utilizing machine learning to address this challenge. By training neural networks on PB instances, our approach learns aggregation rules that balance social welfare, representation, and other societal beneficial goals. It is able to generalize from small-scale synthetic PB examples to large, real-world PB instances. It is able to learn existing aggregation rules but also generate new rules that adapt to diverse objectives, providing a more nuanced, compromise-driven solution for PB processes. The effectiveness of our approach is demonstrated through extensive experiments with synthetic and real-world PB data, and can expand the use and deployment of PB solutions.
Authors: Swayambhoo Jain, Ravi Raju, Bo Li, Zoltan Csaki, Jonathan Li, Kaizhao Liang, Guoyao Feng, Urmish Thakkar, Anand Sampat, Raghu Prabhakar, Sumati Jairath
Abstract: Large Language Models (LLMs) have achieved remarkable advancements, but their monolithic nature presents challenges in terms of scalability, cost, and customization. This paper introduces the Composition of Experts (CoE), a modular compound AI system leveraging multiple expert LLMs. CoE leverages a router to dynamically select the most appropriate expert for a given input, enabling efficient utilization of resources and improved performance. We formulate the general problem of training a CoE and discuss inherent complexities associated with it. We propose a two-step routing approach to address these complexities that first uses a router to classify the input into distinct categories followed by a category-to-expert mapping to obtain desired experts. CoE offers a flexible and cost-effective solution to build compound AI systems. Our empirical evaluation demonstrates the effectiveness of CoE in achieving superior performance with reduced computational overhead. Given that CoE comprises of many expert LLMs it has unique system requirements for cost-effective serving. We present an efficient implementation of CoE leveraging SambaNova SN40L RDUs unique three-tiered memory architecture. CoEs obtained using open weight LLMs Qwen/Qwen2-7B-Instruct, google/gemma-2-9b-it, google/gemma-2-27b-it, meta-llama/Llama-3.1-70B-Instruct and Qwen/Qwen2-72B-Instruct achieve a score of $59.4$ with merely $31$ billion average active parameters on Arena-Hard and a score of $9.06$ with $54$ billion average active parameters on MT-Bench.
Authors: David Zollikofer, B\'eni Egressy, Frederik Benzing, Matthias Otth, Roger Wattenhofer
Abstract: Several self-supervised learning (SSL) approaches have shown that redundancy reduction in the feature embedding space is an effective tool for representation learning. However, these methods consider a narrow notion of redundancy, focusing on pairwise correlations between features. To address this limitation, we formalize the notion of embedding space redundancy and introduce redundancy measures that capture more complex, higher-order dependencies. We mathematically analyze the relationships between these metrics, and empirically measure these redundancies in the embedding spaces of common SSL methods. Based on our findings, we propose Self Supervised Learning with Predictability Minimization (SSLPM) as a method for reducing redundancy in the embedding space. SSLPM combines an encoder network with a predictor engaging in a competitive game of reducing and exploiting dependencies respectively. We demonstrate that SSLPM is competitive with state-of-the-art methods and find that the best performing SSL methods exhibit low embedding space redundancy, suggesting that even methods without explicit redundancy reduction mechanisms perform redundancy reduction implicitly.
Authors: Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das, Markian Rybchuk, Philip H. S. Torr, Ivan Laptev, Fabio Pizzati, Ronald Clark, Christian Schroeder de Witt
Abstract: Enabling effective collaboration among LLMs is a crucial step toward developing autonomous systems capable of solving complex problems. While LLMs are typically used as single-model generators, where humans critique and refine their outputs, the potential for jointly-trained collaborative models remains largely unexplored. Despite promising results in multi-agent communication and debate settings, little progress has been made in training models to work together on tasks. In this paper, we present a first step toward "Multi-agent LLM training" (MALT) on reasoning problems. Our approach employs a sequential multi-agent setup with heterogeneous LLMs assigned specialized roles: a generator, verifier, and refinement model iteratively solving problems. We propose a trajectory-expansion-based synthetic data generation process and a credit assignment strategy driven by joint outcome based rewards. This enables our post-training setup to utilize both positive and negative trajectories to autonomously improve each model's specialized capabilities as part of a joint sequential system. We evaluate our approach across MATH, GSM8k, and CQA, where MALT on Llama 3.1 8B models achieves relative improvements of 14.14%, 7.12%, and 9.40% respectively over the same baseline model. This demonstrates an early advance in multi-agent cooperative capabilities for performance on mathematical and common sense reasoning questions. More generally, our work provides a concrete direction for research around multi-agent LLM training approaches.
Authors: Manpreet Kaur, Ankur Tomar, Srijan Mishra, Shashwat Verma
Abstract: Deep Learning methods are highly local and sensitive to the domain of data they are trained with. Even a slight deviation from the domain distribution affects prediction accuracy of deep networks significantly. In this work, we have investigated a set of techniques aimed at increasing accuracy of generator networks which perform translation from one domain to the other in an adversarial setting. In particular, we experimented with activations, the encoder-decoder network architectures, and introduced a Loss called cyclic loss to constrain the Generator network so that it learns effective source-target translation. This machine learning problem is motivated by myriad applications that can be derived from domain adaptation networks like generating labeled data from synthetic inputs in an unsupervised fashion, and using these translation network in conjunction with the original domain network to generalize deep learning networks across domains.
Authors: Hossein Moosaei, Milan Hlad\'ik, Ahmad Mousavi, Zheming Gao, Haojie Fu
Abstract: Binary classification tasks with imbalanced classes pose significant challenges in machine learning. Traditional classifiers often struggle to accurately capture the characteristics of the minority class, resulting in biased models with subpar predictive performance. In this paper, we introduce a novel approach to tackle this issue by leveraging Universum points to support the minority class within quadratic twin support vector machine models. Unlike traditional classifiers, our models utilize quadratic surfaces instead of hyperplanes for binary classification, providing greater flexibility in modeling complex decision boundaries. By incorporating Universum points, our approach enhances classification accuracy and generalization performance on imbalanced datasets. We generated four artificial datasets to demonstrate the flexibility of the proposed methods. Additionally, we validated the effectiveness of our approach through empirical evaluations on benchmark datasets, showing superior performance compared to conventional classifiers and existing methods for imbalanced classification.
Authors: Blaise Munyampirwa, Vihan Lakshman, Benjamin Coleman
Abstract: Driven by recent breakthrough advances in neural representation learning, approximate near-neighbor (ANN) search over vector embeddings has emerged as a critical computational workload. With the introduction of the seminal Hierarchical Navigable Small World (HNSW) algorithm, graph-based indexes have established themseves as the overwhelmingly dominant paradigm for efficient and scalable ANN search. As the name suggests, HNSW searches a layered hierarchical graph to quickly identify neighborhoods of similar points to a given query vector. But is this hierarchy even necessary? A rigorous experimental analysis to answer this question would provide valuable insights into the nature of algorithm design for ANN search and motivate directions for future work in this increasingly crucial domain. To that end, we conduct an extensive benchmarking study covering more large-scale datasets than prior investigations of this question. We ultimately find that a flat graph retains all of the benefits of HNSW on high-dimensional datasets, with latency and recall performance essentially \emph{identical} to the original algorithm but with less memory overhead. Furthermore, we go a step further and study \emph{why} the hierarchy of HNSW provides no benefit in high dimensions, hypothesizing that navigable small world graphs contain a well-connected, frequently traversed ``highway" of hub nodes that maintain the same purported function as the hierarchical layers. We present compelling empirical evidence that the \emph{Hub Highway Hypothesis} holds for real datasets and investigate the mechanisms by which the highway forms. The implications of this hypothesis may also provide future research directions in developing enhancements to graph-based ANN search.
Authors: Junbo Shen, Bing Xue, Thomas Kannampallil, Chenyang Lu, Joanna Abraham
Abstract: Early detection of surgical complications allows for timely therapy and proactive risk mitigation. Machine learning (ML) can be leveraged to identify and predict patient risks for postoperative complications. We developed and validated the effectiveness of predicting postoperative complications using a novel surgical Variational Autoencoder (surgVAE) that uncovers intrinsic patterns via cross-task and cross-cohort presentation learning. This retrospective cohort study used data from the electronic health records of adult surgical patients over four years (2018 - 2021). Six key postoperative complications for cardiac surgery were assessed: acute kidney injury, atrial fibrillation, cardiac arrest, deep vein thrombosis or pulmonary embolism, blood transfusion, and other intraoperative cardiac events. We compared prediction performances of surgVAE against widely-used ML models and advanced representation learning and generative models under 5-fold cross-validation. 89,246 surgeries (49% male, median (IQR) age: 57 (45-69)) were included, with 6,502 in the targeted cardiac surgery cohort (61% male, median (IQR) age: 60 (53-70)). surgVAE demonstrated superior performance over existing ML solutions across all postoperative complications of cardiac surgery patients, achieving macro-averaged AUPRC of 0.409 and macro-averaged AUROC of 0.831, which were 3.4% and 3.7% higher, respectively, than the best alternative method (by AUPRC scores). Model interpretation using Integrated Gradients highlighted key risk factors based on preoperative variable importance. surgVAE showed excellent discriminatory performance for predicting postoperative complications and addressing the challenges of data complexity, small cohort sizes, and low-frequency positive events. surgVAE enables data-driven predictions of patient risks and prognosis while enhancing the interpretability of patient risk profiles.
Authors: Philippe Brouillard, Chandler Squires, Jonas Wahl, Konrad P. Kording, Karen Sachs, Alexandre Drouin, Dhanya Sridhar
Abstract: Causal discovery aims to automatically uncover causal relationships from data, a capability with significant potential across many scientific disciplines. However, its real-world applications remain limited. Current methods often rely on unrealistic assumptions and are evaluated only on simple synthetic toy datasets, often with inadequate evaluation metrics. In this paper, we substantiate these claims by performing a systematic review of the recent causal discovery literature. We present applications in biology, neuroscience, and Earth sciences - fields where causal discovery holds promise for addressing key challenges. We highlight available simulated and real-world datasets from these domains and discuss common assumption violations that have spurred the development of new methods. Our goal is to encourage the community to adopt better evaluation practices by utilizing realistic datasets and more adequate metrics.
Authors: Shinjan Ghosh, Julian Busch, Georgia Olympia Brikis, Biswadip Dey
Abstract: Design exploration or optimization using computational fluid dynamics (CFD) is commonly used in the industry. Geometric variation is a key component of such design problems, especially in turbulent flow scenarios, which involves running costly simulations at every design iteration. While parametric RANS-PINN type approaches have been proven to make effective turbulent surrogates, as a means of predicting unknown Reynolds number flows for a given geometry at near real-time, geometry aware physics informed surrogates with the ability to predict varying geometries are a relatively less studied topic. A novel geometry aware parametric PINN surrogate model has been created, which can predict flow fields for NACA 4 digit airfoils in turbulent conditions, for unseen shapes as well as inlet flow conditions. A local+global approach for embedding has been proposed, where known global design parameters for an airfoil as well as local SDF values can be used as inputs to the model along with velocity inlet/Reynolds number ($\mathcal{R}_e$) to predict the flow fields. A RANS formulation of the Navier-Stokes equations with a 2-equation k-epsilon turbulence model has been used for the PDE losses, in addition to limited CFD data from 8 different NACA airfoils for training. The models have then been validated with unknown NACA airfoils at unseen Reynolds numbers.
Authors: Jinming Xing, Ruilin Xing, Yan Sun
Abstract: Missing data is a pervasive challenge in wireless networks and many other domains, often compromising the performance of machine learning and deep learning models. To address this, we propose a novel framework, FGATT, that combines the Fuzzy Graph Attention Network (FGAT) with the Transformer encoder to perform robust and accurate data imputation. FGAT leverages fuzzy rough sets and graph attention mechanisms to capture spatial dependencies dynamically, even in scenarios where predefined spatial information is unavailable. The Transformer encoder is employed to model temporal dependencies, utilizing its self-attention mechanism to focus on significant time-series patterns. A self-adaptive graph construction method is introduced to enable dynamic connectivity learning, ensuring the framework's applicability to a wide range of wireless datasets. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods in imputation accuracy and robustness, particularly in scenarios with substantial missing data. The proposed model is well-suited for applications in wireless sensor networks and IoT environments, where data integrity is critical.
Authors: Lifan Yuan, Wendi Li, Huayu Chen, Ganqu Cui, Ning Ding, Kaiyan Zhang, Bowen Zhou, Zhiyuan Liu, Hao Peng
Abstract: Different from its counterpart outcome reward models (ORMs), which evaluate the entire responses, a process reward model (PRM) scores a reasoning trajectory step by step, providing denser and more fine grained rewards. However, training a PRM requires labels annotated at every intermediate step, presenting significant challenges for both manual and automatic data collection. This paper aims to address this challenge. Both theoretically and empirically, we show that an \textit{implicit PRM} can be obtained at no additional cost, by simply training an ORM on the cheaper response-level labels. The only assumption is to parameterize the outcome reward as the log-likelihood ratios of the policy and reference models, which can be optimized regardless of the specific choice of loss objectives. In experiments, we instantiate our implicit PRMs with various objectives and evaluate their performance on MATH. We show that our implicit PRM outperforms a strong MCTS-based baseline \textit{\'a la} Math-Shepherd using less than $1/38$ of the training data. Its performance can be further improved with majority voting. We further find that scaling up instructions and responses benefits our implicit PRM, and the latter brings a larger gain. Particularly, we find that our implicit PRM, when instantiated with the cross-entropy (CE) loss, is more data-efficient and can keep improving generation models even when trained with only one response per instruction, the setup that suffers from extreme data scarcity and imbalance. Further, instructions should be relevant to downstream tasks while the diversity of responses does not bring gains. Surprisingly, training on extra Math-Shepherd step labels brings no further improvements to our implicit PRM trained on only outcome data. We hope that our work will encourage a rethinking of PRM training approaches and contribute to making training PRMs more accessible.
Authors: Mert Gurbuzbalaban, Mohammad Rafiqul Islam, Xiaoyu Wang, Lingjiong Zhu
Abstract: Langevin algorithms are popular Markov Chain Monte Carlo methods for Bayesian learning, particularly when the aim is to sample from the posterior distribution of a parametric model, given the input data and the prior distribution over the model parameters. Their stochastic versions such as stochastic gradient Langevin dynamics (SGLD) allow iterative learning based on randomly sampled mini-batches of large datasets and are scalable to large datasets. However, when data is decentralized across a network of agents subject to communication and privacy constraints, standard SGLD algorithms cannot be applied. Instead, we employ decentralized SGLD (DE-SGLD) algorithms, where Bayesian learning is performed collaboratively by a network of agents without sharing individual data. Nonetheless, existing DE-SGLD algorithms induce a bias at every agent that can negatively impact performance; this bias persists even when using full batches and is attributable to network effects. Motivated by the EXTRA algorithm and its generalizations for decentralized optimization, we propose the generalized EXTRA stochastic gradient Langevin dynamics, which eliminates this bias in the full-batch setting. Moreover, we show that, in the mini-batch setting, our algorithm provides performance bounds that significantly improve upon those of standard DE-SGLD algorithms in the literature. Our numerical results also demonstrate the efficiency of the proposed approach.
Authors: Trenton Chang, Lindsay Warrenburg, Sae-Hwan Park, Ravi B. Parikh, Maggie Makar, Jenna Wiens
Abstract: In many settings, machine learning models may be used to inform decisions that impact individuals or entities who interact with the model. Such entities, or agents, may game model decisions by manipulating their inputs to the model to obtain better outcomes and maximize some utility. We consider a multi-agent setting where the goal is to identify the "worst offenders:" agents that are gaming most aggressively. However, identifying such agents is difficult without knowledge of their utility function. Thus, we introduce a framework in which each agent's tendency to game is parameterized via a scalar. We show that this gaming parameter is only partially identifiable. By recasting the problem as a causal effect estimation problem where different agents represent different "treatments," we prove that a ranking of all agents by their gaming parameters is identifiable. We present empirical results in a synthetic data study validating the usage of causal effect estimation for gaming detection and show in a case study of diagnosis coding behavior in the U.S. that our approach highlights features associated with gaming.
Authors: Ryan Yu, Mateusz Nowak, Qintong Xie, Michelle Yilin Feng, Peter Chin
Abstract: Current approximate Coarse Correlated Equilibria (CCE) algorithms struggle with equilibrium approximation for games in large stochastic environments but are theoretically guaranteed to converge to a strong solution concept. In contrast, modern Reinforcement Learning (RL) algorithms provide faster training yet yield weaker solutions. We introduce Exp3-IXrl - a blend of RL and game-theoretic approach, separating the RL agent's action selection from the equilibrium computation while preserving the integrity of the learning process. We demonstrate that our algorithm expands the application of equilibrium approximation algorithms to new environments. Specifically, we show the improved performance in a complex and adversarial cybersecurity network environment - the Cyber Operations Research Gym - and in the classical multi-armed bandit settings.
Authors: Ihab Tabbara, Hussein Sibai
Abstract: Safety filters in control systems correct nominal controls that violate safety constraints. Designing such filters as functions of visual observations in uncertain and complex environments is challenging. Several deep learning-based approaches to tackle this challenge have been proposed recently. However, formally verifying that the learned filters satisfy critical properties that enable them to guarantee the safety of the system is currently beyond reach. Instead, in this work, motivated by the success of ensemble methods in reinforcement learning, we empirically investigate the efficacy of ensembles in enhancing the accuracy and the out-of-distribution generalization of such filters, as a step towards more reliable ones. We experiment with diverse pre-trained vision representation models as filter backbones, training approaches, and output aggregation techniques. We compare the performance of ensembles with different configurations against each other, their individual member models, and large single-model baselines in distinguishing between safe and unsafe states and controls in the DeepAccident dataset. Our results show that diverse ensembles have better state and control classification accuracies compared to individual models.
Authors: Soheila Sadeghi
Abstract: In the dynamic landscape of project management, scope changes are an inevitable reality that can significantly impact project performance. These changes, whether initiated by stakeholders, external factors, or internal project dynamics, can lead to cost overruns and schedule delays. Accurately predicting the consequences of these changes is crucial for effective project control and informed decision-making. This study aims to develop predictive models to estimate the impact of scope changes on project cost and schedule using machine learning techniques. The research utilizes a comprehensive dataset containing detailed information on project tasks, including the Work Breakdown Structure (WBS), task type, productivity rate, estimated cost, actual cost, duration, task dependencies, scope change magnitude, and scope change timing. Multiple machine learning models are developed and evaluated to predict the impact of scope changes on project cost and schedule. These models include Linear Regression, Decision Tree, Ridge Regression, Random Forest, Gradient Boosting, and XGBoost. The dataset is split into training and testing sets, and the models are trained using the preprocessed data. Model robustness and generalization are assessed using cross-validation techniques. To evaluate the performance of models, we use Mean Squared Error (MSE) and R2. Residual plots are generated to assess the goodness of fit and identify any patterns or outliers. Hyperparameter tuning is performed to optimize the XGBoost model and improve its predictive accuracy. The study identifies the most influential project attributes in determining the magnitude of cost and schedule deviations caused by scope modifications. It is identified that productivity rate, scope change magnitude, task dependencies, estimated cost, actual cost, duration, and specific WBS elements are powerful predictors.
Authors: Kou Tian, Chentao Yue, Changyang She, Yonghui Li, Branka Vucetic
Abstract: This paper presents a novel auto-encoder based end-to-end channel encoding and decoding. It integrates deep reinforcement learning (DRL) and graph neural networks (GNN) in code design by modeling the generation of code parity-check matrices as a Markov Decision Process (MDP), to optimize key coding performance metrics such as error-rates and code algebraic properties. An edge-weighted GNN (EW-GNN) decoder is proposed, which operates on the Tanner graph with an iterative message-passing structure. Once trained on a single linear block code, the EW-GNN decoder can be directly used to decode other linear block codes of different code lengths and code rates. An iterative joint training of the DRL-based code designer and the EW-GNN decoder is performed to optimize the end-end encoding and decoding process. Simulation results show the proposed auto-encoder significantly surpasses several traditional coding schemes at short block lengths, including low-density parity-check (LDPC) codes with the belief propagation (BP) decoding and the maximum-likelihood decoding (MLD), and BCH with BP decoding, offering superior error-correction capabilities while maintaining low decoding complexity.
Authors: Anubha Mahajan, Shreya Hegde, Ethan Shay, Daniel Wu, Aviva Prins
Abstract: In India, the majority of farmers are classified as small or marginal, making their livelihoods particularly vulnerable to economic losses due to market saturation and climate risks. Effective crop planning can significantly impact their expected income, yet existing decision support systems (DSS) often provide generic recommendations that fail to account for real-time market dynamics and the interactions among multiple farmers. In this paper, we evaluate the viability of three multi-agent reinforcement learning (MARL) approaches for optimizing total farmer income and promoting fairness in crop planning: Independent Q-Learning (IQL), where each farmer acts independently without coordination, Agent-by-Agent (ABA), which sequentially optimizes each farmer's policy in relation to the others, and the Multi-agent Rollout Policy, which jointly optimizes all farmers' actions for global reward maximization. Our results demonstrate that while IQL offers computational efficiency with linear runtime, it struggles with coordination among agents, leading to lower total rewards and an unequal distribution of income. Conversely, the Multi-agent Rollout policy achieves the highest total rewards and promotes equitable income distribution among farmers but requires significantly more computational resources, making it less practical for large numbers of agents. ABA strikes a balance between runtime efficiency and reward optimization, offering reasonable total rewards with acceptable fairness and scalability. These findings highlight the importance of selecting appropriate MARL approaches in DSS to provide personalized and equitable crop planning recommendations, advancing the development of more adaptive and farmer-centric agricultural decision-making systems.
Authors: Juncheng Dong, Zihao Wu, Hamid Jafarkhani, Ali Pezeshki, Vahid Tarokh
Abstract: Many challenges in science and engineering, such as drug discovery and communication network design, involve optimizing complex and expensive black-box functions across vast search spaces. Thus, it is essential to leverage existing data to avoid costly active queries of these black-box functions. To this end, while Offline Black-Box Optimization (BBO) is effective for deterministic problems, it may fall short in capturing the stochasticity of real-world scenarios. To address this, we introduce Stochastic Offline BBO (SOBBO), which tackles both black-box objectives and uncontrolled uncertainties. We propose two solutions: for large-data regimes, a differentiable surrogate allows for gradient-based optimization, while for scarce-data regimes, we directly estimate gradients under conservative field constraints, improving robustness, convergence, and data efficiency. Numerical experiments demonstrate the effectiveness of our approach on both synthetic and real-world tasks.
Authors: Abdullah Al Mamun (Graduate Student, Glenn Department of Civil Engineering, Clemson University), Abyad Enan (Graduate Student, Glenn Department of Civil Engineering, Clemson University), Debbie A. Indah (Graduate Student, Department of Engineering, South Carolina State University), Judith Mwakalonge (Professor, Department of Engineering, South Carolina State University), Gurcan Comert (Associate Professor, Computational Data Science and Engineering Department, North Carolina A&T State University), Mashrur Chowdhury (Professor, Glenn Department of Civil Engineering, Clemson University)
Abstract: This study investigates crash severity risk modeling strategies for work zones involving large vehicles (i.e., trucks, buses, and vans) when there are crash data imbalance between low-severity (LS) and high-severity (HS) crashes. We utilized crash data, involving large vehicles in South Carolina work zones for the period between 2014 and 2018, which included 4 times more LS crashes compared to HS crashes. The objective of this study is to explore crash severity prediction performance of various models under different feature selection and data balancing techniques. The findings of this study highlight a disparity between LS and HS predictions, with less-accurate prediction of HS crashes compared to LS crashes due to class imbalance and feature overlaps between LS and HS crashes. Combining features from multiple feature selection techniques: statistical correlation, feature importance, recursive elimination, statistical tests, and mutual information, slightly improves HS crash prediction performance. Data balancing techniques such as NearMiss-1 and RandomUnderSampler, maximize HS recall when paired with certain prediction models, such as Bayesian Mixed Logit (BML), NeuralNet, and RandomForest, making them suitable for HS crash prediction. Conversely, RandomOverSampler, HS Class Weighting, and Kernel-based Synthetic Minority Oversampling (K-SMOTE), used with certain prediction models such as BML, CatBoost, and LightGBM, achieve a balanced performance, defined as achieving an equitable trade-off between LS and HS prediction performance metrics. These insights provide safety analysts with guidance to select models, feature selection techniques, and data balancing techniques that align with their specific safety objectives, offering a robust foundation for enhancing work-zone crash severity prediction.
Authors: Mingming Zhang, Jiahao Hu, Pengfei Shi, Ningtao Wang, Ruizhe Gao, Guandong Sun, Feng Zhao, Yulin kang, Xing Fu, Weiqiang Wang, Junbo Zhao
Abstract: Tabular data plays a critical role in real-world financial scenarios. Traditionally, tree models have dominated in handling tabular data. However, financial datasets in the industry often encounter some challenges, such as data heterogeneity, the predominance of numerical features and the large scale of the data, which can range from tens of millions to hundreds of millions of records. These challenges can lead to significant memory and computational issues when using tree-based models. Consequently, there is a growing need for neural network-based solutions that can outperform these models. In this paper, we introduce TKGMLP, an hybrid network for tabular data that combines shallow Kolmogorov Arnold Networks with Gated Multilayer Perceptron. This model leverages the strengths of both architectures to improve performance and scalability. We validate TKGMLP on a real-world credit scoring dataset, where it achieves state-of-the-art results and outperforms current benchmarks. Furthermore, our findings demonstrate that the model continues to improve as the dataset size increases, making it highly scalable. Additionally, we propose a novel feature encoding method for numerical data, specifically designed to address the predominance of numerical features in financial datasets. The integration of this feature encoding method within TKGMLP significantly improves prediction accuracy. This research not only advances table prediction technology but also offers a practical and effective solution for handling large-scale numerical tabular data in various industrial applications.
Authors: Valdemar \v{S}v\'abensk\'y, Conrad Borchers, Elizabeth B. Cloude, Atsushi Shimada
Abstract: In supervised machine learning (SML) research, large training datasets are essential for valid results. However, obtaining primary data in learning analytics (LA) is challenging. Data augmentation can address this by expanding and diversifying data, though its use in LA remains underexplored. This paper systematically compares data augmentation techniques and their impact on prediction performance in a typical LA task: prediction of academic outcomes. Augmentation is demonstrated on four SML models, which we successfully replicated from a previous LAK study based on AUC values. Among 21 augmentation techniques, SMOTE-ENN sampling performed the best, improving the average AUC by 0.01 and approximately halving the training time compared to the baseline models. In addition, we compared 99 combinations of chaining 21 techniques, and found minor, although statistically significant, improvements across models when adding noise to SMOTE-ENN (+0.014). Notably, some augmentation techniques significantly lowered predictive performance or increased performance fluctuation related to random chance. This paper's contribution is twofold. Primarily, our empirical findings show that sampling techniques provide the most statistically reliable performance improvements for LA applications of SML, and are computationally more efficient than deep generation methods with complex hyperparameter settings. Second, the LA community may benefit from validating a recent study through independent replication.
Authors: Md Hafizur Rahman, Md Mashfiq Rizvee, Sumaiya Shomaji, Prabuddha Chakraborty
Abstract: Artificial intelligence (AI) is widely used in various fields including healthcare, autonomous vehicles, robotics, traffic monitoring, and agriculture. Many modern AI applications in these fields are multi-tasking in nature (i.e. perform multiple analysis on same data) and are deployed on resource-constrained edge devices requiring the AI models to be efficient across different metrics such as power, frame rate, and size. For these specific use-cases, in this work, we propose a new paradigm of neural network architecture (ILASH) that leverages a layer sharing concept for minimizing power utilization, increasing frame rate, and reducing model size. Additionally, we propose a novel neural network architecture search framework (ILASH-NAS) for efficient construction of these neural network models for a given set of tasks and device constraints. The proposed NAS framework utilizes a data-driven intelligent approach to make the search efficient in terms of energy, time, and CO2 emission. We perform extensive evaluations of the proposed layer shared architecture paradigm (ILASH) and the ILASH-NAS framework using four open-source datasets (UTKFace, MTFL, CelebA, and Taskonomy). We compare ILASH-NAS with AutoKeras and observe significant improvement in terms of both the generated model performance and neural search efficiency with up to 16x less energy utilization, CO2 emission, and training/search time.
Authors: L. G. A dos Reis, V. L. P. S. Caminha, T. J. P. Penna
Abstract: Symbolic regression is a machine learning technique, and it has seen many advancements in recent years, especially in genetic programming approaches (GPSR). Furthermore, it has been known for many years that constant optimization of parameters, during the evolutionary search, greatly increases GPSR performance However, different authors approach such tasks differently and no consensus exists regarding which methods perform best. In this work, we evaluate eight different parameter optimization methods, applied during evolutionary search, over ten known benchmark problems, in two different scenarios. We also propose using an under-explored metric called Tree Edit Distance (TED), aiming to identify symbolic accuracy. In conjunction with classical error measures, we develop a combined analysis of model performance in symbolic regression. We then show that different constant optimization methods perform better in certain scenarios and that there is no overall best choice for every problem. Finally, we discuss how common metric decisions may be biased and appear to generate better models in comparison.
Authors: Abulikemu Abuduweili, Changliu Liu
Abstract: Adaptive gradient optimization methods, such as Adam, are prevalent in training deep neural networks across diverse machine learning tasks due to their ability to achieve faster convergence. However, these methods often suffer from suboptimal generalization compared to stochastic gradient descent (SGD) and exhibit instability, particularly when training Transformer models. In this work, we show the standard initialization of the second-order moment estimation ($v_0 =0$) as a significant factor contributing to these limitations. We introduce simple yet effective solutions: initializing the second-order moment estimation with non-zero values, using either data-driven or random initialization strategies. Empirical evaluations demonstrate that our approach not only stabilizes convergence but also enhances the final performance of adaptive gradient optimizers. Furthermore, by adopting the proposed initialization strategies, Adam achieves performance comparable to many recently proposed variants of adaptive gradient optimization methods, highlighting the practical impact of this straightforward modification.
Authors: Xiaojie Yang, Hangli Ge, Jiawei Wang, Zipei Fan, Renhe Jiang, Ryosuke Shibasaki, Noboru Koshizuka
Abstract: Large-scale human mobility exhibits spatial and temporal patterns that can assist policymakers in decision making. Although traditional prediction models attempt to capture these patterns, they often interfered by non-periodic public events, such as disasters and occasional celebrations. Since regular human mobility patterns are heavily affected by these events, estimating their causal effects is critical to accurate mobility predictions. Although news articles provide unique perspectives on these events in an unstructured format, processing is a challenge. In this study, we propose a causality-augmented prediction model, called \textbf{CausalMob}, to analyze the causal effects of public events. We first utilize large language models (LLMs) to extract human intentions from news articles and transform them into features that act as causal treatments. Next, the model learns representations of spatio-temporal regional covariates from multiple data sources to serve as confounders for causal inference. Finally, we present a causal effect estimation framework to ensure event features remain independent of confounders during prediction. Based on large-scale real-world data, the experimental results show that the proposed model excels in human mobility prediction, outperforming state-of-the-art models.
Authors: Tony T. Wang, John Hughes, Henry Sleight, Rylan Schaeffer, Rajashree Agrawal, Fazl Barez, Mrinank Sharma, Jesse Mu, Nir Shavit, Ethan Perez
Abstract: Defending large language models against jailbreaks so that they never engage in a broadly-defined set of forbidden behaviors is an open problem. In this paper, we investigate the difficulty of jailbreak-defense when we only want to forbid a narrowly-defined set of behaviors. As a case study, we focus on preventing an LLM from helping a user make a bomb. We find that popular defenses such as safety training, adversarial training, and input/output classifiers are unable to fully solve this problem. In pursuit of a better solution, we develop a transcript-classifier defense which outperforms the baseline defenses we test. However, our classifier defense still fails in some circumstances, which highlights the difficulty of jailbreak-defense even in a narrow domain.
Authors: Dongkwan Kim, Alice Oh
Abstract: Subgraph representation learning has been effective in solving various real-world problems. However, current graph neural networks (GNNs) produce suboptimal results for subgraph-level tasks due to their inability to capture complex interactions within and between subgraphs. To provide a more expressive and efficient alternative, we propose WLKS, a Weisfeiler-Lehman (WL) kernel generalized for subgraphs by applying the WL algorithm on induced $k$-hop neighborhoods. We combine kernels across different $k$-hop levels to capture richer structural information that is not fully encoded in existing models. Our approach can balance expressiveness and efficiency by eliminating the need for neighborhood sampling. In experiments on eight real-world and synthetic benchmarks, WLKS significantly outperforms leading approaches on five datasets while reducing training time, ranging from 0.01x to 0.25x compared to the state-of-the-art.
Authors: Weiche Hsieh, Ziqian Bi, Keyu Chen, Benji Peng, Sen Zhang, Jiawei Xu, Jinlang Wang, Caitlyn Heqi Yin, Yichao Zhang, Pohsun Feng, Yizhu Wen, Tianyang Wang, Ming Li, Chia Xin Liang, Jintao Ren, Qian Niu, Silin Chen, Lawrence K. Q. Yan, Han Xu, Hong-Ming Tseng, Xinyuan Song, Bowen Jing, Junjie Yang, Junhao Song, Junyu Liu, Ming Liu
Abstract: Advancements in artificial intelligence, machine learning, and deep learning have catalyzed the transformation of big data analytics and management into pivotal domains for research and application. This work explores the theoretical foundations, methodological advancements, and practical implementations of these technologies, emphasizing their role in uncovering actionable insights from massive, high-dimensional datasets. The study presents a systematic overview of data preprocessing techniques, including data cleaning, normalization, integration, and dimensionality reduction, to prepare raw data for analysis. Core analytics methodologies such as classification, clustering, regression, and anomaly detection are examined, with a focus on algorithmic innovation and scalability. Furthermore, the text delves into state-of-the-art frameworks for data mining and predictive modeling, highlighting the role of neural networks, support vector machines, and ensemble methods in tackling complex analytical challenges. Special emphasis is placed on the convergence of big data with distributed computing paradigms, including cloud and edge computing, to address challenges in storage, computation, and real-time analytics. The integration of ethical considerations, including data privacy and compliance with global standards, ensures a holistic perspective on data management. Practical applications across healthcare, finance, marketing, and policy-making illustrate the real-world impact of these technologies. Through comprehensive case studies and Python-based implementations, this work equips researchers, practitioners, and data enthusiasts with the tools to navigate the complexities of modern data analytics. It bridges the gap between theory and practice, fostering the development of innovative solutions for managing and leveraging data in the era of artificial intelligence.
Authors: Guanghui Zhu, Zipeng Ji, Jingyan Chen, Limin Wang, Chunfeng Yuan, Yihua Huang
Abstract: GNAS (Graph Neural Architecture Search) has demonstrated great effectiveness in automatically designing the optimal graph neural architectures for multiple downstream tasks, such as node classification and link prediction. However, most existing GNAS methods cannot efficiently handle large-scale graphs containing more than million-scale nodes and edges due to the expensive computational and memory overhead. To scale GNAS on large graphs while achieving better performance, we propose SA-GNAS, a novel framework based on seed architecture expansion for efficient large-scale GNAS. Similar to the cell expansion in biotechnology, we first construct a seed architecture and then expand the seed architecture iteratively. Specifically, we first propose a performance ranking consistency-based seed architecture selection method, which selects the architecture searched on the subgraph that best matches the original large-scale graph. Then, we propose an entropy minimization-based seed architecture expansion method to further improve the performance of the seed architecture. Extensive experimental results on five large-scale graphs demonstrate that the proposed SA-GNAS outperforms human-designed state-of-the-art GNN architectures and existing graph NAS methods. Moreover, SA-GNAS can significantly reduce the search time, showing better search efficiency. For the largest graph with billion edges, SA-GNAS can achieve 2.8 times speedup compared to the SOTA large-scale GNAS method GAUSS. Additionally, since SA-GNAS is inherently parallelized, the search efficiency can be further improved with more GPUs. SA-GNAS is available at https://github.com/PasaLab/SAGNAS.
Authors: Yaxin Liang, Xinshi Li, Xin Huang, Ziqi Zhang, Yue Yao
Abstract: This study proposes an automated data mining framework based on autoencoders and experimentally verifies its effectiveness in feature extraction and data dimensionality reduction. Through the encoding-decoding structure, the autoencoder can capture the data's potential characteristics and achieve noise reduction and anomaly detection, providing an efficient and stable solution for the data mining process. The experiment compared the performance of the autoencoder with traditional dimensionality reduction methods (such as PCA, FA, T-SNE, and UMAP). The results showed that the autoencoder performed best in terms of reconstruction error and root mean square error and could better retain data structure and enhance the generalization ability of the model. The autoencoder-based framework not only reduces manual intervention but also significantly improves the automation of data processing. In the future, with the advancement of deep learning and big data technology, the autoencoder method combined with a generative adversarial network (GAN) or graph neural network (GNN) is expected to be more widely used in the fields of complex data processing, real-time data analysis and intelligent decision-making.
Authors: Ayan Banerjee, Sandeep K. S. Gupta
Abstract: Recovering a physics-driven model, i.e. a governing set of equations of the underlying dynamical systems, from the real-world data has been of recent interest. Most existing methods either operate on simulation data with unrealistically high sampling rates or require explicit measurements of all system variables, which is not amenable in real-world deployments. Moreover, they assume the timestamps of external perturbations to the physical system are known a priori, without uncertainty, implicitly discounting any sensor time-synchronization or human reporting errors. In this paper, we propose a novel liquid time constant neural network (LTC-NN) based architecture to recover underlying model of physical dynamics from real-world data. The automatic differentiation property of LTC-NN nodes overcomes problems associated with low sampling rates, the input dependent time constant in the forward pass of the hidden layer of LTC-NN nodes creates a massive search space of implicit physical dynamics, the physics model solver based data reconstruction loss guides the search for the correct set of implicit dynamics, and the use of the dropout regularization in the dense layer ensures extraction of the sparsest model. Further, to account for the perturbation timing error, we utilize dense layer nodes to search through input shifts that results in the lowest reconstruction loss. Experiments on four benchmark dynamical systems, three with simulation data and one with the real-world data show that the LTC-NN architecture is more accurate in recovering implicit physics model coefficients than the state-of-the-art sparse model recovery approaches. We also introduce four additional case studies (total eight) on real-life medical examples in simulation and with real-world clinical data to show effectiveness of our approach in recovering underlying model in practice.
Authors: Zhongnian Li, Meng Wei, Peng Ying, Tongfeng Sun, Xinzheng Xu
Abstract: Annotating data for sensitive labels (e.g., disease, smoking) poses a potential threats to individual privacy in many real-world scenarios. To cope with this problem, we propose a novel setting to protect privacy of each instance, namely learning from concealed labels for multi-class classification. Concealed labels prevent sensitive labels from appearing in the label set during the label collection stage, which specifies none and some random sampled insensitive labels as concealed labels set to annotate sensitive data. In this paper, an unbiased estimator can be established from concealed data under mild assumptions, and the learned multi-class classifier can not only classify the instance from insensitive labels accurately but also recognize the instance from the sensitive labels. Moreover, we bound the estimation error and show that the multi-class classifier achieves the optimal parametric convergence rate. Experiments demonstrate the significance and effectiveness of the proposed method for concealed labels in synthetic and real-world datasets.
Authors: Zhongnian Li, Meng Wei, Peng Ying, Xinzheng Xu
Abstract: Learning from Multi-Positive and Unlabeled (MPU) data has gradually attracted significant attention from practical applications. Unfortunately, the risk of MPU also suffer from the shift of minimum risk, particularly when the models are very flexible as shown in Fig.\ref{moti}. In this paper, to alleviate the shifting of minimum risk problem, we propose an Example Sieve Approach (ESA) to select examples for training a multi-class classifier. Specifically, we sieve out some examples by utilizing the Certain Loss (CL) value of each example in the training stage and analyze the consistency of the proposed risk estimator. Besides, we show that the estimation error of proposed ESA obtains the optimal parametric convergence rate. Extensive experiments on various real-world datasets show the proposed approach outperforms previous methods.
Authors: Yushuai Ji, Zepeng Liu, Sheng Wang, Yuan Sun, Zhiyong Peng
Abstract: The k-means algorithm can simplify large-scale spatial vectors, such as 2D geo-locations and 3D point clouds, to support fast analytics and learning. However, when processing large-scale datasets, existing k-means algorithms have been developed to achieve high performance with significant computational resources, such as memory and CPU usage time. These algorithms, though effective, are not well-suited for resource-constrained devices. In this paper, we propose a fast, memory-efficient, and cost-predictable k-means called Dask-means. We first accelerate k-means by designing a memory-efficient accelerator, which utilizes an optimized nearest neighbor search over a memory-tunable index to assign spatial vectors to clusters in batches. We then design a lightweight cost estimator to predict the memory cost and runtime of the k-means task, allowing it to request appropriate memory from devices or adjust the accelerator's required space to meet memory constraints, and ensure sufficient CPU time for running k-means. Experiments show that when simplifying datasets with scale such as $10^6$, Dask-means uses less than $30$MB of memory, achieves over $168$ times speedup compared to the widely-used Lloyd's algorithm. We also validate Dask-means on mobile devices, where it demonstrates significant speedup and low memory cost compared to other state-of-the-art (SOTA) k-means algorithms. Our cost estimator estimates the memory cost with a difference of less than $3\%$ from the actual ones and predicts runtime with an MSE up to $33.3\%$ lower than SOTA methods.
Authors: Jan Kadel, August See, Ritwik Sinha, Mathias Fischer
Abstract: Bots constitute a significant portion of Internet traffic and are a source of various issues across multiple domains. Modern bots often become indistinguishable from real users, as they employ similar methods to browse the web, including using real browsers. We address the challenge of bot detection in high-traffic scenarios by analyzing three distinct detection methods. The first method operates on heuristics, allowing for rapid detection. The second method utilizes, well known, technical features, such as IP address, window size, and user agent. It serves primarily for comparison with the third method. In the third method, we rely solely on browsing behavior, omitting all static features and focusing exclusively on how clients behave on a website. In contrast to related work, we evaluate our approaches using real-world e-commerce traffic data, comprising 40 million monthly page visits. We further compare our methods against another bot detection approach, Botcha, on the same dataset. Our performance metrics, including precision, recall, and AUC, reach 98 percent or higher, surpassing Botcha.
Authors: Lillian Muyama, Estelle Lu, Geoffrey Cheminet, Jacques Pouchot, Bastien Rance, Anne-Isabelle Tropeano, Antoine Neuraz, Adrien Coulet
Abstract: Clinical diagnostic guidelines outline the key questions to answer to reach a diagnosis. Inspired by guidelines, we aim to develop a model that learns from electronic health records to determine the optimal sequence of actions for accurate diagnosis. Focusing on anemia and its sub-types, we employ deep reinforcement learning (DRL) algorithms and evaluate their performance on both a synthetic dataset, which is based on expert-defined diagnostic pathways, and a real-world dataset. We investigate the performance of these algorithms across various scenarios. Our experimental results demonstrate that DRL algorithms perform competitively with state-of-the-art methods while offering the significant advantage of progressively generating pathways to the suggested diagnosis, providing a transparent decision-making process that can guide and explain diagnostic reasoning.
Authors: Lei Yu, Hongyang Chen, Jingsong Lv, Linyao Yang
Abstract: Graph Transformers (GTs) have demonstrated significant advantages in graph representation learning through their global attention mechanisms. However, the self-attention mechanism in GTs tends to neglect the inductive biases inherent in graph structures, making it chanllenging to effectively capture essential structural information. To address this issue, we propose a novel approach that integrate graph inductive bias into self-attention mechanisms by leveraging quantum technology for structural encoding. In this paper, we introduce the Graph Quantum Walk Transformer (GQWformer), a groundbreaking GNN framework that utilizes quantum walks on attributed graphs to generate node quantum states. These quantum states encapsulate rich structural attributes and serve as inductive biases for the transformer, thereby enabling the generation of more meaningful attention scores. By subsequently incorporating a recurrent neural network, our design amplifies the model's ability to focus on both local and global information. We conducted comprehensive experiments across five publicly available datasets to evaluate the effectiveness of our model. These results clearly indicate that GQWformer outperforms existing state-of-the-art graph classification algorithms. These findings highlight the significant potential of integrating quantum computing methodologies with traditional GNNs to advance the field of graph representation learning, providing a promising direction for future research and applications.
Authors: Roberto Pereira, Cristian J. Vaca-Rubio, Luis Blanco
Abstract: Federated Learning (FL) has emerged as a solution for distributed model training across decentralized, privacy-preserving devices, but the different energy capacities of participating devices (system heterogeneity) constrain real-world implementations. These energy limitations not only reduce model accuracy but also increase dropout rates, impacting on convergence in practical FL deployments. In this work, we propose LeanFed, an energy-aware FL framework designed to optimize client selection and training workloads on battery-constrained devices. LeanFed leverages adaptive data usage by dynamically adjusting the fraction of local data each device utilizes during training, thereby maximizing device participation across communication rounds while ensuring they do not run out of battery during the process. We rigorously evaluate LeanFed against traditional FedAvg on CIFAR-10 and CIFAR-100 datasets, simulating various levels of data heterogeneity and device participation rates. Results show that LeanFed consistently enhances model accuracy and stability, particularly in settings with high data heterogeneity and limited battery life, by mitigating client dropout and extending device availability. This approach demonstrates the potential of energy-efficient, privacy-preserving FL in real-world, large-scale applications, setting a foundation for robust and sustainable pervasive AI on resource-constrained networks.
Authors: Yao Lyu, Xiangteng Zhang, Shengbo Eben Li, Jingliang Duan, Letian Tao, Qing Xu, Lei He, Keqiang Li
Abstract: Training deep reinforcement learning (RL) agents necessitates overcoming the highly unstable nonconvex stochastic optimization inherent in the trial-and-error mechanism. To tackle this challenge, we propose a physics-inspired optimization algorithm called relativistic adaptive gradient descent (RAD), which enhances long-term training stability. By conceptualizing neural network (NN) training as the evolution of a conformal Hamiltonian system, we present a universal framework for transferring long-term stability from conformal symplectic integrators to iterative NN updating rules, where the choice of kinetic energy governs the dynamical properties of resulting optimization algorithms. By utilizing relativistic kinetic energy, RAD incorporates principles from special relativity and limits parameter updates below a finite speed, effectively mitigating abnormal gradient influences. Additionally, RAD models NN optimization as the evolution of a multi-particle system where each trainable parameter acts as an independent particle with an individual adaptive learning rate. We prove RAD's sublinear convergence under general nonconvex settings, where smaller gradient variance and larger batch sizes contribute to tighter convergence. Notably, RAD degrades to the well-known adaptive moment estimation (ADAM) algorithm when its speed coefficient is chosen as one and symplectic factor as a small positive value. Experimental results show RAD outperforming nine baseline optimizers with five RL algorithms across twelve environments, including standard benchmarks and challenging scenarios. Notably, RAD achieves up to a 155.1% performance improvement over ADAM in Atari games, showcasing its efficacy in stabilizing and accelerating RL training.
Authors: Guang Wu, Yun Wang, Qian Zhou, Ziyang Zhang
Abstract: Accurate photovoltaic (PV) power forecasting is critical for integrating renewable energy sources into the grid, optimizing real-time energy management, and ensuring energy reliability amidst increasing demand. However, existing models often struggle with effectively capturing the complex relationships between target variables and covariates, as well as the interactions between temporal dynamics and multivariate data, leading to suboptimal forecasting accuracy. To address these challenges, we propose a novel model architecture that leverages the iTransformer for feature extraction from target variables and employs long short-term memory (LSTM) to extract features from covariates. A cross-attention mechanism is integrated to fuse the outputs of both models, followed by a Kolmogorov-Arnold network (KAN) mapping for enhanced representation. The effectiveness of the proposed model is validated using publicly available datasets from Australia, with experiments conducted across four seasons. Results demonstrate that the proposed model effectively capture seasonal variations in PV power generation and improve forecasting accuracy.
Authors: Jiamian Hu, Yuanyuan Hong, Yihua Chen, He Wang, Moriaki Yasuhara
Abstract: We present the Noisy Ostracods, a noisy dataset for genus and species classification of crustacean ostracods with specialists' annotations. Over the 71466 specimens collected, 5.58% of them are estimated to be noisy (possibly problematic) at genus level. The dataset is created to addressing a real-world challenge: creating a clean fine-grained taxonomy dataset. The Noisy Ostracods dataset has diverse noises from multiple sources. Firstly, the noise is open-set, including new classes discovered during curation that were not part of the original annotation. The dataset has pseudo-classes, where annotators misclassified samples that should belong to an existing class into a new pseudo-class. The Noisy Ostracods dataset is highly imbalanced with a imbalance factor $\rho$ = 22429. This presents a unique challenge for robust machine learning methods, as existing approaches have not been extensively evaluated on fine-grained classification tasks with such diverse real-world noise. Initial experiments using current robust learning techniques have not yielded significant performance improvements on the Noisy Ostracods dataset compared to cross-entropy training on the raw, noisy data. On the other hand, noise detection methods have underperformed in error hit rate compared to naive cross-validation ensembling for identifying problematic labels. These findings suggest that the fine-grained, imbalanced nature, and complex noise characteristics of the dataset present considerable challenges for existing noise-robust algorithms. By openly releasing the Noisy Ostracods dataset, our goal is to encourage further research into the development of noise-resilient machine learning methods capable of effectively handling diverse, real-world noise in fine-grained classification tasks. The dataset, along with its evaluation protocols, can be accessed at https://github.com/H-Jamieu/Noisy_ostracods.
Authors: Jamie McGowan, Wei Sheng Lai, Weibin Chen, Henry Aldridge, Jools Clarke, Jezabel Garcia, Rui Xia, Yilei Liang, Guillaume Hennequin, Alberto Bernacchia
Abstract: In many domains, the most successful AI models tend to be the largest, indeed often too large to be handled by AI players with limited computational resources. To mitigate this, a number of compression methods have been developed, including methods that prune the network down to high sparsity whilst retaining performance. The best-performing pruning techniques are often those that use second-order curvature information (such as an estimate of the Fisher information matrix) to score the importance of each weight and to predict the optimal compensation for weight deletion. However, these methods are difficult to scale to high-dimensional parameter spaces without making heavy approximations. Here, we propose the FishLeg surgeon (FLS), a new second-order pruning method based on the Fisher-Legendre (FishLeg) optimizer. At the heart of FishLeg is a meta-learning approach to amortising the action of the inverse FIM, which brings a number of advantages. Firstly, the parameterisation enables the use of flexible tensor factorisation techniques to improve computational and memory efficiency without sacrificing much accuracy, alleviating challenges associated with scalability of most second-order pruning methods. Secondly, directly estimating the inverse FIM leads to less sensitivity to the amplification of stochasticity during inversion, thereby resulting in more precise estimates. Thirdly, our approach also allows for progressive assimilation of the curvature into the parameterisation. In the gradual pruning regime, this results in a more efficient estimate refinement as opposed to re-estimation. We find that FishLeg achieves higher or comparable performance against two common baselines in the area, most notably in the high sparsity regime when considering a ResNet18 model on CIFAR-10 (84% accuracy at 95% sparsity vs 60% for OBS) and TinyIM (53% accuracy at 80% sparsity vs 48% for OBS).
Authors: Harish Srinivas, Graham Cormode, Mehrdad Honarkhah, Samuel Lurye, Jonathan Hehir, Lunwen He, George Hong, Ahmed Magdy, Dzmitry Huba, Kaikai Wang, Shen Guo, Shoubhik Bhattacharya
Abstract: Cross-device Federated Analytics (FA) is a distributed computation paradigm designed to answer analytics queries about and derive insights from data held locally on users' devices. On-device computations combined with other privacy and security measures ensure that only minimal data is transmitted off-device, achieving a high standard of data protection. Despite FA's broad relevance, the applicability of existing FA systems is limited by compromised accuracy; lack of flexibility for data analytics; and an inability to scale effectively. In this paper, we describe our approach to combine privacy, scalability, and practicality to build and deploy a system that overcomes these limitations. Our FA system leverages trusted execution environments (TEEs) and optimizes the use of on-device computing resources to facilitate federated data processing across large fleets of devices, while ensuring robust, defensible, and verifiable privacy safeguards. We focus on federated analytics (statistics and monitoring), in contrast to systems for federated learning (ML workloads), and we flag the key differences.
Authors: Ethan Smith, Rami Seid, Alberto Hojel, Paramita Mishra, Jianbo Wu
Abstract: Low-Rank Adaptation (LoRA) and other parameter-efficient fine-tuning (PEFT) methods provide low-memory, storage-efficient solutions for personalizing text-to-image models. However, these methods offer little to no improvement in wall-clock training time or the number of steps needed for convergence compared to full model fine-tuning. While PEFT methods assume that shifts in generated distributions (from base to fine-tuned models) can be effectively modeled through weight changes in a low-rank subspace, they fail to leverage knowledge of common use cases, which typically focus on capturing specific styles or identities. Observing that desired outputs often comprise only a small subset of the possible domain covered by LoRA training, we propose reducing the search space by incorporating a prior over regions of interest. We demonstrate that training a hypernetwork model to generate LoRA weights can achieve competitive quality for specific domains while enabling near-instantaneous conditioning on user input, in contrast to traditional training methods that require thousands of steps.
Authors: Adam Wr\'obel, Miko{\l}aj Janusz, Bartosz Zieli\'nski, Dawid Rymarczyk
Abstract: Deep Learning (DL) models are often black boxes, making their decision-making processes difficult to interpret. This lack of transparency has driven advancements in eXplainable Artificial Intelligence (XAI), a field dedicated to clarifying the reasoning behind DL model predictions. Among these, attribution-based methods such as LRP and GradCAM are widely used, though they rely on approximations that can be imprecise. To address these limitations, we introduce One Matrix to Explain Neural Networks (OMENN), a novel post-hoc method that represents a neural network as a single, interpretable matrix for each specific input. This matrix is constructed through a series of linear transformations that represent the processing of the input by each successive layer in the neural network. As a result, OMENN provides locally precise, attribution-based explanations of the input across various modern models, including ViTs and CNNs. We present a theoretical analysis of OMENN based on dynamic linearity property and validate its effectiveness with extensive tests on two XAI benchmarks, demonstrating that OMENN is competitive with state-of-the-art methods.
Authors: Tom White
Abstract: We present VISTA (Visualization of Internal States and Their Associations), a novel pipeline for visually exploring and interpreting neural network representations. VISTA addresses the challenge of analyzing vast multidimensional spaces in modern machine learning models by mapping representations into a semantic 2D space. The resulting collages visually reveal patterns and relationships within internal representations. We demonstrate VISTA's utility by applying it to sparse autoencoder latents uncovering new properties and interpretations. We review the VISTA methodology, present findings from our case study ( https://got.drib.net/latents/ ), and discuss implications for neural network interpretability across various domains of machine learning.
Authors: Reihaneh Torkzadehmahani, Reza Nasirigerdeh, Georgios Kaissis, Daniel Rueckert, Gintare Karolina Dziugaite, Eleni Triantafillou
Abstract: Machine unlearning refers to removing the influence of a specified subset of training data from a machine learning model, efficiently, after it has already been trained. This is important for key applications, including making the model more accurate by removing outdated, mislabeled, or poisoned data. In this work, we study localized unlearning, where the unlearning algorithm operates on a (small) identified subset of parameters. Drawing inspiration from the memorization literature, we propose an improved localization strategy that yields strong results when paired with existing unlearning algorithms. We also propose a new unlearning algorithm, Deletion by Example Localization (DEL), that resets the parameters deemed-to-be most critical according to our localization strategy, and then finetunes them. Our extensive experiments on different datasets, forget sets and metrics reveal that DEL sets a new state-of-the-art for unlearning metrics, against both localized and full-parameter methods, while modifying a small subset of parameters, and outperforms the state-of-the-art localized unlearning in terms of test accuracy too.
Authors: Tejumade Afonja, Hui-Po Wang, Raouf Kerkouche, Mario Fritz
Abstract: Generating tabular data under differential privacy (DP) protection ensures theoretical privacy guarantees but poses challenges for training machine learning models, primarily due to the need to capture complex structures under noisy supervision signals. Recently, pre-trained Large Language Models (LLMs) -- even those at the scale of GPT-2 -- have demonstrated great potential in synthesizing tabular data. However, their applications under DP constraints remain largely unexplored. In this work, we address this gap by applying DP techniques to the generation of synthetic tabular data. Our findings shows that LLMs face difficulties in generating coherent text when fine-tuned with DP, as privacy budgets are inefficiently allocated to non-private elements like table structures. To overcome this, we propose \ours, a two-stage fine-tuning framework for differentially private tabular data generation. The first stage involves non-private fine-tuning on a pseudo dataset, followed by DP fine-tuning on a private dataset. Our empirical results show that this approach improves performance across various settings and metrics compared to directly fine-tuned LLMs in DP contexts. We release our code and setup at https://github.com/tejuafonja/DP-2Stage.
Authors: Haojie Wang, Zhe Zhang, Haotian Gao, Xiangying Zhang, Zhihang Chen, Xinchong Chen, Yifei Qi, Yan Li, Renxiao Wang
Abstract: Identifying the interaction targets of bioactive compounds is a foundational element for deciphering their pharmacological effects. Target prediction algorithms equip researchers with an effective tool to rapidly scope and explore potential targets. Here, we introduce the COMET, a multi-technological modular target prediction tool that provides comprehensive predictive insights, including similar active compounds, three-dimensional predicted binding modes, and probability scores, all within an average processing time of less than 10 minutes per task. With meticulously curated data, the COMET database encompasses 990,944 drug-target interaction pairs and 45,035 binding pockets, enabling predictions for 2,685 targets, which span confirmed and exploratory therapeutic targets for human diseases. In comparative testing using datasets from ChEMBL and BindingDB, COMET outperformed five other well-known algorithms, offering nearly an 80% probability of accurately identifying at least one true target within the top 15 predictions for a given compound. COMET also features a user-friendly web server, accessible freely at https://www.pdbbind-plus.org.cn/comet.
Authors: \.Ilter Onat Korkmaz, Ya\c{s}ar Cahit Y{\i}ld{\i}r{\i}m, \c{C}a\u{g}{\i}n Ararat, Cem Tekin
Abstract: Learning problems in which multiple conflicting objectives must be considered simultaneously often arise in various fields, including engineering, drug design, and environmental management. Traditional methods for dealing with multiple black-box objective functions, such as scalarization and identification of the Pareto set under the componentwise order, have limitations in incorporating objective preferences and exploring the solution space accordingly. While vector optimization offers improved flexibility and adaptability via specifying partial orders based on ordering cones, current techniques designed for sequential experiments either suffer from high sample complexity or lack theoretical guarantees. To address these issues, we propose Vector Optimization with Gaussian Process (VOGP), a probably approximately correct adaptive elimination algorithm that performs black-box vector optimization using Gaussian process bandits. VOGP allows users to convey objective preferences through ordering cones while performing efficient sampling by exploiting the smoothness of the objective function, resulting in a more effective optimization process that requires fewer evaluations. We establish theoretical guarantees for VOGP and derive information gain-based and kernel-specific sample complexity bounds. We also conduct experiments on both real-world and synthetic datasets to compare VOGP with the state-of-the-art methods.
Authors: Hao Chen, Han Tao, Guo Song, Jie Zhang, Yunlong Yu, Yonghan Dong, Chuang Yang, Lei Bai
Abstract: Atmospheric science is intricately connected with other fields, e.g., geography and aerospace. Most existing approaches involve training a joint atmospheric and geographic model from scratch, which incurs significant computational costs and overlooks the potential for incremental learning of weather variables across different domains. In this paper, we introduce incremental learning to weather forecasting and propose a novel structure that allows for the flexible expansion of variables within the model. Specifically, our method presents a Channel-Adapted MoE (CA-MoE) that employs a divide-and-conquer strategy. This strategy assigns variable training tasks to different experts by index embedding and reduces computational complexity through a channel-wise Top-K strategy. Experiments conducted on the widely utilized ERA5 dataset reveal that our method, utilizing only approximately 15\% of trainable parameters during the incremental stage, attains performance that is on par with state-of-the-art competitors. Notably, in the context of variable incremental experiments, our method demonstrates negligible issues with catastrophic forgetting.
Authors: Hanyu Zhang, Chuck Arvin, Dmitry Efimov, Michael W. Mahoney, Dominique Perrault-Joncas, Shankar Ramasubramanian, Andrew Gordon Wilson, Malcolm Wolff
Abstract: Modern time-series forecasting models often fail to make full use of rich unstructured information about the time series themselves. This lack of proper conditioning can lead to obvious model failures; for example, models may be unaware of the details of a particular product, and hence fail to anticipate seasonal surges in customer demand in the lead up to major exogenous events like holidays for clearly relevant products. To address this shortcoming, this paper introduces a novel forecast post-processor -- which we call LLMForecaster -- that fine-tunes large language models (LLMs) to incorporate unstructured semantic and contextual information and historical data to improve the forecasts from an existing demand forecasting pipeline. In an industry-scale retail application, we demonstrate that our technique yields statistically significantly forecast improvements across several sets of products subject to holiday-driven demand surges.
Authors: Nicol\'as Garc\'ia Trillos, Aditya Kumar Akash, Sixu Li, Konstantin Riedl, Yuhua Zhu
Abstract: Adversarial attacks pose significant challenges in many machine learning applications, particularly in the setting of distributed training and federated learning, where malicious agents seek to corrupt the training process with the goal of jeopardizing and compromising the performance and reliability of the final models. In this paper, we address the problem of robust federated learning in the presence of such attacks by formulating the training task as a bi-level optimization problem. We conduct a theoretical analysis of the resilience of consensus-based bi-level optimization (CB$^2$O), an interacting multi-particle metaheuristic optimization method, in adversarial settings. Specifically, we provide a global convergence analysis of CB$^2$O in mean-field law in the presence of malicious agents, demonstrating the robustness of CB$^2$O against a diverse range of attacks. Thereby, we offer insights into how specific hyperparameter choices enable to mitigate adversarial effects. On the practical side, we extend CB$^2$O to the clustered federated learning setting by proposing FedCB$^2$O, a novel interacting multi-particle system, and design a practical algorithm that addresses the demands of real-world applications. Extensive experiments demonstrate the robustness of the FedCB$^2$O algorithm against label-flipping attacks in decentralized clustered federated learning scenarios, showcasing its effectiveness in practical contexts.
Authors: Andrei Lixandru, Marcel van Gerven, Sergio Pequito
Abstract: Distributed optimization is fundamental to modern machine learning applications like federated learning, but existing methods often struggle with ill-conditioned problems and face stability-versus-speed tradeoffs. We introduce fractional order distributed optimization (FrODO); a theoretically-grounded framework that incorporates fractional-order memory terms to enhance convergence properties in challenging optimization landscapes. Our approach achieves provable linear convergence for any strongly connected network. Through empirical validation, our results suggest that FrODO achieves up to 4 times faster convergence versus baselines on ill-conditioned problems and 2-3 times speedup in federated neural network training, while maintaining stability and theoretical guarantees.
Authors: Hillary Yang
Abstract: Linear regression is a fundamental tool for statistical analysis, which has motivated the development of linear regression methods that satisfy provable privacy guarantees so that the learned model reveals little about any one data point used to construct it. Most existing privacy-preserving linear regression methods rely on the well-established framework of differential privacy, while the newly proposed PAC Privacy has not yet been explored in this context. In this paper, we systematically compare linear regression models trained with differential privacy and PAC privacy across three real-world datasets, observing several key findings that impact the performance of privacy-preserving linear regression.
Authors: Jacob Marks, Brent A. Griffin, Jason J. Corso
Abstract: We introduce a new framework for analyzing classification datasets based on the ratios of reconstruction errors between autoencoders trained on individual classes. This analysis framework enables efficient characterization of datasets on the sample, class, and entire dataset levels. We define reconstruction error ratios (RERs) that probe classification difficulty and allow its decomposition into (1) finite sample size and (2) Bayes error and decision-boundary complexity. Through systematic study across 19 popular visual datasets, we find that our RER-based dataset difficulty probe strongly correlates with error rate for state-of-the-art (SOTA) classification models. By interpreting sample-level classification difficulty as a label mistakenness score, we further find that RERs achieve SOTA performance on mislabel detection tasks on hard datasets under symmetric and asymmetric label noise. Our code is publicly available at https://github.com/voxel51/reconstruction-error-ratios.
URLs: https://github.com/voxel51/reconstruction-error-ratios.
Authors: Saurab Chhachhi, Fei Teng
Abstract: Data is an increasingly vital component of decision making processes across industries. However, data access raises privacy concerns motivating the need for privacy-preserving techniques such as differential privacy. Data markets provide a means to enable wider access as well as determine the appropriate privacy-utility trade-off. Existing data market frameworks either require a trusted third party to perform computationally expensive valuations or are unable to capture the combinatorial nature of data value and do not endogenously model the effect of differential privacy. This paper addresses these shortcomings by proposing a valuation mechanism based on the Wasserstein distance for differentially-private data, and corresponding procurement mechanisms by leveraging incentive mechanism design theory, for task-agnostic data procurement, and task-specific procurement co-optimisation. The mechanisms are reformulated into tractable mixed-integer second-order cone programs, which are validated with numerical studies.
Authors: Hiroki Furuta, Heiga Zen, Dale Schuurmans, Aleksandra Faust, Yutaka Matsuo, Percy Liang, Sherry Yang
Abstract: Large text-to-video models hold immense potential for a wide range of downstream applications. However, these models struggle to accurately depict dynamic object interactions, often resulting in unrealistic movements and frequent violations of real-world physics. One solution inspired by large language models is to align generated outputs with desired outcomes using external feedback. This enables the model to refine its responses autonomously, eliminating extensive manual data collection. In this work, we investigate the use of feedback to enhance the object dynamics in text-to-video models. We aim to answer a critical question: what types of feedback, paired with which specific self-improvement algorithms, can most effectively improve text-video alignment and realistic object interactions? We begin by deriving a unified probabilistic objective for offline RL finetuning of text-to-video models. This perspective highlights how design elements in existing algorithms like KL regularization and policy projection emerge as specific choices within a unified framework. We then use derived methods to optimize a set of text-video alignment metrics (e.g., CLIP scores, optical flow), but notice that they often fail to align with human perceptions of generation quality. To address this limitation, we propose leveraging vision-language models to provide more nuanced feedback specifically tailored to object dynamics in videos. Our experiments demonstrate that our method can effectively optimize a wide variety of rewards, with binary AI feedback driving the most significant improvements in video quality for dynamic interactions, as confirmed by both AI and human evaluations. Notably, we observe substantial gains when using reward signals derived from AI feedback, particularly in scenarios involving complex interactions between multiple objects and realistic depictions of objects falling.
Authors: Hayden McTavish, Jon Donnelly, Margo Seltzer, Cynthia Rudin
Abstract: Many important datasets contain samples that are missing one or more feature values. Maintaining the interpretability of machine learning models in the presence of such missing data is challenging. Singly or multiply imputing missing values complicates the model's mapping from features to labels. On the other hand, reasoning on indicator variables that represent missingness introduces a potentially large number of additional terms, sacrificing sparsity. We solve these problems with M-GAM, a sparse, generalized, additive modeling approach that incorporates missingness indicators and their interaction terms while maintaining sparsity through l0 regularization. We show that M-GAM provides similar or superior accuracy to prior methods while significantly improving sparsity relative to either imputation or naive inclusion of indicator variables.
Authors: Jinqi Xiao, Miao Yin, Yu Gong, Xiao Zang, Jian Ren, Bo Yuan
Abstract: Attention-based vision models, such as Vision Transformer (ViT) and its variants, have shown promising performance in various computer vision tasks. However, these emerging architectures suffer from large model sizes and high computational costs, calling for efficient model compression solutions. To date, pruning ViTs has been well studied, while other compression strategies that have been widely applied in CNN compression, e.g., model factorization, is little explored in the context of ViT compression. This paper explores an efficient method for compressing vision transformers to enrich the toolset for obtaining compact attention-based vision models. Based on the new insight on the multi-head attention layer, we develop a highly efficient ViT compression solution, which outperforms the state-of-the-art pruning methods. For compressing DeiT-small and DeiT-base models on ImageNet, our proposed approach can achieve 0.45% and 0.76% higher top-1 accuracy even with fewer parameters. Our finding can also be applied to improve the customization efficiency of text-to-image diffusion models, with much faster training (up to $2.6\times$ speedup) and lower extra storage cost (up to $1927.5\times$ reduction) than the existing works.
Authors: Mahdi Rezapour
Abstract: Recommendation systems have become the fundamental services to facilitate users information access. Generally, recommendation system works by filtering historical behaviors to understand and learn users preferences. With the growth of online information, recommendations have become of crucial importance in information filtering to prevent the information overload problem. In this study, we considered hybrid post-fusion of two approaches of collaborative filtering, by using sequences of watched movies and considering the related movies rating. After considering both techniques and applying the weights matrix, the recommendations would be modified to correspond to the users preference as needed. We discussed that various weights would be set based on use cases. For instance, in cases where we have the rating for most classes, we will assign a higher weight to the rating matrix and in case where the rating is unavailable for the majority of cases, the higher weights might be assigned to the sequential dataset. An extensive discussion is made in the context of this paper. Sequential type of the watched movies was used in conjunction of the rating as especially that model might be inadequate in distinguishing users long-term preference and that does not account for the rating of the watched movies and thus that model along might not suffice. Extensive discussion was made regarding the literature and methodological approach to solve the problem.
Authors: Menghan Wang, Yuchen Guo, Duanfeng Zhang, Jianian Jin, Minnie Li, Dan Schonfeld, Shawn Zhou
Abstract: How to leverage large language model's superior capability in e-commerce recommendation has been a hot topic. In this paper, we propose LLM-PKG, an efficient approach that distills the knowledge of LLMs into product knowledge graph (PKG) and then applies PKG to provide explainable recommendations. Specifically, we first build PKG by feeding curated prompts to LLM, and then map LLM response to real enterprise products. To mitigate the risks associated with LLM hallucination, we employ rigorous evaluation and pruning methods to ensure the reliability and availability of the KG. Through an A/B test conducted on an e-commerce website, we demonstrate the effectiveness of LLM-PKG in driving user engagements and transactions significantly.
Authors: Manal Mehdaoui, Amine Abouaomar
Abstract: Deep Reinforcement Learning (DRL) is a powerful tool used for addressing complex challenges in mobile networks. This paper investigates the application of two DRL models, on-policy and off-policy, in the field of resource allocation for Open Radio Access Networks (O-RAN). The on-policy model is the Proximal Policy Optimization (PPO), and the off-policy model is the Sample Efficient Actor-Critic with Experience Replay (ACER), which focuses on resolving the challenges of resource allocation associated with a Quality of Service (QoS) application that has strict requirements. Motivated by the original work of Nessrine Hammami and Kim Khoa Nguyen, this study is a replication to validate and prove the findings. Both PPO and ACER are used within the same experimental setup to assess their performance in a scenario of latency-sensitive and latency-tolerant users and compare them. The aim is to verify the efficacy of on-policy and off-policy DRL models in the context of O-RAN resource allocation. Results from this replication contribute to the ongoing scientific research and offer insights into the reproducibility and generalizability of the original research. This analysis reaffirms that both on-policy and off-policy DRL models have better performance than greedy algorithms in O-RAN settings. In addition, it confirms the original observations that the on-policy model (PPO) gives a favorable balance between energy consumption and user latency, while the off-policy model (ACER) shows a faster convergence. These findings give good insights to optimize resource allocation strategies in O-RANs. Index Terms: 5G, O-RAN, resource allocation, ML, DRL, PPO, ACER.
Authors: Yiyuan Pan, Yunzhe Xu, Zhe Liu, Hesheng Wang
Abstract: Humans navigate unfamiliar environments using the capabilities of episodic simulation and episodic memory. Developing imagination-based memory, analogous to episodic simulation and episodic memory, can enhance embodied agents' comprehension of the complex relationship between environments and objects. However, existing Vision-and-Language Navigation (VLN) agents fail to perform the aforementioned mechanism. We propose a novel architecture to help agents build a recurrent imaginative memory system. Specifically, the agent can maintain a reality-imagination hybrid global memory during navigation and expand the memory map through imaginative mechanisms and navigation actions. Correspondingly, we design a series of pre-training tasks to help the agent acquire fine-grained imaginative abilities. Our agents improve the state-of-the-art (SoTA) success rate (SR) by 7% while simultaneously imagining high-fidelity RGB representations for future scenes.
Authors: Siddhant Dutta, Nouhaila Innan, Sadok Ben Yahia, Muhammad Shafique, David Esteban Bernal Neira
Abstract: The integration of fully homomorphic encryption (FHE) in federated learning (FL) has led to significant advances in data privacy. However, during the aggregation phase, it often results in performance degradation of the aggregated model, hindering the development of robust representational generalization. In this work, we propose a novel multimodal quantum federated learning framework that utilizes quantum computing to counteract the performance drop resulting from FHE. For the first time in FL, our framework combines a multimodal quantum mixture of experts (MQMoE) model with FHE, incorporating multimodal datasets for enriched representation and task-specific learning. Our MQMoE framework enhances performance on multimodal datasets and combined genomics and brain MRI scans, especially for underrepresented categories. Our results also demonstrate that the quantum-enhanced approach mitigates the performance degradation associated with FHE and improves classification accuracy across diverse datasets, validating the potential of quantum interventions in enhancing privacy in FL.
Authors: Marin Jezid\v{z}i\'c, Matej Mihel\v{c}i\'c
Abstract: We explore diverse representations of speech audio, and their effect on a performance of late fusion ensemble of E-Branchformer models, applied to Automatic Speech Recognition (ASR) task. Although it is generally known that ensemble methods often improve the performance of the system even for speech recognition, it is very interesting to explore how ensembles of complex state-of-the-art models, such as medium-sized and large E-Branchformers, cope in this setting when their base models are trained on diverse representations of the input speech audio. The results are evaluated on four widely-used benchmark datasets: \textit{Librispeech, Aishell, Gigaspeech}, \textit{TEDLIUMv2} and show that improvements of $1\% - 14\%$ can still be achieved over the state-of-the-art models trained using comparable techniques on these datasets. A noteworthy observation is that such ensemble offers improvements even with the use of language models, although the gap is closing.
Authors: Jordan Jomsky, Zongyu Li, Yiren Zhang, Jia Guo
Abstract: The growing global aging population necessitates enhanced methods for assessing brain aging and related neurodegenerative changes. Brain Age Gap Estimation (BrainAGE) offers a neuroimaging biomarker for understanding these changes by predicting brain age from MRI scans. Current approaches primarily use T1-weighted magnetic resonance imaging (T1w MRI) data, capturing only structural brain information. To address the lack of functional data, we integrated AI-generated Cerebral Blood Volume (AICBV) with T1w MRI, combining both structural and functional metrics. We developed a deep learning model using a VGG-based architecture to predict brain age. Our model achieved a mean absolute error (MAE) of 3.95 years and a correlation of \(R^2 = 0.94\) on the test set (\(n = 288\)), outperforming existing models trained on similar data. We have further created gradient-based class activation maps (Grad-CAM) to visualize the regions of the brain that most influenced the model's predictions, providing interpretable insights into the structural and functional contributors to brain aging.
Authors: Boya Zeng, Yida Yin, Zhuang Liu
Abstract: A recent study has shown that large-scale visual datasets are very biased: they can be easily classified by modern neural networks. However, the concrete forms of bias among these datasets remain unclear. In this study, we propose a framework to identify the unique visual attributes distinguishing these datasets. Our approach applies various transformations to extract semantic, structural, boundary, color, and frequency information from datasets, and assess how much each type of information reflects their bias. We further decompose their semantic bias with object-level analysis, and leverage natural language methods to generate detailed, open-ended descriptions of each dataset's characteristics. Our work aims to help researchers understand the bias in existing large-scale pre-training datasets, and build more diverse and representative ones in the future. Our project page and code are available at http://boyazeng.github.io/understand_bias .
Authors: Diaa E. Habibi, Gert Aarts, Lingxiao Wang, Kai Zhou
Abstract: The probability distribution effectively sampled by a complex Langevin process for theories with a sign problem is not known a priori and notoriously hard to understand. Diffusion models, a class of generative AI, can learn distributions from data. In this contribution, we explore the ability of diffusion models to learn the distributions created by a complex Langevin process.
Authors: Poorya Aghaomidi, Ge Wang
Abstract: Accurate sleep stage classification is essential for understanding sleep disorders and improving overall health. This study proposes a novel three-stage approach for sleep stage classification using ECG signals, offering a more accessible alternative to traditional methods that often rely on complex modalities like EEG. In Stages 1 and 2, we initialize the weights of two networks, which are then integrated in Stage 3 for comprehensive classification. In the first phase, we estimate key features using Feature Imitating Networks (FINs) to achieve higher accuracy and faster convergence. The second phase focuses on identifying the N1 sleep stage through the time-frequency representation of ECG signals. Finally, the third phase integrates models from the previous stages and employs a Kolmogorov-Arnold Network (KAN) to classify five distinct sleep stages. Additionally, data augmentation techniques, particularly SMOTE, are used in enhancing classification capabilities for underrepresented stages like N1. Our results demonstrate significant improvements in the classification performance, with an overall accuracy of 80.79% an overall kappa of 0.73. The model achieves specific accuracies of 86.70% for Wake, 60.36% for N1, 83.89% for N2, 84.85% for N3, and 87.16% for REM. This study emphasizes the importance of weight initialization and data augmentation in optimizing sleep stage classification with ECG signals.
Authors: Nono SC Merleau, Miguel O'Malley, \'Erika Rold\'an, Sayan Mukherjee
Abstract: Higher-dimensional sliding puzzles are constructed on the vertices of a $d$-dimensional hypercube, where $2^d-l$ vertices are distinctly coloured. Rings with the same colours are initially set randomly on the vertices of the hypercube. The goal of the puzzle is to move each of the $2^d-l$ rings to pre-defined target vertices on the cube. In this setting, the $k$-rule constraint represents a generalisation of edge collision for the movement of colours between vertices, allowing movement only when a hypercube face of dimension $k$ containing a ring is completely free of other rings. Starting from an initial configuration, what is the minimum number of moves needed to make ring colours match the vertex colours? An algorithm that provides us with such a number is called God's algorithm. When such an algorithm exists, it does not have a polynomial time complexity, at least in the case of the 15-puzzle corresponding to $k=1$ in the cubical puzzle. This paper presents a comprehensive computational study of different scenarios of the higher-dimensional puzzle. A benchmark of three computational techniques, an exact algorithm (the A* search) and two approximately optimal search techniques (an evolutionary algorithm (EA) and reinforcement learning (RL)) is presented in this work. The experiments show that all three methods can successfully solve the puzzle of dimension three for different face dimensions and across various difficulty levels. When the dimension increases, the A* search fails, and RL and EA methods can still provide a generally acceptable solution, i.e. a distribution of a number of moves with a median value of less than $30$. Overall, the EA method consistently requires less computational time, while failing in most cases to minimise the number of moves for the puzzle dimensions $d=4$ and $d=5$.
Authors: Audrey Huang, Adam Block, Dylan J. Foster, Dhruv Rohatgi, Cyril Zhang, Max Simchowitz, Jordan T. Ash, Akshay Krishnamurthy
Abstract: Recent work in language modeling has raised the possibility of self-improvement, where a language models evaluates and refines its own generations to achieve higher performance without external feedback. It is impossible for this self-improvement to create information that is not already in the model, so why should we expect that this will lead to improved capabilities? We offer a new perspective on the capabilities of self-improvement through a lens we refer to as sharpening. Motivated by the observation that language models are often better at verifying response quality than they are at generating correct responses, we formalize self-improvement as using the model itself as a verifier during post-training in order to ``sharpen'' the model to one placing large mass on high-quality sequences, thereby amortizing the expensive inference-time computation of generating good sequences. We begin by introducing a new statistical framework for sharpening in which the learner aims to sharpen a pre-trained base policy via sample access, and establish fundamental limits. Then we analyze two natural families of self-improvement algorithms based on SFT and RLHF.
Authors: Gustavo P. C. P. da Luz, Gabriel Massuyoshi Sato, Luis Fernando Gomez Gonzalez, Juliana Freitag Borin
Abstract: The increasing urbanization and the growing number of vehicles in cities have underscored the need for efficient parking management systems. Traditional smart parking solutions often rely on sensors or cameras for occupancy detection, each with its limitations. Recent advancements in deep learning have introduced new YOLO models (YOLOv8, YOLOv9, YOLOv10, and YOLOv11), but these models have not been extensively evaluated in the context of smart parking systems, particularly when combined with Region of Interest (ROI) selection for object detection. Existing methods still rely on fixed polygonal ROI selections or simple pixel-based modifications, which limit flexibility and precision. This work introduces a novel approach that integrates Internet of Things, Edge Computing, and Deep Learning concepts, by using the latest YOLO models for vehicle detection. By exploring both edge and cloud computing, it was found that inference times on edge devices ranged from 1 to 92 seconds, depending on the hardware and model version. Additionally, a new pixel-wise post-processing ROI selection method is proposed for accurately identifying regions of interest to count vehicles in parking lot images. The proposed system achieved 99.68% balanced accuracy on a custom dataset of 3,484 images, offering a cost-effective smart parking solution that ensures precise vehicle detection while preserving data privacy
Authors: Aditya Dutt, Ishikaa Lunawat, Manpreet Kaur
Abstract: Large Foundation Models like Dust3r can produce high quality outputs such as pointmaps, camera intrinsics, and depth estimation, given stereo-image pairs as input. However, the application of these outputs on tasks like Visual Localization requires a large amount of inference time and compute resources. To address these limitations, in this paper, we propose the use of a knowledge distillation pipeline, where we aim to build a student-teacher model with Dust3r as the teacher and explore multiple architectures of student models that are trained using the 3D reconstructed points output by Dust3r. Our goal is to build student models that can learn scene-specific representations and output 3D points with replicable performance such as Dust3r. The data set we used to train our models is 12Scenes. We test two main architectures of models: a CNN-based architecture and a Vision Transformer based architecture. For each architecture, we also compare the use of pre-trained models against models built from scratch. We qualitatively compare the reconstructed 3D points output by the student model against Dust3r's and discuss the various features learned by the student model. We also perform ablation studies on the models through hyperparameter tuning. Overall, we observe that the Vision Transformer presents the best performance visually and quantitatively.
Authors: Gary A. McCully, John D. Hastings, Shengjie Xu
Abstract: This study examines the impact of data snooping on neural networks for vulnerability detection in lifted code, building on previous research which used word2vec, and unidirectional and bidirectional transformer-based embeddings. The research specifically focuses on how model performance is affected when embedding models are trained on datasets, including samples also used for neural network training and validation. The results show that introducing data snooping did not significantly alter model performance, suggesting that data snooping had a minimal impact or that samples randomly dropped as part of the methodology contained hidden features critical to achieving optimal performance. In addition, the findings reinforce the conclusions of previous research, which found that models trained with GPT-2 embeddings consistently outperformed neural networks trained with other embeddings. The fact that this holds even when data snooping is introduced into the embedding model indicates GPT-2's robustness in representing complex code features, even under less-than-ideal conditions.
Authors: Wuyue Yang, Liangrong Peng, Guojie Li, Liu Hong
Abstract: Maximum entropy principle (MEP) offers an effective and unbiased approach to inferring unknown probability distributions when faced with incomplete information, while neural networks provide the flexibility to learn complex distributions from data. This paper proposes a novel neural network architecture, the MEP-Net, which combines the MEP with neural networks to generate probability distributions from moment constraints. We also provide a comprehensive overview of the fundamentals of the maximum entropy principle, its mathematical formulations, and a rigorous justification for its applicability for non-equilibrium systems based on the large deviations principle. Through fruitful numerical experiments, we demonstrate that the MEP-Net can be particularly useful in modeling the evolution of probability distributions in biochemical reaction networks and in generating complex distributions from data.
Authors: Kee Siong Ng, Samuel Yang-Zhao, Timothy Cadogan-Cowper
Abstract: The AI safety literature is full of examples of powerful AI agents that, in blindly pursuing a specific and usually narrow objective, ends up with unacceptable and even catastrophic collateral damage to others. In this paper, we consider the problem of social harms that can result from actions taken by learning and utility-maximising agents in a multi-agent environment. The problem of measuring social harms or impacts in such multi-agent settings, especially when the agents are artificial generally intelligent (AGI) agents, was listed as an open problem in Everitt et al, 2018. We attempt a partial answer to that open problem in the form of market-based mechanisms to quantify and control the cost of such social harms. The proposed setup captures many well-studied special cases and is more general than existing formulations of multi-agent reinforcement learning with mechanism design in two ways: (i) the underlying environment is a history-based general reinforcement learning environment like in AIXI; (ii) the reinforcement-learning agents participating in the environment can have different learning strategies and planning horizons. To demonstrate the practicality of the proposed setup, we survey some key classes of learning algorithms and present a few applications, including a discussion of the Paperclips problem and pollution control with a cap-and-trade system.
Authors: Richard Cloete, Peter Vere\v{s}, Abraham Loeb
Abstract: The Legacy Survey of Space and Time, to be conducted with the Vera C. Rubin Observatory, is poised to revolutionize our understanding of the Solar System by providing an unprecedented wealth of data on various objects, including the elusive interstellar objects (ISOs). Detecting and classifying ISOs is crucial for studying the composition and diversity of materials from other planetary systems. However, the rarity and brief observation windows of ISOs, coupled with the vast quantities of data to be generated by LSST, create significant challenges for their identification and classification. This study aims to address these challenges by exploring the application of machine learning algorithms to the automated classification of ISO tracklets in simulated LSST data. We employed various machine learning algorithms, including random forests (RFs), stochastic gradient descent (SGD), gradient boosting machines (GBMs), and neural networks (NNs), to classify ISO tracklets in simulated LSST data. We demonstrate that GBM and RF algorithms outperform SGD and NN algorithms in accurately distinguishing ISOs from other Solar System objects. RF analysis shows that many derived Digest2 values are more important than direct observables in classifying ISOs from the LSST tracklets. The GBM model achieves the highest precision, recall, and F1 score, with values of 0.9987, 0.9986, and 0.9987, respectively. These findings lay the foundation for the development of an efficient and robust automated system for ISO discovery using LSST data, paving the way for a deeper understanding of the materials and processes that shape planetary systems beyond our own. The integration of our proposed machine learning approach into the LSST data processing pipeline will optimize the survey's potential for identifying these rare and valuable objects, enabling timely follow-up observations and further characterization.
Authors: Zeqing Zhang, Guangze Zheng, Xuebo Ji, Guanqi Chen, Ruixing Jia, Wentao Chen, Guanhua Chen, Liangjun Zhang, Jia Pan
Abstract: Granular materials (GMs) are ubiquitous in daily life. Understanding their properties is also important, especially in agriculture and industry. However, existing works require dedicated measurement equipment and also need large human efforts to handle a large number of particles. In this paper, we introduce a method for estimating the relative values of particle size and density from the video of the interaction with GMs. It is trained on a visuo-haptic learning framework inspired by a contact model, which reveals the strong correlation between GM properties and the visual-haptic data during the probe-dragging in the GMs. After training, the network can map the visual modality well to the haptic signal and implicitly characterize the relative distribution of particle properties in its latent embeddings, as interpreted in that contact model. Therefore, we can analyze GM properties using the trained encoder, and only visual information is needed without extra sensory modalities and human efforts for labeling. The presented GM property estimator has been extensively validated via comparison and ablation experiments. The generalization capability has also been evaluated and a real-world application on the beach is also demonstrated. Experiment videos are available at \url{https://sites.google.com/view/gmwork/vhlearning} .
Authors: Guangyu Zhao, Kewei Lian, Haowei Lin, Haobo Fu, Qiang Fu, Shaofei Cai, Zihao Wang, Yitao Liang
Abstract: A glowing body of work has emerged focusing on instruction-following policies for open-world agents, aiming to better align the agent's behavior with human intentions. However, the performance of these policies is highly susceptible to the initial prompt, which leads to extra efforts in selecting the best instructions. We propose a framework named Preference Goal Tuning (PGT). PGT allows an instruction following policy to interact with the environment to collect several trajectories, which will be categorized into positive and negative samples based on preference. Then we use preference learning to fine-tune the initial goal latent representation with the categorized trajectories while keeping the policy backbone frozen. The experiment result shows that with minimal data and training, PGT achieves an average relative improvement of 72.0% and 81.6% over 17 tasks in 2 different foundation policies respectively, and outperforms the best human-selected instructions. Moreover, PGT surpasses full fine-tuning in the out-of-distribution (OOD) task-execution environments by 13.4%, indicating that our approach retains strong generalization capabilities. Since our approach stores a single latent representation for each task independently, it can be viewed as an efficient method for continual learning, without the risk of catastrophic forgetting or task interference. In short, PGT enhances the performance of agents across nearly all tasks in the Minecraft Skillforge benchmark and demonstrates robustness to the execution environment.
Authors: Junqiu Yu, Xinlin Ren, Yongchong Gu, Haitao Lin, Tianyu Wang, Yi Zhu, Hang Xu, Yu-Gang Jiang, Xiangyang Xue, Yanwei Fu
Abstract: Language-guided robotic grasping is a rapidly advancing field where robots are instructed using human language to grasp specific objects. However, existing methods often depend on dense camera views and struggle to quickly update scenes, limiting their effectiveness in changeable environments. In contrast, we propose SparseGrasp, a novel open-vocabulary robotic grasping system that operates efficiently with sparse-view RGB images and handles scene updates fastly. Our system builds upon and significantly enhances existing computer vision modules in robotic learning. Specifically, SparseGrasp utilizes DUSt3R to generate a dense point cloud as the initialization for 3D Gaussian Splatting (3DGS), maintaining high fidelity even under sparse supervision. Importantly, SparseGrasp incorporates semantic awareness from recent vision foundation models. To further improve processing efficiency, we repurpose Principal Component Analysis (PCA) to compress features from 2D models. Additionally, we introduce a novel render-and-compare strategy that ensures rapid scene updates, enabling multi-turn grasping in changeable environments. Experimental results show that SparseGrasp significantly outperforms state-of-the-art methods in terms of both speed and adaptability, providing a robust solution for multi-turn grasping in changeable environment.
Authors: Harrison Delecki, Sydney M. Katz, Mykel J. Kochenderfer
Abstract: Estimating the probability of failure is a critical step in developing safety-critical autonomous systems. Direct estimation methods such as Monte Carlo sampling are often impractical due to the rarity of failures in these systems. Existing importance sampling approaches do not scale to sequential decision-making systems with large state spaces and long horizons. We propose an adaptive importance sampling algorithm to address these limitations. Our method minimizes the forward Kullback-Leibler divergence between a state-dependent proposal distribution and a relaxed form of the optimal importance sampling distribution. Our method uses Markov score ascent methods to estimate this objective. We evaluate our approach on four sequential systems and show that it provides more accurate failure probability estimates than baseline Monte Carlo and importance sampling techniques. This work is open sourced.
Authors: Chengpeng Fu, Tong Li, Hao Chen, Wen Du, Zhidong He
Abstract: Epidemic prediction is of practical significance in public health, enabling early intervention, resource allocation, and strategic planning. However, privacy concerns often hinder the sharing of health data among institutions, limiting the development of accurate prediction models. In this paper, we develop a general privacy-preserving framework for node-level epidemic prediction on networks based on federated learning (FL). We frame the spatio-temporal spread of epidemics across multiple data-isolated subnetworks, where each node state represents the aggregate epidemic severity within a community. Then, both the pure temporal LSTM model and the spatio-temporal model i.e., Spatio-Temporal Graph Attention Network (STGAT) are proposed to address the federated epidemic prediction. Extensive experiments are conducted on various epidemic processes using a practical airline network, offering a comprehensive assessment of FL efficacy under diverse scenarios. By introducing the efficacy energy metric to measure system robustness under various client configurations, we systematically explore key factors influencing FL performance, including client numbers, aggregation strategies, graph partitioning, missing infectious reports. Numerical results manifest that STGAT excels in capturing spatio-temporal dependencies in dynamic processes whereas LSTM performs well in simpler pattern. Moreover, our findings highlight the importance of balancing feature consistency and volume uniformity among clients, as well as the prediction dilemma between information richness and intrinsic stochasticity of dynamic processes. This study offers practical insights into the efficacy of FL scenario in epidemic management, demonstrates the potential of FL to address broader collective dynamics.
Authors: Ruichen Jiang, Aryan Mokhtari, Francisco Patitucci
Abstract: We study the problem of finding an $\epsilon$-first-order stationary point (FOSP) of a smooth function, given access only to gradient information. The best-known gradient query complexity for this task, assuming both the gradient and Hessian of the objective function are Lipschitz continuous, is ${O}(\epsilon^{-7/4})$. In this work, we propose a method with a gradient complexity of ${O}(d^{1/4}\epsilon^{-13/8})$, where $d$ is the problem dimension, leading to an improved complexity when $d = {O}(\epsilon^{-1/2})$. To achieve this result, we design an optimization algorithm that, underneath, involves solving two online learning problems. Specifically, we first reformulate the task of finding a stationary point for a nonconvex problem as minimizing the regret in an online convex optimization problem, where the loss is determined by the gradient of the objective function. Then, we introduce a novel optimistic quasi-Newton method to solve this online learning problem, with the Hessian approximation update itself framed as an online learning problem in the space of matrices. Beyond improving the complexity bound for achieving an $\epsilon$-FOSP using a gradient oracle, our result provides the first guarantee suggesting that quasi-Newton methods can potentially outperform gradient descent-type methods in nonconvex settings.
Authors: Zixuan Hu, Yongxian Wei, Li Shen, Chun Yuan, Dacheng Tao
Abstract: Large Language Models (LLMs) such as ChatGPT demonstrate strong few-shot adaptability without requiring fine-tuning, positioning them ideal for data-limited and real-time applications. However, this adaptability has not yet been replicated in current Visual Foundation Models (VFMs), which require explicit fine-tuning with sufficient tuning data. Besides, the pretraining-finetuning paradigm has led to the surge of numerous task-specific modular components, such as Low-Rank Adaptation (LoRA). For the first time, we explore the potential of reusing diverse pre-tuned LoRAs without accessing their original training data, to achieve tuning-free few-shot adaptation in VFMs. Our framework, LoRA Recycle, distills a meta-LoRA from diverse pre-tuned LoRAs with a meta-learning objective, using surrogate data generated inversely from pre-tuned LoRAs themselves. The VFM, once equipped with the meta-LoRA, is empowered to solve new few-shot tasks in a single forward pass, akin to the in-context learning of LLMs. Additionally, we incorporate a double-efficient mechanism tailored to our framework, significantly accelerating the meta-training process while maintaining or even improving performance. Extensive experiments across various few-shot classification benchmarks across both in- and cross-domain scenarios demonstrate the superiority of our framework.
Authors: Quanjiang Guo, Yihong Dong, Ling Tian, Zhao Kang, Yu Zhang, Sijie Wang
Abstract: Despite the recent success of two-stage prototypical networks in few-shot named entity recognition (NER), challenges such as over/under-detected false spans in the span detection stage and unaligned entity prototypes in the type classification stage persist. Additionally, LLMs have not proven to be effective few-shot information extractors in general. In this paper, we propose an approach called Boundary-Aware LLMs for Few-Shot Named Entity Recognition to address these issues. We introduce a boundary-aware contrastive learning strategy to enhance the LLM's ability to perceive entity boundaries for generalized entity spans. Additionally, we utilize LoRAHub to align information from the target domain to the source domain, thereby enhancing adaptive cross-domain classification capabilities. Extensive experiments across various benchmarks demonstrate that our framework outperforms prior methods, validating its effectiveness. In particular, the proposed strategies demonstrate effectiveness across a range of LLM architectures. The code and data are released on https://github.com/UESTC-GQJ/BANER.
Authors: Fnu Neha, Deepshikha Bhati, Deepak Kumar Shukla, Sonavi Makarand Dalvi, Nikolaos Mantzou, Safa Shubbar
Abstract: Medical imaging is essential in healthcare to provide key insights into patient anatomy and pathology, aiding in diagnosis and treatment. Non-invasive techniques such as X-ray, Magnetic Resonance Imaging (MRI), Computed Tomography (CT), and Ultrasound (US), capture detailed images of organs, tissues, and abnormalities. Effective analysis of these images requires precise segmentation to delineate regions of interest (ROI), such as organs or lesions. Traditional segmentation methods, relying on manual feature-extraction, are labor-intensive and vary across experts. Recent advancements in Artificial Intelligence (AI) and Deep Learning (DL), particularly convolutional models such as U-Net and its variants (U-Net++ and U-Net 3+), have transformed medical image segmentation (MIS) by automating the process and enhancing accuracy. These models enable efficient, precise pixel-wise classification across various imaging modalities, overcoming the limitations of manual segmentation. This review explores various medical imaging techniques, examines the U-Net architectures and their adaptations, and discusses their application across different modalities. It also identifies common challenges in MIS and proposes potential solutions.
Authors: Pengjie Zhou, Haoyu Wei, Huiming Zhang
Abstract: Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes stochastic multi-armed bandit (MAB) and continuum-armed bandit (SCAB) problems, which model sequential decision-making under uncertainty. This review outlines the foundational models and assumptions of bandit problems, explores non-asymptotic theoretical tools like concentration inequalities and minimax regret bounds, and compares frequentist and Bayesian algorithms for managing exploration-exploitation trade-offs. We also extend the discussion to $K$-armed contextual bandits and SCAB, examining their methodologies, regret analyses, and discussing the relation between the SCAB problems and the functional data analysis. Finally, we highlight recent advances and ongoing challenges in the field.
Authors: Sepand Dyanatkar, Angran Li, Alexander Dungate
Abstract: Climate change's destruction of marine biodiversity is threatening communities and economies around the world which rely on healthy oceans for their livelihoods. The challenge of applying computer vision to niche, real-world domains such as ocean conservation lies in the dynamic and diverse environments where traditional top-down learning struggle with long-tailed distributions, generalization, and domain transfer. Scalable species identification for ocean monitoring is particularly difficult due to the need to adapt models to new environments and identify rare or unseen species. To overcome these limitations, we propose leveraging bottom-up, open-domain learning frameworks as a resilient, scalable solution for image and video analysis in marine applications. Our preliminary demonstration uses pretrained vision-language models (VLMs) combined with retrieval-augmented generation (RAG) as grounding, leaving the door open for numerous architectural, training and engineering optimizations. We validate this approach through a preliminary application in classifying fish from video onboard fishing vessels, demonstrating impressive emergent retrieval and prediction capabilities without domain-specific training or knowledge of the task itself.
Authors: Maximilian Schenke, Shalbus Bukarov
Abstract: The discipline of automatic control is making increased use of concepts that originate from the domain of machine learning. Herein, reinforcement learning (RL) takes an elevated role, as it is inherently designed for sequential decision making, and can be applied to optimal control problems without the need for a plant system model. To advance education of control engineers and operators in this field, this contribution targets an RL framework that can be applied to educational hardware provided by the Lucas-N\"ulle company. Specifically, the goal of inverted pendulum control is pursued by means of RL, including both, swing-up and stabilization within a single holistic design approach. Herein, the actual learning is enabled by separating corresponding computations from the real-time control computer and outsourcing them to a different hardware. This distributed architecture, however, necessitates communication of the involved components, which is realized via CAN bus. The experimental proof of concept is presented with an applied safeguarding algorithm that prevents the plant from being operated harmfully during the trial-and-error training phase.
Authors: Indronil Bhattacharjee, Al-Mahmud, Tareq Mahmud
Abstract: Diabetic Retinopathy is one of the most familiar diseases and is a diabetes complication that affects eyes. Initially, diabetic retinopathy may cause no symptoms or only mild vision problems. Eventually, it can cause blindness. So early detection of symptoms could help to avoid blindness. In this paper, we present some experiments on some features of diabetic retinopathy, like properties of exudates, properties of blood vessels and properties of microaneurysm. Using the features, we can classify healthy, mild non-proliferative, moderate non-proliferative, severe non-proliferative and proliferative stages of DR. Support Vector Machine, Random Forest and Naive Bayes classifiers are used to classify the stages. Finally, Random Forest is found to be the best for higher accuracy, sensitivity and specificity of 76.5%, 77.2% and 93.3% respectively.
Authors: Yasser Khalafaoui (Alteca), Basarab Matei (Alteca), Martino Lovisetto (Alteca), Nistor Grozavu (CY)
Abstract: Recently, deep matrix factorization has been established as a powerful model for unsupervised tasks, achieving promising results, especially for multi-view clustering. However, existing methods often lack effective feature selection mechanisms and rely on empirical hyperparameter selection. To address these issues, we introduce a novel Deep Matrix Factorization with Adaptive Weights for Multi-View Clustering (DMFAW). Our method simultaneously incorporates feature selection and generates local partitions, enhancing clustering results. Notably, the features weights are controlled and adjusted by a parameter that is dynamically updated using Control Theory inspired mechanism, which not only improves the model's stability and adaptability to diverse datasets but also accelerates convergence. A late fusion approach is then proposed to align the weighted local partitions with the consensus partition. Finally, the optimization problem is solved via an alternating optimization algorithm with theoretically guaranteed convergence. Extensive experiments on benchmark datasets highlight that DMFAW outperforms state-of-the-art methods in terms of clustering performance.
Authors: Maximilian E. Tschuchnig, Philipp Steininger, Michael Gadermayr
Abstract: Computer-Assisted Interventions enable clinicians to perform precise, minimally invasive procedures, often relying on advanced imaging methods. Cone-beam computed tomography (CBCT) can be used to facilitate computer-assisted interventions, despite often suffering from artifacts that pose challenges for accurate interpretation. While the degraded image quality can affect image analysis, the availability of high quality, preoperative scans offers potential for improvements. Here we consider a setting where preoperative CT and intraoperative CBCT scans are available, however, the alignment (registration) between the scans is imperfect to simulate a real world scenario. We propose a multimodal learning method that fuses roughly aligned CBCT and CT scans and investigate the effect on segmentation performance. For this experiment we use synthetically generated data containing real CT and synthetic CBCT volumes with corresponding voxel annotations. We show that this fusion setup improves segmentation performance in $18$ out of $20$ investigated setups.
Authors: Yasser Khalafaoui (Alteca), Martino Lovisetto (Alteca), Basarab Matei (CY), Nistor Grozavu (CY)
Abstract: The increasing availability and diversity of multimodal data in recommender systems offer new avenues for enhancing recommendation accuracy and user satisfaction. However, these systems must contend with high-dimensional, sparse user-item rating matrices, where reconstructing the matrix with only small subsets of preferred items for each user poses a significant challenge. To address this, we propose CADMR, a novel autoencoder-based multimodal recommender system framework. CADMR leverages multi-head cross-attention mechanisms and Disentangled Learning to effectively integrate and utilize heterogeneous multimodal data in reconstructing the rating matrix. Our approach first disentangles modality-specific features while preserving their interdependence, thereby learning a joint latent representation. The multi-head cross-attention mechanism is then applied to enhance user-item interaction representations with respect to the learned multimodal item latent representations. We evaluate CADMR on three benchmark datasets, demonstrating significant performance improvements over state-of-the-art methods.
Authors: Alejandro Mendoza Barrionuevo, Samuel Yanes Luis, Daniel Guti\'errez Reina, Sergio L. Toral Mar\'in
Abstract: This paper presents a model-free deep reinforcement learning framework for informative path planning with heterogeneous fleets of autonomous surface vehicles to locate and collect plastic waste. The system employs two teams of vehicles: scouts and cleaners. Coordination between these teams is achieved through a deep reinforcement approach, allowing agents to learn strategies to maximize cleaning efficiency. The primary objective is for the scout team to provide an up-to-date contamination model, while the cleaner team collects as much waste as possible following this model. This strategy leads to heterogeneous teams that optimize fleet efficiency through inter-team cooperation supported by a tailored reward function. Different trainings of the proposed algorithm are compared with other state-of-the-art heuristics in two distinct scenarios, one with high convexity and another with narrow corridors and challenging access. According to the obtained results, it is demonstrated that deep reinforcement learning based algorithms outperform other benchmark heuristics, exhibiting superior adaptability. In addition, training with greedy actions further enhances performance, particularly in scenarios with intricate layouts.
Authors: Yi Zeng, Jinwei Li, Hui Zhu, Shukuan Lu, Jianfeng Li, Xiran Cai
Abstract: Passive acoustic mapping (PAM) is a promising tool for monitoring acoustic cavitation activities in the applications of ultrasound therapy. Data-adaptive beamformers for PAM have better image quality compared to the time exposure acoustics (TEA) algorithms. However, the computational cost of data-adaptive beamformers is considerably expensive. In this work, we develop a deep beamformer based on a generative adversarial network, which can switch between different transducer arrays and reconstruct high-quality PAM images directly from radio frequency ultrasound signals with low computational cost. The deep beamformer was trained on the dataset consisting of simulated and experimental cavitation signals of single and multiple microbubble clouds measured by different (linear and phased) arrays covering 1-15 MHz. We compared the performance of the deep beamformer to TEA and three different data-adaptive beamformers using the simulated and experimental test dataset. Compared with TEA, the deep beamformer reduced the energy spread area by 18.9%-65.0% and improved the image signal-to-noise ratio by 9.3-22.9 dB in average for the different arrays in our data. Compared to the data-adaptive beamformers, the deep beamformer reduced the computational cost by three orders of magnitude achieving 10.5 ms image reconstruction speed in our data, while the image quality was as good as that of the data-adaptive beamformers. These results demonstrated the potential of the deep beamformer for high-resolution monitoring of microbubble cavitation activities for ultrasound therapy.
Authors: Mehmet Arda Eren, Erhan Oztop
Abstract: In self-supervised robot learning, robots actively explore their environments and generate data by acting on entities in the environment. Therefore, an exploration policy is desired that ensures sample efficiency to minimize robot execution costs while still providing accurate learning. For this purpose, the robotic community has adopted Intrinsic Motivation (IM)-based approaches such as Learning Progress (LP). On the machine learning front, Active Learning (AL) has been used successfully, especially for classification tasks. In this work, we develop a novel AL framework geared towards robotics regression tasks, such as action-effect prediction and, more generally, for world model learning, which we call MUSEL - Model Uncertainty for Sample Efficient Learning. MUSEL aims to extract model uncertainty from the total uncertainty estimate given by a suitable learning engine by making use of earning progress and input diversity and use it to improve sample efficiency beyond the state-of-the-art action-effect prediction methods. We demonstrate the feasibility of our model by using a Stochastic Variational Gaussian Process (SVGP) as the learning engine and testing the system on a set of robotic experiments in simulation. The efficacy of MUSEL is demonstrated by comparing its performance to standard methods used in robot action-effect learning. In a robotic tabletop environment in which a robot manipulator is tasked with learning the effect of its actions, the experiments show that MUSEL facilitates higher accuracy in learning action effects while ensuring sample efficiency.
Authors: Ziyang Cheng, Xiangyu Tian, Ruomin Sui, Tiemin Li, Yao Jiang
Abstract: Accurate grasp force control is one of the key skills for ensuring successful and damage-free robotic grasping of objects. Although existing methods have conducted in-depth research on slip detection and grasping force planning, they often overlook the issue of adaptive tracking of the actual force to the target force when handling objects with different material properties. The optimal parameters of a force tracking controller are significantly influenced by the object's stiffness, and many adaptive force tracking algorithms rely on stiffness estimation. However, real-world objects often exhibit viscous, plastic, or other more complex nonlinear time-varying behaviors, and existing studies provide insufficient support for these materials in terms of stiffness definition and estimation. To address this, this paper introduces the concept of generalized stiffness, extending the definition of stiffness to nonlinear time-varying grasp system models, and proposes an online generalized stiffness estimator based on Long Short-Term Memory (LSTM) networks. Based on generalized stiffness, this paper proposes an adaptive parameter adjustment strategy using a PI controller as an example, enabling dynamic force tracking for objects with varying characteristics. Experimental results demonstrate that the proposed method achieves high precision and short probing time, while showing better adaptability to non-ideal objects compared to existing methods. The method effectively solves the problem of grasp force tracking in unknown, nonlinear, and time-varying grasp systems, enhancing the robotic grasping ability in unstructured environments.
Authors: Sergey Zinchenko, Sergey Iazov
Abstract: We propose a novel model for learned query optimization which provides query hints leading to better execution plans. The model addresses the three key challenges in learned hint-based query optimization: reliable hint recommendation (ensuring non-degradation of query latency), efficient hint exploration, and fast inference. We provide an in-depth analysis of existing NN-based approaches to hint-based optimization and experimentally confirm the named challenges for them. Our alternative solution consists of a new inference schema based on an ensemble of context-aware models and a graph storage for reliable hint suggestion and fast inference, and a budget-controlled training procedure with a local search algorithm that solves the issue of exponential search space exploration. In experiments on standard benchmarks, our model demonstrates optimization capability close to the best achievable with coarse-grained hints. Controlling the degree of parallelism (query dop) in addition to operator-related hints enables our model to achieve 3x latency improvement on JOB benchmark which sets a new standard for optimization. Our model is interpretable and easy to debug, which is particularly important for deployment in production.
Authors: Valentin Braeutigam, Vanessa Wirth, Ingrid Ullmann, Christian Sch\"u{\ss}ler, Martin Vossiek, Matthias Berking, Bernhard Egger
Abstract: The 3D reconstruction of faces gains wide attention in computer vision and is used in many fields of application, for example, animation, virtual reality, and even forensics. This work is motivated by monitoring patients in sleep laboratories. Due to their unique characteristics, sensors from the radar domain have advantages compared to optical sensors, namely penetration of electrically non-conductive materials and independence of light. These advantages of radar signals unlock new applications and require adaptation of 3D reconstruction frameworks. We propose a novel model-based method for 3D reconstruction from radar images. We generate a dataset of synthetic radar images with a physics-based but non-differentiable radar renderer. This dataset is used to train a CNN-based encoder to estimate the parameters of a 3D morphable face model. Whilst the encoder alone already leads to strong reconstructions of synthetic data, we extend our reconstruction in an Analysis-by-Synthesis fashion to a model-based autoencoder. This is enabled by learning the rendering process in the decoder, which acts as an object-specific differentiable radar renderer. Subsequently, the combination of both network parts is trained to minimize both, the loss of the parameters and the loss of the resulting reconstructed radar image. This leads to the additional benefit, that at test time the parameters can be further optimized by finetuning the autoencoder unsupervised on the image loss. We evaluated our framework on generated synthetic face images as well as on real radar images with 3D ground truth of four individuals.
Authors: Shabnam Fazliani, Mohammad Mowlavi Sorond, Arsalan Masoudifard
Abstract: The advent of smart contracts has enabled the rapid rise of Decentralized Finance (DeFi) on the Ethereum blockchain, offering substantial rewards in financial innovation and inclusivity. However, this growth has also introduced significant security risks, including the proliferation of illicit accounts involved in fraudulent activities. Traditional detection methods are limited by the scarcity of labeled data and the evolving tactics of malicious actors. In this paper, we propose a novel Self-Learning Ensemble-based Illicit account Detection (SLEID) framework to address these challenges. SLEID employs an Isolation Forest for initial outlier detection and a self-training mechanism to iteratively generate pseudo-labels for unlabeled accounts, thereby enhancing detection accuracy. Extensive experiments demonstrate that SLEID significantly outperforms traditional supervised approaches and recent semi-supervised models, achieving superior precision, recall, and F1-scores, particularly in detecting illicit accounts. Compared to state-of-the-art methods, our approach achieves better detection performance while reducing reliance on labeled data. The results affirm SLEID's efficacy as a robust solution for safeguarding the DeFi ecosystem and mitigating risks posed by malicious accounts.
Authors: Sebastian Hirt, Lukas Theiner, Rolf Findeisen
Abstract: Closed-loop performance of sequential decision making algorithms, such as model predictive control, depends strongly on the parameters of cost functions, models, and constraints. Bayesian optimization is a common approach to learning these parameters based on closed-loop experiments. However, traditional Bayesian optimization approaches treat the learning problem as a black box, ignoring valuable information and knowledge about the structure of the underlying problem, resulting in slow convergence and high experimental resource use. We propose a time-series-informed optimization framework that incorporates intermediate performance evaluations from early iterations of each experimental episode into the learning procedure. Additionally, probabilistic early stopping criteria are proposed to terminate unpromising experiments, significantly reducing experimental time. Simulation results show that our approach achieves baseline performance with approximately half the resources. Moreover, with the same resource budget, our approach outperforms the baseline in terms of final closed-loop performance, highlighting its efficiency in sequential decision making scenarios.
Authors: Kanav Singh Rana, Nitu Kumari
Abstract: A Transformer-based Koopman autoencoder is proposed for linearizing Fisher's reaction-diffusion equation. The primary focus of this study is on using deep learning techniques to find complex spatiotemporal patterns in the reaction-diffusion system. The emphasis is on not just solving the equation but also transforming the system's dynamics into a more comprehensible, linear form. Global coordinate transformations are achieved through the autoencoder, which learns to capture the underlying dynamics by training on a dataset with 60,000 initial conditions. Extensive testing on multiple datasets was used to assess the efficacy of the proposed model, demonstrating its ability to accurately predict the system's evolution as well as to generalize. We provide a thorough comparison study, comparing our suggested design to a few other comparable methods using experiments on various PDEs, such as the Kuramoto-Sivashinsky equation and the Burger's equation. Results show improved accuracy, highlighting the capabilities of the Transformer-based Koopman autoencoder. The proposed architecture in is significantly ahead of other architectures, in terms of solving different types of PDEs using a single architecture. Our method relies entirely on the data, without requiring any knowledge of the underlying equations. This makes it applicable to even the datasets where the governing equations are not known.
Authors: Sunil Mucesh, William G. Hartley, Ciar\'an M. Gilligan-Lee, Ofer Lahav
Abstract: Understanding how galaxies form and evolve is at the heart of modern astronomy. With the advent of large-scale surveys and simulations, remarkable progress has been made in the last few decades. Despite this, the physical processes behind the phenomena, and particularly their importance, remain far from known, as correlations have primarily been established rather than the underlying causality. We address this challenge by applying the causal inference framework. Specifically, we tackle the fundamental open question of whether galaxy formation and evolution depends more on nature (i.e., internal processes) or nurture (i.e., external processes), by estimating the causal effect of environment on star-formation rate in the IllustrisTNG simulations. To do so, we develop a comprehensive causal model and employ cutting-edge techniques from epidemiology to overcome the long-standing problem of disentangling nature and nurture. We find that the causal effect is negative and substantial, with environment suppressing the SFR by a maximal factor of $\sim100$. While the overall effect at $z=0$ is negative, in the early universe, environment is discovered to have a positive impact, boosting star formation by a factor of $\sim10$ at $z\sim1$ and by even greater amounts at higher redshifts. Furthermore, we show that: (i) nature also plays an important role, as ignoring it underestimates the causal effect in intermediate-density environments by a factor of $\sim2$, (ii) controlling for the stellar mass at a snapshot in time, as is common in the literature, is not only insufficient to disentangle nature and nurture but actually has an adverse effect, though (iii) stellar mass is an adequate proxy of the effects of nature. Finally, this work may prove a useful blueprint for extracting causal insights in other fields that deal with dynamical systems with closed feedback loops, such as the Earth's climate.
Authors: Shai Shalev-Shwartz, Amnon Shashua, Gal Beniamini, Yoav Levine, Or Sharir, Noam Wies, Ido Ben-Shaul, Tomer Nussbaum, Shir Granot Peled
Abstract: Artificial Expert Intelligence (AEI) seeks to transcend the limitations of both Artificial General Intelligence (AGI) and narrow AI by integrating domain-specific expertise with critical, precise reasoning capabilities akin to those of top human experts. Existing AI systems often excel at predefined tasks but struggle with adaptability and precision in novel problem-solving. To overcome this, AEI introduces a framework for ``Probably Approximately Correct (PAC) Reasoning". This paradigm provides robust theoretical guarantees for reliably decomposing complex problems, with a practical mechanism for controlling reasoning precision. In reference to the division of human thought into System 1 for intuitive thinking and System 2 for reflective reasoning~\citep{tversky1974judgment}, we refer to this new type of reasoning as System 3 for precise reasoning, inspired by the rigor of the scientific method. AEI thus establishes a foundation for error-bounded, inference-time learning.
Authors: Chenguang Huang, Shengchao Yan, Wolfram Burgard
Abstract: Dynamic scene understanding remains a persistent challenge in robotic applications. Early dynamic mapping methods focused on mitigating the negative influence of short-term dynamic objects on camera motion estimation by masking or tracking specific categories, which often fall short in adapting to long-term scene changes. Recent efforts address object association in long-term dynamic environments using neural networks trained on synthetic datasets, but they still rely on predefined object shapes and categories. Other methods incorporate visual, geometric, or semantic heuristics for the association but often lack robustness. In this work, we introduce BYE, a class-agnostic, per-scene point cloud encoder that removes the need for predefined categories, shape priors, or extensive association datasets. Trained on only a single sequence of exploration data, BYE can efficiently perform object association in dynamically changing scenes. We further propose an ensembling scheme combining the semantic strengths of Vision Language Models (VLMs) with the scene-specific expertise of BYE, achieving a 7% improvement and a 95% success rate in object association tasks. Code and dataset are available at https://byencoder.github.io.
Authors: Caixin Kang, Yubo Chen, Shouwei Ruan, Shiji Zhao, Ruochen Zhang, Jiayi Wang, Shan Fu, Xingxing Wei
Abstract: With the rise of deep learning, facial recognition technology has seen extensive research and rapid development. Although facial recognition is considered a mature technology, we find that existing open-source models and commercial algorithms lack robustness in certain real-world Out-of-Distribution (OOD) scenarios, raising concerns about the reliability of these systems. In this paper, we introduce OODFace, which explores the OOD challenges faced by facial recognition models from two perspectives: common corruptions and appearance variations. We systematically design 30 OOD scenarios across 9 major categories tailored for facial recognition. By simulating these challenges on public datasets, we establish three robustness benchmarks: LFW-C/V, CFP-FP-C/V, and YTF-C/V. We then conduct extensive experiments on 19 different facial recognition models and 3 commercial APIs, along with extended experiments on face masks, Vision-Language Models (VLMs), and defense strategies to assess their robustness. Based on the results, we draw several key insights, highlighting the vulnerability of facial recognition systems to OOD data and suggesting possible solutions. Additionally, we offer a unified toolkit that includes all corruption and variation types, easily extendable to other datasets. We hope that our benchmarks and findings can provide guidance for future improvements in facial recognition model robustness.
Authors: Andreas C. Schneider, Valentin Neuhaus, David A. Ehrlich, Abdullah Makkeh, Alexander S. Ecker, Viola Priesemann, Michael Wibral
Abstract: In modern deep neural networks, the learning dynamics of the individual neurons is often obscure, as the networks are trained via global optimization. Conversely, biological systems build on self-organized, local learning, achieving robustness and efficiency with limited global information. We here show how self-organization between individual artificial neurons can be achieved by designing abstract bio-inspired local learning goals. These goals are parameterized using a recent extension of information theory, Partial Information Decomposition (PID), which decomposes the information that a set of information sources holds about an outcome into unique, redundant and synergistic contributions. Our framework enables neurons to locally shape the integration of information from various input classes, i.e. feedforward, feedback, and lateral, by selecting which of the three inputs should contribute uniquely, redundantly or synergistically to the output. This selection is expressed as a weighted sum of PID terms, which, for a given problem, can be directly derived from intuitive reasoning or via numerical optimization, offering a window into understanding task-relevant local information processing. Achieving neuron-level interpretability while enabling strong performance using local learning, our work advances a principled information-theoretic foundation for local learning strategies.
Authors: Paul D\"utting, Federico Fusco, Silvio Lattanzi, Ashkan Norouzi-Fard, Ola Svensson, Morteza Zadimoghaddam
Abstract: In this work, we study online submodular maximization, and how the requirement of maintaining a stable solution impacts the approximation. In particular, we seek bounds on the best-possible approximation ratio that is attainable when the algorithm is allowed to make at most a constant number of updates per step. We show a tight information-theoretic bound of $\tfrac{2}{3}$ for general monotone submodular functions, and an improved (also tight) bound of $\tfrac{3}{4}$ for coverage functions. Since both these bounds are attained by non poly-time algorithms, we also give a poly-time randomized algorithm that achieves a $0.51$-approximation. Combined with an information-theoretic hardness of $\tfrac{1}{2}$ for deterministic algorithms from prior work, our work thus shows a separation between deterministic and randomized algorithms, both information theoretically and for poly-time algorithms.
Authors: Yaron Veksler, Sharon Hornstein, Han Wang, Maria Laura Delle Monache, Daniel Urieli
Abstract: The proliferation of Connected Automated Vehicles represents an unprecedented opportunity for improving driving efficiency and alleviating traffic congestion. However, existing research fails to address realistic multi-lane highway scenarios without assuming connectivity, perception, and control capabilities that are typically unavailable in current vehicles. This paper proposes a novel AI system that is the first to improve highway traffic efficiency compared with human-like traffic in realistic, simulated multi-lane scenarios, while relying on existing connectivity, perception, and control capabilities. At the core of our approach is a reinforcement learning based controller that dynamically communicates time-headways to automated vehicles near bottlenecks based on real-time traffic conditions. These desired time-headways are then used by Adaptive Cruise Control (ACC) systems to adjust their following distance. By (i) integrating existing traffic estimation technology and low-bandwidth vehicle-to-infrastructure connectivity, (ii) leveraging safety-certified ACC systems, and (iii) targeting localized bottleneck challenges that can be addressed independently in different locations, we propose a practical, safe, and scalable system that can positively impact numerous road users.
Authors: Andrew Wagenmaker, Lu Mi, Marton Rozsa, Matthew S. Bull, Karel Svoboda, Kayvon Daie, Matthew D. Golub, Kevin Jamieson
Abstract: Recent advances in techniques for monitoring and perturbing neural populations have greatly enhanced our ability to study circuits in the brain. In particular, two-photon holographic optogenetics now enables precise photostimulation of experimenter-specified groups of individual neurons, while simultaneous two-photon calcium imaging enables the measurement of ongoing and induced activity across the neural population. Despite the enormous space of potential photostimulation patterns and the time-consuming nature of photostimulation experiments, very little algorithmic work has been done to determine the most effective photostimulation patterns for identifying the neural population dynamics. Here, we develop methods to efficiently select which neurons to stimulate such that the resulting neural responses will best inform a dynamical model of the neural population activity. Using neural population responses to photostimulation in mouse motor cortex, we demonstrate the efficacy of a low-rank linear dynamical systems model, and develop an active learning procedure which takes advantage of low-rank structure to determine informative photostimulation patterns. We demonstrate our approach on both real and synthetic data, obtaining in some cases as much as a two-fold reduction in the amount of data required to reach a given predictive power. Our active stimulation design method is based on a novel active learning procedure for low-rank regression, which may be of independent interest.
Authors: Zhaohui Yang, Wei Xu, Le Liang, Yuanhao Cui, Zhijin Qin, Merouane Debbah
Abstract: Combining wireless communication with large artificial intelligence (AI) models can open up a myriad of novel application scenarios. In sixth generation (6G) networks, ubiquitous communication and computing resources allow large AI models to serve democratic large AI models-related services to enable real-time applications like autonomous vehicles, smart cities, and Internet of Things (IoT) ecosystems. However, the security considerations and sustainable communication resources limit the deployment of large AI models over distributed wireless networks. This paper provides a comprehensive overview of privacy, security, and trustworthy for distributed wireless large AI model (WLAM). In particular, the detailed privacy and security are analysis for distributed WLAM is fist revealed. The classifications and theoretical findings about privacy and security in distributed WLAM are discussed. Then the trustworthy and ethics for implementing distributed WLAM are described. Finally, the comprehensive applications of distributed WLAM is provided in the aspect of electromagnetic signal processing.
Authors: Quang H. Nguyen, Hoang Phan, Khoa D. Doan
Abstract: Diffusion models have shown remarkable abilities in generating realistic and high-quality images from text prompts. However, a trained model remains black-box; little do we know about the role of its components in exhibiting a concept such as objects or styles. Recent works employ causal tracing to localize layers storing knowledge in generative models without showing how those layers contribute to the target concept. In this work, we approach the model interpretability problem from a more general perspective and pose a question: \textit{``How do model components work jointly to demonstrate knowledge?''}. We adapt component attribution to decompose diffusion models, unveiling how a component contributes to a concept. Our framework allows effective model editing, in particular, we can erase a concept from diffusion models by removing positive components while remaining knowledge of other concepts. Surprisingly, we also show there exist components that contribute negatively to a concept, which has not been discovered in the knowledge localization approach. Experimental results confirm the role of positive and negative components pinpointed by our framework, depicting a complete view of interpreting generative models. Our code is available at \url{https://github.com/mail-research/CAD-attribution4diffusion}
URLs: https://github.com/mail-research/CAD-attribution4diffusion
Authors: Alexander Denker, Johannes Hertrich, Zeljko Kereta, Silvia Cipiccia, Ecem Erin, Simon Arridge
Abstract: Ptychography is a coherent diffraction imaging method that uses phase retrieval techniques to reconstruct complex-valued images. It achieves this by sequentially illuminating overlapping regions of a sample with a coherent beam and recording the diffraction pattern. Although this addresses traditional imaging system challenges, it is computationally intensive and highly sensitive to noise, especially with reduced illumination overlap. Data-driven regularisation techniques have been applied in phase retrieval to improve reconstruction quality. In particular, plug-and-play (PnP) offers flexibility by integrating data-driven denoisers as implicit priors. In this work, we propose a half-quadratic splitting framework for using PnP and other data-driven priors for ptychography. We evaluate our method both on natural images and real test objects to validate its effectiveness for ptychographic image reconstruction.
Authors: Gokul Puthumanaillam, Jae Hyuk Song, Nurzhan Yesmagambet, Shinkyu Park, Melkior Ornik
Abstract: Autonomous agents operating in adversarial scenarios face a fundamental challenge: while they may know their adversaries' high-level objectives, such as reaching specific destinations within time constraints, the exact policies these adversaries will employ remain unknown. Traditional approaches address this challenge by treating the adversary's state as a partially observable element, leading to a formulation as a Partially Observable Markov Decision Process (POMDP). However, the induced belief-space dynamics in a POMDP require knowledge of the system's transition dynamics, which, in this case, depend on the adversary's unknown policy. Our key observation is that while an adversary's exact policy is unknown, their behavior is necessarily constrained by their mission objectives and the physical environment, allowing us to characterize the space of possible behaviors without assuming specific policies. In this paper, we develop Task-Aware Behavior Fields (TAB-Fields), a representation that captures adversary state distributions over time by computing the most unbiased probability distribution consistent with known constraints. We construct TAB-Fields by solving a constrained optimization problem that minimizes additional assumptions about adversary behavior beyond mission and environmental requirements. We integrate TAB-Fields with standard planning algorithms by introducing TAB-conditioned POMCP, an adaptation of Partially Observable Monte Carlo Planning. Through experiments in simulation with underwater robots and hardware implementations with ground robots, we demonstrate that our approach achieves superior performance compared to baselines that either assume specific adversary policies or neglect mission constraints altogether. Evaluation videos and code are available at https://tab-fields.github.io.
Authors: Abhas Kumar, Kapil Pathak, Rajesh Kavuru, Prabhakar Srinivasan
Abstract: This paper analyzes the performance of Small Language Models (SLMs) and Vision Language Models (VLMs) and evaluates the trade-off between model performance and carbon emissions across 4 essential tasks: Image Captioning, Visual Question Answering (VQA), Dialogue Summarization and Text-to-SQL conversion. Various SLMs and VLMs belonging to the Qwen and LLaMA architecture family are chosen and variants based on model size in terms of the number of parameters, quantization level and fine-tuning parameters are evaluated. The model variant's performance and carbon emissions are calculated. To quantify the trade-off between model performance and carbon emissions, we introduce a novel metric called CEGI (Carbon Efficient Gain Index). This metric represents the carbon emission per unit percentage gain per million trainable parameters . This metric provides a normalized measure to compare model's efficiency in terms of performance improvement relative to their environmental cost. The experiment's outcome demonstrates that fine-tuning SLMs and VLMs can achieve performance levels comparable to Large Language Models (LLMs) while producing significantly less carbon emissions. Our findings suggest that the marginal gains in accuracy from larger models do not justify the substantial increase in carbon emissions. Leveraging lower-bit quantization levels, the proposed metric further enhances energy efficiency without compromising performance. This study highlights balancing high performance and environmental sustainability. It offers a valuable metric for selecting models suitable for environmentally-friendly AI development.
Authors: Marco Molinari, Vladimir Tregubiak, Victor Shao, Abhimanyu Pandey, Mateusz Mikolajczak, Sebasti\~ao Kuznetsov Ryder Torres Pereira
Abstract: Determining company similarity is a vital task in finance, underpinning hedging, risk management, portfolio diversification, and more. Practitioners often rely on sector and industry classifications to gauge similarity, such as SIC-codes and GICS-codes, the former being used by the U.S. Securities and Exchange Commission (SEC), and the latter widely used by the investment community. Clustering embeddings of company descriptions has been proposed as a potential technique for determining company similarity, but the lack of interpretability in token embeddings poses a significant barrier to adoption in high-stakes contexts. Sparse Autoencoders have shown promise in enhancing the interpretability of Large Language Models by decomposing LLM activations into interpretable features. In this paper, we explore the use of SAE features in measuring company similarity and benchmark them against (1) SIC codes and (2) Major Group codes. We conclude that SAE features can reproduce and even surpass sector classifications in quantifying fundamental characteristics of companies, evaluated by the correlation of monthly returns, a proxy for similarity, and PnL from cointegration.
Authors: Kai Sun, Siyan Xue, Fuchun Sun, Haoran Sun, Yu Luo, Ling Wang, Siyuan Wang, Na Guo, Lei Liu, Tian Zhao, Xinzhou Wang, Lei Yang, Shuo Jin, Jun Yan, Jiahong Dong
Abstract: Recent advancements in deep learning have significantly revolutionized the field of clinical diagnosis and treatment, offering novel approaches to improve diagnostic precision and treatment efficacy across diverse clinical domains, thus driving the pursuit of precision medicine. The growing availability of multi-organ and multimodal datasets has accelerated the development of large-scale Medical Multimodal Foundation Models (MMFMs). These models, known for their strong generalization capabilities and rich representational power, are increasingly being adapted to address a wide range of clinical tasks, from early diagnosis to personalized treatment strategies. This review offers a comprehensive analysis of recent developments in MMFMs, focusing on three key aspects: datasets, model architectures, and clinical applications. We also explore the challenges and opportunities in optimizing multimodal representations and discuss how these advancements are shaping the future of healthcare by enabling improved patient outcomes and more efficient clinical workflows.
Authors: Gianluca Manzan, Daniele Tantari
Abstract: Restricted Boltzmann Machines (RBMs) are generative models designed to learn from data with a rich underlying structure. In this work, we explore a teacher-student setting where a student RBM learns from examples generated by a teacher RBM, with a focus on the effect of the unit priors on learning efficiency. We consider a parametric class of priors that interpolate between continuous (Gaussian) and binary variables. This approach models various possible choices of visible units, hidden units, and weights for both the teacher and student RBMs. By analyzing the phase diagram of the posterior distribution in both the Bayes optimal and mismatched regimes, we demonstrate the existence of a triple point that defines the critical dataset size necessary for learning through generalization. The critical size is strongly influenced by the properties of the teacher, and thus the data, but is unaffected by the properties of the student RBM. Nevertheless, a prudent choice of student priors can facilitate training by expanding the so-called signal retrieval region, where the machine generalizes effectively.
Authors: Yiftach Edelstein, Or Patashnik, Dana Cohen-Bar, Lihi Zelnik-Manor
Abstract: Advancements in text-to-image diffusion models have led to significant progress in fast 3D content creation. One common approach is to generate a set of multi-view images of an object, and then reconstruct it into a 3D model. However, this approach bypasses the use of a native 3D representation of the object and is hence prone to geometric artifacts and limited in controllability and manipulation capabilities. An alternative approach involves native 3D generative models that directly produce 3D representations. These models, however, are typically limited in their resolution, resulting in lower quality 3D objects. In this work, we bridge the quality gap between methods that directly generate 3D representations and ones that reconstruct 3D objects from multi-view images. We introduce a multi-view to multi-view diffusion model called Sharp-It, which takes a 3D consistent set of multi-view images rendered from a low-quality object and enriches its geometric details and texture. The diffusion model operates on the multi-view set in parallel, in the sense that it shares features across the generated views. A high-quality 3D model can then be reconstructed from the enriched multi-view set. By leveraging the advantages of both 2D and 3D approaches, our method offers an efficient and controllable method for high-quality 3D content creation. We demonstrate that Sharp-It enables various 3D applications, such as fast synthesis, editing, and controlled generation, while attaining high-quality assets.
Authors: Gregory Dexter, Petros Drineas, Rajiv Khanna
Abstract: We provide space complexity lower bounds for data structures that approximate logistic loss up to $\epsilon$-relative error on a logistic regression problem with data $\mathbf{X} \in \mathbb{R}^{n \times d}$ and labels $\mathbf{y} \in \{-1,1\}^d$. The space complexity of existing coreset constructions depend on a natural complexity measure $\mu_\mathbf{y}(\mathbf{X})$, first defined in (Munteanu, 2018). We give an $\tilde{\Omega}(\frac{d}{\epsilon^2})$ space complexity lower bound in the regime $\mu_\mathbf{y}(\mathbf{X}) = O(1)$ that shows existing coresets are optimal in this regime up to lower order factors. We also prove a general $\tilde{\Omega}(d\cdot \mu_\mathbf{y}(\mathbf{X}))$ space lower bound when $\epsilon$ is constant, showing that the dependency on $\mu_\mathbf{y}(\mathbf{X})$ is not an artifact of mergeable coresets. Finally, we refute a prior conjecture that $\mu_\mathbf{y}(\mathbf{X})$ is hard to compute by providing an efficient linear programming formulation, and we empirically compare our algorithm to prior approximate methods.
Authors: Yuda Song, Hanlin Zhang, Carson Eisenach, Sham Kakade, Dean Foster, Udaya Ghai
Abstract: Self-improvement is a mechanism in Large Language Model (LLM) pre-training, post-training and test-time inference. We explore a framework where the model verifies its own outputs, filters or reweights data based on this verification, and distills the filtered data. Despite several empirical successes, a fundamental understanding is still lacking. In this work, we initiate a comprehensive, modular and controlled study on LLM self-improvement. We provide a mathematical formulation for self-improvement, which is largely governed by a quantity which we formalize as the generation-verification gap. Through experiments with various model families and tasks, we discover a scaling phenomenon of self-improvement -- a variant of the generation-verification gap scales monotonically with the model pre-training flops. We also examine when self-improvement is possible, an iterative self-improvement procedure, and ways to improve its performance. Our findings not only advance understanding of LLM self-improvement with practical implications, but also open numerous avenues for future research into its capabilities and boundaries.
Authors: Xuanlin Li, Tong Zhao, Xinghao Zhu, Jiuguang Wang, Tao Pang, Kuan Fang
Abstract: Contact-rich bimanual manipulation involves precise coordination of two arms to change object states through strategically selected contacts and motions. Due to the inherent complexity of these tasks, acquiring sufficient demonstration data and training policies that generalize to unseen scenarios remain a largely unresolved challenge. Building on recent advances in planning through contacts, we introduce Generalizable Planning-Guided Diffusion Policy Learning (GLIDE), an approach that effectively learns to solve contact-rich bimanual manipulation tasks by leveraging model-based motion planners to generate demonstration data in high-fidelity physics simulation. Through efficient planning in randomized environments, our approach generates large-scale and high-quality synthetic motion trajectories for tasks involving diverse objects and transformations. We then train a task-conditioned diffusion policy via behavior cloning using these demonstrations. To tackle the sim-to-real gap, we propose a set of essential design options in feature extraction, task representation, action prediction, and data augmentation that enable learning robust prediction of smooth action sequences and generalization to unseen scenarios. Through experiments in both simulation and the real world, we demonstrate that our approach can enable a bimanual robotic system to effectively manipulate objects of diverse geometries, dimensions, and physical properties. Website: https://glide-manip.github.io/
Authors: \'Alvaro Rodr\'iguez Abella, Jo\~ao Pedro Silvestre, Paulo Tabuada
Abstract: A key component of transformers is the attention mechanism orchestrating how each token influences the propagation of every other token through a transformer. In this paper we provide a rigorous, mathematical analysis of the asymptotic properties of attention in transformers. Although we present several results based on different assumptions, all of them point to the same conclusion, all tokens asymptotically converge to each other, a phenomenon that has been empirically reported in the literature. Our findings are carefully compared with existing theoretical results and illustrated by simulations and experimental studies using the GPT-2 model.
Authors: Wenxuan Zhou, Shujian Zhang, Lingxiao Zhao, Tao Meng
Abstract: Reinforcement learning from human feedback (RLHF) has been crucial in aligning large language models (LLMs) with human values. Traditionally, RLHF involves generating responses to a query and using a reward model to assign a reward to the entire response. However, this approach faces challenges due to its reliance on a single, sparse reward, which makes it challenging for the model to identify which parts of the sequence contribute most significantly to the final reward. Recent methods have attempted to address this limitation by introducing token-level rewards. However, these methods often rely on either a trained credit assignment model or AI annotators, raising concerns about the quality and reliability of the rewards. In this paper, we propose token-level reward regularization (T-REG), a novel approach that leverages both sequence-level and token-level rewards for preference optimization. Harnessing the self-refinement capabilities of LLMs, our method uses contrastive prompting to enable LLMs to self-generate token-level rewards. These self-generated rewards then act as reward regularization, guiding the model to more effectively distribute sequence-level rewards across tokens. This facilitates better token-level credit assignment and enhances alignment performance. Experiments on the instruction following benchmarks, including Alpaca Eval 2 and Arena-Hard, show that our method consistently outperforms baseline methods by up to 3.8% and 4.4%, respectively. We will release the code and models at https://github.com/wzhouad/T-REG.
Authors: Medha Pappula, Syed Muhammad Anwar
Abstract: This paper introduces an innovative approach to Attention-deficit/hyperactivity disorder (ADHD) diagnosis by employing deep learning (DL) techniques on electroencephalography (EEG) signals. This method addresses the limitations of current behavior-based diagnostic methods, which often lead to misdiagnosis and gender bias. By utilizing a publicly available EEG dataset and converting the signals into spectrograms, a Resnet-18 convolutional neural network (CNN) architecture was used to extract features for ADHD classification. The model achieved a high precision, recall, and an overall F1 score of 0.9. Feature extraction highlighted significant brain regions (frontopolar, parietal, and occipital lobes) associated with ADHD. These insights guided the creation of a three-part digital diagnostic system, facilitating cost-effective and accessible ADHD screening, especially in school environments. This system enables earlier and more accurate identification of students at risk for ADHD, providing timely support to enhance their developmental outcomes. This study showcases the potential of integrating EEG analysis with DL to enhance ADHD diagnostics, presenting a viable alternative to traditional methods.
Authors: Abdulkader Saoud, Mahmut Alomeyr, Himmet Toprak Kesgin, Mehmet Fatih Amasyali
Abstract: This paper investigates the effectiveness of BERT based models for automated punctuation and capitalization corrections in Turkish texts across five distinct model sizes. The models are designated as Tiny, Mini, Small, Medium, and Base. The design and capabilities of each model are tailored to address the specific challenges of the Turkish language, with a focus on optimizing performance while minimizing computational overhead. The study presents a systematic comparison of the performance metrics precision, recall, and F1 score of each model, offering insights into their applicability in diverse operational contexts. The results demonstrate a significant improvement in text readability and accuracy as model size increases, with the Base model achieving the highest correction precision. This research provides a comprehensive guide for selecting the appropriate model size based on specific user needs and computational resources, establishing a framework for deploying these models in real-world applications to enhance the quality of written Turkish.
Authors: Shubhi Shukla, Subhadeep Dalui, Manaar Alam, Shubhajit Datta, Arijit Mondal, Debdeep Mukhopadhyay, Partha Pratim Chakrabarti
Abstract: Adversarial attacks rely on transferability, where an adversarial example (AE) crafted on a surrogate classifier tends to mislead a target classifier. Recent ensemble methods demonstrate that AEs are less likely to mislead multiple classifiers in an ensemble. This paper proposes a new ensemble training using a Pairwise Adversarially Robust Loss (PARL) that by construction produces an ensemble of classifiers with diverse decision boundaries. PARL utilizes outputs and gradients of each layer with respect to network parameters in every classifier within the ensemble simultaneously. PARL is demonstrated to achieve higher robustness against black-box transfer attacks than previous ensemble methods as well as adversarial training without adversely affecting clean example accuracy. Extensive experiments using standard Resnet20, WideResnet28-10 classifiers demonstrate the robustness of PARL against state-of-the-art adversarial attacks. While maintaining similar clean accuracy and lesser training time, the proposed architecture has a 24.8% increase in robust accuracy ($\epsilon$ = 0.07) from the state-of-the art method.
Authors: Md Ferdous Alam, Yi Wang, Chin-Yi Cheng, Jieliang Luo
Abstract: Volumetric design, also called massing design, is the first and critical step in professional building design which is sequential in nature. As the volumetric design process requires careful design decisions and iterative adjustments, the underlying sequential design process encodes valuable information for designers. Many efforts have been made to automatically generate reasonable volumetric designs, but the quality of the generated design solutions varies, and evaluating a design solution requires either a prohibitively comprehensive set of metrics or expensive human expertise. While previous approaches focused on learning only the final design instead of sequential design tasks, we propose to encode the design knowledge from a collection of expert or high-performing design sequences and extract useful representations using transformer-based models. Later we propose to utilize the learned representations for crucial downstream applications such as design preference evaluation and procedural design generation. We develop the preference model by estimating the density of the learned representations whereas we train an autoregressive transformer model for sequential design generation. We demonstrate our ideas by leveraging a novel dataset of thousands of sequential volumetric designs. Our preference model can compare two arbitrarily given design sequences and is almost $90\%$ accurate in evaluation against random design sequences. Our autoregressive model is also capable of autocompleting a volumetric design sequence from a partial design sequence.
Authors: Jan van Delden, Julius Schultz, Christopher Blech, Sabine C. Langer, Timo L\"uddecke
Abstract: In mechanical structures like airplanes, cars and houses, noise is generated and transmitted through vibrations. To take measures to reduce this noise, vibrations need to be simulated with expensive numerical computations. Deep learning surrogate models present a promising alternative to classical numerical simulations as they can be evaluated magnitudes faster, while trading-off accuracy. To quantify such trade-offs systematically and foster the development of methods, we present a benchmark on the task of predicting the vibration of harmonically excited plates. The benchmark features a total of 12,000 plate geometries with varying forms of beadings, material, boundary conditions, load position and sizes with associated numerical solutions. To address the benchmark task, we propose a new network architecture, named Frequency-Query Operator, which predicts vibration patterns of plate geometries given a specific excitation frequency. Applying principles from operator learning and implicit models for shape encoding, our approach effectively addresses the prediction of highly variable frequency response functions occurring in dynamic systems. To quantify the prediction quality, we introduce a set of evaluation metrics and evaluate the method on our vibrating-plates benchmark. Our method outperforms DeepONets, Fourier Neural Operators and more traditional neural network architectures and can be used for design optimization. Code, dataset and visualizations: https://github.com/ecker-lab/Learning_Vibrating_Plates
URLs: https://github.com/ecker-lab/Learning_Vibrating_Plates
Authors: Mauricio Tec, Ana Trisovic, Michelle Audirac, Sophie Woodward, Jie Kate Hu, Naeem Khoshnevis, Francesca Dominici
Abstract: Spatial confounding poses a significant challenge in scientific studies involving spatial data, where unobserved spatial variables can influence both treatment and outcome, possibly leading to spurious associations. To address this problem, we introduce SpaCE: The Spatial Confounding Environment, the first toolkit to provide realistic benchmark datasets and tools for systematically evaluating causal inference methods designed to alleviate spatial confounding. Each dataset includes training data, true counterfactuals, a spatial graph with coordinates, and smoothness and confounding scores characterizing the effect of a missing spatial confounder. It also includes realistic semi-synthetic outcomes and counterfactuals, generated using state-of-the-art machine learning ensembles, following best practices for causal inference benchmarks. The datasets cover real treatment and covariates from diverse domains, including climate, health and social sciences. SpaCE facilitates an automated end-to-end pipeline, simplifying data loading, experimental setup, and evaluating machine learning and causal inference models. The SpaCE project provides several dozens of datasets of diverse sizes and spatial complexity. It is publicly available as a Python package, encouraging community feedback and contributions.
Authors: Junjie Gao, Xiangyu Zheng, DongDong Wang, Zhixiang Huang, Bangqi Zheng, Kai Yang
Abstract: Uplift modeling comprises a collection of machine learning techniques designed for managers to predict the incremental impact of specific actions on customer outcomes. However, accurately estimating this incremental impact poses significant challenges due to the necessity of determining the difference between two mutually exclusive outcomes for each individual. In our study, we introduce two novel modifications to the established Gradient Boosting Decision Trees (GBDT) technique. These modifications sequentially learn the causal effect, addressing the counterfactual dilemma. Each modification innovates upon the existing technique in terms of the ensemble learning method and the learning objective, respectively. Experiments with large-scale datasets validate the effectiveness of our methods, consistently achieving substantial improvements over baseline models.
Authors: Jianhua Pei, Jingyu Wang, Dongyuan Shi, Ping Wang
Abstract: Power system cyber-physical uncertainties, including measurement ambiguities stemming from cyber attacks and data losses, along with system uncertainties introduced by massive renewables and complex dynamics, reduce the likelihood of enhancing the quality of measurements. Fortunately, denoising diffusion models exhibit powerful learning and generation abilities for the complex underlying physics of the real world. To this end, this paper proposes an improved detection and imputation based two-stage denoising diffusion model (TSDM) to identify and reconstruct the measurements with various cyber-physical uncertainties. The first stage of the model comprises a classifier-guided conditional anomaly detection component, while the second stage involves diffusion-based measurement imputation component. Moreover, the proposed TSDM adopts optimal variance to accelerate the diffusion generation process with subsequence sampling. Extensive numerical case studies demonstrate that the proposed TSDM can accurately recover power system measurements despite renewables-induced strong randomness and highly nonlinear dynamics. Additionally, the proposed TSDM has stronger robustness compared to existing reconstruction networks and exhibits lower computational complexity than general denoising diffusion models.
Authors: Chengting Yu, Fengzhao Zhang, Hanzhi Ma, Aili Wang, Erping Li
Abstract: Traditional end-to-end (E2E) training of deep networks necessitates storing intermediate activations for back-propagation, resulting in a large memory footprint on GPUs and restricted model parallelization. As an alternative, greedy local learning partitions the network into gradient-isolated modules and trains supervisely based on local preliminary losses, thereby providing asynchronous and parallel training methods that substantially reduce memory cost. However, empirical experiments reveal that as the number of segmentations of the gradient-isolated module increases, the performance of the local learning scheme degrades substantially, severely limiting its expansibility. To avoid this issue, we theoretically analyze the greedy local learning from the standpoint of information theory and propose a ContSup scheme, which incorporates context supply between isolated modules to compensate for information loss. Experiments on benchmark datasets (i.e. CIFAR, SVHN, STL-10) achieve SOTA results and indicate that our proposed method can significantly improve the performance of greedy local learning with minimal memory and computational overhead, allowing for the boost of the number of isolated modules. Our codes are available at https://github.com/Tab-ct/ContSup.
Authors: Lingxiao Zhao, Xueying Ding, Leman Akoglu
Abstract: Graph generation has been dominated by autoregressive models due to their simplicity and effectiveness, despite their sensitivity to ordering. Yet diffusion models have garnered increasing attention, as they offer comparable performance while being permutation-invariant. Current graph diffusion models generate graphs in a one-shot fashion, but they require extra features and thousands of denoising steps to achieve optimal performance. We introduce PARD, a Permutation-invariant Auto Regressive Diffusion model that integrates diffusion models with autoregressive methods. PARD harnesses the effectiveness and efficiency of the autoregressive model while maintaining permutation invariance without ordering sensitivity. Specifically, we show that contrary to sets, elements in a graph are not entirely unordered and there is a unique partial order for nodes and edges. With this partial order, PARD generates a graph in a block-by-block, autoregressive fashion, where each block's probability is conditionally modeled by a shared diffusion model with an equivariant network. To ensure efficiency while being expressive, we further propose a higher-order graph transformer, which integrates transformer with PPGN. Like GPT, we extend the higher-order graph transformer to support parallel training of all blocks. Without any extra features, PARD achieves state-of-the-art performance on molecular and non-molecular datasets, and scales to large datasets like MOSES containing 1.9M molecules. Pard is open-sourced at https://github.com/LingxiaoShawn/Pard.
Authors: Jos\'e \'Angel Mart\'in-Baos, Ricardo Garc\'ia-R\'odenas, Luis Rodriguez-Benitez, Michel Bierlaire
Abstract: The application of kernel-based Machine Learning (ML) techniques to discrete choice modelling using large datasets often faces challenges due to memory requirements and the considerable number of parameters involved in these models. This complexity hampers the efficient training of large-scale models. This paper addresses these problems of scalability by introducing the Nystr\"om approximation for Kernel Logistic Regression (KLR) on large datasets. The study begins by presenting a theoretical analysis in which: i) the set of KLR solutions is characterised, ii) an upper bound to the solution of KLR with Nystr\"om approximation is provided, and finally iii) a specialisation of the optimisation algorithms to Nystr\"om KLR is described. After this, the Nystr\"om KLR is computationally validated. Four landmark selection methods are tested, including basic uniform sampling, a k-means sampling strategy, and two non-uniform methods grounded in leverage scores. The performance of these strategies is evaluated using large-scale transport mode choice datasets and is compared with traditional methods such as Multinomial Logit (MNL) and contemporary ML techniques. The study also assesses the efficiency of various optimisation techniques for the proposed Nystr\"om KLR model. The performance of gradient descent, Momentum, Adam, and L-BFGS-B optimisation methods is examined on these datasets. Among these strategies, the k-means Nystr\"om KLR approach emerges as a successful solution for applying KLR to large datasets, particularly when combined with the L-BFGS-B and Adam optimisation methods. The results highlight the ability of this strategy to handle datasets exceeding 200,000 observations while maintaining robust performance.
Authors: Simon Zhang, Cheng Xin, Tamal K. Dey
Abstract: A hypergraph consists of a set of nodes along with a collection of subsets of the nodes called hyperedges. Higher-order link prediction is the task of predicting the existence of a missing hyperedge in a hypergraph. A hyperedge representation learned for higher order link prediction is fully expressive when it does not lose distinguishing power up to an isomorphism. Many existing hypergraph representation learners, are bounded in expressive power by the Generalized Weisfeiler Lehman-1 (GWL-1) algorithm, a generalization of the Weisfeiler Lehman-1 algorithm. However, GWL-1 has limited expressive power. In fact, induced subhypergraphs with identical GWL-1 valued nodes are indistinguishable. Furthermore, message passing on hypergraphs can already be computationally expensive, especially on GPU memory. To address these limitations, we devise a preprocessing algorithm that can identify certain regular subhypergraphs exhibiting symmetry. Our preprocessing algorithm runs once with complexity the size of the input hypergraph. During training, we randomly replace subhypergraphs identified by the algorithm with covering hyperedges to break symmetry. We show that our method improves the expressivity of GWL-1. Our extensive experiments also demonstrate the effectiveness of our approach for higher-order link prediction on both graph and hypergraph datasets with negligible change in computation.
Authors: James Urquhart Allingham, Bruno Kacper Mlodozeniec, Shreyas Padhy, Javier Antor\'an, David Krueger, Richard E. Turner, Eric Nalisnick, Jos\'e Miguel Hern\'andez-Lobato
Abstract: Correctly capturing the symmetry transformations of data can lead to efficient models with strong generalization capabilities, though methods incorporating symmetries often require prior knowledge. While recent advancements have been made in learning those symmetries directly from the dataset, most of this work has focused on the discriminative setting. In this paper, we take inspiration from group theoretic ideas to construct a generative model that explicitly aims to capture the data's approximate symmetries. This results in a model that, given a prespecified but broad set of possible symmetries, learns to what extent, if at all, those symmetries are actually present. Our model can be seen as a generative process for data augmentation. We provide a simple algorithm for learning our generative model and empirically demonstrate its ability to capture symmetries under affine and color transformations, in an interpretable way. Combining our symmetry model with standard generative models results in higher marginal test-log-likelihoods and improved data efficiency.
Authors: Arthur Thuy, Dries F. Benoit
Abstract: Image classification with neural networks (NNs) is widely used in industrial processes, situations where the model likely encounters unknown objects during deployment, i.e., out-of-distribution (OOD) data. Worryingly, NNs tend to make confident yet incorrect predictions when confronted with OOD data. To increase the models' reliability, they should quantify the uncertainty in their own predictions, communicating when the output should (not) be trusted. Deep ensembles, composed of multiple independent NNs, have been shown to perform strongly but are computationally expensive. Recent research has proposed more efficient NN ensembles, namely the snapshot, batch, and multi-input multi-output ensemble. This study investigates the predictive and uncertainty performance of efficient NN ensembles in the context of image classification for industrial processes. It is the first to provide a comprehensive comparison and it proposes a novel Diversity Quality metric to quantify the ensembles' performance on the in-distribution and OOD sets in one single metric. The results highlight the batch ensemble as a cost-effective and competitive alternative to the deep ensemble. It matches the deep ensemble in both uncertainty and accuracy while exhibiting considerable savings in training time, test time, and memory storage.
Authors: Meng Wei, Zhongnian Li, Yong Zhou, Xinzheng Xu
Abstract: Long-tailed data is prevalent in real-world classification tasks and heavily relies on supervised information, which makes the annotation process exceptionally labor-intensive and time-consuming. Unfortunately, despite being a common approach to mitigate labeling costs, existing weakly supervised learning methods struggle to adequately preserve supervised information for tail samples, resulting in a decline in accuracy for the tail classes. To alleviate this problem, we introduce a novel weakly supervised labeling setting called Reduced Label. The proposed labeling setting not only avoids the decline of supervised information for the tail samples, but also decreases the labeling costs associated with long-tailed data. Additionally, we propose an straightforward and highly efficient unbiased framework with strong theoretical guarantees to learn from these Reduced Labels. Extensive experiments conducted on benchmark datasets including ImageNet validate the effectiveness of our approach, surpassing the performance of state-of-the-art weakly supervised methods.
Authors: Boje Deforce, Meng-Chieh Lee, Bart Baesens, Estefan\'ia Serral Asensio, Jaemin Yoo, Leman Akoglu
Abstract: Time series anomaly detection (TSAD) finds many applications such as monitoring environmental sensors, industry KPIs, patient biomarkers, etc. A two-fold challenge for TSAD is a versatile and unsupervised model that can detect various different types of time series anomalies (spikes, discontinuities, trend shifts, etc.) without any labeled data. Modern neural networks have outstanding ability in modeling complex time series. Self-supervised models in particular tackle unsupervised TSAD by transforming the input via various augmentations to create pseudo anomalies for training. However, their performance is sensitive to the choice of augmentation, which is hard to choose in practice, while there exists no effort in the literature on data augmentation tuning for TSAD without labels. Our work aims to fill this gap. We introduce TSAP for TSA "on autoPilot", which can (self-)tune augmentation hyperparameters end-to-end. It stands on two key components: a differentiable augmentation architecture and an unsupervised validation loss to effectively assess the alignment between augmentation type and anomaly type. Case studies show TSAP's ability to effectively select the (discrete) augmentation type and associated (continuous) hyperparameters. In turn, it outperforms established baselines, including SOTA self-supervised models, on diverse TSAD tasks exhibiting different anomaly types.
Authors: Tetsuro Morimura, Mitsuki Sakamoto, Yuu Jinnai, Kenshi Abe, Kaito Ariu
Abstract: Reinforcement learning from human feedback (RLHF) plays a crucial role in aligning language models with human preferences. While the significance of dataset quality is generally recognized, explicit investigations into its impact within the RLHF framework, to our knowledge, have been limited. This paper addresses the issue of text quality within the preference dataset by focusing on direct preference optimization (DPO), an increasingly adopted reward-model-free RLHF method. We confirm that text quality significantly influences the performance of models optimized with DPO more than those optimized with reward-model-based RLHF. Building on this new insight, we propose an extension of DPO, termed filtered direct preference optimization (fDPO). fDPO uses a trained reward model to monitor the quality of texts within the preference dataset during DPO training. Samples of lower quality are discarded based on comparisons with texts generated by the model being optimized, resulting in a more accurate dataset. Experimental results demonstrate that fDPO enhances the final model performance. Our code is available at https://github.com/CyberAgentAILab/filtered-dpo.
Authors: Matthias Chung, Emma Hart, Julianne Chung, Bas Peters, Eldad Haber
Abstract: We consider the solution of nonlinear inverse problems where the forward problem is a discretization of a partial differential equation. Such problems are notoriously difficult to solve in practice and require minimizing a combination of a data-fit term and a regularization term. The main computational bottleneck of typical algorithms is the direct estimation of the data misfit. Therefore, likelihood-free approaches have become appealing alternatives. Nonetheless, difficulties in generalization and limitations in accuracy have hindered their broader utility and applicability. In this work, we use a paired autoencoder framework as a likelihood-free estimator for inverse problems. We show that the use of such an architecture allows us to construct a solution efficiently and to overcome some known open problems when using likelihood-free estimators. In particular, our framework can assess the quality of the solution and improve on it if needed. We demonstrate the viability of our approach using examples from full waveform inversion and inverse electromagnetic imaging.
Authors: Cong Lu, Shengran Hu, Jeff Clune
Abstract: Go-Explore is a powerful family of algorithms designed to solve hard-exploration problems built on the principle of archiving discovered states, and iteratively returning to and exploring from the most promising states. This approach has led to superhuman performance across a wide variety of challenging problems including Atari games and robotic control, but requires manually designing heuristics to guide exploration (i.e., determine which states to save and explore from, and what actions to consider next), which is time-consuming and infeasible in general. To resolve this, we propose Intelligent Go-Explore (IGE) which greatly extends the scope of the original Go-Explore by replacing these handcrafted heuristics with the intelligence and internalized human notions of interestingness captured by giant pretrained foundation models (FMs). This provides IGE with a human-like ability to instinctively identify how interesting or promising any new state is (e.g., discovering new objects, locations, or behaviors), even in complex environments where heuristics are hard to define. Moreover, IGE offers the exciting opportunity to recognize and capitalize on serendipitous discoveries-states encountered during exploration that are valuable in terms of exploration, yet where what makes them interesting was not anticipated by the human user. We evaluate our algorithm on a diverse range of language and vision-based tasks that require search and exploration. Across these tasks, IGE strongly exceeds classic reinforcement learning and graph search baselines, and also succeeds where prior state-of-the-art FM agents like Reflexion completely fail. Overall, Intelligent Go-Explore combines the tremendous strengths of FMs and the powerful Go-Explore algorithm, opening up a new frontier of research into creating more generally capable agents with impressive exploration capabilities.
Authors: Michal Nauman, Mateusz Ostaszewski, Krzysztof Jankowski, Piotr Mi{\l}o\'s, Marek Cygan
Abstract: Sample efficiency in Reinforcement Learning (RL) has traditionally been driven by algorithmic enhancements. In this work, we demonstrate that scaling can also lead to substantial improvements. We conduct a thorough investigation into the interplay of scaling model capacity and domain-specific RL enhancements. These empirical findings inform the design choices underlying our proposed BRO (Bigger, Regularized, Optimistic) algorithm. The key innovation behind BRO is that strong regularization allows for effective scaling of the critic networks, which, paired with optimistic exploration, leads to superior performance. BRO achieves state-of-the-art results, significantly outperforming the leading model-based and model-free algorithms across 40 complex tasks from the DeepMind Control, MetaWorld, and MyoSuite benchmarks. BRO is the first model-free algorithm to achieve near-optimal policies in the notoriously challenging Dog and Humanoid tasks.
Authors: Chenjie Mao, Qiaosheng Zhang
Abstract: This paper proposes the first generic fast convergence result in general function approximation for offline decision making problems, which include offline reinforcement learning (RL) and off-policy evaluation (OPE) as special cases. To unify different settings, we introduce a framework called Decision Making with Offline Feedback (DMOF), which captures a wide range of offline decision making problems. Within this framework, we propose a simple yet powerful algorithm called Empirical Decision with Divergence (EDD), whose upper bound can be termed as a coefficient named Empirical Offline Estimation Coefficient (EOEC). We show that EOEC is instance-dependent and actually measures the correlation of the problem. When assuming partial coverage in the dataset, EOEC will reduce in a rate of $1/N$ where $N$ is the size of the dataset, endowing EDD with a fast convergence guarantee. Finally, we complement the above results with a lower bound in the DMOF framework, which further demonstrates the soundness of our theory.
Authors: Qi Lv, Xiang Deng, Gongwei Chen, Michael Yu Wang, Liqiang Nie
Abstract: While the conditional sequence modeling with the transformer architecture has demonstrated its effectiveness in dealing with offline reinforcement learning (RL) tasks, it is struggle to handle out-of-distribution states and actions. Existing work attempts to address this issue by data augmentation with the learned policy or adding extra constraints with the value-based RL algorithm. However, these studies still fail to overcome the following challenges: (1) insufficiently utilizing the historical temporal information among inter-steps, (2) overlooking the local intrastep relationships among states, actions and return-to-gos (RTGs), (3) overfitting suboptimal trajectories with noisy labels. To address these challenges, we propose Decision Mamba (DM), a novel multi-grained state space model (SSM) with a self-evolving policy learning strategy. DM explicitly models the historical hidden state to extract the temporal information by using the mamba architecture. To capture the relationship among state-action-RTG triplets, a fine-grained SSM module is designed and integrated into the original coarse-grained SSM in mamba, resulting in a novel mamba architecture tailored for offline RL. Finally, to mitigate the overfitting issue on noisy trajectories, a self-evolving policy is proposed by using progressive regularization. The policy evolves by using its own past knowledge to refine the suboptimal actions, thus enhancing its robustness on noisy demonstrations. Extensive experiments on various tasks show that DM outperforms other baselines substantially.
Authors: Jianhua Pei, Cheng Feng, Ping Wang, Hina Tabassum, Dongyuan Shi
Abstract: Deep learning (DL)-based Semantic Communications (SemCom) is becoming critical to maximize overall efficiency of communication networks. Nevertheless, SemCom is sensitive to wireless channel uncertainties, source outliers, and suffer from poor generalization bottlenecks. To address the mentioned challenges, this paper develops a latent diffusion model-enabled SemCom system with three key contributions, i.e., i) to handle potential outliers in the source data, semantic errors obtained by projected gradient descent based on the vulnerabilities of DL models, are utilized to update the parameters and obtain an outlier-robust encoder, ii) a lightweight single-layer latent space transformation adapter completes one-shot learning at the transmitter and is placed before the decoder at the receiver, enabling adaptation for out-of-distribution data and enhancing human-perceptual quality, and iii) an end-to-end consistency distillation (EECD) strategy is used to distill the diffusion models trained in latent space, enabling deterministic single or few-step low-latency denoising in various noisy channels while maintaining high semantic quality. Extensive numerical experiments across different datasets demonstrate the superiority of the proposed SemCom system, consistently proving its robustness to outliers, the capability to transmit data with unknown distributions, and the ability to perform real-time channel denoising tasks while preserving high human perceptual quality, outperforming the existing denoising approaches in semantic metrics like learned perceptual image path similarity (LPIPS).
Authors: Burak Var{\i}c{\i}, Dmitriy Katz-Rogozhnikov, Dennis Wei, Prasanna Sattigeri, Ali Tajer
Abstract: Causal interactions among a group of variables are often modeled by a single causal graph. In some domains, however, these interactions are best described by multiple co-existing causal graphs, e.g., in dynamical systems or genomics. This paper addresses the hitherto unknown role of interventions in learning causal interactions among variables governed by a mixture of causal systems, each modeled by one directed acyclic graph (DAG). Causal discovery from mixtures is fundamentally more challenging than single-DAG causal discovery. Two major difficulties stem from (i)~an inherent uncertainty about the skeletons of the component DAGs that constitute the mixture and (ii)~possibly cyclic relationships across these component DAGs. This paper addresses these challenges and aims to identify edges that exist in at least one component DAG of the mixture, referred to as the true edges. First, it establishes matching necessary and sufficient conditions on the size of interventions required to identify the true edges. Next, guided by the necessity results, an adaptive algorithm is designed that learns all true edges using $O(n^2)$ interventions, where $n$ is the number of nodes. Remarkably, the size of the interventions is optimal if the underlying mixture model does not contain cycles across its components. More generally, the gap between the intervention size used by the algorithm and the optimal size is quantified. It is shown to be bounded by the cyclic complexity number of the mixture model, defined as the size of the minimal intervention that can break the cycles in the mixture, which is upper bounded by the number of cycles among the ancestors of a node.
Authors: Dingwen Zhang, Yan Li, De Cheng, Nannan Wang, Junwei Han
Abstract: To facilitate the evolution of edge intelligence in ever-changing environments, we study on-device incremental learning constrained in limited computation resource in this paper. Current on-device training methods just focus on efficient training without considering the catastrophic forgetting, preventing the model getting stronger when continually exploring the world. To solve this problem, a direct solution is to involve the existing incremental learning mechanisms into the on-device training framework. Unfortunately, such a manner cannot work well as those mechanisms usually introduce large additional computational cost to the network optimization process, which would inevitably exceed the memory capacity of the edge devices. To address this issue, this paper makes an early effort to propose a simple but effective edge-friendly incremental learning framework. Based on an empirical study on the knowledge intensity of the kernel elements of the neural network, we find that the center kernel is the key for maximizing the knowledge intensity for learning new data, while freezing the other kernel elements would get a good balance on the model's capacity for overcoming catastrophic forgetting. Upon this finding, we further design a center-sensitive kernel optimization framework to largely alleviate the cost of the gradient computation and back-propagation. Besides, a dynamic channel element selection strategy is also proposed to facilitate a sparse orthogonal gradient projection for further reducing the optimization complexity, upon the knowledge explored from the new task data. Extensive experiments validate our method is efficient and effective, e.g., our method achieves average accuracy boost of 38.08% with even less memory and approximate computation compared to existing on-device training methods, indicating its significant potential for on-device incremental learning.
Authors: Xisen Jin, Xiang Ren
Abstract: Large Language models (LLMs) suffer from forgetting of upstream data when fine-tuned. Despite efforts on mitigating forgetting, few have investigated whether, and how forgotten upstream examples are dependent on and associated with newly learned tasks. Insights on such associations enable efficient and targeted mitigation of forgetting. In this paper, we empirically analyze forgetting (measured in log-perplexity increase) that occurs in $N$ upstream examples of language modeling or instruction-tuning after fine-tuning LLMs on one of $M$ new tasks, visualized in $M\times N$ matrices. We demonstrate that the matrices display simple low-rank patterns, often well-approximated with multiplicative scalar effects of upstream examples and newly learned tasks. We also examine fine-grained associations with visualization and statistics. Leveraging the low-rank nature of the associations, we predict forgetting of upstream examples when fine-tuning on unseen tasks with matrix completion over the empirical associations. This enables fast identification of most forgotten examples without expensive inference on the entire upstream data. The approach, despite simplicity, outperforms prior approaches that learn semantic relationships of learned tasks and upstream examples with LMs for predicting forgetting. We demonstrate the practical utility of our analysis by showing statistically significantly reduced forgetting as we upweight predicted examples for replay at fine-tuning. Project page: https://inklab.usc.edu/lm-forgetting-prediction/
Authors: Kwanyoung Park, Youngwoon Lee
Abstract: Model-based offline reinforcement learning (RL) is a compelling approach that addresses the challenge of learning from limited, static data by generating imaginary trajectories using learned models. However, these approaches often struggle with inaccurate value estimation from model rollouts. In this paper, we introduce a novel model-based offline RL method, Lower Expectile Q-learning (LEQ), which provides a low-bias model-based value estimation via lower expectile regression of $\lambda$-returns. Our empirical results show that LEQ significantly outperforms previous model-based offline RL methods on long-horizon tasks, such as the D4RL AntMaze tasks, matching or surpassing the performance of model-free approaches and sequence modeling approaches. Furthermore, LEQ matches the performance of state-of-the-art model-based and model-free methods in dense-reward environments across both state-based tasks (NeoRL and D4RL) and pixel-based tasks (V-D4RL), showing that LEQ works robustly across diverse domains. Our ablation studies demonstrate that lower expectile regression, $\lambda$-returns, and critic training on offline data are all crucial for LEQ.
Authors: Rui Wang, Mengshi Qi, Yingxia Shao, Anfu Zhou, Huadong Ma
Abstract: Monitoring blood pressure with non-invasive sensors has gained popularity for providing comfortable user experiences, one of which is a significant function of smart wearables. Although providing a comfortable user experience, such methods are suffering from the demand for a significant amount of realistic data to train an individual model for each subject, especially considering the invasive or obtrusive BP ground-truth measurements. To tackle this challenge, we introduce a novel physics-informed temporal network~(PITN) with adversarial contrastive learning to enable precise BP estimation with very limited data. Specifically, we first enhance the physics-informed neural network~(PINN) with the temporal block for investigating BP dynamics' multi-periodicity for personal cardiovascular cycle modeling and temporal variation. We then employ adversarial training to generate extra physiological time series data, improving PITN's robustness in the face of sparse subject-specific training data. Furthermore, we utilize contrastive learning to capture the discriminative variations of cardiovascular physiologic phenomena. This approach aggregates physiological signals with similar blood pressure values in latent space while separating clusters of samples with dissimilar blood pressure values. Experiments on three widely-adopted datasets with different modailties (\emph{i.e.,} bioimpedance, PPG, millimeter-wave) demonstrate the superiority and effectiveness of the proposed methods over previous state-of-the-art approaches. The code is available at~\url{https://github.com/Zest86/ACL-PITN}.
Authors: Zakaria Patel, Sebastian J. Wetzel
Abstract: It has been demonstrated in many scientific fields that artificial neural networks like autoencoders or Siamese networks encode meaningful concepts in their latent spaces. However, there does not exist a comprehensive framework for retrieving this information in a human-readable form without prior knowledge. In order to extract these concepts, we introduce a framework for finding closed-form interpretations of neurons in latent spaces of artificial neural networks. The interpretation framework is based on embedding trained neural networks into an equivalence class of functions that encode the same concept. We interpret these neural networks by finding an intersection between the equivalence class and human-readable equations defined by a symbolic search space. The approach is demonstrated by retrieving invariants of matrices and conserved quantities of dynamical systems from latent spaces of Siamese neural networks.
Authors: Peyman Milanfar, Mauricio Delbracio
Abstract: Denoising, the process of reducing random fluctuations in a signal to emphasize essential patterns, has been a fundamental problem of interest since the dawn of modern scientific inquiry. Recent denoising techniques, particularly in imaging, have achieved remarkable success, nearing theoretical limits by some measures. Yet, despite tens of thousands of research papers, the wide-ranging applications of denoising beyond noise removal have not been fully recognized. This is partly due to the vast and diverse literature, making a clear overview challenging. This paper aims to address this gap. We present a clarifying perspective on denoisers, their structure, and desired properties. We emphasize the increasing importance of denoising and showcase its evolution into an essential building block for complex tasks in imaging, inverse problems, and machine learning. Despite its long history, the community continues to uncover unexpected and groundbreaking uses for denoising, further solidifying its place as a cornerstone of scientific and engineering practice.
Authors: Heng Rao, Yu Gu, Jason Zipeng Zhang, Ge Yu, Yang Cao, Minghan Chen
Abstract: Biological oscillations are periodic changes in various signaling processes crucial for the proper functioning of living organisms. These oscillations are modeled by ordinary differential equations, with coefficient variations leading to diverse periodic behaviors, typically measured by oscillatory frequencies. This paper explores sampling techniques for neural networks to model the relationship between system coefficients and oscillatory frequency. However, the scarcity of oscillations in the vast coefficient space results in many samples exhibiting non-periodic behaviors, and small coefficient changes near oscillation boundaries can significantly alter oscillatory properties. This leads to non-oscillatory bias and boundary sensitivity, making accurate predictions difficult. While existing importance and uncertainty sampling approaches partially mitigate these challenges, they either fail to resolve the sensitivity problem or result in redundant sampling. To address these limitations, we propose the Hierarchical Gradient-based Genetic Sampling (HGGS) framework, which improves the accuracy of neural network predictions for biological oscillations. The first layer, Gradient-based Filtering, extracts sensitive oscillation boundaries and removes redundant non-oscillatory samples, creating a balanced coarse dataset. The second layer, Multigrid Genetic Sampling, utilizes residual information to refine these boundaries and explore new high-residual regions, increasing data diversity for model training. Experimental results demonstrate that HGGS outperforms seven comparative sampling methods across four biological systems, highlighting its effectiveness in enhancing sampling and prediction accuracy.
Authors: Yuanzhao Zhang, William Gilpin
Abstract: Time-series forecasting is a challenging problem that traditionally requires specialized models custom-trained for the specific task at hand. Recently, inspired by the success of large language models, foundation models pre-trained on vast amounts of time-series data from diverse domains have emerged as a promising candidate for general-purpose time-series forecasting. The defining characteristic of these foundation models is their ability to perform zero-shot learning, that is, forecasting a new system from limited context data without explicit re-training or fine-tuning. Here, we evaluate whether the zero-shot learning paradigm extends to the challenging task of forecasting chaotic systems. Across 135 distinct chaotic dynamical systems and $10^8$ timepoints, we find that foundation models produce competitive forecasts compared to custom-trained models (including NBEATS, TiDE, etc.), particularly when training data is limited. Interestingly, even after point forecasts fail, large foundation models are able to preserve the geometric and statistical properties of the chaotic attractors. We attribute this success to foundation models' ability to perform in-context learning and identify context parroting as a simple mechanism used by these models to capture the long-term behavior of chaotic dynamical systems. Our results highlight the potential of foundation models as a tool for probing nonlinear and complex systems.
Authors: Yan Scholten, Stephan G\"unnemann, Leo Schwinn
Abstract: Comprehensive evaluation of Large Language Models (LLMs) is an open research problem. Existing evaluations rely on deterministic point estimates generated via greedy decoding. However, we find that deterministic evaluations fail to capture the whole output distribution of a model, yielding inaccurate estimations of model capabilities. This is particularly problematic in critical contexts such as unlearning and alignment, where precise model evaluations are crucial. To remedy this, we introduce the first formal probabilistic evaluation framework in LLMs. Namely, we derive novel metrics with high-probability guarantees concerning the output distribution of a model. Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment. Through a case study focused on unlearning, we reveal that deterministic evaluations falsely indicate successful unlearning, whereas our probabilistic evaluations demonstrate that most if not all of the supposedly unlearned information remains accessible in these models. Additionally, we propose a novel unlearning loss based on entropy optimization and adaptive temperature scaling, which significantly improves unlearning in probabilistic settings on recent benchmarks. Our proposed shift from point estimates to probabilistic evaluations of output distributions represents an important step toward comprehensive evaluations of LLMs. Code available at https://github.com/yascho/probabilistic-unlearning.
Authors: Jason Kurz, Sean Oughton, Shitao Liu
Abstract: Operator networks are designed to approximate nonlinear operators, which provide mappings between infinite-dimensional spaces such as function spaces. These networks are playing an increasingly important role in machine learning, with their most notable contributions in the field of scientific computing. Their significance stems from their ability to handle the type of data often encountered in scientific applications. For instance, in climate modeling or fluid dynamics, input data typically consists of discretized continuous fields (like temperature distributions or velocity fields). We introduce the radial basis operator network (RBON), which represents a significant advancement as the first operator network capable of learning an operator in both the time domain and frequency domain when adjusted to accept complex-valued inputs. Despite the small, single hidden-layer structure, the RBON boasts small $L^2$ relative test error for both in- and out-of-distribution data (OOD) of less than $1\times 10^{-7}$ in some benchmark cases. Moreover, the RBON maintains small error on OOD data from entirely different function classes from the training data.
Authors: Kaiyue Wen, Zhiyuan Li, Jason Wang, David Hall, Percy Liang, Tengyu Ma
Abstract: Training language models currently requires pre-determining a fixed compute budget because the typical cosine learning rate schedule depends on the total number of steps. In contrast, the Warmup-Stable-Decay (WSD) schedule uses a constant learning rate to produce a main branch of iterates that can in principle continue indefinitely without a pre-specified compute budget. Then, given any compute budget, one can branch out from the main branch at a proper time with a rapidly decaying learning rate to produce a strong model. Empirically, WSD generates a non-traditional loss curve: the loss remains elevated during the stable phase but sharply declines during the decay phase. Towards explaining this phenomenon, we conjecture that pretraining loss exhibits a river valley landscape, which resembles a deep valley with a river at its bottom. Under this assumption, we show that during the stable phase, the iterate undergoes large oscillations due to the high learning rate, yet it progresses swiftly along the river. During the decay phase, the rapidly dropping learning rate minimizes the iterate's oscillations, moving it closer to the river and revealing true optimization progress. Therefore, the sustained high learning rate phase and fast decaying phase are responsible for progress in the river and the mountain directions respectively, and are both critical. Our analysis predicts phenomenons consistent with empirical observations and shows that this landscape can emerge from pretraining on a simple bi-gram dataset. Inspired by the theory, we introduce WSD-S, a variant of WSD that reuses previous checkpoints' decay phases and keeps only one main branch, where we resume from a decayed checkpoint. WSD-S empirically outperforms WSD and Cyclic-Cosine in obtaining multiple language model checkpoints across various compute budgets in a single run for parameters scaling from 0.1B to 1.2B.
Authors: Sehun Kim
Abstract: Electrocardiogram (ECG) captures the heart's electrical signals, offering valuable information for diagnosing cardiac conditions. However, the scarcity of labeled data makes it challenging to fully leverage supervised learning in medical domain. Self-supervised learning (SSL) offers a promising solution, enabling models to learn from unlabeled data and uncover meaningful patterns. In this paper, we show that masked modeling in the latent space can be a powerful alternative to existing self-supervised methods in the ECG domain. We introduce ECG-JEPA, a SSL model for 12-lead ECG analysis that learns semantic representations of ECG data by predicting in the hidden latent space, bypassing the need to reconstruct raw signals. This approach offers several advantages in the ECG domain: (1) it avoids producing unnecessary details, such as noise, which is common in ECG; and (2) it addresses the limitations of na\"ive L2 loss between raw signals. Another key contribution is the introduction of Cross-Pattern Attention (CroPA), a specialized masked attention mechanism tailored for 12-lead ECG data. ECG-JEPA is trained on the union of several open ECG datasets, totaling approximately 180,000 samples, and achieves state-of-the-art performance in various downstream tasks including ECG classification and feature prediction. Our code is openly available at https://github.com/sehunfromdaegu/ECG_JEPA.
Authors: Juan Sebastian Rojas, Chi-Guhn Lee
Abstract: Average-reward Markov decision processes (MDPs) provide a foundational framework for sequential decision-making under uncertainty. However, average-reward MDPs have remained largely unexplored in reinforcement learning (RL) settings, with the majority of RL-based efforts having been allocated to episodic and discounted MDPs. In this work, we study a unique structural property of average-reward MDPs and utilize it to introduce Reward-Extended Differential (or RED) reinforcement learning: a novel RL framework that can be used to effectively and efficiently solve various subtasks simultaneously in the average-reward setting. We introduce a family of RED learning algorithms for prediction and control, including proven-convergent algorithms for the tabular case. We then showcase the power of these algorithms by demonstrating how they can be used to learn a policy that optimizes, for the first time, the well-known conditional value-at-risk (CVaR) risk measure in a fully-online manner, without the use of an explicit bi-level optimization scheme or an augmented state-space.
Authors: Christofel Rio Goenawan
Abstract: In the modern world, the development of Artificial Intelligence (AI) has contributed to improvements in various areas, including automation, computer vision, fraud detection, and more. AI can be leveraged to enhance the efficiency of Autonomous Smart Traffic Management (ASTM) systems and reduce traffic congestion rates. This paper presents an Autonomous Smart Traffic Management (STM) system that uses AI to improve traffic flow rates. The system employs the YOLO V5 Convolutional Neural Network to detect vehicles in traffic management images. Additionally, it predicts the number of vehicles for the next 12 hours using a Recurrent Neural Network with Long Short-Term Memory (RNN-LSTM). The Smart Traffic Management Cycle Length Analysis manages the traffic cycle length based on these vehicle predictions, aided by AI. From the results of the RNN-LSTM model for predicting vehicle numbers over the next 12 hours, we observe that the model predicts traffic with a Mean Squared Error (MSE) of 4.521 vehicles and a Root Mean Squared Error (RMSE) of 2.232 vehicles. After simulating the STM system in the CARLA simulation environment, we found that the Traffic Management Congestion Flow Rate with ASTM (21 vehicles per minute) is 50\% higher than the rate without STM (around 15 vehicles per minute). Additionally, the Traffic Management Vehicle Pass Delay with STM (5 seconds per vehicle) is 70\% lower than without STM (around 12 seconds per vehicle). These results demonstrate that the STM system using AI can increase traffic flow by 50\% and reduce vehicle pass delays by 70\%.
Authors: Nandan Kumar Jha, Brandon Reagen
Abstract: The pervasiveness of proprietary language models has raised privacy concerns for users' sensitive data, emphasizing the need for private inference (PI), where inference is performed directly on encrypted inputs. However, current PI methods face prohibitively higher communication and latency overheads, primarily due to nonlinear operations. In this paper, we present a comprehensive analysis to understand the role of nonlinearities in transformer-based decoder-only language models. We introduce AERO, a four-step architectural optimization framework that refines the existing LLM architecture for efficient PI by systematically removing nonlinearities such as LayerNorm and GELU and reducing FLOPs counts. For the first time, we propose a Softmax-only architecture with significantly fewer FLOPs tailored for efficient PI. Furthermore, we devise a novel entropy regularization technique to improve the performance of Softmax-only models. AERO achieves up to 4.23$\times$ communication and 1.94$\times$ latency reduction. We validate the effectiveness of AERO by benchmarking it against the state-of-the-art.
Authors: Alexandra Bazarova, Evgenia Romanenkova, Alexey Zaytsev
Abstract: Change point detection (CPD) methods aim to identify abrupt shifts in the distribution of input data streams. Accurate estimators for this task are crucial across various real-world scenarios. Yet, traditional unsupervised CPD techniques face significant limitations, often relying on strong assumptions or suffering from low expressive power due to inherent model simplicity. In contrast, representation learning methods overcome these drawbacks by offering flexibility and the ability to capture the full complexity of the data without imposing restrictive assumptions. However, these approaches are still emerging in the CPD field and lack robust theoretical foundations to ensure their reliability. Our work addresses this gap by integrating the expressive power of representation learning with the groundedness of traditional CPD techniques. We adopt spectral normalization (SN) for deep representation learning in CPD tasks and prove that the embeddings after SN are highly informative for CPD. Our method significantly outperforms current state-of-the-art methods during the comprehensive evaluation via three standard CPD datasets.
Authors: Woosung Koh, Wonbeen Oh, Siyeol Kim, Suhin Shin, Hyeongjin Kim, Jaein Jang, Junghyun Lee, Se-Young Yun
Abstract: Multi-agent reinforcement learning has demonstrated significant potential in addressing complex cooperative tasks across various real-world applications. However, existing MARL approaches often rely on the restrictive assumption that the number of entities (e.g., agents, obstacles) remains constant between training and inference. This overlooks scenarios where entities are dynamically removed or added during the inference trajectory -- a common occurrence in real-world environments like search and rescue missions and dynamic combat situations. In this paper, we tackle the challenge of intra-trajectory dynamic entity composition under zero-shot out-of-domain (OOD) generalization, where such dynamic changes cannot be anticipated beforehand. Our empirical studies reveal that existing MARL methods suffer significant performance degradation and increased uncertainty in these scenarios. In response, we propose FlickerFusion, a novel OOD generalization method that acts as a universally applicable augmentation technique for MARL backbone methods. FlickerFusion stochastically drops out parts of the observation space, emulating being in-domain when inferenced OOD. The results show that FlickerFusion not only achieves superior inference rewards but also uniquely reduces uncertainty vis-\`a-vis the backbone, compared to existing methods. Benchmarks, implementations, and model weights are organized and open-sourced at flickerfusion305.github.io, accompanied by ample demo video renderings.
Authors: Sriniketh Vangaru, Daniel Rosen, Dylan Green, Raphael Rodriguez, Maxwell Wiecek, Amos Johnson, Alyse M. Jones, William C. Headley
Abstract: Technological trends show that Radio Frequency Reinforcement Learning (RFRL) will play a prominent role in the wireless communication systems of the future. Applications of RFRL range from military communications jamming to enhancing WiFi networks. Before deploying algorithms for these purposes, they must be trained in a simulation environment to ensure adequate performance. For this reason, we previously created the RFRL Gym: a standardized, accessible tool for the development and testing of reinforcement learning (RL) algorithms in the wireless communications space. This environment leveraged the OpenAI Gym framework and featured customizable simulation scenarios within the RF spectrum. However, the RFRL Gym was limited to training a single RL agent per simulation; this is not ideal, as most real-world RF scenarios will contain multiple intelligent agents in cooperative, competitive, or mixed settings, which is a natural consequence of spectrum congestion. Therefore, through integration with Ray RLlib, multi-agent reinforcement learning (MARL) functionality for training and assessment has been added to the RFRL Gym, making it even more of a robust tool for RF spectrum simulation. This paper provides an overview of the updated RFRL Gym environment. In this work, the general framework of the tool is described relative to comparable existing resources, highlighting the significant additions and refactoring we have applied to the Gym. Afterward, results from testing various RF scenarios in the MARL environment and future additions are discussed.
Authors: Haiyi Mao, Romain Lopez, Kai Liu, Jan-Christian H\"utter, David Richmond, Panayiotis V. Benos, Lin Qiu
Abstract: The study of cells and their responses to genetic or chemical perturbations promises to accelerate the discovery of therapeutic targets. However, designing adequate and insightful models for such data is difficult because the response of a cell to perturbations essentially depends on its biological context (e.g., genetic background or cell type). For example, while discovering therapeutic targets, one may want to enrich for drugs that specifically target a certain cell type. This challenge emphasizes the need for methods that explicitly take into account potential interactions between drugs and contexts. Towards this goal, we propose a novel Factorized Causal Representation (FCR) learning method that reveals causal structure in single-cell perturbation data from several cell lines. Based on the framework of identifiable deep generative models, FCR learns multiple cellular representations that are disentangled, comprised of covariate-specific ($\mathbf{z}_x$), treatment-specific ($\mathbf{z}_{t}$), and interaction-specific ($\mathbf{z}_{tx}$) blocks. Based on recent advances in non-linear ICA theory, we prove the component-wise identifiability of $\mathbf{z}_{tx}$ and block-wise identifiability of $\mathbf{z}_t$ and $\mathbf{z}_x$. Then, we present our implementation of FCR, and empirically demonstrate that it outperforms state-of-the-art baselines in various tasks across four single-cell datasets.
Authors: Yanke Song, Jonathan Lorraine, Weili Nie, Karsten Kreis, James Lucas
Abstract: Diffusion models achieve high-quality sample generation at the cost of a lengthy multistep inference procedure. To overcome this, diffusion distillation techniques produce student generators capable of matching or surpassing the teacher in a single step. However, the student model's inference speed is limited by the size of the teacher architecture, preventing real-time generation for computationally heavy applications. In this work, we introduce Multi-Student Distillation (MSD), a framework to distill a conditional teacher diffusion model into multiple single-step generators. Each student generator is responsible for a subset of the conditioning data, thereby obtaining higher generation quality for the same capacity. MSD trains multiple distilled students, allowing smaller sizes and, therefore, faster inference. Also, MSD offers a lightweight quality boost over single-student distillation with the same architecture. We demonstrate MSD is effective by training multiple same-sized or smaller students on single-step distillation using distribution matching and adversarial distillation techniques. With smaller students, MSD gets competitive results with faster inference for single-step generation. Using 4 same-sized students, MSD significantly outperforms single-student baseline counterparts and achieves remarkable FID scores for one-step image generation: 1.20 on ImageNet-64x64 and 8.20 on zero-shot COCO2014.
Authors: Chengting Yu, Fengzhao Zhang, Ruizhe Chen, Aili Wang, Zuozhu Liu, Shurun Tan, Er-Ping Li
Abstract: Knowledge Distillation (KD), a learning manner with a larger teacher network guiding a smaller student network, transfers dark knowledge from the teacher to the student via logits or intermediate features, with the aim of producing a well-performed lightweight model. Notably, many subsequent feature-based KD methods outperformed the earliest logit-based KD method and iteratively generated numerous state-of-the-art distillation methods. Nevertheless, recent work has uncovered the potential of the logit-based method, bringing the simple KD form based on logits back into the limelight. Features or logits? They partially implement the KD with entirely distinct perspectives; therefore, choosing between logits and features is not straightforward. This paper provides a unified perspective of feature alignment in order to obtain a better comprehension of their fundamental distinction. Inheriting the design philosophy and insights of feature-based and logit-based methods, we introduce a block-wise logit distillation framework to apply implicit logit-based feature alignment by gradually replacing teacher's blocks as intermediate stepping-stone models to bridge the gap between the student and the teacher. Our method obtains comparable or superior results to state-of-the-art distillation methods. This paper demonstrates the great potential of combining logit and features, and we hope it will inspire future research to revisit KD from a higher vantage point.
Authors: Omar Naim, Guilhem Fouilh\'e, Nicholas Asher
Abstract: In context learning (ICL) is an attractive method of solving a wide range of problems. Inspired by Garg et al. (2022), we look closely at ICL in a variety of train and test settings for several transformer models of different sizes trained from scratch. Our study complements prior work by pointing out several systematic failures of these models to generalize to data not in the training distribution, thereby showing some limitations of ICL. We find that models adopt a strategy for this task that is very different from standard solutions.
Authors: Kaizhao Liang, Lizhang Chen, Bo Liu, Qiang Liu
Abstract: AdamW has been the default optimizer for transformer pretraining. For many years, our community searches for faster and more stable optimizers with only constraint positive outcomes. In this work, we propose a \textbf{single-line modification in Pytorch} to any momentum-based optimizer, which we rename Cautious Optimizer, e.g. C-AdamW and C-Lion. Our theoretical result shows that this modification preserves Adam's Hamiltonian function and it does not break the convergence guarantee under the Lyapunov analysis. In addition, a whole new family of optimizers is revealed by our theoretical insight. Among them, we pick the simplest one for empirical experiments, showing speed-up on Llama and MAE pretraining up to $1.47\times$. Code is available at https://github.com/kyleliang919/C-Optim
Authors: Yuncong Yang, Xiao Han, Yidong Chai, Reza Ebrahimi, Rouzbeh Behnia, Balaji Padmanabhan
Abstract: Recent privacy regulations (e.g., GDPR) grant data subjects the `Right to Be Forgotten' (RTBF) and mandate companies to fulfill data erasure requests from data subjects. However, companies encounter great challenges in complying with the RTBF regulations, particularly when asked to erase specific training data from their well-trained predictive models. While researchers have introduced machine unlearning methods aimed at fast data erasure, these approaches often overlook maintaining model performance (e.g., accuracy), which can lead to financial losses and non-compliance with RTBF obligations. This work develops a holistic machine learning-to-unlearning framework, called Ensemble-based iTerative Information Distillation (ETID), to achieve efficient data erasure while preserving the business value of predictive models. ETID incorporates a new ensemble learning method to build an accurate predictive model that can facilitate handling data erasure requests. ETID also introduces an innovative distillation-based unlearning method tailored to the constructed ensemble model to enable efficient and effective data erasure. Extensive experiments demonstrate that ETID outperforms various state-of-the-art methods and can deliver high-quality unlearned models with efficiency. We also highlight ETID's potential as a crucial tool for fostering a legitimate and thriving market for data and predictive services.
Authors: Ahmad Ahmad, Mehdi Kermanshah, Kevin Leahy, Zachary Serlin, Ho Chit Siu, Makai Mann, Cristian-Ioan Vasile, Roberto Tron, Calin Belta
Abstract: In this paper, we tackle the challenging problem of delayed rewards in reinforcement learning (RL). While Proximal Policy Optimization (PPO) has emerged as a leading Policy Gradient method, its performance can degrade under delayed rewards. We introduce two key enhancements to PPO: a hybrid policy architecture that combines an offline policy (trained on expert demonstrations) with an online PPO policy, and a reward shaping mechanism using Time Window Temporal Logic (TWTL). The hybrid architecture leverages offline data throughout training while maintaining PPO's theoretical guarantees. Building on the monotonic improvement framework of Trust Region Policy Optimization (TRPO), we prove that our approach ensures improvement over both the offline policy and previous iterations, with a bounded performance gap of $(2\varsigma\gamma\alpha^2)/(1-\gamma)^2$, where $\alpha$ is the mixing parameter, $\gamma$ is the discount factor, and $\varsigma$ bounds the expected advantage. Additionally, we prove that our TWTL-based reward shaping preserves the optimal policy of the original problem. TWTL enables formal translation of temporal objectives into immediate feedback signals that guide learning. We demonstrate the effectiveness of our approach through extensive experiments on an inverted pendulum and a lunar lander environments, showing improvements in both learning speed and final performance compared to standard PPO and offline-only approaches.
Authors: Erin Carson, Xinye Chen, Cheng Kang
Abstract: The success of large language models (LLMs) for time series has been demonstrated in previous work. Utilizing a symbolic time series representation, one can efficiently bridge the gap between LLMs and time series. However, the remaining challenge is to exploit the semantic information hidden in time series by using symbols or existing tokens of LLMs, while aligning the embedding space of LLMs according to the hidden information of time series. The symbolic time series approximation (STSA) method called adaptive Brownian bridge-based symbolic aggregation (ABBA) shows outstanding efficacy in preserving salient time series features by modeling time series patterns in terms of amplitude and period while using existing tokens of LLMs. In this paper, we introduce a method, called LLM-ABBA, that integrates ABBA into large language models for various downstream time series tasks. By symbolizing time series, LLM-ABBA compares favorably to the recent state-of-the-art (SOTA) in UCR and three medical time series classification tasks. Meanwhile, a fixed-polygonal chain trick in ABBA is introduced to \kc{avoid obvious drifting} during prediction tasks by significantly mitigating the effects of cumulative error arising from misused symbols during the transition from symbols to numerical values. In time series regression tasks, LLM-ABBA achieves the new SOTA on Time Series Extrinsic Regression (TSER) benchmarks. LLM-ABBA also shows competitive prediction capability compared to recent SOTA time series prediction results. We believe this framework can also seamlessly extend to other time series tasks.
Authors: Vishnou Vinayagame, Gregory Senay, Luis Mart\'i
Abstract: Mathematical reasoning capabilities are increasing with tool-augmented language agents, but methods often rely either on closed-source or large models, external data, or extensive prompt engineering. This work introduces MATATA, a novel cost-effective method to train LLM agents for tabular data problems through reasoning, planning, and tool use. With a progressive self-improvement paradigm and an iterative weak supervision, it empowers 3.8B/8B Small Language Models (SLMs), particularly suited for local hosting and sensitive business contexts where data privacy is crucial. By employing a flexible and reusable tools across different datasets, it achieves robust performance with effective scalability across shared tasks. Experiments show that MATATA reaches state-of-the-art performances on FinQA and TAT-QA among reasoning frameworks based on open-source models. Moreover, MATATA models compete with GPT-4 based frameworks on TabMWP, while being SLMs.
Authors: Akhiad Bercovich, Tomer Ronen, Talor Abramovich, Nir Ailon, Nave Assaf, Mohammad Dabbah, Ido Galil, Amnon Geifman, Yonatan Geifman, Izhak Golan, Netanel Haber, Ehud Karpas, Roi Koren, Itay Levy, Pavlo Molchanov, Shahar Mor, Zach Moshe, Najeeb Nabwani, Omri Puny, Ran Rubin, Itamar Schen, Ido Shahaf, Oren Tropp, Omer Ullman Argov, Ran Zilberstein, Ran El-Yaniv
Abstract: Large language models (LLMs) have demonstrated remarkable capabilities, but their adoption is limited by high computational costs during inference. While increasing parameter counts enhances accuracy, it also widens the gap between state-of-the-art capabilities and practical deployability. We present Puzzle, a framework to accelerate LLM inference on specific hardware while preserving their capabilities. Through an innovative application of neural architecture search (NAS) at an unprecedented scale, Puzzle systematically optimizes models with tens of billions of parameters under hardware constraints. Our approach utilizes blockwise local knowledge distillation (BLD) for parallel architecture exploration and employs mixed-integer programming for precise constraint optimization. We demonstrate the real-world impact of our framework through Llama-3.1-Nemotron-51B-Instruct (Nemotron-51B), a publicly available model derived from Llama-3.1-70B-Instruct. Nemotron-51B achieves a 2.17x inference throughput speedup, fitting on a single NVIDIA H100 GPU while preserving 98.4% of the original model's capabilities. Nemotron-51B currently stands as the most accurate language model capable of inference on a single GPU with large batch sizes. Remarkably, this transformation required just 45B training tokens, compared to over 15T tokens used for the 70B model it was derived from. This establishes a new paradigm where powerful models can be optimized for efficient deployment with only negligible compromise of their capabilities, demonstrating that inference performance, not parameter count alone, should guide model selection. With the release of Nemotron-51B and the presentation of the Puzzle framework, we provide practitioners immediate access to state-of-the-art language modeling capabilities at significantly reduced computational costs.
Authors: Jiani Yan, Charles Rahal
Abstract: This short Correspondence critiques the classic dichotomization of prediction error into reducible and irreducible components, noting that certain types of error can be eliminated at differential speeds. We propose an improved analytical framework that better distinguishes epistemic from aleatoric uncertainty, emphasizing that predictability depends on information sets and cautioning against premature claims of unpredictability.
Authors: Leila Gheisi, Henry Chu, Raju Gottumukkala, Yan Luo, Xingquan Zhu, Mengyu Wang, Min Shi
Abstract: The use of artificial intelligence (AI) in automated disease classification significantly reduces healthcare costs and improves the accessibility of services. However, this transformation has given rise to concerns about the fairness of AI, which disproportionately affects certain groups, particularly patients from underprivileged populations. Recently, a number of methods and large-scale datasets have been proposed to address group performance disparities. Although these methods have shown effectiveness in disease classification tasks, they may fall short in ensuring fair prediction of disease progression, mainly because of limited longitudinal data with diverse demographics available for training a robust and equitable prediction model. In this paper, we introduce TransFair to enhance demographic fairness in progression prediction for ocular diseases. TransFair aims to transfer a fairness-enhanced disease classification model to the task of progression prediction with fairness preserved. Specifically, we train a fair EfficientNet, termed FairEN, equipped with a fairness-aware attention mechanism using extensive data for ocular disease classification. Subsequently, this fair classification model is adapted to a fair progression prediction model through knowledge distillation, which aims to minimize the latent feature distances between the classification and progression prediction models. We evaluate FairEN and TransFair for fairness-enhanced ocular disease classification and progression prediction using both two-dimensional (2D) and 3D retinal images. Extensive experiments and comparisons with models with and without considering fairness learning show that TransFair effectively enhances demographic equity in predicting ocular disease progression.
Authors: Jingyang Xiang, Sai Qian Zhang
Abstract: Rotating the activation and weight matrices to reduce the influence of outliers in large language models (LLMs) has recently attracted significant attention, particularly in the context of model quantization. Prior studies have shown that in low-precision quantization scenarios, such as 4-bit weights and 4-bit activations (W4A4), randomized Hadamard transforms can achieve significantly higher accuracy than randomized orthogonal transforms. Notably, the reason behind this phenomena remains unknown. In this paper, we find that these transformations show substantial improvement in eliminating outliers for common tokens and achieve similar quantization error. The primary reason for the accuracy difference lies in the fact that randomized Hadamard transforms can slightly reduce the quantization error for tokens with massive activations while randomized orthogonal transforms increase the quantization error. Due to the extreme rarity of these tokens and their critical impact on model accuracy, we consider this a long-tail optimization problem, and therefore construct a simple yet effective method: a weighted loss function. Additionally, we propose an optimization strategy for the rotation matrix that involves alternating optimization of quantization parameters while employing orthogonal Procrustes transforms to refine the rotation matrix. This makes the distribution of the rotated activation values more conducive to quantization, especially for tokens with massive activations. Our method enhances the Rotated LLMs by achieving dual free, Outlier-Free and Massive Activation-Free, dubbed as DFRot. Extensive experiments demonstrate the effectiveness and efficiency of DFRot. By tuning the rotation matrix using just a single sample, DFRot achieves a perplexity improvement of 0.25 and 0.21 on W4A4KV4 and W4A4KV16, respectively, for LLaMA3-8B, a model known for its quantization challenges.
Authors: Katarzyna Janocha, Annabel Ling, Alice Godson, Yulia Lampi, Simon Bornschein, Nils Y. Hammerla
Abstract: Cell and immunotherapy offer transformative potential for treating diseases like cancer and autoimmune disorders by modulating the immune system. The development of these therapies is resource-intensive, with the majority of drug candidates failing to progress beyond laboratory testing. While recent advances in machine learning have revolutionised areas such as protein engineering, applications in immunotherapy remain limited due to the scarcity of large-scale, standardised datasets and the complexity of cellular systems. In this work, we address these challenges by leveraging a high-throughput experimental platform to generate data suitable for fine-tuning protein language models. We demonstrate how models fine-tuned using a preference task show surprising correlations to biological assays, and how they can be leveraged for few-shot hit maturation in CARs. This proof-of-concept presents a novel pathway for applying ML to immunotherapy and could generalise to other therapeutic modalities.
Authors: Sebastian Peitz, Sedjro Salomon Hotegni
Abstract: Simultaneously considering multiple objectives in machine learning has been a popular approach for several decades, with various benefits for multi-task learning, the consideration of secondary goals such as sparsity, or multicriteria hyperparameter tuning. However - as multi-objective optimization is significantly more costly than single-objective optimization - the recent focus on deep learning architectures poses considerable additional challenges due to the very large number of parameters, strong nonlinearities and stochasticity. This survey covers recent advancements in the area of multi-objective deep learning. We introduce a taxonomy of existing methods - based on the type of training algorithm as well as the decision maker's needs - before listing recent advancements, and also successful applications. All three main learning paradigms supervised learning, unsupervised learning and reinforcement learning are covered, and we also address the recently very popular area of generative modeling.
Authors: Jan Pablo Burgard, Jo\~ao Vitor Pamplona
Abstract: In this paper, we propose FairML.jl, a Julia package providing a framework for fair classification in machine learning. In this framework, the fair learning process is divided into three stages. Each stage aims to reduce unfairness, such as disparate impact and disparate mistreatment, in the final prediction. For the preprocessing stage, we present a resampling method that addresses unfairness coming from data imbalances. The in-processing phase consist of a classification method. This can be either one coming from the MLJ.jl package, or a user defined one. For this phase, we incorporate fair ML methods that can handle unfairness to a certain degree through their optimization process. In the post-processing, we discuss the choice of the cut-off value for fair prediction. With simulations, we show the performance of the single phases and their combinations.
Authors: Zhengnan Li, Haoxuan Li, Hao Wang, Jun Fang, Duoyin Li Yunxiao Qin
Abstract: Time series forecasting (TSF) plays a crucial role in various domains, including web data analysis, energy consumption prediction, and weather forecasting. While Multi-Layer Perceptrons (MLPs) are lightweight and effective for capturing temporal dependencies, they are prone to overfitting when used to model inter-channel dependencies. In this paper, we investigate the overfitting problem in channel-wise MLPs using Rademacher complexity theory, revealing that extreme values in time series data exacerbate this issue. To mitigate this issue, we introduce a novel Simplex-MLP layer, where the weights are constrained within a standard simplex. This strategy encourages the model to learn simpler patterns and thereby reducing overfitting to extreme values. Based on the Simplex-MLP layer, we propose a novel \textbf{F}requency \textbf{S}implex \textbf{MLP} (FSMLP) framework for time series forecasting, comprising of two kinds of modules: \textbf{S}implex \textbf{C}hannel-\textbf{W}ise MLP (SCWM) and \textbf{F}requency \textbf{T}emporal \textbf{M}LP (FTM). The SCWM effectively leverages the Simplex-MLP to capture inter-channel dependencies, while the FTM is a simple yet efficient temporal MLP designed to extract temporal information from the data. Our theoretical analysis shows that the upper bound of the Rademacher Complexity for Simplex-MLP is lower than that for standard MLPs. Moreover, we validate our proposed method on seven benchmark datasets, demonstrating significant improvements in forecasting accuracy and efficiency, while also showcasing superior scalability. Additionally, we demonstrate that Simplex-MLP can improve other methods that use channel-wise MLP to achieve less overfitting and improved performance. Code are available \href{https://github.com/FMLYD/FSMLP}{\textcolor{red}{here}}.
Authors: Sebastian Baunsgaard, Sebastian B. Wrede, P{\i}nar Tozun
Abstract: Automatic Speech Recognition (ASR) has increased in popularity in recent years. The evolution of processor and storage technologies has enabled more advanced ASR mechanisms, fueling the development of virtual assistants such as Amazon Alexa, Apple Siri, Microsoft Cortana, and Google Home. The interest in such assistants, in turn, has amplified the novel developments in ASR research. However, despite this popularity, there has not been a detailed training efficiency analysis of modern ASR systems. This mainly stems from: the proprietary nature of many modern applications that depend on ASR, like the ones listed above; the relatively expensive co-processor hardware that is used to accelerate ASR by big vendors to enable such applications; and the absence of well-established benchmarks. The goal of this paper is to address the latter two of these challenges. The paper first describes an ASR model, based on a deep neural network inspired by recent work in this domain, and our experiences building it. Then we evaluate this model on three CPU-GPU co-processor platforms that represent different budget categories. Our results demonstrate that utilizing hardware acceleration yields good results even without high-end equipment. While the most expensive platform (10X price of the least expensive one) converges to the initial accuracy target 10-30% and 60-70% faster than the other two, the differences among the platforms almost disappear at slightly higher accuracy targets. In addition, our results further highlight both the difficulty of evaluating ASR systems due to the complex, long, and resource intensive nature of the model training in this domain, and the importance of establishing benchmarks for ASR.
Authors: Songtao Lu
Abstract: Nonconvex constrained optimization problems can be used to model a number of machine learning problems, such as multi-class Neyman-Pearson classification and constrained Markov decision processes. However, such kinds of problems are challenging because both the objective and constraints are possibly nonconvex, so it is difficult to balance the reduction of the loss value and reduction of constraint violation. Although there are a few methods that solve this class of problems, all of them are double-loop or triple-loop algorithms, and they require oracles to solve some subproblems up to certain accuracy by tuning multiple hyperparameters at each iteration. In this paper, we propose a novel gradient descent and perturbed ascent (GDPA) algorithm to solve a class of smooth nonconvex inequality constrained problems. The GDPA is a primal-dual algorithm, which only exploits the first-order information of both the objective and constraint functions to update the primal and dual variables in an alternating way. The key feature of the proposed algorithm is that it is a single-loop algorithm, where only two step-sizes need to be tuned. We show that under a mild regularity condition GDPA is able to find Karush-Kuhn-Tucker (KKT) points of nonconvex functional constrained problems with convergence rate guarantees. To the best of our knowledge, it is the first single-loop algorithm that can solve the general nonconvex smooth problems with nonconvex inequality constraints. Numerical results also showcase the superiority of GDPA compared with the best-known algorithms (in terms of both stationarity measure and feasibility of the obtained solutions).
Authors: Martin Grohe
Abstract: We analyse the power of graph neural networks (GNNs) in terms of Boolean circuit complexity and descriptive complexity. We prove that the graph queries that can be computed by a polynomial-size bounded-depth family of GNNs are exactly those definable in the guarded fragment GFO+C of first-order logic with counting and with built-in relations. This puts GNNs in the circuit complexity class (non-uniform) $\text{TC}^0$. Remarkably, the GNN families may use arbitrary real weights and a wide class of activation functions that includes the standard ReLU, logistic "sigmoid", and hyperbolic tangent functions. If the GNNs are allowed to use random initialisation and global readout (both standard features of GNNs widely used in practice), they can compute exactly the same queries as bounded depth Boolean circuits with threshold gates, that is, exactly the queries in $\text{TC}^0$. Moreover, we show that queries computable by a single GNN with piecewise linear activations and rational weights are definable in GFO+C without built-in relations. Therefore, they are contained in uniform $\text{TC}^0$.
Authors: Zhenqiao Song, Lei Li
Abstract: Designing protein sequences with desired biological function is crucial in biology and chemistry. Recent machine learning methods use a surrogate sequence-function model to replace the expensive wet-lab validation. How can we efficiently generate diverse and novel protein sequences with high fitness? In this paper, we propose IsEM-Pro, an approach to generate protein sequences towards a given fitness criterion. At its core, IsEM-Pro is a latent generative model, augmented by combinatorial structure features from a separately learned Markov random fields (MRFs). We develop an Monte Carlo Expectation-Maximization method (MCEM) to learn the model. During inference, sampling from its latent space enhances diversity while its MRFs features guide the exploration in high fitness regions. Experiments on eight protein sequence design tasks show that our IsEM-Pro outperforms the previous best methods by at least 55% on average fitness score and generates more diverse and novel protein sequences.
Authors: Payam Abdisarabshali, Nicholas Accurso, Filippo Malandra, Weifeng Su, Seyyedali Hosseinalipour
Abstract: Federated learning (FL) is a popular distributed machine learning (ML) technique. However, implementation of FL over 5G-and-beyond wireless networks faces key challenges caused by (i) dynamics of the wireless network conditions and (ii) the coexistence of multiple FL-services in the system, which are not jointly considered in prior works. We first take a closer look into these challenges and unveil nuanced phenomena called over-/under-provisioning of resources and perspective-driven load balancing. We then take the first steps towards addressing these phenomena by proposing a novel distributed ML architecture called elastic FL (EFL). EFL unleashes the full potential of Open RAN (O-RAN) systems and introduces an elastic resource provisioning methodology to execute FL-services. It further constitutes a multi-time-scale FL management system that introduces three dedicated network control functionalities tailored for FL-services, including (i) non-real-time (non-RT) system descriptor, which trains ML-based applications to predicted both system and FL-related dynamics and parameters; (ii) near-RT FL controller, which handles O-RAN slicing and mobility management for the seamless execution of FL-services; (iii) FL MAC scheduler, which conducts real-time resource allocation to the end clients of various FL-services. We finally prototype EFL to demonstrate its potential in improving the performance of FL-services.
Authors: Shuang Cui, Kai Han, Jing Tang, Xueying Li, Aakas Zhiyuli, Hanxiao Li
Abstract: Submodular maximization has found extensive applications in various domains within the field of artificial intelligence, including but not limited to machine learning, computer vision, and natural language processing. With the increasing size of datasets in these domains, there is a pressing need to develop efficient and parallelizable algorithms for submodular maximization. One measure of the parallelizability of a submodular maximization algorithm is its adaptive complexity, which indicates the number of sequential rounds where a polynomial number of queries to the objective function can be executed in parallel. In this paper, we study the problem of non-monotone submodular maximization subject to a knapsack constraint, and propose the first combinatorial algorithm achieving an $(8+\epsilon)$-approximation under $\mathcal{O}(\log n)$ adaptive complexity, which is \textit{optimal} up to a factor of $\mathcal{O}(\log\log n)$. Moreover, we also propose the first algorithm with both provable approximation ratio and sublinear adaptive complexity for the problem of non-monotone submodular maximization subject to a $k$-system constraint. As a by-product, we show that our two algorithms can also be applied to the special case of submodular maximization subject to a cardinality constraint, and achieve performance bounds comparable with those of state-of-the-art algorithms. Finally, the effectiveness of our approach is demonstrated by extensive experiments on real-world applications.
Authors: Alessandro Scagliotti, Sara Farinelli
Abstract: In this paper, we consider the problem of recovering the $W_2$-optimal transport map T between absolutely continuous measures $\mu,\nu\in\mathcal{P}(\mathbb{R}^n)$ as the flow of a linear-control neural ODE, where the control depends only on the time variable and takes values in a finite-dimensional space. We first show that, under suitable assumptions on $\mu,\nu$ and on the controlled vector fields governing the neural ODE, the optimal transport map is contained in the $C^0_c$-closure of the flows generated by the system. Then, we tackle the problem under the assumption that only discrete approximations of $\mu_N,\nu_N$ of the original measures $\mu,\nu$ are available: we formulate approximated optimal control problems, and we show that their solutions give flows that approximate the original optimal transport map $T$. In the framework of generative models, the approximating flow constructed here can be seen as a `Normalizing Flow', which usually refers to the task of providing invertible transport maps between probability measures by means of deep neural networks. We propose an iterative numerical scheme based on the Pontryagin Maximum Principle for the resolution of the optimal control problem, resulting in a method for the practical computation of the approximated optimal transport map, and we test it on a two-dimensional example.
Authors: Amirhosein Chahe, Chenan Wang, Abhishek Jeyapratap, Kaidi Xu, Lifeng Zhou
Abstract: This paper introduces an attacking mechanism to challenge the resilience of autonomous driving systems. Specifically, we manipulate the decision-making processes of an autonomous vehicle by dynamically displaying adversarial patches on a screen mounted on another moving vehicle. These patches are optimized to deceive the object detection models into misclassifying targeted objects, e.g., traffic signs. Such manipulation has significant implications for critical multi-vehicle interactions such as intersection crossing and lane changing, which are vital for safe and efficient autonomous driving systems. Particularly, we make four major contributions. First, we introduce a novel adversarial attack approach where the patch is not co-located with its target, enabling more versatile and stealthy attacks. Moreover, our method utilizes dynamic patches displayed on a screen, allowing for adaptive changes and movement, enhancing the flexibility and performance of the attack. To do so, we design a Screen Image Transformation Network (SIT-Net), which simulates environmental effects on the displayed images, narrowing the gap between simulated and real-world scenarios. Further, we integrate a positional loss term into the adversarial training process to increase the success rate of the dynamic attack. Finally, we shift the focus from merely attacking perceptual systems to influencing the decision-making algorithms of self-driving systems. Our experiments demonstrate the first successful implementation of such dynamic adversarial attacks in real-world autonomous driving scenarios, paving the way for advancements in the field of robust and secure autonomous driving.
Authors: Cheng Li, Mengzhou Chen, Jindong Wang, Sunayana Sitaram, Xing Xie
Abstract: Large language models (LLMs) are reported to be partial to certain cultures owing to the training data dominance from the English corpora. Since multilingual cultural data are often expensive to collect, existing efforts handle this by prompt engineering or culture-specific pre-training. However, they might overlook the knowledge deficiency of low-resource culture and require extensive computing resources. In this paper, we propose CultureLLM, a cost-effective solution to incorporate cultural differences into LLMs. CultureLLM adopts World Value Survey (WVS) as seed data and generates semantically equivalent training data via the proposed semantic data augmentation. Using only 50 seed samples from WVS with augmented data, we fine-tune culture-specific LLMs and one unified model (CultureLLM-One) for 9 cultures covering rich and low-resource languages. Extensive experiments on 60 culture-related datasets demonstrate that CultureLLM significantly outperforms various counterparts such as GPT-3.5 (by 8.1%) and Gemini Pro (by 9.5%) with comparable performance to GPT-4 or even better. Our human study shows that the generated samples are semantically equivalent to the original samples, providing an effective solution for LLMs augmentation. Code is released at https://github.com/Scarelette/CultureLLM.
Authors: Rishabh Goel
Abstract: The advent of Large Language Models (LLMs) is promising and LLMs have been applied to numerous fields. However, it is not trivial to implement LLMs in the medical field, due to the high standards for precision and accuracy. Currently, the diagnosis of medical ailments must be done by hand, as it is costly to build a sufficiently broad LLM that can diagnose a wide range of diseases. Here, we explore the use of vector databases and embedding models as a means of encoding and classifying text with medical text data without the need to train a new model altogether. We used various LLMs to generate the medical data, then encoded the data with a text embedding model and stored it in a vector database. We hypothesized that higher embedding dimensions coupled with descriptive data in the vector database would lead to better classifications and designed a robustness test to test our hypothesis. By using vector databases and text embedding models to classify a clinician's notes on a patient presenting with a certain ailment, we showed that these tools can be successful at classifying medical text data. We found that a higher embedding dimension did indeed yield better results, however, querying with simple data in the database was optimal for performance. We have shown in this study the applicability of text embedding models and vector databases on a small scale, and our work lays the groundwork for applying these tools on a larger scale.
Authors: Johannes Schneider, Pauline Kuss, Rene Abraham, Christian Meske
Abstract: Generative Artificial Intelligence (GenAI), specifically large language models like ChatGPT, has swiftly entered organizations without adequate governance, posing both opportunities and risks. Despite extensive debates on GenAI's transformative nature and regulatory measures, limited research addresses organizational governance, encompassing technical and business perspectives. Although numerous frameworks for governance of AI exist, it is not clear to what extent they apply to GenAI. Our review paper fills this gap by surveying recent works with the purpose of better understanding fundamental characteristics of GenAI and adjusting prior frameworks specifically towards GenAI governance within companies. To do so, it extends Nickerson's framework development processes to include prior conceptualizations. Our framework outlines the scope, objectives, and governance mechanisms tailored to harness business opportunities as well as mitigate risks associated with GenAI integration. Our research contributes a focused approach to GenAI governance, offering practical insights for companies navigating the challenges of GenAI adoption and highlighting research gaps.
Authors: Yao Fu, Dong-Ki Kim, Jaekyeom Kim, Sungryull Sohn, Lajanugen Logeswaran, Kyunghoon Bae, Honglak Lee
Abstract: Recent advances in large language models (LLMs) have empowered AI agents capable of performing various sequential decision-making tasks. However, effectively guiding LLMs to perform well in unfamiliar domains like web navigation, where they lack sufficient knowledge, has proven to be difficult with the demonstration-based in-context learning paradigm. In this paper, we introduce a novel framework, called AutoGuide, which addresses this limitation by automatically generating context-aware guidelines from offline experiences. Importantly, each context-aware guideline is expressed in concise natural language and follows a conditional structure, clearly describing the context where it is applicable. As a result, our guidelines facilitate the provision of relevant knowledge for the agent's current decision-making process, overcoming the limitations of the conventional demonstration-based learning paradigm. Our evaluation demonstrates that AutoGuide significantly outperforms competitive baselines in complex benchmark domains, including real-world web navigation.
Authors: Zhiwei Zhao
Abstract: Delicate cloth simulations have long been desired in computer graphics. Various methods were proposed to improve engaged force interactions, collision handling, and numerical integrations. Deep learning has the potential to achieve fast and real-time simulation, but common neural network structures often demand many parameters to capture cloth dynamics. This paper proposes a physics-embedded learning framework that directly encodes physical features of cloth simulation. The convolutional neural network is used to represent spatial correlations of the mass-spring system, after which three branches are designed to learn linear, nonlinear, and time derivate features of cloth physics. The framework can also integrate with other external forces and collision handling through either traditional simulators or sub neural networks. The model is tested across different cloth animation cases, without training with new data. Agreement with baselines and predictive realism successfully validate its generalization ability. Inference efficiency of the proposed model also defeats traditional physics simulation. This framework is also designed to easily integrate with other visual refinement techniques like wrinkle carving, which leaves significant chances to incorporate prevailing macing learning techniques in 3D cloth amination.
Authors: Alina Petukhova, Jo\~ao P. Matos-Carvalho, Nuno Fachada
Abstract: Text clustering is an important method for organising the increasing volume of digital content, aiding in the structuring and discovery of hidden patterns in uncategorised data. The effectiveness of text clustering largely depends on the selection of textual embeddings and clustering algorithms. This study argues that recent advancements in large language models (LLMs) have the potential to enhance this task. The research investigates how different textual embeddings, particularly those utilised in LLMs, and various clustering algorithms influence the clustering of text datasets. A series of experiments were conducted to evaluate the impact of embeddings on clustering results, the role of dimensionality reduction through summarisation, and the adjustment of model size. The findings indicate that LLM embeddings are superior at capturing subtleties in structured language. OpenAI's GPT-3.5 Turbo model yields better results in three out of five clustering metrics across most tested datasets. Most LLM embeddings show improvements in cluster purity and provide a more informative silhouette score, reflecting a refined structural understanding of text data compared to traditional methods. Among the more lightweight models, BERT demonstrates leading performance. Additionally, it was observed that increasing model dimensionality and employing summarisation techniques do not consistently enhance clustering efficiency, suggesting that these strategies require careful consideration for practical application. These results highlight a complex balance between the need for refined text representation and computational feasibility in text clustering applications. This study extends traditional text clustering frameworks by integrating embeddings from LLMs, offering improved methodologies and suggesting new avenues for future research in various types of textual analysis.
Authors: Zirui Qiu, Hassan Rivaz, Yiming Xiao
Abstract: Purpose: As visual inspection is an inherent process during radiological screening, the associated eye gaze data can provide valuable insights into relevant clinical decisions. As deep learning has become the state-of-the-art for computer-assisted diagnosis, integrating human behavior, such as eye gaze data, into these systems is instrumental to help align machine predictions with clinical diagnostic criteria, thus enhancing the quality of automatic radiological diagnosis. Methods: We propose a novel deep learning framework for joint disease diagnosis and prediction of corresponding clinical visual attention maps for chest X-ray scans. Specifically, we introduce a new dual-encoder multi-task UNet, which leverages both a DenseNet201 backbone and a Residual and Squeeze-and-Excitation block-based encoder to extract diverse features for visual attention map prediction, and a multi-scale feature-fusion classifier to perform disease classification. To tackle the issue of asynchronous training schedules of individual tasks in multi-task learning, we proposed a multi-stage cooperative learning strategy, with contrastive learning for feature encoder pretraining to boost performance. Results: Our proposed method is shown to significantly outperform existing techniques for chest X-ray diagnosis (AUC=0.93) and the quality of visual attention map prediction (Correlation coefficient=0.58). Conclusion: Benefiting from the proposed multi-task multi-stage cooperative learning, our technique demonstrates the benefit of integrating clinicians' eye gaze into clinical AI systems to boost performance and potentially explainability.
Authors: Mitja Briscik, Gabriele Tazza, Marie-Agnes Dillies, L\'aszl\'o Vid\'acs, S\'ebastien Dejean
Abstract: Advances in high-throughput technologies have originated an ever-increasing availability of omics datasets. The integration of multiple heterogeneous data sources is currently an issue for biology and bioinformatics. Multiple kernel learning (MKL) has shown to be a flexible and valid approach to consider the diverse nature of multi-omics inputs, despite being an underused tool in genomic data mining. We provide novel MKL approaches based on different kernel fusion strategies. To learn from the meta-kernel of input kernels, we adapted unsupervised integration algorithms for supervised tasks with support vector machines. We also tested deep learning architectures for kernel fusion and classification. The results show that MKL-based models can outperform more complex, state-of-the-art, supervised multi-omics integrative approaches. Multiple kernel learning offers a natural framework for predictive models in multi-omics data. It proved to provide a fast and reliable solution that can compete with and outperform more complex architectures. Our results offer a direction for bio-data mining research, biomarker discovery and further development of methods for heterogeneous data integration.
Authors: Juno Nam, Jiayu Peng, Rafael G\'omez-Bombarelli
Abstract: Machine learning interatomic potentials (MLIPs) have become a workhorse of modern atomistic simulations, and recently published universal MLIPs, pre-trained on large datasets, have demonstrated remarkable accuracy and generalizability. However, the computational cost of MLIPs limits their applicability to chemically disordered systems requiring large simulation cells or to sample-intensive statistical methods. Here, we report the use of continuous and differentiable alchemical degrees of freedom in atomistic materials simulations, exploiting the fact that graph neural network MLIPs represent discrete elements as real-valued tensors. The proposed method introduces alchemical atoms with corresponding weights into the input graph, alongside modifications to the message-passing and readout mechanisms of MLIPs, and allows smooth interpolation between the compositional states of materials. The end-to-end differentiability of MLIPs enables efficient calculation of the gradient of energy with respect to the compositional weights. With this modification, we propose methodologies for optimizing the composition of solid solutions towards target macroscopic properties, characterizing order and disorder in multicomponent oxides, and conducting alchemical free energy simulations to quantify the free energy of vacancy formation and composition changes. The approach offers an avenue for extending the capabilities of universal MLIPs in the modeling of compositional disorder and characterizing the phase stability of complex materials systems.
Authors: Gabriel Lopes Cardoso, Dami\'an Mayorga Pe\~na, Suresh Nampuri
Abstract: We study the integrability of two-dimensional theories that are obtained by a dimensional reduction of certain four-dimensional gravitational theories describing the coupling of Maxwell fields and neutral scalar fields to gravity in the presence of a potential for the neutral scalar fields. For a certain solution subspace, we demonstrate partial integrability by showing that a subset of the equations of motion in two dimensions are the compatibility conditions for a linear system. Subsequently, we study the integrability of these two-dimensional models from a complementary one-dimensional point of view, framed in terms of Liouville integrability. In this endeavour, we employ various machine learning techniques to systematise our search for numerical Lax pair matrices for these models, as well as conserved currents expressed as functions of phase space variables.
Authors: Matthew J. Colbrook, Catherine Drysdale, Andrew Horning
Abstract: We introduce the Rigged Dynamic Mode Decomposition (Rigged DMD) algorithm, which computes generalized eigenfunction decompositions of Koopman operators. By considering the evolution of observables, Koopman operators transform complex nonlinear dynamics into a linear framework suitable for spectral analysis. While powerful, traditional Dynamic Mode Decomposition (DMD) techniques often struggle with continuous spectra. Rigged DMD addresses these challenges with a data-driven methodology that approximates the Koopman operator's resolvent and its generalized eigenfunctions using snapshot data from the system's evolution. At its core, Rigged DMD builds wave-packet approximations for generalized Koopman eigenfunctions and modes by integrating Measure-Preserving Extended Dynamic Mode Decomposition with high-order kernels for smoothing. This provides a robust decomposition encompassing both discrete and continuous spectral elements. We derive explicit high-order convergence theorems for generalized eigenfunctions and spectral measures. Additionally, we propose a novel framework for constructing rigged Hilbert spaces using time-delay embedding, significantly extending the algorithm's applicability (Rigged DMD can be used with any rigging). We provide examples, including systems with a Lebesgue spectrum, integrable Hamiltonian systems, the Lorenz system, and a high-Reynolds number lid-driven flow in a two-dimensional square cavity, demonstrating Rigged DMD's convergence, efficiency, and versatility. This work paves the way for future research and applications of decompositions with continuous spectra.
Authors: Joongwon Lee, Wonho Zhung, Jisu Seo, Woo Youn Kim
Abstract: A remarkable advance in geometric deep generative models with accumulated structural data enables structure-based drug design (SBDD) with target protein information only. However, most existing models struggle to address multi-objectives simultaneously while performing well only in their specialized tasks. Here, we present BInD, a diffusion model with knowledge-based guidance for multi-objective SBDD. BInD is designed to co-generate molecules and their interactions with a target protein to consider all key objectives equally well, including target-specific interactions, molecular properties, and local geometry. Comprehensive evaluations show that BInD achieves robust performance for all objectives while outperforming or matching state-of-the-art methods for each. Finally, we propose a train-free optimization method empowered by retrieving target-specific interactions, highlighting the role of non-covalent interactions in achieving higher selectivity and binding affinities to a target protein.
Authors: B\'alint M\'at\'e, Fran\c{c}ois Fleuret, Tristan Bereau
Abstract: Thermodynamic integration (TI) offers a rigorous method for estimating free-energy differences by integrating over a sequence of interpolating conformational ensembles. However, TI calculations are computationally expensive and typically limited to coupling a small number of degrees of freedom due to the need to sample numerous intermediate ensembles with sufficient conformational-space overlap. In this work, we propose to perform TI along an alchemical pathway represented by a trainable neural network, which we term Neural TI. Critically, we parametrize a time-dependent Hamiltonian interpolating between the interacting and non-interacting systems, and optimize its gradient using a score matching objective. The ability of the resulting energy-based diffusion model to sample all intermediate ensembles allows us to perform TI from a single reference calculation. We apply our method to Lennard-Jones fluids, where we report accurate calculations of the excess chemical potential, demonstrating that Neural TI reproduces the underlying changes in free energy without the need for simulations at interpolating Hamiltonians.
Authors: Ziliang Zhang, Huaming Yu, Danqin Ren
Abstract: Traditional wave forecasting models, although based on energy conservation equations, are computationally expensive. On the other hand, existing deep learning geophysical fluid models, while computationally efficient, often suffer from issues such as energy dissipation in long-term forecasts. This paper proposes a novel energy-balanced deep learning wave forecasting model called OceanCastNet (OCN). By incorporating wind fields at the current, previous, and future time steps, as well as wave fields at the current and previous time steps as input variables, OCN maintains energy balance within the model. Furthermore, the model employs adaptive Fourier operators as its core components and designs a masked loss function to better handle the impact of land-sea boundaries. A series of experiments on the ERA5 dataset demonstrate that OCN can achieve short-term forecast accuracy comparable to traditional models while exhibiting an understanding of the wave generation process. In comparative experiments under both normal and extreme conditions, OCN consistently outperforms the widely used WaveWatch III model in the industry. Even after long-term forecasting, OCN maintains a stable and energy-rich state. By further constructing a simple meteorological model, OCN-wind, which considers energy balance, this paper confirms the importance of energy constraints for improving the long-term forecast performance of deep learning meteorological models. This finding provides new ideas for future research on deep learning geophysical fluid models.
Authors: Liliang Ren, Yang Liu, Yadong Lu, Yelong Shen, Chen Liang, Weizhu Chen
Abstract: Efficiently modeling sequences with infinite context length has long been a challenging problem. Previous approaches have either suffered from quadratic computational complexity or limited extrapolation ability in length generalization. In this work, we present Samba, a simple hybrid architecture that layer-wise combines Mamba, a selective State Space Model (SSM), with Sliding Window Attention (SWA). Samba selectively compresses a given sequence into recurrent hidden states while still maintaining the ability to precisely recall recent memories with the attention mechanism. We scale Samba up to 3.8B parameters with 3.2T training tokens and demonstrate that it significantly outperforms state-of-the-art models across a variety of benchmarks. Pretrained on sequences of 4K length, Samba shows improved perplexity in context lengths of up to 1M in zero-shot. When finetuned on 4K-length sequences, Samba efficiently extrapolates to a 256K context length with perfect memory recall on the Passkey Retrieval task, and exhibits superior retrieval extrapolation on the challenging Phonebook task compared to full-attention models. As a linear-time sequence model, Samba achieves a 3.73x higher throughput compared to Transformers with grouped-query attention for user prompts of 128K length, and a 3.64x speedup when generating 64K tokens with unlimited streaming. Our code for training on open source data is publicly available at https://github.com/microsoft/Samba.
Authors: Megan Ayers, Luke Sanford, Margaret Roberts, Eddie Yang
Abstract: Experimental methods for estimating the impacts of text on human evaluation have been widely used in the social sciences. However, researchers in experimental settings are usually limited to testing a small number of pre-specified text treatments. While efforts to mine unstructured texts for features that causally affect outcomes have been ongoing in recent years, these models have primarily focused on the topics or specific words of text, which may not always be the mechanism of the effect. We connect these efforts with NLP interpretability techniques and present a method for flexibly discovering clusters of similar text phrases that are predictive of human reactions to texts using convolutional neural networks. When used in an experimental setting, this method can identify text treatments and their effects under certain assumptions. We apply the method to two datasets. The first enables direct validation of the model's ability to detect phrases known to cause the outcome. The second demonstrates its ability to flexibly discover text treatments with varying textual structures. In both cases, the model learns a greater variety of text treatments compared to benchmark methods, and these text features quantitatively meet or exceed the ability of benchmark methods to predict the outcome.
Authors: Yifan Yang, Kai Zhen, Ershad Banijamal, Athanasios Mouchtaris, Zheng Zhang
Abstract: Fine-tuning large language models (LLMs) has achieved remarkable performance across various natural language processing tasks, yet it demands more and more memory as model sizes keep growing. To address this issue, the recently proposed Memory-efficient Zeroth-order (MeZO) methods attempt to fine-tune LLMs using only forward passes, thereby avoiding the need for a backpropagation graph. However, significant performance drops and a high risk of divergence have limited their widespread adoption. In this paper, we propose the Adaptive Zeroth-order Tensor-Train Adaption (AdaZeta) framework, specifically designed to improve the performance and convergence of the ZO methods. To enhance dimension-dependent ZO estimation accuracy, we introduce a fast-forward, low-parameter tensorized adapter. To tackle the frequently observed divergence issue in large-scale ZO fine-tuning tasks, we propose an adaptive query number schedule that guarantees convergence. Detailed theoretical analysis and extensive experimental results on Roberta-Large and Llama-2-7B models substantiate the efficacy of our AdaZeta framework in terms of accuracy, memory efficiency, and convergence speed.
Authors: Mingrui Zhang, Chunyang Wang, Stephan Kramer, Joseph G. Wallwork, Siyi Li, Jiancheng Liu, Xiang Chen, Matthew D. Piggott
Abstract: Solving complex Partial Differential Equations (PDEs) accurately and efficiently is an essential and challenging problem in all scientific and engineering disciplines. Mesh movement methods provide the capability to improve the accuracy of the numerical solution without increasing the overall mesh degree of freedom count. Conventional sophisticated mesh movement methods are extremely expensive and struggle to handle scenarios with complex boundary geometries. However, existing learning-based methods require re-training from scratch given a different PDE type or boundary geometry, which limits their applicability, and also often suffer from robustness issues in the form of inverted elements. In this paper, we introduce the Universal Mesh Movement Network (UM2N), which -- once trained -- can be applied in a non-intrusive, zero-shot manner to move meshes with different size distributions and structures, for solvers applicable to different PDE types and boundary geometries. UM2N consists of a Graph Transformer (GT) encoder for extracting features and a Graph Attention Network (GAT) based decoder for moving the mesh. We evaluate our method on advection and Navier-Stokes based examples, as well as a real-world tsunami simulation case. Our method outperforms existing learning-based mesh movement methods in terms of the benchmarks described above. In comparison to the conventional sophisticated Monge-Amp\`ere PDE-solver based method, our approach not only significantly accelerates mesh movement, but also proves effective in scenarios where the conventional method fails. Our project page is at https://erizmr.github.io/UM2N/.
Authors: Xiaoyue Xu, Qinyuan Ye, Xiang Ren
Abstract: We introduce Lifelong ICL, a problem setting that challenges long-context language models (LMs) to learn a sequence of language tasks through in-context learning (ICL). We further introduce Task Haystack, an evaluation suite dedicated to assessing and diagnosing how long-context LMs utilizes contexts in Lifelong ICL. When given a task instruction and test inputs, long-context LMs are expected to leverage the relevant demonstrations in the Lifelong ICL prompt, avoid distraction and interference from other tasks, and achieve test accuracies that are not significantly worse than those of the Single-task ICL baseline. Task Haystack draws inspiration from the widely-adopted "needle-in-a-haystack" (NIAH) evaluation, but presents distinct new challenges. It requires models (1) to utilize the contexts at a deeper level, rather than resorting to simple copying and pasting; (2) to navigate through long streams of evolving topics and tasks, proxying the complexities and dynamism of contexts in real-world scenarios. Additionally, Task Haystack inherits the controllability of NIAH, providing model developers with tools and visualizations to identify model vulnerabilities effectively. We benchmark 14 long-context LMs using Task Haystack, finding that frontier models like GPT-4o still struggle with the setting, failing on 15% of cases on average. Most open-weight models further lack behind by a large margin, with failure rates reaching up to 61%. In our controlled analysis, we identify factors such as distraction and recency bias as contributors to these failure cases. Further, performance declines when task instructions are paraphrased at test time or when ICL demonstrations are repeated excessively, raising concerns about the robustness, instruction understanding, and true context utilization of long-context LMs.
Authors: Majid Ghasemi, Dariush Ebrahimi
Abstract: Reinforcement Learning (RL), a subfield of Artificial Intelligence (AI), focuses on training agents to make decisions by interacting with their environment to maximize cumulative rewards. This paper provides an overview of RL, covering its core concepts, methodologies, and resources for further learning. It offers a thorough explanation of fundamental components such as states, actions, policies, and reward signals, ensuring readers develop a solid foundational understanding. Additionally, the paper presents a variety of RL algorithms, categorized based on the key factors such as model-free, model-based, value-based, policy-based, and other key factors. Resources for learning and implementing RL, such as books, courses, and online communities are also provided. By offering a clear, structured introduction, this paper aims to simplify the complexities of RL for beginners, providing a straightforward pathway to understanding.
Authors: Mincheol Park, Nishad Maskara, Marcin Kalinowski, Mikhail D. Lukin
Abstract: Reliable quantum computation requires systematic identification and correction of errors that occur and accumulate in quantum hardware. To diagnose and correct such errors, standard quantum error-correcting protocols utilize $\textit{global}$ error information across the system obtained by mid-circuit readout of ancillary qubits. We investigate circuit-level error-correcting protocols that are measurement-free and based on $\textit{local}$ error information. Such a local error correction (LEC) circuit consists of faulty multi-qubit gates to perform both syndrome extraction and ancilla-controlled error removal. We develop and implement a reinforcement learning framework that takes a fixed set of faulty gates as inputs and outputs an optimized LEC circuit. To evaluate this approach, we quantitatively characterize an extension of logical qubit lifetime by a noisy LEC circuit. For the 2D classical Ising model and 4D toric code, our optimized LEC circuit performs better at extending a memory lifetime compared to a conventional LEC circuit based on Toom's rule in a sub-threshold gate error regime. We further show that such circuits can be used to reduce the rate of mid-circuit readouts to preserve a 2D toric code memory. Finally, we discuss the application of the LEC protocol on dissipative preparation of quantum states with topological phases.
Authors: Chayadon Lumbut, Donlapark Ponnoprat
Abstract: This study investigates privacy leakage in dimensionality reduction methods through a novel machine learning-based reconstruction attack. Employing an informed adversary threat model, we develop a neural network capable of reconstructing high-dimensional data from low-dimensional embeddings. We evaluate six popular dimensionality reduction techniques: PCA, sparse random projection (SRP), multidimensional scaling (MDS), Isomap, t-SNE, and UMAP. Using both MNIST and NIH Chest X-ray datasets, we perform a qualitative analysis to identify key factors affecting reconstruction quality. Furthermore, we assess the effectiveness of an additive noise mechanism in mitigating these reconstruction attacks. Our experimental results on both datasets reveal that the attack is effective against deterministic methods (PCA and Isomap), but ineffective against methods that employ random initialization (SRP, MDS, t-SNE and UMAP). When adding the images with large noises before performing PCA or Isomap, the attack produced severely distorted reconstructions. In contrast, for the other four methods, the reconstructions still show some recognizable features, though they bear little resemblance to the original images.
Authors: Yuejiang Liu, Jubayer Ibn Hamid, Annie Xie, Yoonho Lee, Maximilian Du, Chelsea Finn
Abstract: Predicting and executing a sequence of actions without intermediate replanning, known as action chunking, is increasingly used in robot learning from human demonstrations. Yet, its reported effects on the learned policy are inconsistent: some studies find it crucial for achieving strong results, while others observe decreased performance. In this paper, we first dissect how action chunking impacts the divergence between a learner and a demonstrator. We find that action chunking allows the learner to better capture the temporal dependencies in demonstrations but at the cost of reduced reactivity in stochastic environments. To address this tradeoff, we propose Bidirectional Decoding (BID), a test-time inference algorithm that bridges action chunking with closed-loop operations. BID samples multiple predictions at each time step and searches for the optimal one based on two criteria: (i) backward coherence, which favors samples that align with previous decisions; (ii) forward contrast, which seeks samples of high likelihood for future plans. By coupling decisions within and across action chunks, BID promotes consistency over time while maintaining reactivity to unexpected changes. Experimental results show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks. Code and videos are available at https://bid-robot.github.io.
Authors: Enrique Luna Villagomez, Vladimir Mahalec
Abstract: This work introduces a novel control-aware distributed process monitoring methodology based on modules comprised of clusters of interacting measurements. The methodology relies on the process flow diagram (PFD) and control system structure without requiring cross-correlation data to create monitoring modules. The methodology is validated on the Tennessee Eastman Process benchmark using full Principal Component Analysis (f-PCA) in the monitoring modules. The results are comparable to nonlinear techniques implemented in a centralized manner such as Kernel PCA (KPCA), Autoencoders (AE), and Recurrent Neural Networks (RNN), or distributed techniques like the Distributed Canonical Correlation Analysis (DCCA). Temporal plots of fault detection by different modules show clearly the magnitude and propagation of the fault through each module, pinpointing the module where the fault originates, and separating controllable faults from other faults. This information, combined with PCA contribution plots, helps detection and identification as effectively as more complex nonlinear centralized or distributed methods.
Authors: Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Ling Liu
Abstract: Recent research demonstrates that the nascent fine-tuning-as-a-service business model exposes serious safety concerns -- fine-tuning over a few harmful data uploaded by the users can compromise the safety alignment of the model. The attack, known as harmful fine-tuning attack, has raised a broad research interest among the community. However, as the attack is still new, \textbf{we observe that there are general misunderstandings within the research community.} To clear up concern, this paper provide a comprehensive overview to three aspects of harmful fine-tuning: attacks setting, defense design and evaluation methodology. Specifically, we first present the threat model of the problem, and introduce the harmful fine-tuning attack and its variants. Then we systematically survey the existing literature on attacks/defenses/mechanical analysis of the problem. Finally, we introduce the evaluation methodology and outline future research directions that might contribute to the development of the field. Additionally, we present a list of questions of interest, which might be useful to refer to when reviewers in the peer review process question the realism of the experiment/attack/defense setting. A curated list of relevant papers is maintained and made accessible at: https://github.com/git-disl/awesome_LLM-harmful-fine-tuning-papers.
URLs: https://github.com/git-disl/awesome_LLM-harmful-fine-tuning-papers.
Authors: Juno Nam, Sulin Liu, Gavin Winter, KyuJung Jun, Soojung Yang, Rafael G\'omez-Bombarelli
Abstract: We introduce LiFlow, a generative framework to accelerate molecular dynamics (MD) simulations for crystalline materials that formulates the task as conditional generation of atomic displacements. The model uses flow matching, with a Propagator submodel to generate atomic displacements and a Corrector to locally correct unphysical geometries, and incorporates an adaptive prior based on the Maxwell-Boltzmann distribution to account for chemical and thermal conditions. We benchmark LiFlow on a dataset comprising 25-ps trajectories of lithium diffusion across 4,186 solid-state electrolyte (SSE) candidates at four temperatures. The model obtains a consistent Spearman rank correlation of 0.7-0.8 for lithium mean squared displacement (MSD) predictions on unseen compositions. Furthermore, LiFlow generalizes from short training trajectories to larger supercells and longer simulations while maintaining high accuracy. With speed-ups of up to 600,000$\times$ compared to first-principles methods, LiFlow enables scalable simulations at significantly larger length and time scales.
Authors: Lei Xu, Mohammed Asad Karim, Saket Dingliwal, Aparna Elangovan
Abstract: Large language models (LLMs) can generate fluent summaries across domains using prompting techniques, reducing the need to train models for summarization applications. However, crafting effective prompts that guide LLMs to generate summaries with the appropriate level of detail and writing style remains a challenge. In this paper, we explore the use of salient information extracted from the source document to enhance summarization prompts. We show that adding keyphrases in prompts can improve ROUGE F1 and recall, making the generated summaries more similar to the reference and more complete. The number of keyphrases can control the precision-recall trade-off. Furthermore, our analysis reveals that incorporating phrase-level salient information is superior to word- or sentence-level. However, the impact on hallucination is not universally positive across LLMs. To conduct this analysis, we introduce Keyphrase Signal Extractor (SigExt), a lightweight model that can be finetuned to extract salient keyphrases. By using SigExt, we achieve consistent ROUGE improvements across datasets and open-weight and proprietary LLMs without any LLM customization. Our findings provide insights into leveraging salient information in building prompt-based summarization systems. We release our code at \url{https://github.com/amazon-science/SigExt}
Authors: Alireza Rezazadeh, Zichao Li, Wei Wei, Yujia Bao
Abstract: Recent advancements in large language models have significantly improved their context windows, yet challenges in effective long-term memory management remain. We introduce MemTree, an algorithm that leverages a dynamic, tree-structured memory representation to optimize the organization, retrieval, and integration of information, akin to human cognitive schemas. MemTree organizes memory hierarchically, with each node encapsulating aggregated textual content, corresponding semantic embeddings, and varying abstraction levels across the tree's depths. Our algorithm dynamically adapts this memory structure by computing and comparing semantic embeddings of new and existing information to enrich the model's context-awareness. This approach allows MemTree to handle complex reasoning and extended interactions more effectively than traditional memory augmentation methods, which often rely on flat lookup tables. Evaluations on benchmarks for multi-turn dialogue understanding and document question answering show that MemTree significantly enhances performance in scenarios that demand structured memory management.
Authors: Nataliya Sevryugina, Serena Costanzo, Stephen de Bruyn Kops, Colm-cille Caulfield, Iraj Mortazavi, Taraneh Sayadi
Abstract: Computational Fluid Dynamics (CFD) is an indispensable method of fluid modelling in engineering applications, reducing the need for physical prototypes and testing for tasks such as design optimisation and performance analysis. Depending on the complexity of the system under consideration, models ranging from low to high fidelity can be used for prediction, allowing significant speed-up. However, the choice of model requires information about the actual dynamics of the flow regime. Correctly identifying the regions/clusters of flow that share the same dynamics has been a challenging research topic to date. In this study, we propose a novel hybrid approach to flow clustering. It consists of characterising each sample point of the system with equation-based features, i.e. features are budgets that represent the contribution of each term from the original governing equation to the local dynamics at each sample point. This was achieved by applying the Sparse Identification of Nonlinear Dynamical systems (SINDy) method pointwise to time evolution data. The method proceeds with equation-based clustering using the Girvan-Newman algorithm. This allows the detection of communities that share the same physical dynamics. The algorithm is implemented in both Eulerian and Lagrangian frameworks. In the Lagrangian, i.e. dynamic approach, the clustering is performed on the trajectory of each point, allowing the change of clusters to be represented also in time. The performance of the algorithm is first tested on a flow around a cylinder. The construction of the dynamic clusters in this test case clearly shows the evolution of the wake from the steady state solution through the transient to the oscillatory solution. Dynamic clustering was then successfully tested on turbulent flow data. Two distinct and well-defined clusters were identified and their temporal evolution was reconstructed.
Authors: Ming Yin, Jingyang Zhang, Jingwei Sun, Minghong Fang, Hai Li, Yiran Chen
Abstract: Model merging is an emerging technique that integrates multiple models fine-tuned on different tasks to create a versatile model that excels in multiple domains. This scheme, in the meantime, may open up backdoor attack opportunities where one single malicious model can jeopardize the integrity of the merged model. Existing works try to demonstrate the risk of such attacks by assuming substantial computational resources, focusing on cases where the attacker can fully fine-tune the pre-trained model. Such an assumption, however, may not be feasible given the increasing size of machine learning models. In practice where resources are limited and the attacker can only employ techniques like Low-Rank Adaptation (LoRA) to produce the malicious model, it remains unclear whether the attack can still work and pose threats. In this work, we first identify that the attack efficacy is significantly diminished when using LoRA for fine-tuning. Then, we propose LoBAM, a method that yields high attack success rate with minimal training resources. The key idea of LoBAM is to amplify the malicious weights in an intelligent way that effectively enhances the attack efficacy. We demonstrate that our design can lead to improved attack success rate through both theoretical proof and extensive empirical experiments across various model merging scenarios. Moreover, we show that our method has strong stealthiness and is difficult to detect.
Authors: Yuan Lu
Abstract: This paper introduces the Density-Calibrated Conformal Quantile Regression (CQR-d) method, a novel approach for constructing prediction intervals that adapts to varying uncertainty across the feature space. Building upon conformal quantile regression, CQR-d incorporates local information through a weighted combination of local and global conformity scores, where the weights are determined by local data density. We prove that CQR-d provides valid marginal coverage at level $1 - \alpha - \epsilon$, where $\epsilon$ represents a small tolerance from numerical optimization. Through extensive simulation studies and an application to the a heteroscedastic dataset available in R, we demonstrate that CQR-d maintains the desired coverage while producing substantially narrower prediction intervals compared to standard conformal quantile regression (CQR). The method's effectiveness is particularly pronounced in settings with clear local uncertainty patterns, making it a valuable tool for prediction tasks in heterogeneous data environments.
Authors: Taesung Kwon, Jong Chul Ye
Abstract: In this paper, we propose a novel framework for solving high-definition video inverse problems using latent image diffusion models. Building on recent advancements in spatio-temporal optimization for video inverse problems using image diffusion models, our approach leverages latent-space diffusion models to achieve enhanced video quality and resolution. To address the high computational demands of processing high-resolution frames, we introduce a pseudo-batch consistent sampling strategy, allowing efficient operation on a single GPU. Additionally, to improve temporal consistency, we present batch-consistent inversion, an initialization technique that incorporates informative latents from the measurement frame. By integrating with SDXL, our framework achieves state-of-the-art video reconstruction across a wide range of spatio-temporal inverse problems, including complex combinations of frame averaging and various spatial degradations, such as deblurring, super-resolution, and inpainting. Unlike previous methods, our approach supports multiple aspect ratios (landscape, vertical, and square) and delivers HD-resolution reconstructions (exceeding 1280x720) in under 2.5 minutes on a single NVIDIA 4090 GPU.
Authors: Xiaoyan Xing, Konrad Groh, Sezer Karaoglu, Theo Gevers, Anand Bhattad
Abstract: We introduce LumiNet, a novel architecture that leverages generative models and latent intrinsic representations for effective lighting transfer. Given a source image and a target lighting image, LumiNet synthesizes a relit version of the source scene that captures the target's lighting. Our approach makes two key contributions: a data curation strategy from the StyleGAN-based relighting model for our training, and a modified diffusion-based ControlNet that processes both latent intrinsic properties from the source image and latent extrinsic properties from the target image. We further improve lighting transfer through a learned adaptor (MLP) that injects the target's latent extrinsic properties via cross-attention and fine-tuning. Unlike traditional ControlNet, which generates images with conditional maps from a single scene, LumiNet processes latent representations from two different images - preserving geometry and albedo from the source while transferring lighting characteristics from the target. Experiments demonstrate that our method successfully transfers complex lighting phenomena including specular highlights and indirect illumination across scenes with varying spatial layouts and materials, outperforming existing approaches on challenging indoor scenes using only images as input.
Authors: Ivory Yang, Weicheng Ma, Soroush Vosoughi
Abstract: The preservation and revitalization of endangered and extinct languages is a meaningful endeavor, conserving cultural heritage while enriching fields like linguistics and anthropology. However, these languages are typically low-resource, making their reconstruction labor-intensive and costly. This challenge is exemplified by N\"ushu, a rare script historically used by Yao women in China for self-expression within a patriarchal society. To address this challenge, we introduce N\"ushuRescue, an AI-driven framework designed to train large language models (LLMs) on endangered languages with minimal data. N\"ushuRescue automates evaluation and expands target corpora to accelerate linguistic revitalization. As a foundational component, we developed NCGold, a 500-sentence N\"ushu-Chinese parallel corpus, the first publicly available dataset of its kind. Leveraging GPT-4-Turbo, with no prior exposure to N\"ushu and only 35 short examples from NCGold, N\"ushuRescue achieved 48.69\% translation accuracy on 50 withheld sentences and generated NCSilver, a set of 98 newly translated modern Chinese sentences of varying lengths. A sample of both NCGold and NCSilver is included in the Supplementary Materials. Additionally, we developed FastText-based and Seq2Seq models to further support research on N\"ushu. N\"ushuRescue provides a versatile and scalable tool for the revitalization of endangered languages, minimizing the need for extensive human input.
Authors: Wenxuan Huang, Zijie Zhai, Yunhang Shen, Shaoshen Cao, Fei Zhao, Xiangfeng Xu, Zheyu Ye, Shaohui Lin
Abstract: Multimodal Large Language Models (MLLMs) have achieved remarkable success in vision understanding, reasoning, and interaction. However, the inference computation and memory increase progressively with the generation of output tokens during decoding, directly affecting the efficacy of MLLMs. Existing methods attempt to reduce the vision context redundancy to achieve efficient MLLMs. Unfortunately, the efficiency benefits of the vision context reduction in the prefill stage gradually diminish during the decoding stage. To address this problem, we proposed a dynamic vision-language context sparsification framework Dynamic-LLaVA, which dynamically reduces the redundancy of vision context in the prefill stage and decreases the memory and computation overhead of the generated language context during decoding. Dynamic-LLaVA designs a tailored sparsification inference scheme for different inference modes, i.e., prefill, decoding with and without KV cache, to achieve efficient inference of MLLMs. In practice, Dynamic-LLaVA can reduce computation consumption by $\sim$75\% in the prefill stage. Meanwhile, throughout the entire generation process of MLLMs, Dynamic-LLaVA reduces the $\sim$50\% computation consumption under decoding without KV cache, while saving $\sim$50\% GPU memory overhead when decoding with KV cache, due to the vision-language context sparsification. Extensive experiments also demonstrate that Dynamic-LLaVA achieves efficient inference for MLLMs with negligible understanding and generation ability degradation or even performance gains compared to the full-context inference baselines. Code is available at https://github.com/Osilly/dynamic_llava .
Authors: 01. AI, :, Alan Wake, Albert Wang, Bei Chen, C. X. Lv, Chao Li, Chengen Huang, Chenglin Cai, Chujie Zheng, Daniel Cooper, Ethan Dai, Fan Zhou, Feng Hu, Heng Ji, Howard Qiu, Jiangcheng Zhu, Jun Tian, Katherine Su, Lihuan Zhang, Liying Li, Ming Song, Mou Li, Peng Liu, Qichen Hu, Shawn Wang, Shijun Zhou, Shiyong Li, Tianhang Zhu, Wen Xie, Xiang He, Xiaobo Chen, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Yanpeng Li, Yongke Zhao, Yongzhen Luo, Yuchi Xu, Yuxuan Sha, Zhaodong Yan, Zhiyuan Liu, Zirui Zhang
Abstract: This technical report presents Yi-Lightning, our latest flagship large language model (LLM). It achieves exceptional performance, ranking 6th overall on Chatbot Arena, with particularly strong results (2nd to 4th place) in specialized categories including Chinese, Math, Coding, and Hard Prompts. Yi-Lightning leverages an enhanced Mixture-of-Experts (MoE) architecture, featuring advanced expert segmentation and routing mechanisms coupled with optimized KV-caching techniques. Our development process encompasses comprehensive pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF), where we devise deliberate strategies for multi-stage training, synthetic data construction, and reward modeling. Furthermore, we implement RAISE (Responsible AI Safety Engine), a four-component framework to address safety issues across pre-training, post-training, and serving phases. Empowered by our scalable super-computing infrastructure, all these innovations substantially reduce training, deployment and inference costs while maintaining high-performance standards. With further evaluations on public academic benchmarks, Yi-Lightning demonstrates competitive performance against top-tier LLMs, while we observe a notable disparity between traditional, static benchmark results and real-world, dynamic human preferences. This observation prompts a critical reassessment of conventional benchmarks' utility in guiding the development of more intelligent and powerful AI systems for practical applications. Yi-Lightning is now available through our developer platform at https://platform.lingyiwanwu.com.
Authors: Kaixin Wu, Yixin Ji, Zeyuan Chen, Qiang Wang, Cunxiang Wang, Hong Liu, Baijun Ji, Jia Xu, Zhongyi Liu, Jinjie Gu, Yuan Zhou, Linjian Mo
Abstract: Relevance modeling between queries and items stands as a pivotal component in commercial search engines, directly affecting the user experience. Given the remarkable achievements of large language models (LLMs) in various natural language processing (NLP) tasks, LLM-based relevance modeling is gradually being adopted within industrial search systems. Nevertheless, foundational LLMs lack domain-specific knowledge and do not fully exploit the potential of in-context learning. Furthermore, structured item text remains underutilized, and there is a shortage in the supply of corresponding queries and background knowledge. We thereby propose CPRM (Continual Pre-training for Relevance Modeling), a framework designed for the continual pre-training of LLMs to address these issues. Our CPRM framework includes three modules: 1) employing both queries and multi-field item to jointly pre-train for enhancing domain knowledge, 2) applying in-context pre-training, a novel approach where LLMs are pre-trained on a sequence of related queries or items, and 3) conducting reading comprehension on items to produce associated domain knowledge and background information (e.g., generating summaries and corresponding queries) to further strengthen LLMs. Results on offline experiments and online A/B testing demonstrate that our model achieves convincing performance compared to strong baselines.
Authors: Hong Lin, Shixin Wan, Zhongle Xie, Ke Chen, Meihui Zhang, Lidan Shou, Gang Chen
Abstract: Over the recent years, Shapley value (SV), a solution concept from cooperative game theory, has found numerous applications in data analytics (DA). This paper provides the first comprehensive study of SV used throughout the DA workflow, which involves three main steps: data fabric, data exploration, and result reporting. We summarize existing versatile forms of SV used in these steps by a unified definition and clarify the essential functionalities that SV can provide for data scientists. We categorize the arts in this field based on the technical challenges they tackled, which include computation efficiency, approximation error, privacy preservation, and appropriate interpretations. We discuss these challenges and analyze the corresponding solutions. We also implement SVBench, the first open-sourced benchmark for developing SV applications, and conduct experiments on six DA tasks to validate our analysis and discussions. Based on the qualitative and quantitative results, we identify the limitations of current efforts for applying SV to DA and highlight the directions of future research and engineering.
Authors: Koen Minartz, Fleur Hendriks, Simon Martinus Koop, Alessandro Corbetta, Vlado Menkovski
Abstract: Understanding the dynamics of pedestrian crowds is an outstanding challenge crucial for designing efficient urban infrastructure and ensuring safe crowd management. To this end, both small-scale laboratory and large-scale real-world measurements have been used. However, these approaches respectively lack statistical resolution and parametric controllability, both essential to discovering physical relationships underlying the complex stochastic dynamics of crowds. Here, we establish an investigation paradigm that offers laboratory-like controllability, while ensuring the statistical resolution of large-scale real-world datasets. Using our data-driven Neural Crowd Simulator (NeCS), which we train on large-scale data and validate against key statistical features of crowd dynamics, we show that we can perform effective surrogate crowd dynamics experiments without training on specific scenarios. We not only reproduce known experimental results on pairwise avoidance, but also uncover the vision-guided and topological nature of N-body interactions. These findings show how virtual experiments based on neural simulation enable data-driven scientific discovery.
Authors: Wenhan Dong, Chao Lin, Xinlei He, Xinyi Huang, Shengmin Xu
Abstract: Privacy-preserving federated learning (PPFL) aims to train a global model for multiple clients while maintaining their data privacy. However, current PPFL protocols exhibit one or more of the following insufficiencies: considerable degradation in accuracy, the requirement for sharing keys, and cooperation during the key generation or decryption processes. As a mitigation, we develop the first protocol that utilizes neural networks to implement PPFL, as well as incorporating an Aggregatable Hybrid Encryption scheme tailored to the needs of PPFL. We name these networks as Homomorphic Adversarial Networks (HANs) which demonstrate that neural networks are capable of performing tasks similar to multi-key homomorphic encryption (MK-HE) while solving the problems of key distribution and collaborative decryption. Our experiments show that HANs are robust against privacy attacks. Compared with non-private federated learning, experiments conducted on multiple datasets demonstrate that HANs exhibit a negligible accuracy loss (at most 1.35%). Compared to traditional MK-HE schemes, HANs increase encryption aggregation speed by 6,075 times while incurring a 29.2 times increase in communication overhead.