Authors: Zhiming Xu, Suorong Yang, Baile Xu, Jian Zhao, Furao Shen
Abstract: Class-incremental learning (CIL) aims to acquire new classes while conserving historical knowledge incrementally. Despite existing pre-trained model (PTM) based methods performing excellently in CIL, it is better to fine-tune them on downstream incremental tasks with massive patterns unknown to PTMs. However, using task streams for fine-tuning could lead to catastrophic forgetting that will erase the knowledge in PTMs. This paper proposes the Dual Prototype network for Task-wise Adaption (DPTA) of PTM-based CIL. For each incremental learning task, a task-wise adapter module is built to fine-tune the PTM, where the center-adapt loss forces the representation to be more centrally clustered and class separable. The dual prototype network improves the prediction process by enabling test-time adapter selection, where the raw prototypes deduce several possible task indexes of test samples to select suitable adapter modules for PTM, and the augmented prototypes that could separate highly correlated classes are utilized to determine the final result. Experiments on several benchmark datasets demonstrate the state-of-the-art performance of DPTA. The code will be open-sourced after the paper is published.
Authors: Xuanbing Zhu, Dunbin Shen, Zhongwen Rao, Huiyi Ma, Yingguang Hao, Hongyu Wang
Abstract: Multivariate time series data provide a robust framework for future predictions by leveraging information across multiple dimensions, ensuring broad applicability in practical scenarios. However, their high dimensionality and mixing patterns pose significant challenges in establishing an interpretable and explicit mapping between historical and future series, as well as extracting long-range feature dependencies. To address these challenges, we propose a channel-time dual unmixing network for multivariate time series forecasting (named MTS-UNMixer), which decomposes the entire series into critical bases and coefficients across both the time and channel dimensions. This approach establishes a robust sharing mechanism between historical and future series, enabling accurate representation and enhancing physical interpretability. Specifically, MTS-UNMixers represent sequences over time as a mixture of multiple trends and cycles, with the time-correlated representation coefficients shared across both historical and future time periods. In contrast, sequence over channels can be decomposed into multiple tick-wise bases, which characterize the channel correlations and are shared across the whole series. To estimate the shared time-dependent coefficients, a vanilla Mamba network is employed, leveraging its alignment with directional causality. Conversely, a bidirectional Mamba network is utilized to model the shared channel-correlated bases, accommodating noncausal relationships. Experimental results show that MTS-UNMixers significantly outperform existing methods on multiple benchmark datasets. The code is available at https://github.com/ZHU-0108/MTS-UNMixers.
Authors: Debo Cheng (University of South Australia), Ziqi Xu (RMIT University), Jiuyong Li (University of South Australia), Lin Liu (University of South Australia), Thuc duy Le (University of South Australia), Xudong Guo (University of South Australia), Shichao Zhang (Guangxi Normal University)
Abstract: Querying causal effects from time-series data is important across various fields, including healthcare, economics, climate science, and epidemiology. However, this task becomes complex in the existence of time-varying latent confounders, which affect both treatment and outcome variables over time and can introduce bias in causal effect estimation. Traditional instrumental variable (IV) methods are limited in addressing such complexities due to the need for predefined IVs or strong assumptions that do not hold in dynamic settings. To tackle these issues, we develop a novel Time-varying Conditional Instrumental Variables (CIV) for Debiasing causal effect estimation, referred to as TDCIV. TDCIV leverages Long Short-Term Memory (LSTM) and Variational Autoencoder (VAE) models to disentangle and learn the representations of time-varying CIV and its conditioning set from proxy variables without prior knowledge. Under the assumptions of the Markov property and availability of proxy variables, we theoretically establish the validity of these learned representations for addressing the biases from time-varying latent confounders, thus enabling accurate causal effect estimation. Our proposed TDCIV is the first to effectively learn time-varying CIV and its associated conditioning set without relying on domain-specific knowledge.
Authors: Pirzada Suhail, Hao Tang, Amit Sethi
Abstract: Neural networks have emerged as powerful tools across various applications, yet their decision-making process often remains opaque, leading to them being perceived as "black boxes." This opacity raises concerns about their interpretability and reliability, especially in safety-critical scenarios. Network inversion techniques offer a solution by allowing us to peek inside these black boxes, revealing the features and patterns learned by the networks behind their decision-making processes and thereby provide valuable insights into how neural networks arrive at their conclusions, making them more interpretable and trustworthy. This paper presents a simple yet effective approach to network inversion using a meticulously conditioned generator that learns the data distribution in the input space of the trained neural network, enabling the reconstruction of inputs that would most likely lead to the desired outputs. To capture the diversity in the input space for a given output, instead of simply revealing the conditioning labels to the generator, we encode the conditioning label information into vectors and intermediate matrices and further minimize the cosine similarity between features of the generated images. Additionally, we incorporate feature orthogonality as a regularization term to boost image diversity which penalises the deviations of the Gram matrix of the features from the identity matrix, ensuring orthogonality and promoting distinct, non-redundant representations for each label. The paper concludes by exploring immediate applications of the proposed network inversion approach in interpretability, out-of-distribution detection, and training data reconstruction.
Authors: Gili Rosenberg, J. Kyle Brubaker, Martin J. A. Schuetz, Elton Yechao Zhu, Serdar Kad{\i}o\u{g}lu, Sima E. Borujeni, Helmut G. Katzgraber
Abstract: Pruning neural networks, which involves removing a fraction of their weights, can often maintain high accuracy while significantly reducing model complexity, at least up to a certain limit. We present a neural network pruning technique that builds upon the Combinatorial Brain Surgeon, but solves an optimization problem over a subset of the network weights in an iterative, block-wise manner using block coordinate descent. The iterative, block-based nature of this pruning technique, which we dub ``iterative Combinatorial Brain Surgeon'' (iCBS) allows for scalability to very large models, including large language models (LLMs), that may not be feasible with a one-shot combinatorial optimization approach. When applied to large models like Mistral and DeiT, iCBS achieves higher performance metrics at the same density levels compared to existing pruning methods such as Wanda. This demonstrates the effectiveness of this iterative, block-wise pruning method in compressing and optimizing the performance of large deep learning models, even while optimizing over only a small fraction of the weights. Moreover, our approach allows for a quality-time (or cost) tradeoff that is not available when using a one-shot pruning technique alone. The block-wise formulation of the optimization problem enables the use of hardware accelerators, potentially offsetting the increased computational costs compared to one-shot pruning methods like Wanda. In particular, the optimization problem solved for each block is quantum-amenable in that it could, in principle, be solved by a quantum computer.
Authors: Armin W. Thomas, Rom Parnichkun, Alexander Amini, Stefano Massaroli, Michael Poli
Abstract: Iterative improvement of model architectures is fundamental to deep learning: Transformers first enabled scaling, and recent advances in model hybridization have pushed the quality-efficiency frontier. However, optimizing architectures remains challenging and expensive. Current automated or manual approaches fall short, largely due to limited progress in the design of search spaces and due to the simplicity of resulting patterns and heuristics. In this work, we propose a new approach for the synthesis of tailored architectures (STAR). Our approach combines a novel search space based on the theory of linear input-varying systems, supporting a hierarchical numerical encoding into architecture genomes. STAR genomes are automatically refined and recombined with gradient-free, evolutionary algorithms to optimize for multiple model quality and efficiency metrics. Using STAR, we optimize large populations of new architectures, leveraging diverse computational units and interconnection patterns, improving over highly-optimized Transformers and striped hybrid models on the frontier of quality, parameter size, and inference cache for autoregressive language modeling.
Authors: Indranil Halder
Abstract: Diffusion-based generative models demonstrate a transition from memorizing the training dataset to a non-memorization regime as the size of the training set increases. Here, we begin by introducing a mathematically precise definition of this transition in terms of a relative distance: the model is said to be in the non-memorization/`generalization' regime if the generated distribution is almost surely far from the probability distribution associated with a Gaussian kernel approximation to the training dataset, relative to the sampling distribution. Then, we develop an analytically tractable diffusion model and establish a lower bound on Kullback-Leibler divergence between the generated and sampling distribution. The model also features the transition, according to our definition in terms of the relative distance, when the training data is sampled from an isotropic Gaussian distribution. Further, our study reveals that this transition occurs when the individual distance between the generated and underlying sampling distribution begins to decrease with the addition of more training samples. This is to be contrasted with an alternative scenario, where the model's memorization performance degrades, but generalization performance doesn't improve. We also provide empirical evidence indicating that realistic diffusion models exhibit the same alignment of scales.
Authors: Meghan Plumridge, Rasmus Mar{\aa}k, Chiara Ceccobello, Pablo G\'omez, Gabriele Meoni, Filip Svoboda, Nicholas D. Lane
Abstract: Segmentation of Earth observation (EO) satellite data is critical for natural hazard analysis and disaster response. However, processing EO data at ground stations introduces delays due to data transmission bottlenecks and communication windows. Using segmentation models capable of near-real-time data analysis onboard satellites can therefore improve response times. This study presents a proof-of-concept using MobileSAM, a lightweight, pre-trained segmentation model, onboard Unibap iX10-100 satellite hardware. We demonstrate the segmentation of water bodies from Sentinel-2 satellite imagery and integrate MobileSAM with PASEOS, an open-source Python module that simulates satellite operations. This integration allows us to evaluate MobileSAM's performance under simulated conditions of a satellite constellation. Our research investigates the potential of fine-tuning MobileSAM in a decentralised way onboard multiple satellites in rapid response to a disaster. Our findings show that MobileSAM can be rapidly fine-tuned and benefits from decentralised learning, considering the constraints imposed by the simulated orbital environment. We observe improvements in segmentation performance with minimal training data and fast fine-tuning when satellites frequently communicate model updates. This study contributes to the field of onboard AI by emphasising the benefits of decentralised learning and fine-tuning pre-trained models for rapid response scenarios. Our work builds on recent related research at a critical time; as extreme weather events increase in frequency and magnitude, rapid response with onboard data analysis is essential.
Authors: Allan M. de Souza, Filipe Maciel, Joahannes B. D. da Costa, Luiz F. Bittencourt, Eduardo Cerqueira, Antonio A. F. Loureiro, Leandro A. Villas
Abstract: Federated Learning (FL) is a distributed approach to collaboratively training machine learning models. FL requires a high level of communication between the devices and a central server, thus imposing several challenges, including communication bottlenecks and network scalability. This article introduces ACSP-FL (https://github.com/AllanMSouza/ACSP-FL), a solution to reduce the overall communication and computation costs for training a model in FL environments. ACSP-FL employs a client selection strategy that dynamically adapts the number of devices training the model and the number of rounds required to achieve convergence. Moreover, ACSP-FL enables model personalization to improve clients performance. A use case based on human activity recognition datasets aims to show the impact and benefits of ACSP-FL when compared to state-of-the-art approaches. Experimental evaluations show that ACSP-FL minimizes the overall communication and computation overheads to train a model and converges the system efficiently. In particular, ACSP-FL reduces communication up to 95% compared to literature approaches while providing good convergence even in scenarios where data is distributed differently, non-independent and identical way between client devices.
Authors: Christopher Holder, Anthony Bagnall
Abstract: Time series data has become increasingly prevalent across numerous domains, driving a growing demand for time series machine learning techniques. Among these, time series clustering (TSCL) stands out as one of the most popular machine learning tasks. TSCL serves as a powerful exploratory analysis tool and is also employed as a preprocessing step or subroutine for various tasks, including anomaly detection, segmentation, and classification. The most popular TSCL algorithms are either fast (in terms of run time) but perform poorly on benchmark problems, or perform well on benchmarks but scale poorly. We present a new TSCL algorithm, the $k$-means (K) accelerated (A) Stochastic subgradient (S) Barycentre (B) Average (A) (KASBA) clustering algorithm. KASBA is a $k$-means clustering algorithm that uses the Move-Split-Merge (MSM) elastic distance at all stages of clustering, applies a randomised stochastic subgradient gradient descent to find barycentre centroids, links each stage of clustering to accelerate convergence and exploits the metric property of MSM distance to avoid a large proportion of distance calculations. It is a versatile and scalable clusterer designed for real-world TSCL applications. It allows practitioners to balance run time and clustering performance. We demonstrate through extensive experimentation that KASBA produces significantly better clustering than the faster state of the art clusterers and is offers orders of magnitude improvement in run time over the most performant $k$-means alternatives.
Authors: Hongni Jin, Kenneth M. Merz Jr
Abstract: A key step in interpreting gas-phase ion mobility coupled with mass spectrometry (IM-MS) data for unknown structure prediction involves identifying the most favorable protonated structure. In the gas phase, the site of protonation is determined using proton affinity (PA) measurements. Currently, mass spectrometry and ab initio computation methods are widely used to evaluate PA; however, both methods are resource-intensive and time-consuming. Therefore, there is a critical need for efficient methods to estimate PA, enabling the rapid identification of the most favorable protonation site in complex organic molecules with multiple proton binding sites. In this work, we developed a fast and accurate method for PA prediction by using multiple descriptors in combination with machine learning (ML) models. Using a comprehensive set of 186 descriptors, our model demonstrated strong predictive performance, with an R2 of 0.96 and a MAE of 2.47kcal/mol, comparable to experimental uncertainty. Furthermore, we designed quantum circuits as feature encoders for a classical neural network. To evaluate the effectiveness of this hybrid quantum-classical model, we compared its performance with traditional ML models using a reduced feature set derived from the full set. The result showed that this hybrid model achieved consistent performance comparable to traditional ML models with the same reduced feature set on both a noiseless simulator and real quantum hardware, highlighting the potential of quantum machine learning for accurate and efficient PA predictions.
Authors: Ahmad Ahmad, Mehdi Kermanshah, Kevin Leahy, Zachary Serlin, Ho Chit Siu, Makai Mann, Cristian-Ioan Vasile, Roberto Tron, Calin Belta
Abstract: In this paper, we tackle the challenging problem of delayed rewards in reinforcement learning (RL). While Proximal Policy Optimization (PPO) has emerged as a leading Policy Gradient method, its performance can degrade under delayed rewards. We introduce two key enhancements to PPO: a hybrid policy architecture that combines an offline policy (trained on expert demonstrations) with an online PPO policy, and a reward shaping mechanism using Time Window Temporal Logic (TWTL). The hybrid architecture leverages offline data throughout training while maintaining PPO's theoretical guarantees. Building on the monotonic improvement framework of Trust Region Policy Optimization (TRPO), we prove that our approach ensures improvement over both the offline policy and previous iterations, with a bounded performance gap of $(2\varsigma\gamma\alpha^2)/(1-\gamma)^2$, where $\alpha$ is the mixing parameter, $\gamma$ is the discount factor, and $\varsigma$ bounds the expected advantage. Additionally, we prove that our TWTL-based reward shaping preserves the optimal policy of the original problem. TWTL enables formal translation of temporal objectives into immediate feedback signals that guide learning. We demonstrate the effectiveness of our approach through extensive experiments on an inverted pendulum and a lunar lander environments, showing improvements in both learning speed and final performance compared to standard PPO and offline-only approaches.
Authors: Shuhua Yu, Ding Zhou, Cong Xie, An Xu, Zhi Zhang, Xin Liu, Soummya Kar
Abstract: Pre-training Transformer models is resource-intensive, and recent studies have shown that sign momentum is an efficient technique for training large-scale deep learning models, particularly Transformers. However, its application in distributed training or federated learning remains underexplored. This paper investigates a novel communication-efficient distributed sign momentum method with local updates. Our proposed method allows for a broad class of base optimizers for local updates, and uses sign momentum in global updates, where momentum is generated from differences accumulated during local steps. We evaluate our method on the pre-training of various GPT-2 models, and the empirical results show significant improvement compared to other distributed methods with local updates. Furthermore, by approximating the sign operator with a randomized version that acts as a continuous analog in expectation, we present an $O(1/\sqrt{T})$ convergence for one instance of the proposed method for nonconvex smooth functions.
Authors: Andreas Karatzas, Dimitrios Stamoulis, Iraklis Anagnostopoulos
Abstract: Modern edge data centers simultaneously handle multiple Deep Neural Networks (DNNs), leading to significant challenges in workload management. Thus, current management systems must leverage the architectural heterogeneity of new embedded systems to efficiently handle multi-DNN workloads. This paper introduces RankMap, a priority-aware manager specifically designed for multi-DNN tasks on heterogeneous embedded devices. RankMap addresses the extensive solution space of multi-DNN mapping through stochastic space exploration combined with a performance estimator. Experimental results show that RankMap achieves x3.6 higher average throughput compared to existing methods, while preventing DNN starvation under heavy workloads and improving the prioritization of specified DNNs by x57.5.
Authors: Soheila Sadeghi
Abstract: Accurate forecasting of project performance metrics is crucial for successfully managing and delivering urban road reconstruction projects. Traditional methods often rely on static baseline plans and fail to consider the dynamic nature of project progress and external factors. This research proposes a machine learning-based approach to forecast project performance metrics, such as cost variance and earned value, for each Work Breakdown Structure (WBS) category in an urban road reconstruction project. The proposed model utilizes time series forecasting techniques, including Autoregressive Integrated Moving Average (ARIMA) and Long Short-Term Memory (LSTM) networks, to predict future performance based on historical data and project progress. The model also incorporates external factors, such as weather patterns and resource availability, as features to enhance the accuracy of forecasts. By applying the predictive power of machine learning, the performance forecasting model enables proactive identification of potential deviations from the baseline plan, which allows project managers to take timely corrective actions. The research aims to validate the effectiveness of the proposed approach using a case study of an urban road reconstruction project, comparing the model's forecasts with actual project performance data. The findings of this research contribute to the advancement of project management practices in the construction industry, offering a data-driven solution for improving project performance monitoring and control.
Authors: Alan Oursland
Abstract: We present empirical evidence that neural networks with ReLU and Absolute Value activations learn distance-based representations. We independently manipulate both distance and intensity properties of internal activations in trained models, finding that both architectures are highly sensitive to small distance-based perturbations while maintaining robust performance under large intensity-based perturbations. These findings challenge the prevailing intensity-based interpretation of neural network activations and offer new insights into their learning and decision-making processes.
Authors: Shu Wan, Reepal Shah, Qi Deng, John Sabo, Huan Liu, K. Sel\c{c}uk
Abstract: Streamflow plays an essential role in the sustainable planning and management of national water resources. Traditional hydrologic modeling approaches simulate streamflow by establishing connections across multiple physical processes, such as rainfall and runoff. These data, inherently connected both spatially and temporally, possess intrinsic causal relations that can be leveraged for robust and accurate forecasting. Recently, spatio-temporal graph neural networks (STGNNs) have been adopted, excelling in various domains, such as urban traffic management, weather forecasting, and pandemic control, and they also promise advances in streamflow management. However, learning causal relationships directly from vast observational data is theoretically and computationally challenging. In this study, we employ a river flow graph as prior knowledge to facilitate the learning of the causal structure and then use the learned causal graph to predict streamflow at targeted sites. The proposed model, Causal Streamflow Forecasting (CSF) is tested in a real-world study in the Brazos River basin in Texas. Our results demonstrate that our method outperforms regular spatio-temporal graph neural networks and achieves higher computational efficiency compared to traditional simulation methods. By effectively integrating river flow graphs with STGNNs, this research offers a novel approach to streamflow prediction, showcasing the potential of combining advanced neural network techniques with domain-specific knowledge for enhanced performance in hydrologic modeling.
Authors: Yuanyuan Qi, Jueqing Lu, Xiaohao Yang, Joanne Enticott, Lan Du
Abstract: The primary challenge of multi-label active learning, differing it from multi-class active learning, lies in assessing the informativeness of an indefinite number of labels while also accounting for the inherited label correlation. Existing studies either require substantial computational resources to leverage correlations or fail to fully explore label dependencies. Additionally, real-world scenarios often require addressing intrinsic biases stemming from imbalanced data distributions. In this paper, we propose a new multi-label active learning strategy to address both challenges. Our method incorporates progressively updated positive and negative correlation matrices to capture co-occurrence and disjoint relationships within the label space of annotated samples, enabling a holistic assessment of uncertainty rather than treating labels as isolated elements. Furthermore, alongside diversity, our model employs ensemble pseudo labeling and beta scoring rules to address data imbalances. Extensive experiments on four realistic datasets demonstrate that our strategy consistently achieves more reliable and superior performance, compared to several established methods.
Authors: Tian Ye, Rajgopal Kannan, Viktor Prasanna
Abstract: Adversarial training has emerged as an effective approach to train robust neural network models that are resistant to adversarial attacks, even in low-label regimes where labeled data is scarce. In this paper, we introduce a novel semi-supervised adversarial training approach that enhances both robustness and natural accuracy by generating effective adversarial examples. Our method begins by applying linear interpolation between clean and adversarial examples to create interpolated adversarial examples that cross decision boundaries by a controlled margin. This sample-aware strategy tailors adversarial examples to the characteristics of each data point, enabling the model to learn from the most informative perturbations. Additionally, we propose a global epsilon scheduling strategy that progressively adjusts the upper bound of perturbation strengths during training. The combination of these strategies allows the model to develop increasingly complex decision boundaries with better robustness and natural accuracy. Empirical evaluations show that our approach effectively enhances performance against various adversarial attacks, such as PGD and AutoAttack.
Authors: Xiaojie Yu, Haibo Zhang, Lizhi Peng, Fengyang Sun, Jeremiah Deng
Abstract: ReduNet is a deep neural network model that leverages the principle of maximal coding rate \textbf{redu}ction to transform original data samples into a low-dimensional, linear discriminative feature representation. Unlike traditional deep learning frameworks, ReduNet constructs its parameters explicitly layer by layer, with each layer's parameters derived based on the features transformed from the preceding layer. Rather than directly using labels, ReduNet uses the similarity between each category's spanned subspace and the data samples for feature updates at each layer. This may lead to features being updated in the wrong direction, impairing the correct construction of network parameters and reducing the network's convergence speed. To address this issue, based on the geometric interpretation of the network parameters, this paper presents ESS-ReduNet to enhance the separability of each category's subspace by dynamically controlling the expansion of the overall spanned space of the samples. Meanwhile, label knowledge is incorporated with Bayesian inference to encourage the decoupling of subspaces. Finally, stability, as assessed by the condition number, serves as an auxiliary criterion for halting training. Experiments on the ESR, HAR, Covertype, and Gas datasets demonstrate that ESS-ReduNet achieves more than 10x improvement in convergence compared to ReduNet. Notably, on the ESR dataset, the features transformed by ESS-ReduNet achieve a 47\% improvement in SVM classification accuracy.
Authors: Shuli Jiang (Richard), Qiuyi (Richard), Zhang, Gauri Joshi
Abstract: We study a classical problem in private prediction, the problem of computing an $(m\epsilon, \delta)$-differentially private majority of $K$ $(\epsilon, \Delta)$-differentially private algorithms for $1 \leq m \leq K$ and $1 > \delta \geq \Delta \geq 0$. Standard methods such as subsampling or randomized response are widely used, but do they provide optimal privacy-utility tradeoffs? To answer this, we introduce the Data-dependent Randomized Response Majority (DaRRM) algorithm. It is parameterized by a data-dependent noise function $\gamma$, and enables efficient utility optimization over the class of all private algorithms, encompassing those standard methods. We show that maximizing the utility of an $(m\epsilon, \delta)$-private majority algorithm can be computed tractably through an optimization problem for any $m \leq K$ by a novel structural result that reduces the infinitely many privacy constraints into a polynomial set. In some settings, we show that DaRRM provably enjoys a privacy gain of a factor of 2 over common baselines, with fixed utility. Lastly, we demonstrate the strong empirical effectiveness of our first-of-its-kind privacy-constrained utility optimization for ensembling labels for private prediction from private teachers in image classification. Notably, our DaRRM framework with an optimized $\gamma$ exhibits substantial utility gains when compared against several baselines.
Authors: Xiaoxuan Li, Yao Liu, Ruoyu Wang, Lina Yao
Abstract: As the significance of understanding the cause-and-effect relationships among variables increases in the development of modern systems and algorithms, learning causality from observational data has become a preferred and efficient approach over conducting randomized control trials. However, purely observational data could be insufficient to reconstruct the true causal graph. Consequently, many researchers tried to utilise some form of prior knowledge to improve causal discovery process. In this context, the impressive capabilities of large language models (LLMs) have emerged as a promising alternative to the costly acquisition of prior expert knowledge. In this work, we further explore the potential of using LLMs to enhance causal discovery approaches, particularly focusing on score-based methods, and we propose a general framework to utilise the capacity of not only one but multiple LLMs to augment the discovery process.
Authors: Mingsen Du, Yanxuan Wei, Xiangwei Zheng, Cun Ji
Abstract: Recently, time series classification has attracted the attention of a large number of researchers, and hundreds of methods have been proposed. However, these methods often ignore the spatial correlations among dimensions and the local correlations among features. To address this issue, the causal and local correlations based network (CaLoNet) is proposed in this study for multivariate time series classification. First, pairwise spatial correlations between dimensions are modeled using causality modeling to obtain the graph structure. Then, a relationship extraction network is used to fuse local correlations to obtain long-term dependency features. Finally, the graph structure and long-term dependency features are integrated into the graph neural network. Experiments on the UEA datasets show that CaLoNet can obtain competitive performance compared with state-of-the-art methods.
Authors: Zan Ahmad, Shiyi Chen, Minglang Yin, Avisha Kumar, Nicolas Charon, Natalia Trayanova, Mauro Maggioni
Abstract: A computed approximation of the solution operator to a system of partial differential equations (PDEs) is needed in various areas of science and engineering. Neural operators have been shown to be quite effective at predicting these solution generators after training on high-fidelity ground truth data (e.g. numerical simulations). However, in order to generalize well to unseen spatial domains, neural operators must be trained on an extensive amount of geometrically varying data samples that may not be feasible to acquire or simulate in certain contexts (i.e., patient-specific medical data, large-scale computationally intensive simulations.) We propose that in order to learn a PDE solution operator that can generalize across multiple domains without needing to sample enough data expressive enough for all possible geometries, we can train instead a latent neural operator on just a few ground truth solution fields diffeomorphically mapped from different geometric/spatial domains to a fixed reference configuration. Furthermore, the form of the solutions is dependent on the choice of mapping to and from the reference domain. We emphasize that preserving properties of the differential operator when constructing these mappings can significantly reduce the data requirement for achieving an accurate model due to the regularity of the solution fields that the latent neural operator is training on. We provide motivating numerical experimentation that demonstrates an extreme case of this consideration by exploiting the conformal invariance of the Laplacian
Authors: Mingsen Du, Meng Chen, Yongjian Li, Cun Ji, Shoushui Wei
Abstract: Multivariate time series (MTS) classification is widely applied in fields such as industry, healthcare, and finance, aiming to extract key features from complex time series data for accurate decision-making and prediction. However, existing methods for MTS often struggle due to the challenges of effectively modeling high-dimensional data and the lack of labeled data, resulting in poor classification performance. To address this issue, we propose a heterogeneous relationships of subjects and shapelets method for semi-supervised MTS classification. This method offers a novel perspective by integrating various types of additional information while capturing the relationships between them. Specifically, we first utilize a contrast temporal self-attention module to obtain sparse MTS representations, and then model the similarities between these representations using soft dynamic time warping to construct a similarity graph. Secondly, we learn the shapelets for different subject types, incorporating both the subject features and their shapelets as additional information to further refine the similarity graph, ultimately generating a heterogeneous graph. Finally, we use a dual level graph attention network to get prediction. Through this method, we successfully transform dataset into a heterogeneous graph, integrating multiple additional information and achieving precise semi-supervised node classification. Experiments on the Human Activity Recognition, sleep stage classification and University of East Anglia datasets demonstrate that our method outperforms current state-of-the-art methods in MTS classification tasks, validating its superiority.
Authors: Anmol Dwivedi, Ali Tajer, Santiago Paternain, Nurali Virani
Abstract: Electricity grid's resiliency and climate change strongly impact one another due to an array of technical and policy-related decisions that impact both. This paper introduces a physics-informed machine learning-based framework to enhance grid's resiliency. Specifically, when encountering disruptive events, this paper designs remedial control actions to prevent blackouts. The proposed Physics-Guided Reinforcement Learning (PG-RL) framework determines effective real-time remedial line-switching actions, considering their impact on power balance, system security, and grid reliability. To identify an effective blackout mitigation policy, PG-RL leverages power-flow sensitivity factors to guide the RL exploration during agent training. Comprehensive evaluations using the Grid2Op platform demonstrate that incorporating physical signals into RL significantly improves resource utilization within electric grids and achieves better blackout mitigation policies - both of which are critical in addressing climate change.
Authors: Yi Ren, Ruge Xu, Xinfei Guo, Weikang Qian
Abstract: A widely-used technique in designing energy-efficient deep neural network (DNN) accelerators is quantization. Recent progress in this direction has reduced the bitwidths used in DNN down to 2. Meanwhile, many prior works apply approximate multipliers (AppMuls) in designing DNN accelerators to lower their energy consumption. Unfortunately, these works still assume a bitwidth much larger than 2, which falls far behind the state-of-the-art in quantization area and even challenges the meaningfulness of applying AppMuls in DNN accelerators, since a high-bitwidth AppMul consumes much more energy than a low-bitwidth exact multiplier! Thus, an important problem to study is: Can approximate multipliers be effectively applied to quantized DNN models with very low bitwidths? In this work, we give an affirmative answer to this question and present a systematic solution that achieves the answer: FAMES, a fast approximate multiplier substitution method for mixed-precision DNNs. Our experiments demonstrate an average 28.67% energy reduction on state-of-the-art mixed-precision quantized models with bitwidths as low as 2 bits and accuracy losses kept under 1%. Additionally, our approach is up to 300x faster than previous genetic algorithm-based methods.
Authors: Rahul Pandey, Ziwei Zhu, Hemant Purohit
Abstract: Effective labeled data collection plays a critical role in developing and fine-tuning robust streaming analytics systems. However, continuously labeling documents to filter relevant information poses significant challenges like limited labeling budget or lack of high-quality labels. There is a need for efficient human-in-the-loop machine learning (HITL-ML) design to improve streaming analytics systems. One particular HITL- ML approach is online active learning, which involves iteratively selecting a small set of the most informative documents for labeling to enhance the ML model performance. The performance of such algorithms can get affected due to human errors in labeling. To address these challenges, we propose ORIS, a method to perform Online active learning using Reinforcement learning-based Inclusive Sampling of documents for labeling. ORIS aims to create a novel Deep Q-Network-based strategy to sample incoming documents that minimize human errors in labeling and enhance the ML model performance. We evaluate the ORIS method on emotion recognition tasks, and it outperforms traditional baselines in terms of both human labeling performance and the ML model performance.
Authors: Temitope Adeyeha, Chetraj Pandey, Berkay Aydin
Abstract: Accurate and reliable predictions of solar flares are essential due to their potentially significant impact on Earth and space-based infrastructure. Although deep learning models have shown notable predictive capabilities in this domain, current evaluations often focus on accuracy while neglecting interpretability and reliability--factors that are especially critical in operational settings. To address this gap, we propose a novel proximity-based framework for analyzing post hoc explanations to assess the interpretability of deep learning models for solar flare prediction. Our study compares two models trained on full-disk line-of-sight (LoS) magnetogram images to predict $\geq$M-class solar flares within a 24-hour window. We employ the Guided Gradient-weighted Class Activation Mapping (Guided Grad-CAM) method to generate attribution maps from these models, which we then analyze to gain insights into their decision-making processes. To support the evaluation of explanations in operational systems, we introduce a proximity-based metric that quantitatively assesses the accuracy and relevance of local explanations when regions of interest are known. Our findings indicate that the models' predictions align with active region characteristics to varying degrees, offering valuable insights into their behavior. This framework enhances the evaluation of model interpretability in solar flare forecasting and supports the development of more transparent and reliable operational systems.
Authors: Shuhei Watanabe
Abstract: Expected Improvement (EI) is arguably the most widely used acquisition function in Bayesian optimization. However, it is often challenging to enhance the performance with EI due to its sensitivity to numerical precision. Previously, Hutter et al. (2009) tackled this problem by using Gaussian process trained on the log-transformed objective function and it was reported that this trick improves the predictive accuracy of GP, leading to substantially better performance. Although Hutter et al. (2009) offered the closed form of their EI, its intermediate derivation has not been provided so far. In this paper, we give a friendly derivation of their proposition.
Authors: Wanxue Dong, Maria De-arteaga, Maytal Saar-Tsechansky
Abstract: Biased human decisions have consequential impacts across various domains, yielding unfair treatment of individuals and resulting in suboptimal outcomes for organizations and society. In recognition of this fact, organizations regularly design and deploy interventions aimed at mitigating these biases. However, measuring human decision biases remains an important but elusive task. Organizations are frequently concerned with mistaken decisions disproportionately affecting one group. In practice, however, this is typically not possible to assess due to the scarcity of a gold standard: a label that indicates what the correct decision would have been. In this work, we propose a machine learning-based framework to assess bias in human-generated decisions when gold standard labels are scarce. We provide theoretical guarantees and empirical evidence demonstrating the superiority of our method over existing alternatives. This proposed methodology establishes a foundation for transparency in human decision-making, carrying substantial implications for managerial duties, and offering potential for alleviating algorithmic biases when human decisions are used as labels to train algorithms.
Authors: Emma Reyner-Fuentes, Esther Rituerto-Gonzalez, Carmen Pelaez-Moreno
Abstract: This study addresses the critical issue of gender-based violence's (GBV) impact on women's mental health. GBV, encompassing physical and sexual aggression, often results in long-lasting adverse effects for the victims, including anxiety, depression, post-traumatic stress disorder (PTSD), and substance abuse. Artificial Intelligence (AI)-based speech technologies have proven valuable for mental health assessments. However, these technologies experience performance challenges when confronted with speakers whose data has not been used for training. Our research presents a novel approach to speaker-agnostic detection of the gender-based violence victim condition (GBVVC), focusing on the development of robust AI models capable of generalization across diverse speakers. Leveraging advanced deep learning models and domain-adversarial training techniques, we minimize speaker identity's influence, achieving a 26.95% relative reduction in speaker identification ability while enhancing the GBVVC detection by a 6.37% relative improvement in the accuracy. This shows that models can focus on discriminative paralinguistic biomarkers that enhance the GBVVC prediction, and reduce the subject-specific traits' impact. Additionally, our model's predictions moderately correlate with pre-clinical PTSD symptoms, emphasizing the link between GBV and mental health. This work paves the way for AI-powered tools to aid mental health professionals in addressing this societal issue, offering a promising baseline for further research.
Authors: Dimitris Michailidis, Willem R\"opke, Diederik M. Roijers, Sennay Ghebreab, Fernando P. Santos
Abstract: Multi-Objective Reinforcement Learning (MORL) aims to learn a set of policies that optimize trade-offs between multiple, often conflicting objectives. MORL is computationally more complex than single-objective RL, particularly as the number of objectives increases. Additionally, when objectives involve the preferences of agents or groups, ensuring fairness is socially desirable. This paper introduces a principled algorithm that incorporates fairness into MORL while improving scalability to many-objective problems. We propose using Lorenz dominance to identify policies with equitable reward distributions and introduce {\lambda}-Lorenz dominance to enable flexible fairness preferences. We release a new, large-scale real-world transport planning environment and demonstrate that our method encourages the discovery of fair policies, showing improved scalability in two large cities (Xi'an and Amsterdam). Our methods outperform common multi-objective approaches, particularly in high-dimensional objective spaces.
Authors: Milin Zhang, Mohammad Abdi, Venkat R. Dasari, Francesco Restuccia
Abstract: Semantic Edge Computing (SEC) and Semantic Communications (SemComs) have been proposed as viable approaches to achieve real-time edge-enabled intelligence in sixth-generation (6G) wireless networks. On one hand, SemCom leverages the strength of Deep Neural Networks (DNNs) to encode and communicate the semantic information only, while making it robust to channel distortions by compensating for wireless effects. Ultimately, this leads to an improvement in the communication efficiency. On the other hand, SEC has leveraged distributed DNNs to divide the computation of a DNN across different devices based on their computational and networking constraints. Although significant progress has been made in both fields, the literature lacks a systematic view to connect both fields. In this work, we fulfill the current gap by unifying the SEC and SemCom fields. We summarize the research problems in these two fields and provide a comprehensive review of the state of the art with a focus on their technical strengths and challenges.
Authors: Jie-Jing Shao, Hao-Ran Hao, Xiao-Wen Yang, Yu-Feng Li
Abstract: Recent learning-to-imitation methods have shown promising results in planning via imitating within the observation-action space. However, their ability in open environments remains constrained, particularly in long-horizon tasks. In contrast, traditional symbolic planning excels in long-horizon tasks through logical reasoning over human-defined symbolic spaces but struggles to handle observations beyond symbolic states, such as high-dimensional visual inputs encountered in real-world scenarios. In this work, we draw inspiration from abductive learning and introduce a novel framework \textbf{AB}ductive \textbf{I}mitation \textbf{L}earning (ABIL) that integrates the benefits of data-driven learning and symbolic-based reasoning, enabling long-horizon planning. Specifically, we employ abductive reasoning to understand the demonstrations in symbolic space and design the principles of sequential consistency to resolve the conflicts between perception and reasoning. ABIL generates predicate candidates to facilitate the perception from raw observations to symbolic space without laborious predicate annotations, providing a groundwork for symbolic planning. With the symbolic understanding, we further develop a policy ensemble whose base policies are built with different logical objectives and managed through symbolic reasoning. Experiments show that our proposal successfully understands the observations with the task-relevant symbolics to assist the imitation learning. Importantly, ABIL demonstrates significantly improved data efficiency and generalization across various long-horizon tasks, highlighting it as a promising solution for long-horizon planning. Project website: \url{https://www.lamda.nju.edu.cn/shaojj/KDD25_ABIL/}.
Authors: Aladin Djuhera, Vlad C. Andrei, Amin Seffo, Holger Boche, Walid Saad
Abstract: Path planning is a complex problem for many practical applications, particularly in robotics. Existing algorithms, however, are exhaustive in nature and become increasingly complex when additional side constraints are incorporated alongside distance minimization. In this paper, a novel approach using vision language models (VLMs) is proposed for enabling path planning in complex wireless-aware environments. To this end, insights from a digital twin (DT) with real-world wireless ray tracing data are explored in order to guarantee an average path gain threshold while minimizing the trajectory length. First, traditional approaches such as A* are compared to several wireless-aware extensions, and an optimal iterative dynamic programming approach (DP-WA*) is derived, which fully takes into account all path gains and distance metrics within the DT. On the basis of these baselines, the role of VLMs as an alternative assistant for path planning is investigated, and a strategic chain-of-thought tasking (SCoTT) approach is proposed. SCoTT divides the complex planning task into several subproblems and solves each with advanced CoT prompting. Results show that SCoTT achieves very close average path gains compared to DP-WA* while at the same time yielding consistently shorter path lengths. The results also show that VLMs can be used to accelerate DP-WA* by efficiently reducing the algorithm's search space and thus saving up to 62\% in execution time. This work underscores the potential of VLMs in future digital systems as capable assistants for solving complex tasks, while enhancing user interaction and accelerating rapid prototyping under diverse wireless constraints.
Authors: Abhay Kumar Pathak, Mrityunjay Chaubey, Manjari Gupta
Abstract: Cardiovascular disease refers to any critical condition that impacts the heart. Because heart diseases can be life-threatening. Researchers are focusing on designing smart systems to accurately diagnose them based on electronic health data, with the aid of machine learning algorithms. Heart disease classification using machine learning (ML) algorithms such as Support Vector Machine(SVM), Na\"ive Bayes(NB), Decision Trees (DTs) and Random Forests (RFs) are often hindered by overfitting. These ML algorithms need extensive hyperparameter tuning. Random Search offers a faster, and, more efficient exploration of hyperparameter space, but, it may overlook optimal regions. Grid Search, though exhaustive, but, it is computationally expensive and inefficient, particularly with high-dimensional data. To address these limitations, Randomized-Grid Search, a novel hybrid optimization method is proposed that combines the global exploration strengths of Random Search with the focused, and, exhaustive search of Grid Search in the most promising regions. This hybrid approach efficiently balances exploration and exploitation. The proposed model optimizes the hyperparameter for Decision Tree model. The proposed model is applied to UCI heart disease dataset for classification. It enhances model performance, provides improved accuracy, generalization, and computational efficiency. Experimental results demonstrate that Randomized-Grid Search outperforms traditional methods by significant margins. The proposed model provides a more effective solution for machine learning applications in healthcare diagnosis.
Authors: Zhouxing Shi, Cho-Jui Hsieh, Huan Zhang
Abstract: We study the problem of learning Lyapunov-stable neural controllers which provably satisfy the Lyapunov asymptotic stability condition within a region-of-attraction. Compared to previous works which commonly used counterexample guided training on this task, we develop a new and generally formulated certified training framework named CT-BaB, and we optimize for differentiable verified bounds, to produce verification-friendly models. In order to handle the relatively large region-of-interest, we propose a novel framework of training-time branch-and-bound to dynamically maintain a training dataset of subregions throughout training, such that the hardest subregions are iteratively split into smaller ones whose verified bounds can be computed more tightly to ease the training. We demonstrate that our new training framework can produce models which can be more efficiently verified at test time. On the largest 2D quadrotor dynamical system, verification for our model is more than 5X faster compared to the baseline, while our size of region-of-attraction is 16X larger than the baseline.
Authors: Da Chang, Deliang Wang, Xiao Yang
Abstract: Weight initialization significantly impacts the convergence and performance of neural networks. While traditional methods like Xavier and Kaiming initialization are widely used, they often fall short for spiking neural networks (SNNs), which have distinct requirements compared to artificial neural networks (ANNs). To address this, we introduce \textbf{IKUN}, a variance-stabilizing initialization method integrated with surrogate gradient functions, specifically designed for SNNs. \textbf{IKUN} stabilizes signal propagation, accelerates convergence, and enhances generalization. Experiments show \textbf{IKUN} improves training efficiency by up to \textbf{50\%}, achieving \textbf{95\%} training accuracy and \textbf{91\%} generalization accuracy. Hessian analysis reveals that \textbf{IKUN}-trained models converge to flatter minima, characterized by Hessian eigenvalues near zero on the positive side, promoting better generalization. The method is open-sourced for further exploration: \href{https://github.com/MaeChd/SurrogateVarStabe}{https://github.com/MaeChd/SurrogateVarStabe}.
URLs: https://github.com/MaeChd/SurrogateVarStabe, https://github.com/MaeChd/SurrogateVarStabe
Authors: Melda Yeghaian, Zuhir Bodalal, Daan van den Broek, John B A G Haanen, Regina G H Beets-Tan, Stefano Trebeschi, Marcel A J van Gerven
Abstract: Purpose: Analyzing noninvasive longitudinal and multimodal data using artificial intelligence could potentially transform immunotherapy for cancer patients, paving the way towards precision medicine. Methods: In this study, we integrated pre- and on-treatment blood measurements, prescribed medications and CT-based volumes of organs from a large pan-cancer cohort of 694 patients treated with immunotherapy to predict short and long-term overall survival. By leveraging a combination of recent developments, different variants of our extended multimodal transformer-based simple temporal attention (MMTSimTA) network were trained end-to-end to predict mortality at three, six, nine and twelve months. These models were also compared to baseline methods incorporating intermediate and late fusion based integration methods. Results: The strongest prognostic performance was demonstrated using the extended transformer-based multimodal model with area under the curves (AUCs) of $0.84 \pm $0.04, $0.83 \pm $0.02, $0.82 \pm $0.02, $0.81 \pm $0.03 for 3-, 6-, 9-, and 12-month survival prediction, respectively. Conclusion: Our findings suggest that analyzing integrated early treatment data has potential for predicting survival of immunotherapy patients. Integrating complementary noninvasive modalities into a jointly trained model, using our extended transformer-based architecture, demonstrated an improved multimodal prognostic performance, especially in short term survival prediction.
Authors: Marius Tacke, Matthias Busch, Kevin Linka, Christian J. Cyron, Roland C. Aydin
Abstract: Datasets often incorporate various functional patterns related to different aspects or regimes, which are typically not equally present throughout the dataset. We propose a novel, general-purpose partitioning algorithm that utilizes competition between models to detect and separate these functional patterns. This competition is induced by multiple models iteratively submitting their predictions for the dataset, with the best prediction for each data point being rewarded with training on that data point. This reward mechanism amplifies each model's strengths and encourages specialization in different patterns. The specializations can then be translated into a partitioning scheme. The amplification of each model's strengths inverts the active learning paradigm: while active learning typically focuses the training of models on their weaknesses to minimize the number of required training data points, our concept reinforces the strengths of each model, thus specializing them. We validate our concept -- called active partitioning -- with various datasets with clearly distinct functional patterns, such as mechanical stress and strain data in a porous structure. The active partitioning algorithm produces valuable insights into the datasets' structure, which can serve various further applications. As a demonstration of one exemplary usage, we set up modular models consisting of multiple expert models, each learning a single partition, and compare their performance on more than twenty popular regression problems with single models learning all partitions simultaneously. Our results show significant improvements, with up to 54% loss reduction, confirming our partitioning algorithm's utility.
Authors: L. Klochko, M. d'Aquin, A. Togo, L. Chaput
Abstract: Machine learning promises to accelerate the material discovery by enabling high-throughput prediction of desirable macro-properties from atomic-level descriptors or structures. However, the limited data available about precise values of these properties have been a barrier, leading to predictive models with limited precision or the ability to generalize. This is particularly true of lattice thermal conductivity (LTC): existing datasets of precise (ab initio, DFT-based) computed values are limited to a few dozen materials with little variability. Based on such datasets, we study the impact of transfer learning on both the precision and generalizability of a deep learning model (ParAIsite). We start from an existing model (MEGNet~\cite{Chen2019}) and show that improvements are obtained by fine-tuning a pre-trained version on different tasks. Interestingly, we also show that a much greater improvement is obtained when first fine-tuning it on a large datasets of low-quality approximations of LTC (based on the AGL model) and then applying a second phase of fine-tuning with our high-quality, smaller-scale datasets. The promising results obtained pave the way not only towards a greater ability to explore large databases in search of low thermal conductivity materials but also to methods enabling increasingly precise predictions in areas where quality data are rare.
Authors: Mohit Apte, Ketan Kale, Pranav Datar, Pratiksha Deshmukh
Abstract: This paper explores the application of a reinforcement learning (RL) framework using the Q-Learning algorithm to enhance dynamic pricing strategies in the retail sector. Unlike traditional pricing methods, which often rely on static demand models, our RL approach continuously adapts to evolving market dynamics, offering a more flexible and responsive pricing strategy. By creating a simulated retail environment, we demonstrate how RL effectively addresses real-time changes in consumer behavior and market conditions, leading to improved revenue outcomes. Our results illustrate that the RL model not only surpasses traditional methods in terms of revenue generation but also provides insights into the complex interplay of price elasticity and consumer demand. This research underlines the significant potential of applying artificial intelligence in economic decision-making, paving the way for more sophisticated, data-driven pricing models in various commercial domains.
Authors: Shun Hu
Abstract: In the era of 5G communication, the knowledge of channel state information (CSI) is crucial for enhancing network performance. This paper explores the utilization of language models for spatial CSI prediction within MIMO-OFDM systems. We begin by outlining the significance of accurate CSI in enabling advanced functionalities such as adaptive modulation. We review existing methodologies for CSI estimation, emphasizing the shift from traditional to data-driven approaches. Then a novel framework for spatial CSI prediction using realistic environment information is proposed, and experimental results demonstrate the effectiveness. This research paves way for innovative strategies in managing wireless networks.
Authors: Xinyu Su, Feng Liu, Yanchuan Chang, Egemen Tanin, Majid Sarvi, Jianzhong Qi
Abstract: Traffic forecasting is an important problem in the operation and optimisation of transportation systems. State-of-the-art solutions train machine learning models by minimising the mean forecasting errors on the training data. The trained models often favour periodic events instead of aperiodic ones in their prediction results, as periodic events often prevail in the training data. While offering critical optimisation opportunities, aperiodic events such as traffic incidents may be missed by the existing models. To address this issue, we propose DualCast -- a model framework to enhance the learning capability of traffic forecasting models, especially for aperiodic events. DualCast takes a dual-branch architecture, to disentangle traffic signals into two types, one reflecting intrinsic {spatial-temporal} patterns and the other reflecting external environment contexts including aperiodic events. We further propose a cross-time attention mechanism, to capture high-order spatial-temporal relationships from both periodic and aperiodic patterns. DualCast is versatile. We integrate it with recent traffic forecasting models, consistently reducing their forecasting errors by up to 9.6% on multiple real datasets.
Authors: Zechen Liu
Abstract: Personalized Federal learning(PFL) allows clients to cooperatively train a personalized model without disclosing their private dataset. However, PFL suffers from Non-IID, heterogeneous devices, lack of fairness, and unclear contribution which urgently need the interpretability of deep learning model to overcome these challenges. These challenges proposed new demands for interpretability. Low cost, privacy, and detailed information. There is no current interpretability method satisfying them. In this paper, we propose a novel interpretability method \emph{FreqX} by introducing Signal Processing and Information Theory. Our experiments show that the explanation results of FreqX contain both attribution information and concept information. FreqX runs at least 10 times faster than the baselines which contain concept information.
Authors: Ryan Lucas, Rahul Mazumder
Abstract: We present SNOWS, a one-shot post-training pruning framework aimed at reducing the cost of vision network inference without retraining. Current leading one-shot pruning methods minimize layer-wise least squares reconstruction error which does not take into account deeper network representations. We propose to optimize a more global reconstruction objective. This objective accounts for nonlinear activations deep in the network to obtain a better proxy for the network loss. This nonlinear objective leads to a more challenging optimization problem -- we demonstrate it can be solved efficiently using a specialized second-order optimization framework. A key innovation of our framework is the use of Hessian-free optimization to compute exact Newton descent steps without needing to compute or store the full Hessian matrix. A distinct advantage of SNOWS is that it can be readily applied on top of any sparse mask derived from prior methods, readjusting their weights to exploit nonlinearities in deep feature representations. SNOWS obtains state-of-the-art results on various one-shot pruning benchmarks including residual networks and Vision Transformers (ViT/B-16 and ViT/L-16, 86m and 304m parameters respectively).
Authors: Shivam Pal, Aishwarya Gupta, Saqib Sarwar, Piyush Rai
Abstract: Federated Learning (FL) has emerged as a promising method to collaboratively learn from decentralized and heterogeneous data available at different clients without the requirement of data ever leaving the clients. Recent works on FL have advocated taking a Bayesian approach to FL as it offers a principled way to account for the model and predictive uncertainty by learning a posterior distribution for the client and/or server models. Moreover, Bayesian FL also naturally enables personalization in FL to handle data heterogeneity across the different clients by having each client learn its own distinct personalized model. In particular, the hierarchical Bayesian approach enables all the clients to learn their personalized models while also taking into account the commonalities via a prior distribution provided by the server. However, despite their promise, Bayesian approaches for FL can be computationally expensive and can have high communication costs as well because of the requirement of computing and sending the posterior distributions. We present a novel Bayesian FL method using an efficient second-order optimization approach, with a computational cost that is similar to first-order optimization methods like Adam, but also provides the various benefits of the Bayesian approach for FL (e.g., uncertainty, personalization), while also being significantly more efficient and accurate than SOTA Bayesian FL methods (both for standard as well as personalized FL settings). Our method achieves improved predictive accuracies as well as better uncertainty estimates as compared to the baselines which include both optimization based as well as Bayesian FL methods.
Authors: Tina A. Dardeno, Lawrence A. Bull, Nikolaos Dervilis, Keith Worden
Abstract: Despite recent advances in population-based structural health monitoring (PBSHM), knowledge transfer between highly-disparate structures (i.e., heterogeneous populations) remains a challenge. It has been proposed that heterogeneous transfer may be accomplished via intermediate structures that bridge the gap in information between the structures of interest. A key aspect of the technique is the idea that by varying parameters such as material properties and geometry, one structure can be continuously morphed into another. The current work demonstrates the development of these interpolating structures, via case studies involving the parameterisation of (and transfer between) a simple, simulated 'bridge' and 'aeroplane'. The facetious question 'When is a bridge not an aeroplane?' has been previously asked in the context of predicting positive transfer based on structural similarity. While the obvious answer to this question is 'Always,' the current work demonstrates that in some cases positive transfer can be achieved between highly-disparate systems.
Authors: Ao Shen, Zhiyao Li, Mingyu Gao
Abstract: Serving numerous users and requests concurrently requires good fairness in Large Language Models (LLMs) serving system. This ensures that, at the same cost, the system can meet the Service Level Objectives (SLOs) of more users , such as time to first token (TTFT) and time between tokens (TBT), rather than allowing a few users to experience performance far exceeding the SLOs. To achieve better fairness, the preemption-based scheduling policy dynamically adjusts the priority of each request to maintain balance during runtime. However, existing systems tend to overly prioritize throughput, overlooking the overhead caused by preemption-induced context switching, which is crucial for maintaining fairness through priority adjustments. In this work, we identify three main challenges that result in this overhead. 1) Inadequate I/O utilization. 2) GPU idleness. 3) Unnecessary I/O transmission during multi-turn conversations. Our key insight is that the block-based KV cache memory policy in existing systems, while achieving near-zero memory waste, leads to discontinuity and insufficient granularity in the KV cache memory. To respond, we introduce FastSwitch, a fairness-aware serving system that not only aligns with existing KV cache memory allocation policy but also mitigates context switching overhead. Our evaluation shows that FastSwitch outperforms the state-of-the-art LLM serving system vLLM with speedups of 1.4-11.2x across different tail TTFT and TBT.
Authors: Rui Li, Marcus Klasson, Arno Solin, Martin Trapp
Abstract: The rising interest in Bayesian deep learning (BDL) has led to a plethora of methods for estimating the posterior distribution. However, efficient computation of inferences, such as predictions, has been largely overlooked with Monte Carlo integration remaining the standard. In this work we examine streamlining prediction in BDL through a single forward pass without sampling. For this we use local linearisation on activation functions and local Gaussian approximations at linear layers. Thus allowing us to analytically compute an approximation to the posterior predictive distribution. We showcase our approach for both MLP and transformers, such as ViT and GPT-2, and assess its performance on regression and classification tasks.
Authors: Ronghui Xu, Hanyin Cheng, Chenjuan Guo, Hongfan Gao, Jilin Hu, Sean Bin Yang, Bin Yang
Abstract: Developing effective path representations has become increasingly essential across various fields within intelligent transportation. Although pre-trained path representation learning models have shown improved performance, they predominantly focus on the topological structures from single modality data, i.e., road networks, overlooking the geometric and contextual features associated with path-related images, e.g., remote sensing images. Similar to human understanding, integrating information from multiple modalities can provide a more comprehensive view, enhancing both representation accuracy and generalization. However, variations in information granularity impede the semantic alignment of road network-based paths (road paths) and image-based paths (image paths), while the heterogeneity of multi-modal data poses substantial challenges for effective fusion and utilization. In this paper, we propose a novel Multi-modal, Multi-granularity Path Representation Learning Framework (MM-Path), which can learn a generic path representation by integrating modalities from both road paths and image paths. To enhance the alignment of multi-modal data, we develop a multi-granularity alignment strategy that systematically associates nodes, road sub-paths, and road paths with their corresponding image patches, ensuring the synchronization of both detailed local information and broader global contexts. To address the heterogeneity of multi-modal data effectively, we introduce a graph-based cross-modal residual fusion component designed to comprehensively fuse information across different modalities and granularities. Finally, we conduct extensive experiments on two large-scale real-world datasets under two downstream tasks, validating the effectiveness of the proposed MM-Path. This is an extended version of the paper accepted by KDD 2025.
Authors: Xinyu Wang, Yiyang Peng, Wei Ma
Abstract: Ubiquitous mobile devices have catalyzed the development of vehicle crowd sensing (VCS). In particular, vehicle sensing systems show great potential in the flexible acquisition of spatio-temporal urban data through built-in sensors under diverse sensing scenarios. However, vehicle systems often exhibit biased coverage due to the heterogeneous nature of trip requests and routes. To achieve a high sensing coverage, a critical challenge lies in optimally relocating vehicles to minimize the divergence between vehicle distributions and target sensing distributions. Conventional approaches typically employ a two-stage predict-then-optimize (PTO) process: first predicting real-time vehicle distributions and subsequently generating an optimal relocation strategy based on the predictions. However, this approach can lead to suboptimal decision-making due to the propagation of errors from upstream prediction. To this end, we develop an end-to-end Smart Predict-then-Optimize (SPO) framework by integrating optimization into prediction within the deep learning architecture, and the entire framework is trained by minimizing the task-specific matching divergence rather than the upstream prediction error. Methodologically, we formulate the vehicle relocation problem by quadratic programming (QP) and incorporate a novel unrolling approach based on the Alternating Direction Method of Multipliers (ADMM) within the SPO framework to compute gradients of the QP layer, facilitating backpropagation and gradient-based optimization for end-to-end learning. The effectiveness of the proposed framework is validated by real-world taxi datasets in Hong Kong. Utilizing the alternating differentiation method, the general SPO framework presents a novel concept of addressing decision-making problems with uncertainty, demonstrating significant potential for advancing applications in intelligent transportation systems.
Authors: Yasin I. Tepeli, Mathijs de Wolf, Joana P. Goncalves
Abstract: Selection bias poses a critical challenge for fairness in machine learning, as models trained on data that is less representative of the population might exhibit undesirable behavior for underrepresented profiles. Semi-supervised learning strategies like self-training can mitigate selection bias by incorporating unlabeled data into model training to gain further insight into the distribution of the population. However, conventional self-training seeks to include high-confidence data samples, which may reinforce existing model bias and compromise effectiveness. We propose Metric-DST, a diversity-guided self-training strategy that leverages metric learning and its implicit embedding space to counter confidence-based bias through the inclusion of more diverse samples. Metric-DST learned more robust models in the presence of selection bias for generated and real-world datasets with induced bias, as well as a molecular biology prediction task with intrinsic bias. The Metric-DST learning strategy offers a flexible and widely applicable solution to mitigate selection bias and enhance fairness of machine learning models.
Authors: Marco Pasini, Javier Nistal, Stefan Lattner, George Fazekas
Abstract: Autoregressive models are typically applied to sequences of discrete tokens, but recent research indicates that generating sequences of continuous embeddings in an autoregressive manner is also feasible. However, such Continuous Autoregressive Models (CAMs) can suffer from a decline in generation quality over extended sequences due to error accumulation during inference. We introduce a novel method to address this issue by injecting random noise into the input embeddings during training. This procedure makes the model robust against varying error levels at inference. We further reduce error accumulation through an inference procedure that introduces low-level noise. Experiments on musical audio generation show that CAM substantially outperforms existing autoregressive and non-autoregressive approaches while preserving audio quality over extended sequences. This work paves the way for generating continuous embeddings in a purely autoregressive setting, opening new possibilities for real-time and interactive generative applications.
Authors: Abhijith S, Arjun Rajesh, Mansi Manoj, Sandra Davis Kollannur, Sujitta R V, Jerrin Thomas Panachakel
Abstract: Myocardial infarction (MI), commonly known as a heart attack, is a critical health condition caused by restricted blood flow to the heart. Early-stage detection through continuous ECG monitoring is essential to minimize irreversible damage. This review explores advancements in MI classification methodologies for wearable devices, emphasizing their potential in real-time monitoring and early diagnosis. It critically examines traditional approaches, such as morphological filtering and wavelet decomposition, alongside cutting-edge techniques, including Convolutional Neural Networks (CNNs) and VLSI-based methods. By synthesizing findings on machine learning, deep learning, and hardware innovations, this paper highlights their strengths, limitations, and future prospects. The integration of these techniques into wearable devices offers promising avenues for efficient, accurate, and energy-aware MI detection, paving the way for next-generation wearable healthcare solutions.
Authors: Jos\'e Fernando N\'u\~nez, Jamie Arjona, Javier B\'ejar
Abstract: Deep learning models need a sufficient amount of data in order to be able to find the hidden patterns in it. It is the purpose of generative modeling to learn the data distribution, thus allowing us to sample more data and augment the original dataset. In the context of physiological data, and more specifically electrocardiogram (ECG) data, given its sensitive nature and expensive data collection, we can exploit the benefits of generative models in order to enlarge existing datasets and improve downstream tasks, in our case, classification of heart rhythm. In this work, we explore the usefulness of synthetic data generated with different generative models from Deep Learning namely Diffweave, Time-Diffusion and Time-VQVAE in order to obtain better classification results for two open source multivariate ECG datasets. Moreover, we also investigate the effects of transfer learning, by fine-tuning a synthetically pre-trained model and then progressively adding increasing proportions of real data. We conclude that although the synthetic samples resemble the real ones, the classification improvement when simply augmenting the real dataset is barely noticeable on individual datasets, but when both datasets are merged the results show an increase across all metrics for the classifiers when using synthetic samples as augmented data. From the fine-tuning results the Time-VQVAE generative model has shown to be superior to the others but not powerful enough to achieve results close to a classifier trained with real data only. In addition, methods and metrics for measuring closeness between synthetic data and the real one have been explored as a side effect of the main research questions of this study.
Authors: Emily Williams, Amanda Howard, Brek Meuris, Panos Stinis
Abstract: Physics-informed deep operator networks (DeepONets) have emerged as a promising approach toward numerically approximating the solution of partial differential equations (PDEs). In this work, we aim to develop further understanding of what is being learned by physics-informed DeepONets by assessing the universality of the extracted basis functions and demonstrating their potential toward model reduction with spectral methods. Results provide clarity about measuring the performance of a physics-informed DeepONet through the decays of singular values and expansion coefficients. In addition, we propose a transfer learning approach for improving training for physics-informed DeepONets between parameters of the same PDE as well as across different, but related, PDEs where these models struggle to train well. This approach results in significant error reduction and learned basis functions that are more effective in representing the solution of a PDE.
Authors: Chen Xu, Qiang Wang, Lijun Sun
Abstract: Accurate travel time estimation is essential for navigation and itinerary planning. While existing research employs probabilistic modeling to assess travel time uncertainty and account for correlations between multiple trips, modeling the temporal variability of multi-trip travel time distributions remains a significant challenge. Capturing the evolution of joint distributions requires large, well-organized datasets; however, real-world trip data are often temporally sparse and spatially unevenly distributed. To address this issue, we propose SPTTE, a spatiotemporal probabilistic framework that models the evolving joint distribution of multi-trip travel times by formulating the estimation task as a spatiotemporal stochastic process regression problem with fragmented observations. SPTTE incorporates an RNN-based temporal Gaussian process parameterization to regularize sparse observations and capture temporal dependencies. Additionally, it employs a prior-based heterogeneity smoothing strategy to correct unreliable learning caused by unevenly distributed trips, effectively modeling temporal variability under sparse and uneven data distributions. Evaluations on real-world datasets demonstrate that SPTTE outperforms state-of-the-art deterministic and probabilistic methods by over 10.13%. Ablation studies and visualizations further confirm the effectiveness of the model components.
Authors: Erin Carson, Xinye Chen, Cheng Kang
Abstract: The success of large language models (LLMs) for time series has been demonstrated in previous work. Utilizing a symbolic time series representation, one can efficiently bridge the gap between LLMs and time series. However, the remaining challenge is to exploit the semantic information hidden in time series by using symbols or existing tokens of LLMs, while aligning the embedding space of LLMs according to the hidden information of time series. The symbolic time series approximation (STSA) method called adaptive Brownian bridge-based symbolic aggregation (ABBA) shows outstanding efficacy in preserving salient time series features by modeling time series patterns in terms of amplitude and period while using existing tokens of LLMs. In this paper, we introduce a method, called LLM-ABBA, that integrates ABBA into large language models for various downstream time series tasks. By symbolizing time series, LLM-ABBA compares favorably to the recent state-of-the-art (SOTA) in UCR and three medical time series classification tasks. Meanwhile, a fixed-polygonal chain trick in ABBA is introduced to \kc{avoid obvious drifting} during prediction tasks by significantly mitigating the effects of cumulative error arising from misused symbols during the transition from symbols to numerical values. In time series regression tasks, LLM-ABBA achieves the new SOTA on Time Series Extrinsic Regression (TSER) benchmarks. LLM-ABBA also shows competitive prediction capability compared to recent SOTA time series prediction results. We believe this framework can also seamlessly extend to other time series tasks.
Authors: Yichen Wang, Jie Wang, Fulin Wang, Xiang Li, Hao Yin, Bhiksha Raj
Abstract: In recent years, graph representation learning has undergone a paradigm shift, driven by the emergence and proliferation of graph neural networks (GNNs) and their heterogeneous counterparts. Heterogeneous GNNs have shown remarkable success in extracting low-dimensional embeddings from complex graphs that encompass diverse entity types and relationships. While meta-path-based techniques have long been recognized for their ability to capture semantic affinities among nodes, their dependence on manual specification poses a significant limitation. In contrast, matrix-focused methods accelerate processing by utilizing structural cues but often overlook contextual richness. In this paper, we challenge the current paradigm by introducing ontology as a fundamental semantic primitive within complex graphs. Our goal is to integrate the strengths of both matrix-centric and meta-path-based approaches into a unified framework. We propose perturbation Ontology-based Graph Attention Networks (POGAT), a novel methodology that combines ontology subgraphs with an advanced self-supervised learning paradigm to achieve a deep contextual understanding. The core innovation of POGAT lies in our enhanced homogeneous perturbing scheme designed to generate rigorous negative samples, encouraging the model to explore minimal contextual features more thoroughly. Through extensive empirical evaluations, we demonstrate that POGAT significantly outperforms state-of-the-art baselines, achieving a groundbreaking improvement of up to 10.78\% in F1-score for the critical task of link prediction and 12.01\% in Micro-F1 for the critical task of node classification.
Authors: Borna Sayedana, Peter E. Caines, Aditya Mahajan
Abstract: In this paper, we investigate the concentration properties of cumulative rewards in Markov Decision Processes (MDPs), focusing on both asymptotic and non-asymptotic settings. We introduce a unified approach to characterize reward concentration in MDPs, covering both infinite-horizon settings (i.e., average and discounted reward frameworks) and finite-horizon setting. Our asymptotic results include the law of large numbers, the central limit theorem, and the law of iterated logarithms, while our non-asymptotic bounds include Azuma-Hoeffding-type inequalities and a non-asymptotic version of the law of iterated logarithms. Additionally, we explore two key implications of our results. First, we analyze the sample path behavior of the difference in rewards between any two stationary policies. Second, we show that two alternative definitions of regret for learning policies proposed in the literature are rate-equivalent. Our proof techniques rely on a novel martingale decomposition of cumulative rewards, properties of the solution to the policy evaluation fixed-point equation, and both asymptotic and non-asymptotic concentration results for martingale difference sequences.
Authors: Tien Vu-Van, Dat Du Thanh, Nguyen Ho, Mai Vu
Abstract: Convolutional Neural Networks (CNNs) achieve high performance in image classification tasks but are challenging to deploy on resource-limited hardware due to their large model sizes. To address this issue, we leverage Mutual Information, a metric that provides valuable insights into how deep learning models retain and process information through measuring the shared information between input features or output labels and network layers. In this study, we propose a structured filter-pruning approach for CNNs that identifies and selectively retains the most informative features in each layer. Our approach successively evaluates each layer by ranking the importance of its feature maps based on Conditional Mutual Information (CMI) values, computed using a matrix-based Renyi {\alpha}-order entropy numerical method. We propose several formulations of CMI to capture correlation among features across different layers. We then develop various strategies to determine the cutoff point for CMI values to prune unimportant features. This approach allows parallel pruning in both forward and backward directions and significantly reduces model size while preserving accuracy. Tested on the VGG16 architecture with the CIFAR-10 dataset, the proposed method reduces the number of filters by more than a third, with only a 0.32% drop in test accuracy.
Authors: Shah Md Ahasan Siddique, Ragib Tahshin Rinath, Shakil Mosharrof, Syed Tanjib Mahmud, Sakib Ahmed
Abstract: The search for evidence of past life on Mars presents a tremendous challenge that requires the usage of very advanced robotic technologies to overcome it. Current digital microscopic imagers and spectrometers used for astrobiological examination suffer from limitations such as insufficient resolution, narrow detection range, and lack of portability. To overcome these challenges, this research study presents modifications to the Phoenix rover to expand its capability for detecting biosignatures on Mars. This paper examines the modifications implemented on the Phoenix rover to enhance its capability to detect a broader spectrum of biosignatures. One of the notable improvements comprises the integration of advanced digital microscopic imagers and spectrometers, enabling high-resolution examination of soil samples. Additionally, the mechanical components of the device have been reinforced to enhance maneuverability and optimize subsurface sampling capabilities. Empirical investigations have demonstrated that Phoenix has the capability to navigate diverse geological environments and procure samples for the purpose of biomolecular analysis. The biomolecular instrumentation and hybrid analytical methods showcased in this study demonstrate considerable potential for future astrobiology missions on Mars. The potential for enhancing the system lies in the possibility of broadening the range of detectable biomarkers and biosignatures.
Authors: Zhixu Tao, Ian Mason, Sanjeev Kulkarni, Xavier Boix
Abstract: Task Arithmetic is a model merging technique that enables the combination of multiple models' capabilities into a single model through simple arithmetic in the weight space, without the need for additional fine-tuning or access to the original training data. However, the factors that determine the success of Task Arithmetic remain unclear. In this paper, we examine Task Arithmetic for multi-task learning by framing it as a one-shot Federated Learning problem. We demonstrate that Task Arithmetic is mathematically equivalent to the commonly used algorithm in Federated Learning, called Federated Averaging (FedAvg). By leveraging well-established theoretical results from FedAvg, we identify two key factors that impact the performance of Task Arithmetic: data heterogeneity and training heterogeneity. To mitigate these challenges, we adapt several algorithms from Federated Learning to improve the effectiveness of Task Arithmetic. Our experiments demonstrate that applying these algorithms can often significantly boost performance of the merged model compared to the original Task Arithmetic approach. This work bridges Task Arithmetic and Federated Learning, offering new theoretical perspectives on Task Arithmetic and improved practical methodologies for model merging.
Authors: Cheng Tang, Zhishuai Liu, Pan Xu
Abstract: The Distributionally Robust Markov Decision Process (DRMDP) is a popular framework for addressing dynamics shift in reinforcement learning by learning policies robust to the worst-case transition dynamics within a constrained set. However, solving its dual optimization oracle poses significant challenges, limiting theoretical analysis and computational efficiency. The recently proposed Robust Regularized Markov Decision Process (RRMDP) replaces the uncertainty set constraint with a regularization term on the value function, offering improved scalability and theoretical insights. Yet, existing RRMDP methods rely on unstructured regularization, often leading to overly conservative policies by considering transitions that are unrealistic. To address these issues, we propose a novel framework, the $d$-rectangular linear robust regularized Markov decision process ($d$-RRMDP), which introduces a linear latent structure into both transition kernels and regularization. For the offline RL setting, where an agent learns robust policies from a pre-collected dataset in the nominal environment, we develop a family of algorithms, Robust Regularized Pessimistic Value Iteration (R2PVI), employing linear function approximation and $f$-divergence based regularization terms on transition kernels. We provide instance-dependent upper bounds on the suboptimality gap of R2PVI policies, showing these bounds depend on how well the dataset covers state-action spaces visited by the optimal robust policy under robustly admissible transitions. This term is further shown to be fundamental to $d$-RRMDPs via information-theoretic lower bounds. Finally, numerical experiments validate that R2PVI learns robust policies and is computationally more efficient than methods for constrained DRMDPs.
Authors: Zhi Zhang, Jiayi Shen, Congfeng Cao, Gaole Dai, Shiji Zhou, Qizhe Zhang, Shanghang Zhang, Ekaterina Shutova
Abstract: Advancing towards generalist agents necessitates the concurrent processing of multiple tasks using a unified model, thereby underscoring the growing significance of simultaneous model training on multiple downstream tasks. A common issue in multi-task learning is the occurrence of gradient conflict, which leads to potential competition among different tasks during joint training. This competition often results in improvements in one task at the expense of deterioration in another. Although several optimization methods have been developed to address this issue by manipulating task gradients for better task balancing, they cannot decrease the incidence of gradient conflict. In this paper, we systematically investigate the occurrence of gradient conflict across different methods and propose a strategy to reduce such conflicts through sparse training (ST), wherein only a portion of the model's parameters are updated during training while keeping the rest unchanged. Our extensive experiments demonstrate that ST effectively mitigates conflicting gradients and leads to superior performance. Furthermore, ST can be easily integrated with gradient manipulation techniques, thus enhancing their effectiveness.
Authors: Anya Shchetkina, Ron Berman
Abstract: Targeting and personalization policies can be used to improve outcomes beyond the uniform policy that assigns the best performing treatment in an A/B test to everyone. Personalization relies on the presence of heterogeneity of treatment effects, yet, as we show in this paper, heterogeneity alone is not sufficient for personalization to be successful. We develop a statistical model to quantify "actionable heterogeneity," or the conditions when personalization is likely to outperform the best uniform policy. We show that actionable heterogeneity can be visualized as crossover interactions in outcomes across treatments and depends on three population-level parameters: within-treatment heterogeneity, cross-treatment correlation, and the variation in average responses. Our model can be used to predict the expected gain from personalization prior to running an experiment and also allows for sensitivity analysis, providing guidance on how changing treatments can affect the personalization gain. To validate our model, we apply five common personalization approaches to two large-scale field experiments with many interventions that encouraged flu vaccination. We find an 18% gain from personalization in one and a more modest 4% gain in the other, which is consistent with our model. Counterfactual analysis shows that this difference in the gains from personalization is driven by a drastic difference in within-treatment heterogeneity. However, reducing cross-treatment correlation holds a larger potential to further increase personalization gains. Our findings provide a framework for assessing the potential from personalization and offer practical recommendations for improving gains from targeting in multi-intervention settings.
Authors: Hyewon Jeong, Suyeol Yun, Hammaad Adam
Abstract: Electrocardiograms (ECGs) are an established technique to screen for abnormal cardiac signals. Recent work has established that it is possible to detect arrhythmia directly from the ECG signal using deep learning algorithms. While a few prior approaches with contrastive learning have been successful, the best way to define a positive sample remains an open question. In this project, we investigate several ways to define positive samples, and assess which approach yields the best performance in a downstream task of classifying arrhythmia. We explore spatiotemporal invariances, generic augmentations, demographic similarities, cardiac rhythms, and wave attributes of ECG as potential ways to match positive samples. We then evaluate each strategy with downstream task performance, and find that learned representations invariant to patient identity are powerful in arrhythmia detection. We made our code available in: https://github.com/mandiehyewon/goodviews_ecg.git
Authors: Rutuja Gurav, Elena Massara, Xiaomei Song, Kimberly Sinclair, Edward Brown, Matt Kusner, Bala Poduval, Atilim Gunes Baydin
Abstract: Extended human presence beyond low-Earth orbit (BLEO) during missions to the Moon and Mars will pose significant challenges in the near future. A primary health risk associated with these missions is radiation exposure, primarily from galatic cosmic rays (GCRs) and solar proton events (SPEs). While GCRs present a more consistent, albeit modulated threat, SPEs are harder to predict and can deliver acute doses over short periods. Currently NASA utilizes analytical tools for monitoring the space radiation environment in order to make decisions of immediate action to shelter astronauts. However this reactive approach could be significantly enhanced by predictive models that can forecast radiation exposure in advance, ideally hours ahead of major events, while providing estimates of prediction uncertainty to improve decision-making. In this work we present a machine learning approach for forecasting radiation exposure in BLEO using multimodal time-series data including direct solar imagery from Solar Dynamics Observatory, X-ray flux measurements from GOES missions, and radiation dose measurements from the BioSentinel satellite that was launched as part of Artemis~1 mission. To our knowledge, this is the first time full-disk solar imagery has been used to forecast radiation exposure. We demonstrate that our model can predict the onset of increased radiation due to an SPE event, as well as the radiation decay profile after an event has occurred.
Authors: Wei Peng, Kang Liu, Jiaxi Shi, Jianchen Hu
Abstract: The electroencephalography (EEG)-based motor imagery (MI) classification is a critical and challenging task in brain-computer interface (BCI) technology, which plays a significant role in assisting patients with functional impairments to regain mobility. We present a novel multi-scale atrous convolutional neural network (CNN) model called EEG-dilated convolution network (DCNet) to enhance the accuracy and efficiency of the EEG-based MI classification tasks. We incorporate the $1\times1$ convolutional layer and utilize the multi-branch parallel atrous convolutional architecture in EEG-DCNet to capture the highly nonlinear characteristics and multi-scale features of the EEG signals. Moreover, we utilize the sliding window to enhance the temporal consistency and utilize the attension mechanism to improve the accuracy of recognizing user intentions. The experimental results (via the BCI-IV-2a ,BCI-IV-2b and the High-Gamma datasets) show that EEG-DCNet outperforms existing state-of-the-art (SOTA) approaches in terms of classification accuracy and Kappa scores. Furthermore, since EEG-DCNet requires less number of parameters, the training efficiency and memory consumption are also improved. The experiment code is open-sourced at \href{https://github.com/Kanyooo/EEG-DCNet}{here}.
Authors: Simon Ouellette
Abstract: ARC-AGI is an open-world problem domain in which the ability to generalize out-of-distribution is a crucial quality. Under the program induction paradigm, we present a series of experiments that reveal the efficiency and generalization characteristics of various neurally-guided program induction approaches. The three paradigms we consider are Learning the grid space, Learning the program space, and Learning the transform space. We implement and experiment thoroughly on the first two, and retain the second one for ARC-AGI submission. After identifying the strengths and weaknesses of both of these approaches, we suggest the third as a potential solution, and run preliminary experiments.
Authors: Martyna Poziomska, Marian Dovgialo, Przemys{\l}aw Olbratowski, Pawe{\l} Niedbalski, Pawe{\l} Ogniewski, Joanna Zych, Jacek Rogala, Jaros{\l}aw \.Zygierewicz
Abstract: This study investigates the impact of quantity and diversity of data on the performance of various machine-learning models for detecting general EEG pathology. We utilized an EEG dataset of 2,993 recordings from Temple University Hospital and a dataset of 55,787 recordings from Elmiko Biosignals sp. z o.o. The latter contains data from 39 hospitals and a diverse patient set with varied conditions. Thus, we introduce the Elmiko dataset - the largest publicly available EEG corpus. Our findings show that small and consistent datasets enable a wide range of models to achieve high accuracy; however, variations in pathological conditions, recording protocols, and labeling standards lead to significant performance degradation. Nonetheless, increasing the number of available recordings improves predictive accuracy and may even compensate for data diversity, particularly in neural networks based on attention mechanism or transformer architecture. A meta-model that combined these networks with a gradient-boosting approach using handcrafted features demonstrated superior performance across varied datasets.
Authors: Yue Wang, Xu Cao, Yaojun Hu, Haochao Ying, James Matthew Rehg, Jimeng Sun, Jian Wu, Jintai Chen
Abstract: Electrocardiogram (ECG), a non-invasive and affordable tool for cardiac monitoring, is highly sensitive in detecting acute heart attacks. However, due to the lengthy nature of ECG recordings, numerous machine learning methods have been developed for automated heart disease detection to reduce human workload. Despite these efforts, performance remains suboptimal. A key obstacle is the inherent complexity of ECG data, which includes heterogeneity (e.g., varying sampling rates), high levels of noise, demographic-related pattern shifts, and intricate rhythm-event associations. To overcome these challenges, this paper introduces AnyECG, a foundational model designed to extract robust representations from any real-world ECG data. Specifically, a tailored ECG Tokenizer encodes each fixed-duration ECG fragment into a token and, guided by proxy tasks, converts noisy, continuous ECG features into discrete, compact, and clinically meaningful local rhythm codes. These codes encapsulate basic morphological, frequency, and demographic information (e.g., sex), effectively mitigating signal noise. We further pre-train the AnyECG to learn rhythmic pattern associations across ECG tokens, enabling the capture of cardiac event semantics. By being jointly pre-trained on diverse ECG data sources, AnyECG is capable of generalizing across a wide range of downstream tasks where ECG signals are recorded from various devices and scenarios. Experimental results in anomaly detection, arrhythmia detection, corrupted lead generation, and ultra-long ECG signal analysis demonstrate that AnyECG learns common ECG knowledge from data and significantly outperforms cutting-edge methods in each respective task.
Authors: Ali Asgar Chandanwala, Srutakirti Bhowmik, Parna Chaudhury, Sheena Christabel Pravin
Abstract: Applications in behavioural research, human-computer interaction, and mental health depend on the ability to recognize emotions. In order to improve the accuracy of emotion recognition using electroencephalography (EEG) data, this work presents a hybrid quantum deep learning technique. Conventional EEG-based emotion recognition techniques are limited by noise and high-dimensional data complexity, which make feature extraction difficult. To tackle these issues, our method combines traditional deep learning classification with quantum-enhanced feature extraction. To identify important brain wave patterns, Bandpass filtering and Welch method are used as preprocessing techniques on EEG data. Intricate inter-band interactions that are essential for determining emotional states are captured by mapping frequency band power attributes (delta, theta, alpha, and beta) to quantum representations. Entanglement and rotation gates are used in a hybrid quantum circuit to maximize the model's sensitivity to EEG patterns associated with different emotions. Promising results from evaluation on a test dataset indicate the model's potential for accurate emotion recognition. The model will be extended for real-time applications and multi-class categorization in future study, which could improve EEG-based mental health screening instruments. This method offers a promising tool for applications in adaptive human-computer systems and mental health monitoring by showcasing the possibilities of fusing traditional deep learning with quantum processing for reliable, scalable emotion recognition.
Authors: Veronica Henao Isaza, David Aguillon, Carlos Andres Tobon Quintero, Francisco Lopera, John Fredy Ochoa Gomez
Abstract: Background: Dementia, marked by cognitive decline, is a global health challenge. Alzheimer's disease (AD), the leading type, accounts for ~70% of cases. Electroencephalography (EEG) measures show promise in identifying AD risk, but obtaining large samples for reliable comparisons is challenging. Objective: This study integrates signal processing, harmonization, and statistical techniques to enhance sample size and improve AD risk classification reliability. Methods: We used advanced EEG preprocessing, feature extraction, harmonization, and propensity score matching (PSM) to balance healthy non-carriers (HC) and asymptomatic E280A mutation carriers (ACr). Data from four databases were harmonized to adjust site effects while preserving covariates like age and sex. PSM ratios (2:1, 5:1, 10:1) were applied to assess sample size impact on model performance. The final dataset underwent machine learning analysis with decision trees and cross-validation for robust results. Results: Balancing sample sizes via PSM significantly improved classification accuracy, ranging from 0.92 to 0.96 across ratios. This approach enabled precise risk identification even with limited samples. Conclusion: Integrating data processing, harmonization, and balancing techniques improves AD risk classification accuracy, offering potential for other neurodegenerative diseases.
Authors: Keshav Kumar, Ravindranath Chowdary
Abstract: Research papers are well structured documents. They have text, figures, equations, tables etc., to covey their ideas and findings. They are divided into sections like Introduction, Model, Experiments etc., which deal with different aspects of research. Characteristics like these set research papers apart from ordinary documents and allows us to significantly improve their summarization. In this paper, we propose a novel system, SlideSpwan, that takes PDF of a research document as an input and generates a quality presentation providing it's summary in a visual and concise fashion. The system first converts the PDF of the paper to an XML document that has the structural information about various elements. Then a machine learning model, trained on PS5K dataset and Aminer 9.5K Insights dataset (that we introduce), is used to predict salience of each sentence in the paper. Sentences for slides are selected using ILP and clustered based on their similarity with each cluster being given a suitable title. Finally a slide is generated by placing any graphical element referenced in the selected sentences next to them. Experiments on a test set of 650 pairs of papers and slides demonstrate that our system generates presentations with better quality.
Authors: Arnaud Delorme, Dung Truong, Luca Pion-Tonachini, Scott Makeig
Abstract: ICLabel is an important plug-in function in EEGLAB, the most widely used software for EEG data processing. A powerful approach to automated processing of EEG data involves decomposing the data by Independent Component Analysis (ICA) and then classifying the resulting independent components (ICs) using ICLabel. While EEGLAB pipelines support high-performance computing (HPC) platforms running the open-source Octave interpreter, the ICLabel plug-in is incompatible with Octave because of its specialized neural network architecture. To enhance cross-platform compatibility, we developed a Python version of ICLabel that uses standard EEGLAB data structures. We compared ICLabel MATLAB and Python implementations to data from 14 subjects. ICLabel returns the likelihood of classification in 7 classes of components for each ICA component. The returned IC classifications were virtually identical between Python and MATLAB, with differences in classification percentage below 0.001%.
Authors: Abel C. H. Chen
Abstract: With the maturation of quantum computing technology, research has gradually shifted towards exploring its applications. Alongside the rise of artificial intelligence, various machine learning methods have been developed into quantum circuits and algorithms. Among them, Quantum Neural Networks (QNNs) can map inputs to quantum circuits through Feature Maps (FMs) and adjust parameter values via variational models, making them applicable in regression and classification tasks. However, designing a FM that is suitable for a given application problem is a significant challenge. In light of this, this study proposes an Enhanced Quantum Neural Network (EQNN), which includes an Enhanced Feature Map (EFM) designed in this research. This EFM effectively maps input variables to a value range more suitable for quantum computing, serving as the input to the variational model to improve accuracy. In the experimental environment, this study uses mobile data usage prediction as a case study, recommending appropriate rate plans based on users' mobile data usage. The proposed EQNN is compared with current mainstream QNNs, and experimental results show that the EQNN achieves higher accuracy with fewer quantum logic gates and converges to the optimal solution faster under different optimization algorithms.
Authors: Zhe Zhao, Jingping Xu, Ce Wang, Yaping Yang
Abstract: Analytic continuation aims to reconstruct real-time spectral functions from imaginary-time Green's functions; however, this process is notoriously ill-posed and challenging to solve. We propose a novel neural network architecture, named the Feature Learning Network (FL-net), to enhance the prediction accuracy of spectral functions, achieving an improvement of at least $20\%$ over traditional methods, such as the Maximum Entropy Method (MEM), and previous neural network approaches. Furthermore, we develop an analytical method to evaluate the robustness of the proposed network. Using this method, we demonstrate that increasing the hidden dimensionality of FL-net, while leading to lower loss, results in decreased robustness. Overall, our model provides valuable insights into effectively addressing the complex challenges associated with analytic continuation.
Authors: Md. Naimur Rahman, Shafak Shahriar Sozol, Md. Samsuzzaman, Md. Shahin Hossin, Mohammad Tariqul Islam, S. M. Taohidul Islam, Md. Maniruzzaman
Abstract: In the modern agricultural industry, technology plays a crucial role in the advancement of cultivation. To increase crop productivity, soil require some specific characteristics. For watermelon cultivation, soil needs to be sandy and of high temperature with proper irrigation. This research aims to design and implement an intelligent IoT-based soil characterization system for the watermelon field to measure the soil characteristics. IoT based developed system measures moisture, temperature, and pH of soil using different sensors, and the sensor data is uploaded to the cloud via Arduino and Raspberry Pi, from where users can obtain the data using mobile application and webpage developed for this system. To ensure the precision of the framework, this study includes the comparison between the readings of the soil parameters by the existing field soil meters, the values obtained from the sensors integrated IoT system, and data obtained from soil science laboratory. Excessive salinity in soil affects the watermelon yield. This paper proposes a model for the measurement of soil salinity based on soil resistivity. It establishes a relationship between soil salinity and soil resistivity from the data obtained in the laboratory using artificial neural network (ANN).
Authors: Uditha Muthumala, Yuxuan Zhang, Luciano Sebastian Martinez-Rau, Sebastian Bader
Abstract: This paper compares machine learning approaches with different input data formats for the classification of acoustic emission (AE) signals. AE signals are a promising monitoring technique in many structural health monitoring applications. Machine learning has been demonstrated as an effective data analysis method, classifying different AE signals according to the damage mechanism they represent. These classifications can be performed based on the entire AE waveform or specific features that have been extracted from it. However, it is currently unknown which of these approaches is preferred. With the goal of model deployment on resource-constrained embedded Internet of Things (IoT) systems, this work evaluates and compares both approaches in terms of classification accuracy, memory requirement, processing time, and energy consumption. To accomplish this, features are extracted and carefully selected, neural network models are designed and optimized for each input data scenario, and the models are deployed on a low-power IoT node. The comparative analysis reveals that all models can achieve high classification accuracies of over 99\%, but that embedded feature extraction is computationally expensive. Consequently, models utilizing the raw AE signal as input have the fastest processing speed and thus the lowest energy consumption, which comes at the cost of a larger memory requirement.
Authors: Mohammed Riadh Berramdane (IFPEN), Alexandre Battiston (IFPEN), Michele Bardi (IFPEN), Nicolas Blet (LEMTA), Benjamin R\'emy (LEMTA), Matthieu Urbain (LEMTA)
Abstract: Facing the thermal management challenges of Wide Bandgap (WBG) semiconductors, this study highlights the use of ARX parametric models, which provide accurate temperature predictions without requiring detailed understanding of component thickness disparities or material physical properties, relying solely on experimental measurements. These parametric models emerge as a reliable alternative to FEM simulations and conventional thermal models, significantly simplifying system identification while ensuring high result accuracy.
Authors: Ryan Dempsey, Jonathan Ethier, Halim Yanikomeroglu
Abstract: Radio deployments and spectrum planning can benefit from path loss predictions. Obstructions along a communications link are often considered implicitly or through derived metrics such as representative clutter height or total obstruction depth. In this paper, we propose a path-specific path loss prediction method that uses convolutional neural networks to automatically perform feature extraction from high-resolution obstruction height maps. Our methods result in low prediction error in a variety of environments without requiring derived obstruction metrics.
Authors: D\'enes Berta, Balduin Katzer, Katrin Schulz, P\'eter Dus\'an Isp\'anovity
Abstract: Acoustic emission signals have been shown to accompany avalanche-like events in materials, such as dislocation avalanches in crystalline solids, collapse of voids in porous matter or domain wall movement in ferroics. The data provided by acoustic emission measurements is tremendously rich, but it is rather challenging to precisely connect it to the characteristics of the triggering avalanche. In our work we propose a machine learning based method with which one can infer microscopic details of dislocation avalanches in micropillar compression tests from merely acoustic emission data. As it is demonstrated in the paper, this approach is suitable for the prediction of the force-time response as it can provide outstanding prediction for the temporal location of avalanches and can also predict the magnitude of individual deformation events. Various descriptors (including frequency dependent and independent ones) are utilised in our machine learning approach and their importance in the prediction is analysed. The transferability of the method to other specimen sizes is also demonstrated and the possible application in more generic settings is discussed.
Authors: Shijian Deng, Wentian Zhao, Yu-Jhe Li, Kun Wan, Daniel Miranda, Ajinkya Kale, Yapeng Tian
Abstract: Self-improvement in multimodal large language models (MLLMs) is crucial for enhancing their reliability and robustness. However, current methods often rely heavily on MLLMs themselves as judges, leading to high computational costs and potential pitfalls like reward hacking and model collapse. This paper introduces a novel, model-level judge-free self-improvement framework. Our approach employs a controlled feedback mechanism while eliminating the need for MLLMs in the verification loop. We generate preference learning pairs using a controllable hallucination mechanism and optimize data quality by leveraging lightweight, contrastive language-image encoders to evaluate and reverse pairs when necessary. Evaluations across public benchmarks and our newly introduced IC dataset designed to challenge hallucination control demonstrate that our model outperforms conventional techniques. We achieve superior precision and recall with significantly lower computational demands. This method offers an efficient pathway to scalable self-improvement in MLLMs, balancing performance gains with reduced resource requirements.
Authors: Peng Cui, Guande He, Dan Zhang, Zhijie Deng, Yinpeng Dong, Jun Zhu
Abstract: Datasets collected from the open world unavoidably suffer from various forms of randomness or noiseness, leading to the ubiquity of aleatoric (data) uncertainty. Quantifying such uncertainty is particularly pivotal for object detection, where images contain multi-scale objects with occlusion, obscureness, and even noisy annotations, in contrast to images with centric and similar-scale objects in classification. This paper suggests modeling and exploiting the uncertainty inherent in object detection data with vision foundation models and develops a data-centric reliable training paradigm. Technically, we propose to estimate the data uncertainty of each object instance based on the feature space of vision foundation models, which are trained on ultra-large-scale datasets and able to exhibit universal data representation. In particular, we assume a mixture-of-Gaussian structure of the object features and devise Mahalanobis distance-based measures to quantify the data uncertainty. Furthermore, we suggest two curial and practical usages of the estimated uncertainty: 1) for defining uncertainty-aware sample filter to abandon noisy and redundant instances to avoid over-fitting, and 2) for defining sample adaptive regularizer to balance easy/hard samples for adaptive training. The estimated aleatoric uncertainty serves as an extra level of annotations of the dataset, so it can be utilized in a plug-and-play manner with any model. Extensive empirical studies verify the effectiveness of the proposed aleatoric uncertainty measure on various advanced detection models and challenging benchmarks.
Authors: Yaya Etiabi, Eslam Eldeeb, Mohammad Shehab, Wafa Njima, Hirley Alves, Mohamed-Slim Alouini, El Mehdi Amhoud
Abstract: Accurate indoor localization remains challenging due to variations in wireless signal environments and limited data availability. This paper introduces MetaGraphLoc, a novel system leveraging sensor fusion, graph neural networks (GNNs), and meta-learning to overcome these limitations. MetaGraphLoc integrates received signal strength indicator measurements with inertial measurement unit data to enhance localization accuracy. Our proposed GNN architecture, featuring dynamic edge construction (DEC), captures the spatial relationships between access points and underlying data patterns. MetaGraphLoc employs a meta-learning framework to adapt the GNN model to new environments with minimal data collection, significantly reducing calibration efforts. Extensive evaluations demonstrate the effectiveness of MetaGraphLoc. Data fusion reduces localization error by 15.92%, underscoring its importance. The GNN with DEC outperforms traditional deep neural networks by up to 30.89%, considering accuracy. Furthermore, the meta-learning approach enables efficient adaptation to new environments, minimizing data collection requirements. These advancements position MetaGraphLoc as a promising solution for indoor localization, paving the way for improved navigation and location-based services in the ever-evolving Internet of Things networks.
Authors: Kun Liu, Jin Zhao
Abstract: In the field of finance, the prediction of individual credit default is of vital importance. However, existing methods face problems such as insufficient interpretability and transparency as well as limited performance when dealing with high-dimensional and nonlinear data. To address these issues, this paper introduces a method based on Kolmogorov-Arnold Networks (KANs). KANs is a new type of neural network architecture with learnable activation functions and no linear weights, which has potential advantages in handling complex multi-dimensional data. Specifically, this paper applies KANs to the field of individual credit risk prediction for the first time and constructs the Kolmogorov-Arnold Credit Default Predict (KACDP) model. Experiments show that the KACDP model outperforms mainstream credit default prediction models in performance metrics (ROC_AUC and F1 values). Meanwhile, through methods such as feature attribution scores and visualization of the model structure, the model's decision-making process and the importance of different features are clearly demonstrated, providing transparent and interpretable decision-making basis for financial institutions and meeting the industry's strict requirements for model interpretability. In conclusion, the KACDP model constructed in this paper exhibits excellent predictive performance and satisfactory interpretability in individual credit risk prediction, providing an effective way to address the limitations of existing methods and offering a new and practical credit risk prediction tool for financial institutions.
Authors: Yong-Yeon Jo, Byeong Tak Lee, Beom Joon Kim, Jeong-Ho Hong, Hak Seung Lee, Joon-myoung Kwon
Abstract: Online Test-Time Adaptation (OTTA) enhances model robustness by updating pre-trained models with unlabeled data during testing. In healthcare, OTTA is vital for real-time tasks like predicting blood pressure from biosignals, which demand continuous adaptation. We introduce a new test-time scenario with streams of unlabeled samples and occasional labeled samples. Our framework combines supervised and self-supervised learning, employing a dual-queue buffer and weighted batch sampling to balance data types. Experiments show improved accuracy and adaptability under real-world conditions.
Authors: Emanuele Aiello, Umberto Michieli, Diego Valsesia, Mete Ozay, Enrico Magli
Abstract: Personalized image generation requires text-to-image generative models that capture the core features of a reference subject to allow for controlled generation across different contexts. Existing methods face challenges due to complex training requirements, high inference costs, limited flexibility, or a combination of these issues. In this paper, we introduce DreamCache, a scalable approach for efficient and high-quality personalized image generation. By caching a small number of reference image features from a subset of layers and a single timestep of the pretrained diffusion denoiser, DreamCache enables dynamic modulation of the generated image features through lightweight, trained conditioning adapters. DreamCache achieves state-of-the-art image and text alignment, utilizing an order of magnitude fewer extra parameters, and is both more computationally effective and versatile than existing models.
Authors: Jiahan Li, Chaoran Cheng, Jianzhu Ma, Ge Liu
Abstract: Shape assembly, which aims to reassemble separate parts into a complete object, has gained significant interest in recent years. Existing methods primarily rely on networks to predict the poses of individual parts, but often fail to effectively capture the geometric interactions between the parts and their poses. In this paper, we present the Geometric Point Attention Transformer (GPAT), a network specifically designed to address the challenges of reasoning about geometric relationships. In the geometric point attention module, we integrate both global shape information and local pairwise geometric features, along with poses represented as rotation and translation vectors for each part. To enable iterative updates and dynamic reasoning, we introduce a geometric recycling scheme, where each prediction is fed into the next iteration for refinement. We evaluate our model on both the semantic and geometric assembly tasks, showing that it outperforms previous methods in absolute pose estimation, achieving accurate pose predictions and high alignment accuracy.
Authors: Selim Furkan Tekin, Fatih Ilhan, Tiansheng Huang, Sihao Hu, Zachary Yahn, Ling Liu
Abstract: Alignment of pretrained LLMs using instruction-based datasets is critical for creating fine-tuned models that reflect human preference. A growing number of alignment-based fine-tuning algorithms and benchmarks emerged recently, fueling the efforts on effective alignments of pre-trained LLMs to ensure helpful, harmless, and honest answers from both open-source and closed-source LLMs. This paper tackles this problem by developing an alignment fusion approach, coined as $H^3$Fusion, with three unique characteristics. First, $H^3$Fusion ensembles multiple individually aligned LLMs to create a final fine-tuned alignment model with enhanced capabilities beyond those of individual models, delivering robust alignment through promoting helpful, harmless, honest fusion. Second, $H^3$Fusion leverages the mixture-of-experts (MoE) methodology in two steps. We first freeze the multi-head attention weights of each individual model while tuning the FFN layer during alignment fusion. Then we merge the aligned model weights with an expert router according to the type of input instruction and dynamically select a subset of experts that are best suited for producing the output response. Finally, we boost the performance of the resulting $H^3$3Fusion model by introducing gating loss and regularization terms. The former penalizes the selection errors of the expert-router, and the latter mediates the expert weights drifting during fine-tuning and dynamically adjusts the fusion behavior of the resulting model by canalizing the activations on the experts. Extensive evaluations on three benchmark datasets show that $H^3$3Fusion is more helpful, less harmful, and more honest from two aspects: it outperforms each individually aligned model by $11.37\%$, and it provides stronger robustness compared to the state-of-the-art LLM ensemble approaches by $13.77\%$. Code is available at github.com/sftekin/h3fusion.
Authors: Jiangbin Zheng, Ge Wang, Han Zhang, Stan Z. Li
Abstract: Computational protein design (CPD) offers transformative potential for bioengineering, but current deep CPD models, focused on universal domains, struggle with function-specific designs. This work introduces a novel CPD paradigm tailored for functional design tasks, particularly for enzymes-a key protein class often lacking specific application efficiency. To address structural data scarcity, we present CrossDesign, a domain-adaptive framework that leverages pretrained protein language models (PPLMs). By aligning protein structures with sequences, CrossDesign transfers pretrained knowledge to structure models, overcoming the limitations of limited structural data. The framework combines autoregressive (AR) and non-autoregressive (NAR) states in its encoder-decoder architecture, applying it to enzyme datasets and pan-proteins. Experimental results highlight CrossDesign's superior performance and robustness, especially with out-of-domain enzymes. Additionally, the model excels in fitness prediction when tested on large-scale mutation data, showcasing its stability.
Authors: Jiangbin Zheng, Qianhui Xu, Ruichen Xia, Stan Z. Li
Abstract: Identifying T-cell receptors (TCRs) that interact with antigenic peptides provides the technical basis for developing vaccines and immunotherapies. The emergent deep learning methods excel at learning antigen binding patterns from known TCRs but struggle with novel or sparsely represented antigens. However, binding specificity for unseen antigens or exogenous peptides is critical. We introduce a domain-adaptive peptide-agnostic learning framework DapPep for universal TCR-antigen binding affinity prediction to address this challenge. The lightweight self-attention architecture combines a pre-trained protein language model with an inner-loop self-supervised regime to enable robust TCR-peptide representations. Extensive experiments on various benchmarks demonstrate that DapPep consistently outperforms existing tools, showcasing robust generalization capability, especially for data-scarce settings and unseen peptides. Moreover, DapPep proves effective in challenging clinical tasks such as sorting reactive T cells in tumor neoantigen therapy and identifying key positions in 3D structures.
Authors: Aman Sinha, Payam Nikdel, Supratik Paul, Shimon Whiteson
Abstract: Ensuring the safety of autonomous vehicles (AVs) requires both accurate estimation of their performance and efficient discovery of potential failure cases. This paper introduces Bayesian adaptive multifidelity sampling (BAMS), which leverages the power of adaptive Bayesian sampling to achieve efficient discovery while simultaneously estimating the rate of adverse events. BAMS prioritizes exploration of regions with potentially low performance, leading to the identification of novel and critical scenarios that traditional methods might miss. Using real-world AV data we demonstrate that BAMS discovers 10 times as many issues as Monte Carlo (MC) and importance sampling (IS) baselines, while at the same time generating rate estimates with variances 15 and 6 times narrower than MC and IS baselines respectively.
Authors: Jeovane Honorio Alves, Radu State, Cinthia Obladen de Almendra Freitas, Jean Paul Barddal
Abstract: In an era of information overload, manually annotating the vast and growing corpus of documents and scholarly papers is increasingly impractical. Automated keyphrase extraction addresses this challenge by identifying representative terms within texts. However, most existing methods focus on short documents (up to 512 tokens), leaving a gap in processing long-context documents. In this paper, we introduce LongKey, a novel framework for extracting keyphrases from lengthy documents, which uses an encoder-based language model to capture extended text intricacies. LongKey uses a max-pooling embedder to enhance keyphrase candidate representation. Validated on the comprehensive LDKP datasets and six diverse, unseen datasets, LongKey consistently outperforms existing unsupervised and language model-based keyphrase extraction methods. Our findings demonstrate LongKey's versatility and superior performance, marking an advancement in keyphrase extraction for varied text lengths and domains.
Authors: Marco Colussi, Sergio Mascetti, Jose Dolz, Christian Desrosiers
Abstract: The remarkable progress in deep learning (DL) showcases outstanding results in various computer vision tasks. However, adaptation to real-time variations in data distributions remains an important challenge. Test-Time Training (TTT) was proposed as an effective solution to this issue, which increases the generalization ability of trained models by adding an auxiliary task at train time and then using its loss at test time to adapt the model. Inspired by the recent achievements of contrastive representation learning in unsupervised tasks, we propose ReC-TTT, a test-time training technique that can adapt a DL model to new unseen domains by generating discriminative views of the input data. ReC-TTT uses cross-reconstruction as an auxiliary task between a frozen encoder and two trainable encoders, taking advantage of a single shared decoder. This enables, at test time, to adapt the encoders to extract features that will be correctly reconstructed by the decoder that, in this phase, is frozen on the source domain. Experimental results show that ReC-TTT achieves better results than other state-of-the-art techniques in most domain shift classification challenges.
Authors: Haniyeh Ehsani Oskouie, Christina Chance, Claire Huang, Margaret Capetz, Elizabeth Eyeson, Majid Sarrafzadeh
Abstract: Content moderation and toxicity classification represent critical tasks with significant social implications. However, studies have shown that major classification models exhibit tendencies to magnify or reduce biases and potentially overlook or disadvantage certain marginalized groups within their classification processes. Researchers suggest that the positionality of annotators influences the gold standard labels in which the models learned from propagate annotators' bias. To further investigate the impact of annotator positionality, we delve into fine-tuning BERTweet and HateBERT on the dataset while using topic-modeling strategies for content moderation. The results indicate that fine-tuning the models on specific topics results in a notable improvement in the F1 score of the models when compared to the predictions generated by other prominent classification models such as GPT-4, PerspectiveAPI, and RewireAPI. These findings further reveal that the state-of-the-art large language models exhibit significant limitations in accurately detecting and interpreting text toxicity contrasted with earlier methodologies. Code is available at https://github.com/aheldis/Toxicity-Classification.git.
URLs: https://github.com/aheldis/Toxicity-Classification.git.
Authors: Meng Wang, Zach Noonan, Pnina Gershon, Shannon C. Roberts
Abstract: Predicting crash likelihood in complex driving environments is essential for improving traffic safety and advancing autonomous driving. Previous studies have used statistical models and deep learning to predict crashes based on semantic, contextual, or driving features, but none have examined the combined influence of these factors, termed roadway complexity in this study. This paper introduces a two-stage framework that integrates roadway complexity features for crash prediction. In the first stage, an encoder extracts hidden contextual information from these features, generating complexity-infused features. The second stage uses both original and complexity-infused features to predict crash likelihood, achieving an accuracy of 87.98% with original features alone and 90.15% with the added complexity-infused features. Ablation studies confirm that a combination of semantic, driving, and contextual features yields the best results, which emphasize their role in capturing roadway complexity. Additionally, complexity index annotations generated by Large Language Models outperform those by Amazon Mechanical Turk, highlighting the potential of automated tools for accurate, scalable crash prediction systems.
Authors: Muhammad Waseem Akram, Marco Vannucci, Giorgio Buttazzo, Valentina Colla, Stefano Roccella, Andrea Vannini, Giovanni Caruso, Simone Nesi, Alessandra Francini, Luca Sebastiani
Abstract: The leaf area index determines crop health and growth. Traditional methods for calculating it are time-consuming, destructive, costly, and limited to a scale. In this study, we automate the index estimation method using drone image data of grapevine plants and a machine learning model. Traditional feature extraction and deep learning methods are used to obtain helpful information from the data and enhance the performance of the different machine learning models employed for the leaf area index prediction. The results showed that deep learning based feature extraction is more effective than traditional methods. The new approach is a significant improvement over old methods, offering a faster, non-destructive, and cost-effective leaf area index calculation, which enhances precision agriculture practices.
Authors: Yannay Alon, Steve Hanneke, Shay Moran, Uri Shalit
Abstract: Classic supervised learning involves algorithms trained on $n$ labeled examples to produce a hypothesis $h \in \mathcal{H}$ aimed at performing well on unseen examples. Meta-learning extends this by training across $n$ tasks, with $m$ examples per task, producing a hypothesis class $\mathcal{H}$ within some meta-class $\mathbb{H}$. This setting applies to many modern problems such as in-context learning, hypernetworks, and learning-to-learn. A common method for evaluating the performance of supervised learning algorithms is through their learning curve, which depicts the expected error as a function of the number of training examples. In meta-learning, the learning curve becomes a two-dimensional learning surface, which evaluates the expected error on unseen domains for varying values of $n$ (number of tasks) and $m$ (number of training examples). Our findings characterize the distribution-free learning surfaces of meta-Empirical Risk Minimizers when either $m$ or $n$ tend to infinity: we show that the number of tasks must increase inversely with the desired error. In contrast, we show that the number of examples exhibits very different behavior: it satisfies a dichotomy where every meta-class conforms to one of the following conditions: (i) either $m$ must grow inversely with the error, or (ii) a \emph{finite} number of examples per task suffices for the error to vanish as $n$ goes to infinity. This finding illustrates and characterizes cases in which a small number of examples per task is sufficient for successful learning. We further refine this for positive values of $\varepsilon$ and identify for each $\varepsilon$ how many examples per task are needed to achieve an error of $\varepsilon$ in the limit as the number of tasks $n$ goes to infinity. We achieve this by developing a necessary and sufficient condition for meta-learnability using a bounded number of examples per domain.
Authors: Daniel Weitekamp, Erik Harpstead, Kenneth Koedinger
Abstract: AI2T is an interactively teachable AI for authoring intelligent tutoring systems (ITSs). Authors tutor AI2T by providing a few step-by-step solutions and then grading AI2T's own problem-solving attempts. From just 20-30 minutes of interactive training, AI2T can induce robust rules for step-by-step solution tracking (i.e., model-tracing). As AI2T learns it can accurately estimate its certainty of performing correctly on unseen problem steps using STAND: a self-aware precondition learning algorithm that outperforms state-of-the-art methods like XGBoost. Our user study shows that authors can use STAND's certainty heuristic to estimate when AI2T has been trained on enough diverse problems to induce correct and complete model-tracing programs. AI2T-induced programs are more reliable than hallucination-prone LLMs and prior authoring-by-tutoring approaches. With its self-aware induction of hierarchical rules, AI2T offers a path toward trustable data-efficient authoring-by-tutoring for complex ITSs that normally require as many as 200-300 hours of programming per hour of instruction.
Authors: Sankalp Sinha, Mohammad Sadil Khan, Muhammad Usama, Shino Sam, Didier Stricker, Sk Aziz Ali, Muhammad Zeshan Afzal
Abstract: Generating high-fidelity 3D content from text prompts remains a significant challenge in computer vision due to the limited size, diversity, and annotation depth of the existing datasets. To address this, we introduce MARVEL-40M+, an extensive dataset with 40 million text annotations for over 8.9 million 3D assets aggregated from seven major 3D datasets. Our contribution is a novel multi-stage annotation pipeline that integrates open-source pretrained multi-view VLMs and LLMs to automatically produce multi-level descriptions, ranging from detailed (150-200 words) to concise semantic tags (10-20 words). This structure supports both fine-grained 3D reconstruction and rapid prototyping. Furthermore, we incorporate human metadata from source datasets into our annotation pipeline to add domain-specific information in our annotation and reduce VLM hallucinations. Additionally, we develop MARVEL-FX3D, a two-stage text-to-3D pipeline. We fine-tune Stable Diffusion with our annotations and use a pretrained image-to-3D network to generate 3D textured meshes within 15s. Extensive evaluations show that MARVEL-40M+ significantly outperforms existing datasets in annotation quality and linguistic diversity, achieving win rates of 72.41% by GPT-4 and 73.40% by human evaluators.
Authors: Seungyeon Kim, Wheesung Lee, Sung-Ho Ahn, Do-Eun Lee, Tae-Rin Lee
Abstract: Accurate prediction of cerebral blood flow is essential for the diagnosis and treatment of cerebrovascular diseases. Traditional computational methods, however, often incur significant computational costs, limiting their practicality in real-time clinical applications. This paper proposes a graph neural network (GNN) to predict blood flow and pressure in previously unseen cerebral vascular network structures that were not included in training data. The GNN was developed using clinical datasets from patients with stenosis, featuring complex and abnormal vascular geometries. Additionally, the GNN model was trained on data incorporating a wide range of inflow conditions, vessel topologies, and network connectivities to enhance its generalization capability. The approach achieved Pearson's correlation coefficients of 0.727 for pressure and 0.824 for flow rate, with sufficient training data. These findings demonstrate the potential of the GNN for real-time cerebrovascular diagnostics, particularly in handling intricate and pathological vascular networks.
Authors: Zhenyu Yu
Abstract: The forest serves as the most significant terrestrial carbon stock mechanism, effectively reducing atmospheric CO$_2$ concentrations and mitigating climate change. Remote sensing provides high data accuracy and enables large-scale observations. Optical images facilitate long-term monitoring, which is crucial for future carbon stock estimation studies. This study focuses on Huize County, Qujing City, Yunnan Province, China, utilizing GF-1 WFV satellite imagery. The KD-VGG and KD-UNet modules were introduced for initial feature extraction, and the improved implicit diffusion model (IIDM) was proposed. The results showed: (1) The VGG module improved initial feature extraction, improving accuracy, and reducing inference time with optimized model parameters. (2) The Cross-attention + MLPs module enabled effective feature fusion, establishing critical relationships between global and local features, achieving high-accuracy estimation. (3) The IIDM model, a novel contribution, demonstrated the highest estimation accuracy with an RMSE of 12.17\%, significantly improving by 41.69\% to 42.33\% compared to the regression model. In carbon stock estimation, the generative model excelled in extracting deeper features, significantly outperforming other models, demonstrating the feasibility of AI-generated content in quantitative remote sensing. The 16-meter resolution estimates provide a robust basis for tailoring forest carbon sink regulations, enhancing regional carbon stock management.
Authors: Tian Bai, Ying Jin
Abstract: Model selection/optimization in conformal inference is challenging, since it may break the exchangeability between labeled and unlabeled data. We study this problem in the context of conformal selection, which uses conformal p-values to select ``interesting'' instances with large unobserved labels from a pool of unlabeled data, while controlling the FDR in finite sample. For validity, existing solutions require the model choice to be independent of the data used to construct the p-values and calibrate the selection set. However, when presented with many model choices and limited labeled data, it is desirable to (i) select the best model in a data-driven manner, and (ii) mitigate power loss due to sample splitting. This paper presents OptCS, a general framework that allows valid statistical testing (selection) after flexible data-driven model optimization. We introduce general conditions under which OptCS constructs valid conformal p-values despite substantial data reuse and handles complex p-value dependencies to maintain finite-sample FDR control via a novel multiple testing procedure. We instantiate this general recipe to propose three FDR-controlling procedures, each optimizing the models differently: (i) selecting the most powerful one among multiple pre-trained candidate models, (ii) using all data for model fitting without sample splitting, and (iii) combining full-sample model fitting and selection. We demonstrate the efficacy of our methods via simulation studies and real applications in drug discovery and alignment of large language models in radiology report generation.
Authors: Andreas Madsen
Abstract: As machine learning becomes more widespread and is used in more critical applications, it's important to provide explanations for these models, to prevent unintended behavior. Unfortunately, many current interpretability methods struggle with faithfulness. Therefore, this Ph.D. thesis investigates the question "How to provide and ensure faithful explanations for complex general-purpose neural NLP models?" The main thesis is that we should develop new paradigms in interpretability. This is achieved by first developing solid faithfulness metrics and then applying the lessons learned from this investigation to develop new paradigms. The two new paradigms explored are faithfulness measurable models (FMMs) and self-explanations. The idea in self-explanations is to have large language models explain themselves, we identify that current models are not capable of doing this consistently. However, we suggest how this could be achieved. The idea of FMMs is to create models that are designed such that measuring faithfulness is cheap and precise. This makes it possible to optimize an explanation towards maximum faithfulness, which makes FMMs designed to be explained. We find that FMMs yield explanations that are near theoretical optimal in terms of faithfulness. Overall, from all investigations of faithfulness, results show that post-hoc and intrinsic explanations are by default model and task-dependent. However, this was not the case when using FMMs, even with the same post-hoc explanation methods. This shows, that even simple modifications to the model, such as randomly masking the training dataset, as was done in FMMs, can drastically change the situation and result in consistently faithful explanations. This answers the question of how to provide and ensure faithful explanations.
Authors: Weiwen Yuan, Jinke Ren, Chongjie Wang, Ruichen Zhang, Jun Wei, Dong In Kim, Shuguang Cui
Abstract: Semantic communication has emerged as a promising technology for enhancing communication efficiency. However, most existing research emphasizes single-task reconstruction, neglecting model adaptability and generalization across multi-task systems. In this paper, we propose a novel generative semantic communication system that supports both image reconstruction and segmentation tasks. Our approach builds upon semantic knowledge bases (KBs) at both the transmitter and receiver, with each semantic KB comprising a source KB and a task KB. The source KB at the transmitter leverages a hierarchical Swin-Transformer, a generative AI scheme, to extract multi-level features from the input image. Concurrently, the counterpart source KB at the receiver utilizes hierarchical residual blocks to generate task-specific knowledge. Furthermore, the two task KBs adopt a semantic similarity model to map different task requirements into pre-defined task instructions, thereby facilitating the feature selection of the source KBs. Additionally, we develop a unified residual block-based joint source and channel (JSCC) encoder and two task-specific JSCC decoders to achieve the two image tasks. In particular, a generative diffusion model is adopted to construct the JSCC decoder for the image reconstruction task. Experimental results demonstrate that our multi-task generative semantic communication system outperforms previous single-task communication systems in terms of peak signal-to-noise ratio and segmentation accuracy.
Authors: Jonathan Soriano, Srinath Saikrishnan, Vikram Seenivasan, Bernie Boscoe, Jack Singal, Tuan Do
Abstract: In this work, we explore methods to improve galaxy redshift predictions by combining different ground truths. Traditional machine learning models rely on training sets with known spectroscopic redshifts, which are precise but only represent a limited sample of galaxies. To make redshift models more generalizable to the broader galaxy population, we investigate transfer learning and directly combining ground truth redshifts derived from photometry and spectroscopy. We use the COSMOS2020 survey to create a dataset, TransferZ, which includes photometric redshift estimates derived from up to 35 imaging filters using template fitting. This dataset spans a wider range of galaxy types and colors compared to spectroscopic samples, though its redshift estimates are less accurate. We first train a base neural network on TransferZ and then refine it using transfer learning on a dataset of galaxies with more precise spectroscopic redshifts (GalaxiesML). In addition, we train a neural network on a combined dataset of TransferZ and GalaxiesML. Both methods reduce bias by $\sim$ 5x, RMS error by $\sim$ 1.5x, and catastrophic outlier rates by 1.3x on GalaxiesML, compared to a baseline trained only on TransferZ. However, we also find a reduction in performance for RMS and bias when evaluated on TransferZ data. Overall, our results demonstrate these approaches can meet cosmological requirements.
Authors: Yalcin Tur, Vedat Cicek, Tufan Cinar, Elif Keles, Bradlay D. Allen, Hatice Savas, Gorkem Durak, Alpay Medetalibeyoglu, Ulas Bagci
Abstract: Pulmonary Embolism (PE) is a serious cardiovascular condition that remains a leading cause of mortality and critical illness, underscoring the need for enhanced diagnostic strategies. Conventional clinical methods have limited success in predicting 30-day in-hospital mortality of PE patients. In this study, we present a new algorithm, called PEP-Net, for 30-day mortality prediction of PE patients based on the initial imaging data (CT) that opportunistically integrates a 3D Residual Network (3DResNet) with Extreme Gradient Boosting (XGBoost) algorithm with patient level binary labels without annotations of the emboli and its extent. Our proposed system offers a comprehensive prediction strategy by handling class imbalance problems, reducing overfitting via regularization, and reducing the prediction variance for more stable predictions. PEP-Net was tested in a cohort of 193 volumetric CT scans diagnosed with Acute PE, and it demonstrated a superior performance by significantly outperforming baseline models (76-78\%) with an accuracy of 94.5\% (+/-0.3) and 94.0\% (+/-0.7) when the input image is either lung region (Lung-ROI) or heart region (Cardiac-ROI). Our results advance PE prognostics by using only initial imaging data, setting a new benchmark in the field. While purely deep learning models have become the go-to for many medical classification (diagnostic) tasks, combined ResNet and XGBoost models herein outperform sole deep learning models due to a potential reason for having lack of enough data.
Authors: Akshat Sharma, Hangliang Ding, Jianping Li, Neel Dani, Minjia Zhang
Abstract: How to efficiently serve LLMs in practice has become exceptionally challenging due to their prohibitive memory and computation requirements. In this study, we investigate optimizing the KV cache, whose memory footprint poses a critical bottleneck in LLM inference, especially when dealing with long context tasks. To tackle the challenge, we introduce MiniKV, a KV cache optimization method that simultaneously preserves long context task accuracy while significantly reducing KV cache size via a novel 2-bit layer-discriminative KV cache. More importantly, we develop specialized CUDA kernels to make MiniKV compatible with FlashAttention. Experiments on a wide range of long context tasks show that MiniKV effectively achieves 86% KV cache compression ratio while recovering over 98.5% of accuracy, outperforming state-of-the-art methods while achieving excellent measured system performance improvements.
Authors: Weiqin Zhao, Ziyu Guo, Yinshuang Fan, Yuming Jiang, Maximus Yeung, Lequan Yu
Abstract: Due to the large size and lack of fine-grained annotation, Whole Slide Images (WSIs) analysis is commonly approached as a Multiple Instance Learning (MIL) problem. However, previous studies only learn from training data, posing a stark contrast to how human clinicians teach each other and reason about histopathologic entities and factors. Here we present a novel knowledge concept-based MIL framework, named ConcepPath to fill this gap. Specifically, ConcepPath utilizes GPT-4 to induce reliable diseasespecific human expert concepts from medical literature, and incorporate them with a group of purely learnable concepts to extract complementary knowledge from training data. In ConcepPath, WSIs are aligned to these linguistic knowledge concepts by utilizing pathology vision-language model as the basic building component. In the application of lung cancer subtyping, breast cancer HER2 scoring, and gastric cancer immunotherapy-sensitive subtyping task, ConcepPath significantly outperformed previous SOTA methods which lack the guidance of human expert knowledge.
Authors: Yifan Zhang
Abstract: The rapid advancement of large language models (LLMs) such as GPT-3, PaLM, and Llama has significantly transformed natural language processing, showcasing remarkable capabilities in understanding and generating language. However, these models often struggle with tasks requiring complex reasoning, particularly in mathematical problem-solving, due in part to the scarcity of large-scale, high-quality, domain-specific datasets necessary for training sophisticated reasoning abilities. To address this limitation, we introduce Template-based Data Generation (TDG), a novel approach that leverages LLMs (GPT-4) to automatically generate parameterized meta-templates, which are then used to synthesize a vast array of high-quality problems and solutions. Leveraging TDG, we create TemplateMath Part I: TemplateGSM, a dataset comprising over 7 million synthetically generated grade school math problems--each accompanied by code-based and natural language solutions--with the potential to generate an effectively unlimited number more. This dataset alleviates the scarcity of large-scale mathematical datasets and serves as a valuable resource for pre-training, fine-tuning, and evaluating LLMs in mathematical reasoning. Our method not only enables the generation of virtually infinite data but also elevates data augmentation to a new level by using GPT-4 for meta-template generation, ensuring diverse and high-quality problem structures. The TemplateMath Part I: TemplateGSM dataset is publicly available at https://huggingface.co/datasets/math-ai/TemplateGSM. The code is available at https://github.com/iiis-ai/TemplateMath.
URLs: https://huggingface.co/datasets/math-ai/TemplateGSM., https://github.com/iiis-ai/TemplateMath.
Authors: Silvan K\"aser, Debasish Koner, Markus Meuwly
Abstract: Atomistic simulations are a powerful tool for studying the dynamics of molecules, proteins, and materials on wide time and length scales. Their reliability and predictiveness, however, depend directly on the accuracy of the underlying potential energy surface (PES). Guided by the principle of parsimony this work introduces KerNN, a combined kernel/neural network-based approach to represent molecular PESs. Compared to state-of-the-art neural network PESs the number of learnable parameters of KerNN is significantly reduced. This speeds up training and evaluation times by several orders of magnitude while retaining high prediction accuracy. Importantly, using kernels as the features also improves the extrapolation capabilities of KerNN far beyond the coverage provided by the training data which solves a general problem of NN-based PESs. KerNN applied to spectroscopy and reaction dynamics shows excellent performance on test set statistics and observables including vibrational bands computed from classical and quantum simulations.
Authors: Muhammad Al-Zafar Khan, Jamal Al-Karaki, Marwan Omar
Abstract: In this study, we consider a real-world application of QML techniques to study water quality in the U20A region in Durban, South Africa. Specifically, we applied the quantum support vector classifier (QSVC) and quantum neural network (QNN), and we showed that the QSVC is easier to implement and yields a higher accuracy. The QSVC models were applied for three kernels: Linear, polynomial, and radial basis function (RBF), and it was shown that the polynomial and RBF kernels had exactly the same performance. The QNN model was applied using different optimizers, learning rates, noise on the circuit components, and weight initializations were considered, but the QNN persistently ran into the dead neuron problem. Thus, the QNN was compared only by accraucy and loss, and it was shown that with the Adam optimizer, the model has the best performance, however, still less than the QSVC.
Authors: Felix Igelbrink, Marian Renz, Martin G\"unther, Piper Powell, Lennart Niecksch, Oscar Lima, Martin Atzmueller, Joachim Hertzberg
Abstract: Semantic mapping is a key component of robots operating in and interacting with objects in structured environments. Traditionally, geometric and knowledge representations within a semantic map have only been loosely integrated. However, recent advances in deep learning now allow full integration of prior knowledge, represented as knowledge graphs or language concepts, into sensor data processing and semantic mapping pipelines. Semantic scene graphs and language models enable modern semantic mapping approaches to incorporate graph-based prior knowledge or to leverage the rich information in human language both during and after the mapping process. This has sparked substantial advances in semantic mapping, leading to previously impossible novel applications. This survey reviews these recent developments comprehensively, with a focus on online integration of knowledge into semantic mapping. We specifically focus on methods using semantic scene graphs for integrating symbolic prior knowledge and language models for respective capture of implicit common-sense knowledge and natural language concepts
Authors: Ehsan Kabir, Austin R. J. Downey, Jason D. Bakos, David Andrews, Miaoqing Huang
Abstract: Transformer neural networks (TNN) excel in natural language processing (NLP), machine translation, and computer vision (CV) without relying on recurrent or convolutional layers. However, they have high computational and memory demands, particularly on resource-constrained devices like FPGAs. Moreover, transformer models vary in processing time across applications, requiring custom models with specific parameters. Designing custom accelerators for each model is complex and time-intensive. Some custom accelerators exist with no runtime adaptability, and they often rely on sparse matrices to reduce latency. However, hardware designs become more challenging due to the need for application-specific sparsity patterns. This paper introduces ADAPTOR, a runtime-adaptive accelerator for dense matrix computations in transformer encoders and decoders on FPGAs. ADAPTOR enhances the utilization of processing elements and on-chip memory, enhancing parallelism and reducing latency. It incorporates efficient matrix tiling to distribute resources across FPGA platforms and is fully quantized for computational efficiency and portability. Evaluations on Xilinx Alveo U55C data center cards and embedded platforms like VC707 and ZCU102 show that our design is 1.2$\times$ and 2.87$\times$ more power efficient than the NVIDIA K80 GPU and the i7-8700K CPU respectively. Additionally, it achieves a speedup of 1.7 to 2.25$\times$ compared to some state-of-the-art FPGA-based accelerators.
Authors: Mohamad Abubaker, Zubayda Alsadder, Hamed Abdelhaq, Maik Boltes, Ahmed Alia
Abstract: The automatic detection of pedestrian heads in crowded environments is essential for crowd analysis and management tasks, particularly in high-risk settings such as railway platforms and event entrances. These environments, characterized by dense crowds and dynamic movements, are underrepresented in public datasets, posing challenges for existing deep learning models. To address this gap, we introduce the Railway Platforms and Event Entrances-Heads (RPEE-Heads) dataset, a novel, diverse, high-resolution, and accurately annotated resource. It includes 109,913 annotated pedestrian heads across 1,886 images from 66 video recordings, with an average of 56.2 heads per image. Annotations include bounding boxes for visible head regions. In addition to introducing the RPEE-Heads dataset, this paper evaluates eight state-of-the-art object detection algorithms using the RPEE-Heads dataset and analyzes the impact of head size on detection accuracy. The experimental results show that You Only Look Once v9 and Real-Time Detection Transformer outperform the other algorithms, achieving mean average precisions of 90.7% and 90.8%, with inference times of 11 and 14 milliseconds, respectively. Moreover, the findings underscore the need for specialized datasets like RPEE-Heads for training and evaluating accurate models for head detection in railway platforms and event entrances. The dataset and pretrained models are available at https://doi.org/10.34735/ped.2024.2.
Authors: Samuele Pasini, Jinhan Kim, Tommaso Aiello, Rocio Cabrera Lozoya, Antonino Sabetta, Paolo Tonella
Abstract: Large Language Models (LLMs) are increasingly used in software development to generate functions, such as attack detectors, that implement security requirements. However, LLMs struggle to generate accurate code, resulting, e.g., in attack detectors that miss well-known attacks when used in practice. This is most likely due to the LLM lacking knowledge about some existing attacks and to the generated code being not evaluated in real usage scenarios. We propose a novel approach integrating Retrieval Augmented Generation (RAG) and Self-Ranking into the LLM pipeline. RAG enhances the robustness of the output by incorporating external knowledge sources, while the Self-Ranking technique, inspired to the concept of Self-Consistency, generates multiple reasoning paths and creates ranks to select the most robust detector. Our extensive empirical study targets code generated by LLMs to detect two prevalent injection attacks in web security: Cross-Site Scripting (XSS) and SQL injection (SQLi). Results show a significant improvement in detection performance compared to baselines, with an increase of up to 71%pt and 37%pt in the F2-Score for XSS and SQLi detection, respectively.
Authors: Aladin Djuhera, Vlad C. Andrei, Mohsen Pourghasemian, Haris Gacanin, Holger Boche, Walid Saad
Abstract: Multi-task large language models (MTLLMs) are important for many applications at the wireless edge, where users demand specialized models to handle multiple tasks efficiently. However, training MTLLMs is complex and exhaustive, particularly when tasks are subject to change. Recently, the concept of model fusion via task vectors has emerged as an efficient approach for combining fine-tuning parameters to produce an MTLLM. In this paper, the problem of enabling edge users to collaboratively craft such MTTLMs via tasks vectors is studied, under the assumption of worst-case adversarial attacks. To this end, first the influence of adversarial noise to multi-task model fusion is investigated and a relationship between the so-called weight disentanglement error and the mean squared error (MSE) is derived. Using hypothesis testing, it is directly shown that the MSE increases interference between task vectors, thereby rendering model fusion ineffective. Then, a novel resilient MTLLM fusion (R-MTLLMF) is proposed, which leverages insights about the LLM architecture and fine-tuning process to safeguard task vector aggregation under adversarial noise by realigning the MTLLM. The proposed R-MTLLMF is then compared for both worst-case and ideal transmission scenarios to study the impact of the wireless channel. Extensive model fusion experiments with vision LLMs demonstrate R-MTLLMF's effectiveness, achieving close-to-baseline performance across eight different tasks in ideal noise scenarios and significantly outperforming unprotected model fusion in worst-case scenarios. The results further advocate for additional physical layer protection for a holistic approach to resilience, from both a wireless and LLM perspective.
Authors: Ruslan Idelfonso Magana Vsevolodovna
Abstract: Integrating new features into existing software projects can be a complex and time-consuming process. Feature-Factory leverages Generative AI with WatsonX.ai to automate the analysis, planning, and implementation of feature requests. By combining advanced project parsing, dependency resolution, and AI-generated code, the program ensures seamless integration of features into software systems while maintaining structural integrity. This paper presents the methodology, mathematical model, and results of the Feature-Factory framework.
Authors: Nadine M. Trummer, Amit Reza, Michael A. Steindorfer, Christiane Helling
Abstract: The growing number of man-made debris in Earth's orbit poses a threat to active satellite missions due to the risk of collision. Characterizing unknown debris is, therefore, of high interest. Light Curves (LCs) are temporal variations of object brightness and have been shown to contain information such as shape, attitude, and rotational state. Since 2015, the Satellite Laser Ranging (SLR) group of Space Research Institute (IWF) Graz has been building a space debris LC catalogue. The LCs are captured on a Single Photon basis, which sets them apart from CCD-based measurements. In recent years, Machine Learning (ML) models have emerged as a viable technique for analyzing LCs. This work aims to classify Single Photon Space Debris using the ML framework. We have explored LC classification using k-Nearest Neighbour (k-NN), Random Forest (RDF), XGBoost (XGB), and Convolutional Neural Network (CNN) classifiers in order to assess the difference in performance between traditional and deep models. Instead of performing classification on the direct LCs data, we extracted features from the data first using an automated pipeline. We apply our models on three tasks, which are classifying individual objects, objects grouped into families according to origin (e.g., GLONASS satellites), and grouping into general types (e.g., rocket bodies). We successfully classified Space Debris LCs captured on Single Photon basis, obtaining accuracies as high as 90.7%. Further, our experiments show that the classifiers provide better classification accuracy with automated extracted features than other methods.
Authors: Daniel Scalena, Elisabetta Fersini, Malvina Nissim
Abstract: Adapting models to a language that was only partially present in the pre-training data requires fine-tuning, which is expensive in terms of both data and computational resources. As an alternative to fine-tuning, we explore the potential of activation steering-based techniques to enhance model performance on Italian tasks. Through our experiments we show that Italian steering (i) can be successfully applied to different models, (ii) achieves performances comparable to, or even better than, fine-tuned models for Italian, and (iii) yields higher quality and consistency in Italian generations. We also discuss the utility of steering and fine-tuning in the contemporary LLM landscape where models are anyway getting high Italian performances even if not explicitly trained in this language.
Authors: Xiaohan Yu, Li Zhang, Xin Zhao, Yue Wang
Abstract: The recent breakthrough of large language models (LLMs) in natural language processing has sparked exploration in recommendation systems, however, their limited domain-specific knowledge remains a critical bottleneck. Specifically, LLMs lack key pieces of information crucial for sequential recommendations, such as user behavior patterns. To address this critical gap, we propose IDLE-Adapter, a novel framework that integrates pre-trained ID embeddings, rich in domain-specific knowledge, into LLMs to improve recommendation accuracy. IDLE-Adapter acts as a bridge, transforming sparse user-item interaction data into dense, LLM-compatible representations through a Pre-trained ID Sequential Model, Dimensionality Alignment, Layer-wise Embedding Refinement, and Layer-wise Distribution Alignment. Furthermore, IDLE-Adapter demonstrates remarkable flexibility by seamlessly integrating ID embeddings from diverse ID-based sequential models and LLM architectures. Extensive experiments across various datasets demonstrate the superiority of IDLE-Adapter, achieving over 10\% and 20\% improvements in HitRate@5 and NDCG@5 metrics, respectively, compared to state-of-the-art methods.
Authors: \v{S}imon Sedl\'a\v{c}ek, Santosh Kesiraju, Alexander Polok, Jan \v{C}ernock\'y
Abstract: This paper investigates a novel approach to end-to-end speech translation (ST) based on aligning frozen pre-trained automatic speech recognition (ASR) and machine translation (MT) models via a small connector module (Q-Former, our Subsampler-Transformer Encoder). This connector bridges the gap between the speech and text modalities, transforming ASR encoder embeddings into the latent representation space of the MT encoder while being the only part of the system optimized during training. Experiments are conducted on the How2 English-Portuguese dataset as we investigate the alignment approach in a small-scale scenario focusing on ST. While keeping the size of the connector module constant and small in comparison ( < 5% of the size of the larger aligned models), increasing the size and capability of the foundation ASR and MT models universally improves translation results. We also find that the connectors can serve as domain adapters for the foundation MT models, significantly improving translation performance in the aligned ST setting. We conclude that this approach represents a viable and scalable approach to training end-to-end ST systems.
Authors: Esmaeel Mohammadi, Daniel Ortiz-Arroyo, Aviaja Anna Hansen, Mikkel Stokholm-Bjerregaard, Sebastien Gros, Akhil S Anand, Petar Durdevic
Abstract: Wastewater treatment plants face unique challenges for process control due to their complex dynamics, slow time constants, and stochastic delays in observations and actions. These characteristics make conventional control methods, such as Proportional-Integral-Derivative controllers, suboptimal for achieving efficient phosphorus removal, a critical component of wastewater treatment to ensure environmental sustainability. This study addresses these challenges using a novel deep reinforcement learning approach based on the Soft Actor-Critic algorithm, integrated with a custom simulator designed to model the delayed feedback inherent in wastewater treatment plants. The simulator incorporates Long Short-Term Memory networks for accurate multi-step state predictions, enabling realistic training scenarios. To account for the stochastic nature of delays, agents were trained under three delay scenarios: no delay, constant delay, and random delay. The results demonstrate that incorporating random delays into the reinforcement learning framework significantly improves phosphorus removal efficiency while reducing operational costs. Specifically, the delay-aware agent achieved 36% reduction in phosphorus emissions, 55% higher reward, 77% lower target deviation from the regulatory limit, and 9% lower total costs than traditional control methods in the simulated environment. These findings underscore the potential of reinforcement learning to overcome the limitations of conventional control strategies in wastewater treatment, providing an adaptive and cost-effective solution for phosphorus removal.
Authors: Lara Scavuzzo, Karen Aardal, Neil Yorke-Smith
Abstract: Modern Mixed Integer Linear Programming (MILP) solvers use the Branch-and-Bound algorithm together with a plethora of auxiliary components that speed up the search. In recent years, there has been an explosive development in the use of machine learning for enhancing and supporting these algorithmic components. Within this line, we propose a methodology for predicting the optimal objective value, or, equivalently, predicting if the current incumbent is optimal. For this task, we introduce a predictor based on a graph neural network (GNN) architecture, together with a set of dynamic features. Experimental results on diverse benchmarks demonstrate the efficacy of our approach, achieving high accuracy in the prediction task and outperforming existing methods. These findings suggest new opportunities for integrating ML-driven predictions into MILP solvers, enabling smarter decision-making and improved performance.
Authors: Mathurin Videau, Alessandro Leite, Marc Schoenauer, Olivier Teytaud
Abstract: Mixture-of-Experts (MoE) models have shown promising potential for parameter-efficient scaling across various domains. However, the implementation in computer vision remains limited, and often requires large-scale datasets comprising billions of samples. In this study, we investigate the integration of MoE within computer vision models and explore various MoE configurations on open datasets. When introducing MoE layers in image classification, the best results are obtained for models with a moderate number of activated parameters per sample. However, such improvements gradually vanish when the number of parameters per sample increases.
Authors: Luis Eduardo Pessoa, Cristovao Freitas Iglesias Jr, Claudio Miceli
Abstract: Designing resilient Internet of Things (IoT) systems requires i) identification of IoT Critical Objects (ICOs) such as services, devices, and resources, ii) threat analysis, and iii) mitigation strategy selection. However, the traditional process for designing resilient IoT systems is still manual, leading to inefficiencies and increased risks. In addition, while tools such as ChatGPT could support this manual and highly error-prone process, their use raises concerns over data privacy, inconsistent outputs, and internet dependence. Therefore, we propose RITA, an automated, open-source framework that uses a fine-tuned RoBERTa-based Named Entity Recognition (NER) model to identify ICOs from IoT requirement documents, correlate threats, and recommend countermeasures. RITA operates entirely offline and can be deployed on-site, safeguarding sensitive information and delivering consistent outputs that enhance standardization. In our empirical evaluation, RITA outperformed ChatGPT in four of seven ICO categories, particularly in actuator, sensor, network resource, and service identification, using both human-annotated and ChatGPT-generated test data. These findings indicate that RITA can improve resilient IoT design by effectively supporting key security operations, offering a practical solution for developing robust IoT architectures.
Authors: Amruta Parulekar, Abhishek Gupta, Sameep Chattopadhyay, Preethi Jyothi
Abstract: Spontaneous or conversational multilingual speech presents many challenges for state-of-the-art automatic speech recognition (ASR) systems. In this work, we present a new technique AMPS that augments a multilingual multimodal ASR system with paraphrase-based supervision for improved conversational ASR in multiple languages, including Hindi, Marathi, Malayalam, Kannada, and Nyanja. We use paraphrases of the reference transcriptions as additional supervision while training the multimodal ASR model and selectively invoke this paraphrase objective for utterances with poor ASR performance. Using AMPS with a state-of-the-art multimodal model SeamlessM4T, we obtain significant relative reductions in word error rates (WERs) of up to 5%. We present detailed analyses of our system using both objective and human evaluation metrics.
Authors: Denys Rozumnyi, Nadine Bertsch, Othman Sbai, Filippo Arcadu, Yuhua Chen, Artsiom Sanakoyeu, Manoj Kumar, Catherine Herold, Robin Kips
Abstract: Tracking the full body motions of users in XR (AR/VR) devices is a fundamental challenge to bring a sense of authentic social presence. Due to the absence of dedicated leg sensors, currently available body tracking methods adopt a synthesis approach to generate plausible motions given a 3-point signal from the head and controller tracking. In order to enable mixed reality features, modern XR devices are capable of estimating depth information of the headset surroundings using available sensors combined with dedicated machine learning models. Such egocentric depth sensing cannot drive the body directly, as it is not registered and is incomplete due to limited field-of-view and body self-occlusions. For the first time, we propose to leverage the available depth sensing signal combined with self-supervision to learn a multi-modal pose estimation model capable of tracking full body motions in real time on XR devices. We demonstrate how current 3-point motion synthesis models can be extended to point cloud modalities using a semantic point cloud encoder network combined with a residual network for multi-modal pose estimation. These modules are trained jointly in a self-supervised way, leveraging a combination of real unregistered point clouds and simulated data obtained from motion capture. We compare our approach against several state-of-the-art systems for XR body tracking and show that our method accurately tracks a diverse range of body motions. XR-MBT tracks legs in XR for the first time, whereas traditional synthesis approaches based on partial body tracking are blind.
Authors: Jiahan Li, Tong Chen, Shitong Luo, Chaoran Cheng, Jiaqi Guan, Ruihan Guo, Sheng Wang, Ge Liu, Jian Peng, Jianzhu Ma
Abstract: Peptides, short chains of amino acids, interact with target proteins, making them a unique class of protein-based therapeutics for treating human diseases. Recently, deep generative models have shown great promise in peptide generation. However, several challenges remain in designing effective peptide binders. First, not all residues contribute equally to peptide-target interactions. Second, the generated peptides must adopt valid geometries due to the constraints of peptide bonds. Third, realistic tasks for peptide drug development are still lacking. To address these challenges, we introduce PepHAR, a hot-spot-driven autoregressive generative model for designing peptides targeting specific proteins. Building on the observation that certain hot spot residues have higher interaction potentials, we first use an energy-based density model to fit and sample these key residues. Next, to ensure proper peptide geometry, we autoregressively extend peptide fragments by estimating dihedral angles between residue frames. Finally, we apply an optimization process to iteratively refine fragment assembly, ensuring correct peptide structures. By combining hot spot sampling with fragment-based extension, our approach enables de novo peptide design tailored to a target protein and allows the incorporation of key hot spot residues into peptide scaffolds. Extensive experiments, including peptide design and peptide scaffold generation, demonstrate the strong potential of PepHAR in computational peptide binder design.
Authors: Javier Huertas-Tato, Adri\'an Gir\'on-Jim\'enez, Alejandro Mart\'in, David Camacho
Abstract: Authorship has entangled style and content inside. Authors frequently write about the same topics in the same style, so when different authors write about the exact same topic the easiest way out to distinguish them is by understanding the nuances of their style. Modern neural models for authorship can pick up these features using contrastive learning, however, some amount of content leakage is always present. Our aim is to reduce the inevitable impact and correlation between content and authorship. We present a technique to use contrastive learning (InfoNCE) with additional hard negatives synthetically created using a semantic similarity model. This disentanglement technique aims to distance the content embedding space from the style embedding space, leading to embeddings more informed by style. We demonstrate the performance with ablations on two different datasets and compare them on out-of-domain challenges. Improvements are clearly shown on challenging evaluations on prolific authors with up to a 10% increase in accuracy when the settings are particularly hard. Trials on challenges also demonstrate the preservation of zero-shot capabilities of this method as fine tuning.
Authors: Xuandong Zhao, Sam Gunn, Miranda Christ, Jaiden Fairoze, Andres Fabrega, Nicholas Carlini, Sanjam Garg, Sanghyun Hong, Milad Nasr, Florian Tramer, Somesh Jha, Lei Li, Yu-Xiang Wang, Dawn Song
Abstract: As the outputs of generative AI (GenAI) techniques improve in quality, it becomes increasingly challenging to distinguish them from human-created content. Watermarking schemes are a promising approach to address the problem of distinguishing between AI and human-generated content. These schemes embed hidden signals within AI-generated content to enable reliable detection. While watermarking is not a silver bullet for addressing all risks associated with GenAI, it can play a crucial role in enhancing AI safety and trustworthiness by combating misinformation and deception. This paper presents a comprehensive overview of watermarking techniques for GenAI, beginning with the need for watermarking from historical and regulatory perspectives. We formalize the definitions and desired properties of watermarking schemes and examine the key objectives and threat models for existing approaches. Practical evaluation strategies are also explored, providing insights into the development of robust watermarking techniques capable of resisting various attacks. Additionally, we review recent representative works, highlight open challenges, and discuss potential directions for this emerging field. By offering a thorough understanding of watermarking in GenAI, this work aims to guide researchers in advancing watermarking methods and applications, and support policymakers in addressing the broader implications of GenAI.
Authors: David Perera, Fran\c{c}ois Derrida, Th\'eo Mariotte, Ga\"el Richard, Slim Essid
Abstract: Training speech separation models in the supervised setting raises a permutation problem: finding the best assignation between the model predictions and the ground truth separated signals. This inherently ambiguous task is customarily solved using Permutation Invariant Training (PIT). In this article, we instead consider using the Multiple Choice Learning (MCL) framework, which was originally introduced to tackle ambiguous tasks. We demonstrate experimentally on the popular WSJ0-mix and LibriMix benchmarks that MCL matches the performances of PIT, while being computationally advantageous. This opens the door to a promising research direction, as MCL can be naturally extended to handle a variable number of speakers, or to tackle speech separation in the unsupervised setting.
Authors: Samson Koelle, Marina Meila
Abstract: Isometry pursuit is a convex algorithm for identifying orthonormal column-submatrices of wide matrices. It consists of a novel normalization method followed by multitask basis pursuit. Applied to Jacobians of putative coordinate functions, it helps identity isometric embeddings from within interpretable dictionaries. We provide theoretical and experimental results justifying this method. For problems involving coordinate selection and diversification, it offers a synergistic alternative to greedy and brute force search.
Authors: Siddhant Gupta, Fred Lu, Andrew Barlow, Edward Raff, Francis Ferraro, Cynthia Matuszek, Charles Nicholas, James Holt
Abstract: A strategy used by malicious actors is to "live off the land," where benign systems and tools already available on a victim's systems are used and repurposed for the malicious actor's intent. In this work, we ask if there is a way for anti-virus developers to similarly re-purpose existing work to improve their malware detection capability. We show that this is plausible via YARA rules, which use human-written signatures to detect specific malware families, functionalities, or other markers of interest. By extracting sub-signatures from publicly available YARA rules, we assembled a set of features that can more effectively discriminate malicious samples from benign ones. Our experiments demonstrate that these features add value beyond traditional features on the EMBER 2018 dataset. Manual analysis of the added sub-signatures shows a power-law behavior in a combination of features that are specific and unique, as well as features that occur often. A prior expectation may be that the features would be limited in being overly specific to unique malware families. This behavior is observed, and is apparently useful in practice. In addition, we also find sub-signatures that are dual-purpose (e.g., detecting virtual machine environments) or broadly generic (e.g., DLL imports).
Authors: Patrick Mineault, Niccol\`o Zanichelli, Joanne Zichen Peng, Anton Arkhipov, Eli Bingham, Julian Jara-Ettinger, Emily Mackevicius, Adam Marblestone, Marcelo Mattar, Andrew Payne, Sophia Sanborn, Karen Schroeder, Zenna Tavares, Andreas Tolias
Abstract: As AI systems become increasingly powerful, the need for safe AI has become more pressing. Humans are an attractive model for AI safety: as the only known agents capable of general intelligence, they perform robustly even under conditions that deviate significantly from prior experiences, explore the world safely, understand pragmatics, and can cooperate to meet their intrinsic goals. Intelligence, when coupled with cooperation and safety mechanisms, can drive sustained progress and well-being. These properties are a function of the architecture of the brain and the learning algorithms it implements. Neuroscience may thus hold important keys to technical AI safety that are currently underexplored and underutilized. In this roadmap, we highlight and critically evaluate several paths toward AI safety inspired by neuroscience: emulating the brain's representations, information processing, and architecture; building robust sensory and motor systems from imitating brain data and bodies; fine-tuning AI systems on brain data; advancing interpretability using neuroscience methods; and scaling up cognitively-inspired architectures. We make several concrete recommendations for how neuroscience can positively impact AI safety.
Authors: Zhixuan Liang, Yao Mu, Yixiao Wang, Fei Ni, Tianxing Chen, Wenqi Shao, Wei Zhan, Masayoshi Tomizuka, Ping Luo, Mingyu Ding
Abstract: Dexterous manipulation with contact-rich interactions is crucial for advanced robotics. While recent diffusion-based planning approaches show promise for simpler manipulation tasks, they often produce unrealistic ghost states (e.g., the object automatically moves without hand contact) or lack adaptability when handling complex sequential interactions. In this work, we introduce DexDiffuser, an interaction-aware diffusion planning framework for adaptive dexterous manipulation. DexDiffuser models joint state-action dynamics through a dual-phase diffusion process which consists of pre-interaction contact alignment and post-contact goal-directed control, enabling goal-adaptive generalizable dexterous manipulation. Additionally, we incorporate dynamics model-based dual guidance and leverage large language models for automated guidance function generation, enhancing generalizability for physical interactions and facilitating diverse goal adaptation through language cues. Experiments on physical interaction tasks such as door opening, pen and block re-orientation, and hammer striking demonstrate DexDiffuser's effectiveness on goals outside training distributions, achieving over twice the average success rate (59.2% vs. 29.5%) compared to existing methods. Our framework achieves 70.0% success on 30-degree door opening, 40.0% and 36.7% on pen and block half-side re-orientation respectively, and 46.7% on hammer nail half drive, highlighting its robustness and flexibility in contact-rich manipulation.
Authors: Omkar Khade, Shruti Jagdale, Abhishek Phaltankar, Gauri Takalikar, Raviraj Joshi
Abstract: Large Language Models (LLMs) have demonstrated remarkable multilingual capabilities, yet challenges persist in adapting these models for low-resource languages. In this study, we investigate the effects of Low-Rank Adaptation (LoRA) Parameter-Efficient Fine-Tuning (PEFT) on multilingual Gemma models for Marathi, a language with limited resources. Using a translated Alpaca dataset with 52,000 instruction-response pairs, our findings reveal that while evaluation metrics often show a performance decline post-fine-tuning, manual assessments frequently suggest that the fine-tuned models outperform their original counterparts. The observations indicate improvements in target language generation capabilities but a reduction in reasoning abilities following language adaptation. These results underscore the need for improved evaluation methodologies and the creation of high-quality native datasets to accurately assess language-specific model performance in low-resource settings.
Authors: Pedro Delicado, Cristian Pach\'on-Garc\'ia
Abstract: The presence of Artificial Intelligence (AI) in our society is increasing, which brings with it the need to understand the behaviour of AI mechanisms, including machine learning predictive algorithms fed with tabular data, text, or images, among other types of data. This work focuses on interpretability of predictive models based on functional data. Designing interpretability methods for functional data models implies working with a set of features whose size is infinite. In the context of scalar on function regression, we propose an interpretability method based on the Shapley value for continuous games, a mathematical formulation that allows to fairly distribute a global payoff among a continuous set players. The method is illustrated through a set of experiments with simulated and real data sets. The open source Python package ShapleyFDA is also presented.
Authors: Shruti Jagdale, Omkar Khade, Gauri Takalikar, Mihir Inamdar, Raviraj Joshi
Abstract: Code-mixing is the practice of using two or more languages in a single sentence, which often occurs in multilingual communities such as India where people commonly speak multiple languages. Classic NLP tools, trained on monolingual data, face challenges when dealing with code-mixed data. Extracting meaningful information from sentences containing multiple languages becomes difficult, particularly in tasks like hate speech detection, due to linguistic variation, cultural nuances, and data sparsity. To address this, we aim to analyze the significance of code-mixed embeddings and evaluate the performance of BERT and HingBERT models (trained on a Hindi-English corpus) in hate speech detection. Our study demonstrates that HingBERT models, benefiting from training on the extensive Hindi-English dataset L3Cube-HingCorpus, outperform BERT models when tested on hate speech text datasets. We also found that code-mixed Hing-FastText performs better than standard English FastText and vanilla BERT models.
Authors: Kieran A. Murphy, Yujing Zhang, Dani S. Bassett
Abstract: Multivariate information theory provides a general and principled framework for understanding how the components of a complex system are connected. Existing analyses are coarse in nature -- built up from characterizations of discrete subsystems -- and can be computationally prohibitive. In this work, we propose to study the continuous space of possible descriptions of a composite system as a window into its organizational structure. A description consists of specific information conveyed about each of the components, and the space of possible descriptions is equivalent to the space of lossy compression schemes of the components. We introduce a machine learning framework to optimize descriptions that extremize key information theoretic quantities used to characterize organization, such as total correlation and O-information. Through case studies on spin systems, Sudoku boards, and letter sequences from natural language, we identify extremal descriptions that reveal how system-wide variation emerges from individual components. By integrating machine learning into a fine-grained information theoretic analysis of composite random variables, our framework opens a new avenues for probing the structure of real-world complex systems.
Authors: Nurshat Fateh Ali, Md. Mahdi Mohtasim, Shakil Mosharrof, T. Gopi Krishna
Abstract: This research presents and compares multiple approaches to automate the generation of literature reviews using several Natural Language Processing (NLP) techniques and retrieval-augmented generation (RAG) with a Large Language Model (LLM). The ever-increasing number of research articles provides a huge challenge for manual literature review. It has resulted in an increased demand for automation. Developing a system capable of automatically generating the literature reviews from only the PDF files as input is the primary objective of this research work. The effectiveness of several Natural Language Processing (NLP) strategies, such as the frequency-based method (spaCy), the transformer model (Simple T5), and retrieval-augmented generation (RAG) with Large Language Model (GPT-3.5-turbo), is evaluated to meet the primary objective. The SciTLDR dataset is chosen for this research experiment and three distinct techniques are utilized to implement three different systems for auto-generating the literature reviews. The ROUGE scores are used for the evaluation of all three systems. Based on the evaluation, the Large Language Model GPT-3.5-turbo achieved the highest ROUGE-1 score, 0.364. The transformer model comes in second place and spaCy is at the last position. Finally, a graphical user interface is created for the best system based on the large language model.
Authors: Shengqu Cai, Eric Chan, Yunzhi Zhang, Leonidas Guibas, Jiajun Wu, Gordon Wetzstein
Abstract: Text-to-image diffusion models produce impressive results but are frustrating tools for artists who desire fine-grained control. For example, a common use case is to create images of a specific instance in novel contexts, i.e., "identity-preserving generation". This setting, along with many other tasks (e.g., relighting), is a natural fit for image+text-conditional generative models. However, there is insufficient high-quality paired data to train such a model directly. We propose Diffusion Self-Distillation, a method for using a pre-trained text-to-image model to generate its own dataset for text-conditioned image-to-image tasks. We first leverage a text-to-image diffusion model's in-context generation ability to create grids of images and curate a large paired dataset with the help of a Visual-Language Model. We then fine-tune the text-to-image model into a text+image-to-image model using the curated paired dataset. We demonstrate that Diffusion Self-Distillation outperforms existing zero-shot methods and is competitive with per-instance tuning techniques on a wide range of identity-preservation generation tasks, without requiring test-time optimization.
Authors: Aoran Shen, Minghao Dai, Jiacheng Hu, Yingbin Liang, Shiru Wang, Junliang Du
Abstract: In the 21st-century information age, with the development of big data technology, effectively extracting valuable information from massive data has become a key issue. Traditional data mining methods are inadequate when faced with large-scale, high-dimensional and complex data. Especially when labeled data is scarce, their performance is greatly limited. This study optimizes data mining algorithms by introducing semi-supervised learning methods, aiming to improve the algorithm's ability to utilize unlabeled data, thereby achieving more accurate data analysis and pattern recognition under limited labeled data conditions. Specifically, we adopt a self-training method and combine it with a convolutional neural network (CNN) for image feature extraction and classification, and continuously improve the model prediction performance through an iterative process. The experimental results demonstrate that the proposed method significantly outperforms traditional machine learning techniques such as Support Vector Machine (SVM), XGBoost, and Multi-Layer Perceptron (MLP) on the CIFAR-10 image classification dataset. Notable improvements were observed in key performance metrics, including accuracy, recall, and F1 score. Furthermore, the robustness and noise-resistance capabilities of the semi-supervised CNN model were validated through experiments under varying noise levels, confirming its practical applicability in real-world scenarios.
Authors: Luciano Vinas, Arash A. Amini
Abstract: We revisit recent spectral GNN approaches to semi-supervised node classification (SSNC). We posit that state-of-the-art (SOTA) GNN architectures may be over-engineered for common SSNC benchmark datasets (citation networks, page-page networks, etc.). By replacing feature aggregation with a non-parametric learner we are able to streamline the GNN design process and avoid many of the engineering complexities associated with SOTA hyperparameter selection (GNN depth, non-linearity choice, feature dropout probability, etc.). Our empirical experiments suggest conventional methods such as non-parametric regression are well suited for semi-supervised learning on sparse, directed networks and a variety of other graph types commonly found in SSNC benchmarks. Additionally, we bring attention to recent changes in evaluation conventions for SSNC benchmarking and how this may have partially contributed to rising performances over time.
Authors: Zeyu Zhang, Lu Li, Xingyu Ji, Kaiqi Zhao, Xiaofeng Zhu, Philip S. Yu, Jiawei Li, Maojun Wang
Abstract: Signed graphs are powerful models for representing complex relations with both positive and negative connections. Recently, Signed Graph Neural Networks (SGNNs) have emerged as potent tools for analyzing such graphs. To our knowledge, no prior research has been conducted on devising a training plan specifically for SGNNs. The prevailing training approach feeds samples (edges) to models in a random order, resulting in equal contributions from each sample during the training process, but fails to account for varying learning difficulties based on the graph's structure. We contend that SGNNs can benefit from a curriculum that progresses from easy to difficult, similar to human learning. The main challenge is evaluating the difficulty of edges in a signed graph. We address this by theoretically analyzing the difficulty of SGNNs in learning adequate representations for edges in unbalanced cycles and propose a lightweight difficulty measurer. This forms the basis for our innovative Curriculum representation learning framework for Signed Graphs, referred to as CSG. The process involves using the measurer to assign difficulty scores to training samples, adjusting their order using a scheduler and training the SGNN model accordingly. We empirically our approach on six real-world signed graph datasets. Our method demonstrates remarkable results, enhancing the accuracy of popular SGNN models by up to 23.7% and showing a reduction of 8.4% in standard deviation, enhancing model stability.
Authors: Rishav Mukherji, Mark Sch\"one, Khaleelulla Khan Nazeer, Christian Mayr, Anand Subramoney
Abstract: Artificial neural networks open up unprecedented machine learning capabilities at the cost of ever growing computational requirements. Sparsifying the parameters, often achieved through weight pruning, has been identified as a powerful technique to compress the number of model parameters and reduce the computational operations of neural networks. Yet, sparse activations, while omnipresent in both biological neural networks and deep learning systems, have not been fully utilized as a compression technique in deep learning. Moreover, the interaction between sparse activations and weight pruning is not fully understood. In this work, we demonstrate that activity sparsity can compose multiplicatively with parameter sparsity in a recurrent neural network model based on the GRU that is designed to be activity sparse. We achieve up to $20\times$ reduction of computation while maintaining perplexities below $60$ on the Penn Treebank language modeling task. This magnitude of reduction has not been achieved previously with solely sparsely connected LSTMs, and the language modeling performance of our model has not been achieved previously with any sparsely activated recurrent neural networks or spiking neural networks. Neuromorphic computing devices are especially good at taking advantage of the dynamic activity sparsity, and our results provide strong evidence that making deep learning models activity sparse and porting them to neuromorphic devices can be a viable strategy that does not compromise on task performance. Our results also drive further convergence of methods from deep learning and neuromorphic computing for efficient machine learning.
Authors: Naoki Saito, Stefan C. Schonsheck, Eugene Shvarts
Abstract: We propose new scattering networks for signals measured on simplicial complexes, which we call \emph{Multiscale Hodge Scattering Networks} (MHSNs). Our construction is based on multiscale basis dictionaries on simplicial complexes, i.e., the $\kappa$-GHWT and $\kappa$-HGLET, which we recently developed for simplices of dimension $\kappa \in \mathbb{N}$ in a given simplicial complex by generalizing the node-based Generalized Haar-Walsh Transform (GHWT) and Hierarchical Graph Laplacian Eigen Transform (HGLET). The $\kappa$-GHWT and the $\kappa$-HGLET both form redundant sets (i.e., dictionaries) of multiscale basis vectors and the corresponding expansion coefficients of a given signal. Our MHSNs use a layered structure analogous to a convolutional neural network (CNN) to cascade the moments of the modulus of the dictionary coefficients. The resulting features are invariant to reordering of the simplices (i.e., node permutation of the underlying graphs). Importantly, the use of multiscale basis dictionaries in our MHSNs admits a natural pooling operation that is akin to local pooling in CNNs, and which may be performed either locally or per-scale. These pooling operations are harder to define in both traditional scattering networks based on Morlet wavelets, and geometric scattering networks based on Diffusion Wavelets. As a result, we are able to extract a rich set of descriptive yet robust features that can be used along with very simple machine learning methods (i.e., logistic regression or support vector machines) to achieve high-accuracy classification systems with far fewer parameters to train than most modern graph neural networks. Finally, we demonstrate the usefulness of our MHSNs in three distinct types of problems: signal classification, domain (i.e., graph/simplex) classification, and molecular dynamics prediction.
Authors: Florent Forest, Olga Fink
Abstract: Intelligent Fault Diagnosis (IFD) based on deep learning has proven to be an effective and flexible solution, attracting extensive research. Deep neural networks can learn rich representations from vast amounts of representative labeled data for various applications. In IFD, they achieve high classification performance from signals in an end-to-end manner, without requiring extensive domain knowledge. However, deep learning models usually only perform well on the data distribution they have been trained on. When applied to a different distribution, they may experience performance drops. This is also observed in IFD, where assets are often operated in working conditions different from those in which labeled data have been collected. Unsupervised domain adaptation (UDA) deals with the scenario where labeled data are available in a source domain, and only unlabeled data are available in a target domain, where domains may correspond to operating conditions. Recent methods rely on training with confident pseudo-labels for target samples. However, the confidence-based selection of pseudo-labels is hindered by poorly calibrated confidence estimates in the target domain, primarily due to over-confident predictions, which limits the quality of pseudo-labels and leads to error accumulation. In this paper, we propose a novel UDA method called Calibrated Adaptive Teacher (CAT), where we propose to calibrate the predictions of the teacher network throughout the self-training process, leveraging post-hoc calibration techniques. We evaluate CAT on domain-adaptive IFD and perform extensive experiments on the Paderborn benchmark for bearing fault diagnosis under varying operating conditions. Our proposed method achieves state-of-the-art performance on most transfer tasks.
Authors: Mark Zhao, Emanuel Adamiak, Christos Kozyrakis
Abstract: The input data pipeline is an essential component of each machine learning (ML) training job. It is responsible for reading massive amounts of training data, processing batches of samples using complex transformations, and loading them onto training nodes at low latency and high throughput. Performant input data systems are becoming increasingly critical, driven by skyrocketing data volumes and training throughput demands. Unfortunately, current input data systems cannot fully leverage key performance optimizations, resulting in hugely inefficient infrastructures that require significant resources - or worse - underutilize expensive accelerators. To address these demands, we present cedar, an optimized and unified programming framework for ML input data pipelines. cedar allows users to define input data pipelines using composable operators that support arbitrary ML frameworks and libraries. cedar introduces an extensible optimizer that systematically applies a complex combination of optimizations (e.g., offloading, caching, prefetching, fusion, and reordering). It orchestrates processing across a customizable set of local and distributed compute resources in order to improve processing performance and efficiency, all without user input. Across eight pipelines, cedar improves performance by up to 1.87x to 10.65x compared to state-of-the-art input data systems.
Authors: Yifan Duan, Guibin Zhang, Shilong Wang, Xiaojiang Peng, Wang Ziqi, Junyuan Mao, Hao Wu, Xinke Jiang, Kun Wang
Abstract: Credit card fraud poses a significant threat to the economy. While Graph Neural Network (GNN)-based fraud detection methods perform well, they often overlook the causal effect of a node's local structure on predictions. This paper introduces a novel method for credit card fraud detection, the \textbf{\underline{Ca}}usal \textbf{\underline{T}}emporal \textbf{\underline{G}}raph \textbf{\underline{N}}eural \textbf{N}etwork (CaT-GNN), which leverages causal invariant learning to reveal inherent correlations within transaction data. By decomposing the problem into discovery and intervention phases, CaT-GNN identifies causal nodes within the transaction graph and applies a causal mixup strategy to enhance the model's robustness and interpretability. CaT-GNN consists of two key components: Causal-Inspector and Causal-Intervener. The Causal-Inspector utilizes attention weights in the temporal attention mechanism to identify causal and environment nodes without introducing additional parameters. Subsequently, the Causal-Intervener performs a causal mixup enhancement on environment nodes based on the set of nodes. Evaluated on three datasets, including a private financial dataset and two public datasets, CaT-GNN demonstrates superior performance over existing state-of-the-art methods. Our findings highlight the potential of integrating causal reasoning with graph neural networks to improve fraud detection capabilities in financial transactions.
Authors: Feras Saad, Jacob Burnim, Colin Carroll, Brian Patton, Urs K\"oster, Rif A. Saurous, Matthew Hoffman
Abstract: Spatiotemporal datasets, which consist of spatially-referenced time series, are ubiquitous in diverse applications, such as air pollution monitoring, disease tracking, and cloud-demand forecasting. As the scale of modern datasets increases, there is a growing need for statistical methods that are flexible enough to capture complex spatiotemporal dynamics and scalable enough to handle many observations. This article introduces the Bayesian Neural Field (BayesNF), a domain-general statistical model that infers rich spatiotemporal probability distributions for data-analysis tasks including forecasting, interpolation, and variography. BayesNF integrates a deep neural network architecture for high-capacity function estimation with hierarchical Bayesian inference for robust predictive uncertainty quantification. Evaluations against prominent baselines show that BayesNF delivers improvements on prediction problems from climate and public health data containing tens to hundreds of thousands of measurements. Accompanying the paper is an open-source software package (https://github.com/google/bayesnf) that runs on GPU and TPU accelerators through the JAX machine learning platform.
Authors: Yaozhong Shi, Angela F. Gao, Zachary E. Ross, Kamyar Azizzadenesheli
Abstract: Regression on function spaces is typically limited to models with Gaussian process priors. We introduce the notion of universal functional regression, in which we aim to learn a prior distribution over non-Gaussian function spaces that remains mathematically tractable for functional regression. To do this, we develop Neural Operator Flows (OpFlow), an infinite-dimensional extension of normalizing flows. OpFlow is an invertible operator that maps the (potentially unknown) data function space into a Gaussian process, allowing for exact likelihood estimation of functional point evaluations. OpFlow enables robust and accurate uncertainty quantification via drawing posterior samples of the Gaussian process and subsequently mapping them into the data function space. We empirically study the performance of OpFlow on regression and generation tasks with data generated from Gaussian processes with known posterior forms and non-Gaussian processes, as well as real-world earthquake seismograms with an unknown closed-form distribution.
Authors: Kaveen Hiniduma, Suren Byna, Jean Luca Bez
Abstract: Artificial Intelligence (AI) applications critically depend on data. Poor quality data produces inaccurate and ineffective AI models that may lead to incorrect or unsafe use. Evaluation of data readiness is a crucial step in improving the quality and appropriateness of data usage for AI. R&D efforts have been spent on improving data quality. However, standardized metrics for evaluating data readiness for use in AI training are still evolving. In this study, we perform a comprehensive survey of metrics used to verify data readiness for AI training. This survey examines more than 140 papers published by ACM Digital Library, IEEE Xplore, journals such as Nature, Springer, and Science Direct, and online articles published by prominent AI experts. This survey aims to propose a taxonomy of data readiness for AI (DRAI) metrics for structured and unstructured datasets. We anticipate that this taxonomy will lead to new standards for DRAI metrics that will be used for enhancing the quality, accuracy, and fairness of AI training and inference.
Authors: Shibo Li, Hengliang Cheng, Weihua Li
Abstract: The widespread application of Electronic Health Records (EHR) data in the medical field has led to early successes in disease risk prediction using deep learning methods. These methods typically require extensive data for training due to their large parameter sets. However, existing works do not exploit the full potential of EHR data. A significant challenge arises from the infrequent occurrence of many medical codes within EHR data, limiting their clinical applicability. Current research often lacks in critical areas: 1) incorporating disease domain knowledge; 2) heterogeneously learning disease representations with rich meanings; 3) capturing the temporal dynamics of disease progression. To overcome these limitations, we introduce a novel heterogeneous graph learning model designed to assimilate disease domain knowledge and elucidate the intricate relationships between drugs and diseases. This model innovatively incorporates temporal data into visit-level embeddings and leverages a time-aware transformer alongside an adaptive attention mechanism to produce patient representations. When evaluated on two healthcare datasets, our approach demonstrated notable enhancements in both prediction accuracy and interpretability over existing methodologies, signifying a substantial advancement towards personalized and proactive healthcare management.
Authors: Hengyue Liang, Le Peng, Ju Sun
Abstract: In selective classification (SC), a classifier abstains from making predictions that are likely to be wrong to avoid excessive errors. To deploy imperfect classifiers -- either due to intrinsic statistical noise of data or for robustness issue of the classifier or beyond -- in high-stakes scenarios, SC appears to be an attractive and necessary path to follow. Despite decades of research in SC, most previous SC methods still focus on the ideal statistical setting only, i.e., the data distribution at deployment is the same as that of training, although practical data can come from the wild. To bridge this gap, in this paper, we propose an SC framework that takes into account distribution shifts, termed generalized selective classification, that covers label-shifted (or out-of-distribution) and covariate-shifted samples, in addition to typical in-distribution samples, the first of its kind in the SC literature. We focus on non-training-based confidence-score functions for generalized SC on deep learning (DL) classifiers, and propose two novel margin-based score functions. Through extensive analysis and experiments, we show that our proposed score functions are more effective and reliable than the existing ones for generalized SC on a variety of classification tasks and DL classifiers. Code is available at https://github.com/sun-umn/sc_with_distshift.
Authors: Vinod Kumar Chauhan, Lei Clifton, Achille Sala\"un, Huiqi Yvonne Lu, Kim Branson, Patrick Schwab, Gaurav Nigam, David A. Clifton
Abstract: While machine learning algorithms hold promise for personalised medicine, their clinical adoption remains limited, partly due to biases that can compromise the reliability of predictions. In this paper, we focus on sample selection bias (SSB), a specific type of bias where the study population is less representative of the target population, leading to biased and potentially harmful decisions. Despite being well-known in the literature, SSB remains scarcely studied in machine learning for healthcare. Moreover, the existing machine learning techniques try to correct the bias mostly by balancing distributions between the study and the target populations, which may result in a loss of predictive performance. To address these problems, our study illustrates the potential risks associated with SSB by examining SSB's impact on the performance of machine learning algorithms. Most importantly, we propose a new research direction for addressing SSB, based on the target population identification rather than the bias correction. Specifically, we propose two independent networks(T-Net) and a multitasking network (MT-Net) for addressing SSB, where one network/task identifies the target subpopulation which is representative of the study population and the second makes predictions for the identified subpopulation. Our empirical results with synthetic and semi-synthetic datasets highlight that SSB can lead to a large drop in the performance of an algorithm for the target population as compared with the study population, as well as a substantial difference in the performance for the target subpopulations that are representative of the selected and the non-selected patients from the study population. Furthermore, our proposed techniques demonstrate robustness across various settings, including different dataset sizes, event rates, and selection rates, outperforming the existing bias correction techniques.
Authors: Adiba Orzikulova, Jaehyun Kwak, Jaemin Shin, Sung-Ju Lee
Abstract: Many healthcare sensing applications utilize multimodal time-series data from sensors embedded in mobile and wearable devices. Federated Learning (FL), with its privacy-preserving advantages, is particularly well-suited for health applications. However, most multimodal FL methods assume the availability of complete modality data for local training, which is often unrealistic. Moreover, recent approaches tackling incomplete modalities scale poorly and become inefficient as the number of modalities increases. To address these limitations, we propose FLISM, an efficient FL training algorithm with incomplete sensing modalities while maintaining high accuracy. FLISM employs three key techniques: (1) modality-invariant representation learning to extract effective features from clients with a diverse set of modalities, (2) modality quality-aware aggregation to prioritize contributions from clients with higher-quality modality data, and (3) global-aligned knowledge distillation to reduce local update shifts caused by modality differences. Extensive experiments on real-world datasets show that FLISM not only achieves high accuracy but is also faster and more efficient compared with state-of-the-art methods handling incomplete modality problems in FL. We release the code as open-source at https://github.com/AdibaOrz/FLISM.
Authors: Kai Huang, Haoming Wang, Wei Gao
Abstract: Text-to-image diffusion models can be fine-tuned in custom domains to adapt to specific user preferences, but such adaptability has also been utilized for illegal purposes, such as forging public figures' portraits, duplicating copyrighted artworks and generating explicit contents. Existing work focused on detecting the illegally generated contents, but cannot prevent or mitigate illegal adaptations of diffusion models. Other schemes of model unlearning and reinitialization, similarly, cannot prevent users from relearning the knowledge of illegal model adaptation with custom data. In this paper, we present FreezeAsGuard, a new technique that addresses these limitations and enables irreversible mitigation of illegal adaptations of diffusion models. Our approach is that the model publisher selectively freezes tensors in pre-trained diffusion models that are critical to illegal model adaptations, to mitigate the fine-tuned model's representation power in illegal adaptations, but minimize the impact on other legal adaptations. Experiment results in multiple text-to-image application domains show that FreezeAsGuard provides 37% stronger power in mitigating illegal model adaptations compared to competitive baselines, while incurring less than 5% impact on legal model adaptations. The source code is available at: https://github.com/pittisl/FreezeAsGuard.
Authors: Tian Wang, Chuang Wang
Abstract: Neural operators effectively solve PDE problems from data without knowing the explicit equations, which learn the map from the input sequences of observed samples to the predicted values. Most existing works build the model in the original geometric space, leading to high computational costs when the number of sample points is large. We present the Latent Neural Operator (LNO) solving PDEs in the latent space. In particular, we first propose Physics-Cross-Attention (PhCA) transforming representation from the geometric space to the latent space, then learn the operator in the latent space, and finally recover the real-world geometric space via the inverse PhCA map. Our model retains flexibility that can decode values in any position not limited to locations defined in the training set, and therefore can naturally perform interpolation and extrapolation tasks particularly useful for inverse problems. Moreover, the proposed LNO improves both prediction accuracy and computational efficiency. Experiments show that LNO reduces the GPU memory by 50%, speeds up training 1.8 times, and reaches state-of-the-art accuracy on four out of six benchmarks for forward problems and a benchmark for inverse problem. Code is available at https://github.com/L-I-M-I-T/LatentNeuralOperator.
Authors: Dong Liu, Kaiser Pister
Abstract: Currently, there are many quantization methods appeared for LLM quantization, yet few are user-friendly and easy to be deployed locally. Packages like TensorRT and Quantohave many underlying structures and self-invoking internal functions, which are not conducive to developers' personalized development and learning for deployment. Therefore, we develop LLMEasyQuant, it is a package aiming to for easy quantization deployment which is user-friendly and suitable for beginners' learning.
Authors: Huifa Li, Jie Fu, Xinpeng Ling, Zhiyu Sun, Kuncan Wang, Zhili Chen
Abstract: The swift advancement of single-cell RNA sequencing (scRNA-seq) technologies enables the investigation of cellular-level tissue heterogeneity. Cell annotation significantly contributes to the extensive downstream analysis of scRNA-seq data. However, The analysis of scRNA-seq for biological inference presents challenges owing to its intricate and indeterminate data distribution, characterized by a substantial volume and a high frequency of dropout events. Furthermore, the quality of training samples varies greatly, and the performance of the popular scRNA-seq data clustering solution GNN could be harmed by two types of low-quality training nodes: 1) nodes on the boundary; 2) nodes that contribute little additional information to the graph. To address these problems, we propose a single-cell curriculum learning-based deep graph embedding clustering (scCLG). We first propose a Chebyshev graph convolutional autoencoder with multi-criteria (ChebAE) that combines three optimization objectives, including topology reconstruction loss of cell graphs, zero-inflated negative binomial (ZINB) loss, and clustering loss, to learn cell-cell topology representation. Meanwhile, we employ a selective training strategy to train GNN based on the features and entropy of nodes and prune the difficult nodes based on the difficulty scores to keep the high-quality graph. Empirical results on a variety of gene expression datasets show that our model outperforms state-of-the-art methods. The code of scCLG will be made publicly available at https://github.com/LFD-byte/scCLG.
Authors: Wall Kim
Abstract: Sequence modeling with State Space models (SSMs) has demonstrated performance surpassing that of Transformers in various tasks, raising expectations for their potential to outperform the Decision Transformer and its enhanced variants in offline reinforcement learning (RL). However, decision models based on Mamba, a state-of-the-art SSM, failed to achieve superior performance compared to these enhanced Decision Transformers. We hypothesize that this limitation arises from information loss during the selective scanning phase. To address this, we propose the Decision MetaMamba (DMM), which augments Mamba with a token mixer in its input layer. This mixer explicitly accounts for the multimodal nature of offline RL inputs, comprising state, action, and return-to-go. The DMM demonstrates improved performance while significantly reducing parameter count compared to prior models. Notably, similar performance gains were achieved using a simple linear token mixer, emphasizing the importance of preserving information from proximate time steps rather than the specific design of the token mixer itself. This novel modification to Mamba's input layer represents a departure from conventional timestamp-based encoding approaches used in Transformers. By enhancing performance of Mamba in offline RL, characterized by memory efficiency and fast inference, this work opens new avenues for its broader application in future RL research.
Authors: Boyu Chen, Junjie Liu, Zhu Li, Mengyue Yang
Abstract: Probability of necessity and sufficiency (PNS) measures the likelihood of a feature set being both necessary and sufficient for predicting an outcome. It has proven effective in guiding representation learning for unimodal data, enhancing both predictive performance and model robustness. Despite these benefits, extending PNS to multimodal settings remains unexplored. This extension presents unique challenges, as the conditions for PNS estimation, exogeneity and monotonicity, need to be reconsidered in a multimodal context. We address these challenges by first conceptualizing multimodal representations as comprising modality-invariant and modality-specific components. We then analyze how to compute PNS for each component while ensuring non-trivial PNS estimation. Based on these analyses, we formulate tractable optimization objectives that enable multimodal models to learn high-PNS representations. Experiments demonstrate the effectiveness of our method on both synthetic and real-world data.
Authors: Rayhan Zirvi, Bahareh Tolooshams, Anima Anandkumar
Abstract: Recent advancements in diffusion models have been effective in learning data priors for solving inverse problems. They leverage diffusion sampling steps for inducing a data prior while using a measurement guidance gradient at each step to impose data consistency. For general inverse problems, approximations are needed when an unconditionally trained diffusion model is used since the measurement likelihood is intractable, leading to inaccurate posterior sampling. In other words, due to their approximations, these methods fail to preserve the generation process on the data manifold defined by the diffusion prior, leading to artifacts in applications such as image restoration. To enhance the performance and robustness of diffusion models in solving inverse problems, we propose Diffusion State-Guided Projected Gradient (DiffStateGrad), which projects the measurement gradient onto a subspace that is a low-rank approximation of an intermediate state of the diffusion process. DiffStateGrad, as a module, can be added to a wide range of diffusion-based inverse solvers to improve the preservation of the diffusion process on the prior manifold and filter out artifact-inducing components. We highlight that DiffStateGrad improves the robustness of diffusion models in terms of the choice of measurement guidance step size and noise while improving the worst-case performance. Finally, we demonstrate that DiffStateGrad improves upon the state-of-the-art on linear and nonlinear image restoration inverse problems.
Authors: Rafael Pablos Sarabia, Joachim Nyborg, Morten Birk, Jeppe Liborius Sj{\o}rup, Anders Lillevang Vesterholt, Ira Assent
Abstract: Precipitation nowcasting is crucial across various industries and plays a significant role in mitigating and adapting to climate change. We introduce an efficient deep learning model for precipitation nowcasting, capable of predicting rainfall up to 8 hours in advance with greater accuracy than existing operational physics-based and extrapolation-based models. Our model leverages multi-source meteorological data and physics-based forecasts to deliver high-resolution predictions in both time and space. It captures complex spatio-temporal dynamics through temporal attention networks and is optimized using data quality maps and dynamic thresholds. Experiments demonstrate that our model outperforms state-of-the-art, and highlight its potential for fast reliable responses to evolving weather conditions.
Authors: Bo Tang, Elias B. Khalil, J\'an Drgo\v{n}a
Abstract: Mixed-integer non-linear programs (MINLPs) arise in various domains, such as energy systems and transportation, but are notoriously difficult to solve. Recent advances in machine learning have led to remarkable successes in optimization tasks, an area broadly known as learning to optimize. This approach includes using predictive models to generate solutions for optimization problems with continuous decision variables, thereby avoiding the need for computationally expensive optimization algorithms. However, applying learning to MINLPs remains challenging primarily due to the presence of integer decision variables, which complicate gradient-based learning. To address this limitation, we propose two differentiable correction layers that generate integer outputs while preserving gradient information. Combined with a soft penalty for constraint violation, our framework can tackle both the integrality and non-linear constraints in a MINLP. Experiments on three problem classes with convex/non-convex objective/constraints and integer/mixed-integer variables show that the proposed learning-based approach consistently produces high-quality solutions for parametric MINLPs extremely quickly. As problem size increases, traditional exact solvers and heuristic methods struggle to find feasible solutions, whereas our approach continues to deliver reliable results. Our work extends the scope of learning-to-optimize to MINLP, paving the way for integrating integer constraints into deep learning models. Our code is available at https://github.com/pnnl/L2O-pMINLP.
Authors: Alan T. L. Bacellar, Zachary Susskind, Mauricio Breternitz Jr., Eugene John, Lizy K. John, Priscila M. V. Lima, Felipe M. G. Fran\c{c}a
Abstract: We introduce the Differentiable Weightless Neural Network (DWN), a model based on interconnected lookup tables. Training of DWNs is enabled by a novel Extended Finite Difference technique for approximate differentiation of binary values. We propose Learnable Mapping, Learnable Reduction, and Spectral Regularization to further improve the accuracy and efficiency of these models. We evaluate DWNs in three edge computing contexts: (1) an FPGA-based hardware accelerator, where they demonstrate superior latency, throughput, energy efficiency, and model area compared to state-of-the-art solutions, (2) a low-power microcontroller, where they achieve preferable accuracy to XGBoost while subject to stringent memory constraints, and (3) ultra-low-cost chips, where they consistently outperform small models in both accuracy and projected hardware area. DWNs also compare favorably against leading approaches for tabular datasets, with higher average rank. Overall, our work positions DWNs as a pioneering solution for edge-compatible high-throughput neural networks.
Authors: Zhe Li, Xiangfei Qiu, Peng Chen, Yihang Wang, Hanyin Cheng, Yang Shu, Jilin Hu, Chenjuan Guo, Aoying Zhou, Qingsong Wen, Christian S. Jensen, Bin Yang
Abstract: Time Series Forecasting (TSF) is key functionality in numerous fields, including in finance, weather services, and energy management. While TSF methods are emerging these days, many of them require domain-specific data collection and model training and struggle with poor generalization performance on new domains. Foundation models aim to overcome this limitation. Pre-trained on large-scale language or time series data, they exhibit promising inferencing capabilities in new or unseen data. This has spurred a surge in new TSF foundation models. We propose a new benchmark, FoundTS, to enable thorough and fair evaluation and comparison of such models. FoundTS covers a variety of TSF foundation models, including those based on large language models and those pretrained on time series. Next, FoundTS supports different forecasting strategies, including zero-shot, few-shot, and full-shot, thereby facilitating more thorough evaluations. Finally, FoundTS offers a pipeline that standardizes evaluation processes such as dataset splitting, loading, normalization, and few-shot sampling, thereby facilitating fair evaluations. Building on this, we report on an extensive evaluation of TSF foundation models on a broad range of datasets from diverse domains and with different statistical characteristics. Specifically, we identify pros and cons and inherent limitations of existing foundation models, and we identify directions for future model design. We make our code and datasets available at https://anonymous.4open.science/r/FoundTS-C2B0.
Authors: Kiyam Lin, Benjamin Joachimi, Jason D. McEwen
Abstract: We demonstrate the successful use of scattering representations without further compression for simulation-based inference (SBI) with images (i.e. field-level), illustrated with a cosmological case study. Scattering representations provide a highly effective representational space for subsequent learning tasks, although the higher dimensional compressed space introduces challenges. We overcome these through spatial averaging, coupled with more expressive density estimators. Compared to alternative methods, such an approach does not require additional simulations for either training or computing derivatives, is interpretable, and resilient to covariate shift. As expected, we show that a scattering only approach extracts more information than traditional second order summary statistics.
Authors: Zeyu Jia, Jian Qian, Alexander Rakhlin, Chen-Yu Wei
Abstract: We consider realizable contextual bandits with general function approximation, investigating how small reward variance can lead to better-than-minimax regret bounds. Unlike in minimax bounds, we show that the eluder dimension $d_\text{elu}$$-$a complexity measure of the function class$-$plays a crucial role in variance-dependent bounds. We consider two types of adversary: (1) Weak adversary: The adversary sets the reward variance before observing the learner's action. In this setting, we prove that a regret of $\Omega(\sqrt{\min\{A,d_\text{elu}\}\Lambda}+d_\text{elu})$ is unavoidable when $d_{\text{elu}}\leq\sqrt{AT}$, where $A$ is the number of actions, $T$ is the total number of rounds, and $\Lambda$ is the total variance over $T$ rounds. For the $A\leq d_\text{elu}$ regime, we derive a nearly matching upper bound $\tilde{O}(\sqrt{A\Lambda}+d_\text{elu})$ for the special case where the variance is revealed at the beginning of each round. (2) Strong adversary: The adversary sets the reward variance after observing the learner's action. We show that a regret of $\Omega(\sqrt{d_\text{elu}\Lambda}+d_\text{elu})$ is unavoidable when $\sqrt{d_\text{elu}\Lambda}+d_\text{elu}\leq\sqrt{AT}$. In this setting, we provide an upper bound of order $\tilde{O}(d_\text{elu}\sqrt{\Lambda}+d_\text{elu})$. Furthermore, we examine the setting where the function class additionally provides distributional information of the reward, as studied by Wang et al. (2024). We demonstrate that the regret bound $\tilde{O}(\sqrt{d_\text{elu}\Lambda}+d_\text{elu})$ established in their work is unimprovable when $\sqrt{d_{\text{elu}}\Lambda}+d_\text{elu}\leq\sqrt{AT}$. However, with a slightly different definition of the total variance and with the assumption that the reward follows a Gaussian distribution, one can achieve a regret of $\tilde{O}(\sqrt{A\Lambda}+d_\text{elu})$.
Authors: Jiaxuan Gao, Shusheng Xu, Wenjie Ye, Weilin Liu, Chuyi He, Wei Fu, Zhiyu Mei, Guangju Wang, Yi Wu
Abstract: Reward models have been increasingly critical for improving the reasoning capability of LLMs. Existing research has shown that a well-trained reward model can substantially improve model performances at inference time via search. However, the potential of reward models during RL training time still remains largely under-explored. It is currently unclear whether these reward models can provide additional training signals to enhance the reasoning capabilities of LLMs in RL training that uses sparse success rewards, which verify the correctness of solutions. In this work, we evaluate popular reward models for RL training, including the Outcome-supervised Reward Model (ORM) and the Process-supervised Reward Model (PRM), and train a collection of LLMs for math problems using RL by combining these learned rewards with success rewards. Surprisingly, even though these learned reward models have strong inference-time performances, they may NOT help or even hurt RL training, producing worse performances than LLMs trained with the success reward only. Our analysis reveals that an LLM can receive high rewards from some of these reward models by repeating correct but unnecessary reasoning steps, leading to a severe reward hacking issue. Therefore, we introduce two novel reward refinement techniques, including Clipping and Delta. The key idea is to ensure the accumulative reward of any reasoning trajectory is upper-bounded to keep a learned reward model effective without being exploited. We evaluate our techniques with multiple reward models over a set of 1.5B and 7B LLMs on MATH and GSM8K benchmarks and demonstrate that with a carefully designed reward function, RL training without any additional supervised tuning can improve all the evaluated LLMs, including the state-of-the-art 7B LLM Qwen2.5-Math-7B-Instruct on MATH and GSM8K benchmarks.
Authors: Shpresim Sadiku, Moritz Wagner, Sai Ganesh Nagarajan, Sebastian Pokutta
Abstract: We study the problem of finding optimal sparse, manifold-aligned counterfactual explanations for classifiers. Canonically, this can be formulated as an optimization problem with multiple non-convex components, including classifier loss functions and manifold alignment (or \emph{plausibility}) metrics. The added complexity of enforcing \emph{sparsity}, or shorter explanations, complicates the problem further. Existing methods often focus on specific models and plausibility measures, relying on convex $\ell_1$ regularizers to enforce sparsity. In this paper, we tackle the canonical formulation using the accelerated proximal gradient (APG) method, a simple yet efficient first-order procedure capable of handling smooth non-convex objectives and non-smooth $\ell_p$ (where $0 \leq p < 1$) regularizers. This enables our approach to seamlessly incorporate various classifiers and plausibility measures while producing sparser solutions. Our algorithm only requires differentiable data-manifold regularizers and supports box constraints for bounded feature ranges, ensuring the generated counterfactuals remain \emph{actionable}. Finally, experiments on real-world datasets demonstrate that our approach effectively produces sparse, manifold-aligned counterfactual explanations while maintaining proximity to the factual data and computational efficiency.
Authors: Haoxuan Kuang, Kunxiang Deng, Linlin You, Jun Li
Abstract: Electric vehicle charging demand prediction is important for vacant charging pile recommendation and charging infrastructure planning, thus facilitating vehicle electrification and green energy development. The performance of previous spatio-temporal studies is still far from satisfactory nowadays because urban region attributes and multivariate temporal influences are not adequately taken into account. To tackle these issues, we propose a learning approach for citywide electric vehicle charging demand prediction, named CityEVCP. To learn non-pairwise relationships in urban areas, we cluster service areas by the types and numbers of points of interest in the areas and develop attentive hypergraph networks accordingly. Graph attention mechanisms are employed for information propagation between neighboring areas. Additionally, we propose a variable selection network to adaptively learn dynamic auxiliary information and improve the Transformer encoder utilizing gated mechanisms for fluctuating charging time-series data. Experiments on a citywide electric vehicle charging dataset demonstrate the performances of our proposed approach compared with a broad range of competing baselines. Furthermore, we demonstrate the impact of dynamic influences on prediction results in different areas of the city and the effectiveness of our area clustering method.
Authors: Tian Wang, Chuang Wang
Abstract: Pretraining methods gain increasing attraction recently for solving PDEs with neural operators. It alleviates the data scarcity problem encountered by neural operator learning when solving single PDE via training on large-scale datasets consisting of various PDEs and utilizing shared patterns among different PDEs to improve the solution precision. In this work, we propose the Latent Neural Operator Pretraining (LNOP) framework based on the Latent Neural Operator (LNO) backbone. We achieve universal transformation through pretraining on hybrid time-dependent PDE dataset to extract representations of different physical systems and solve various time-dependent PDEs in the latent space through finetuning on single PDE dataset. Our proposed LNOP framework reduces the solution error by 31.7% on four problems and can be further improved to 57.1% after finetuning. On out-of-distribution dataset, our LNOP model achieves roughly 50% lower error and 3$\times$ data efficiency on average across different dataset sizes. These results show that our method is more competitive in terms of solution precision, transfer capability and data efficiency compared to non-pretrained neural operators.
Authors: Joey Hong, Anca Dragan, Sergey Levine
Abstract: Value-based reinforcement learning (RL) can in principle learn effective policies for a wide range of multi-turn problems, from games to dialogue to robotic control, including via offline RL from static previously collected datasets. However, despite the widespread use of policy gradient methods to train large language models for single turn tasks (e.g., question answering), value-based methods for multi-turn RL in an off-policy or offline setting have proven particularly challenging to scale to the setting of large language models. This setting requires effectively leveraging pretraining, scaling to large architectures with billions of parameters, and training on large datasets, all of which represent major challenges for current value-based RL methods. In this work, we propose a novel offline RL algorithm that addresses these drawbacks, casting Q-learning as a modified supervised fine-tuning (SFT) problem where the probabilities of tokens directly translate to Q-values. In this way we obtain an algorithm that smoothly transitions from maximizing the likelihood of the data during pretraining to learning a near-optimal Q-function during finetuning. Our algorithm has strong theoretical foundations, enjoying performance bounds similar to state-of-the-art Q-learning methods, while in practice utilizing an objective that closely resembles SFT. Because of this, our approach can enjoy the full benefits of the pretraining of language models, without the need to reinitialize any weights before RL finetuning, and without the need to initialize new heads for predicting values or advantages. Empirically, we evaluate our method on both pretrained LLMs and VLMs, on a variety of tasks including both natural language dialogue and robotic manipulation and navigation from images.
Authors: Jaehyeok Lee, Keisuke Sakaguchi, JinYeong Bak
Abstract: Self-training approach for large language models (LLMs) improves reasoning abilities by training the models on their self-generated rationales. Previous approaches have labeled rationales that produce correct answers for a given question as appropriate for training. However, a single measure risks misjudging rationale quality, leading the models to learn flawed reasoning patterns. To address this issue, we propose CREST (Consistency-driven Rationale Evaluation for Self-Training), a self-training framework that further evaluates each rationale through follow-up questions and leverages this evaluation to guide its training. Specifically, we introduce two methods: (1) filtering out rationales that frequently result in incorrect answers on follow-up questions and (2) preference learning based on mixed preferences from rationale evaluation results of both original and follow-up questions. Experiments on three question-answering datasets using open LLMs show that CREST not only improves the logical robustness and correctness of rationales but also improves reasoning abilities compared to previous self-training approaches.
Authors: Tianqu Kang, Zixin Wang, Hengtao He, Jun Zhang, Shenghui Song, Khaled B. Letaief
Abstract: Fine-tuning large pre-trained foundation models (FMs) on distributed edge devices presents considerable computational and privacy challenges. Federated fine-tuning (FedFT) mitigates some privacy issues by facilitating collaborative model training without the need to share raw data. To lessen the computational burden on resource-limited devices, combining low-rank adaptation (LoRA) with federated learning enables parameter-efficient fine-tuning. Additionally, the split FedFT architecture partitions an FM between edge devices and a central server, reducing the necessity for complete model deployment on individual devices. However, the risk of privacy eavesdropping attacks in FedFT remains a concern, particularly in sensitive areas such as healthcare and finance. In this paper, we propose a split FedFT framework with differential privacy (DP) over wireless networks, where the inherent wireless channel noise in the uplink transmission is utilized to achieve DP guarantees without adding an extra artificial noise. We shall investigate the impact of the wireless noise on convergence performance of the proposed framework. We will also show that by updating only one of the low-rank matrices in the split FedFT with DP, the proposed method can mitigate the noise amplification effect. Simulation results will demonstrate that the proposed framework achieves higher accuracy under strict privacy budgets compared to baseline methods.
Authors: Botao Wang, Jia Li, Heng Chang, Keli Zhang, Fugee Tsung
Abstract: In this work, we discover that causal inference provides a promising approach to capture heterophilic message-passing in Graph Neural Network (GNN). By leveraging cause-effect analysis, we can discern heterophilic edges based on asymmetric node dependency. The learned causal structure offers more accurate relationships among nodes. To reduce the computational complexity, we introduce intervention-based causal inference in graph learning. We first simplify causal analysis on graphs by formulating it as a structural learning model and define the optimization problem within the Bayesian scheme. We then present an analysis of decomposing the optimization target into a consistency penalty and a structure modification based on cause-effect relations. We then estimate this target by conditional entropy and present insights into how conditional entropy quantifies the heterophily. Accordingly, we propose CausalMP, a causal message-passing discovery network for heterophilic graph learning, that iteratively learns the explicit causal structure of input graphs. We conduct extensive experiments in both heterophilic and homophilic graph settings. The result demonstrates that the our model achieves superior link prediction performance. Training on causal structure can also enhance node representation in classification task across different base models.
Authors: Xu Ouyang, Tao Ge, Thomas Hartvigsen, Zhisong Zhang, Haitao Mi, Dong Yu
Abstract: We reveal that low-bit quantization favors undertrained large language models (LLMs) by observing that models with larger sizes or fewer training tokens experience less quantization-induced degradation (QiD) when applying low-bit quantization, whereas smaller models with extensive training tokens suffer significant QiD. To gain deeper insights into this trend, we study over 1500 quantized LLM checkpoints of various sizes and at different training levels (undertrained or fully trained) in a controlled setting, deriving scaling laws for understanding the relationship between QiD and factors such as the number of training tokens, model size and bit width. With the derived scaling laws, we propose a novel perspective that we can use QiD to measure an LLM's training levels and determine the number of training tokens required for fully training LLMs of various sizes. Moreover, we use the scaling laws to predict the quantization performance of different-sized LLMs trained with 100 trillion tokens. Our projection shows that the low-bit quantization performance of future models, which are expected to be trained with over 100 trillion tokens, may NOT be desirable. This poses a potential challenge for low-bit quantization in the future and highlights the need for awareness of a model's training level when evaluating low-bit quantization research. To facilitate future research on this problem, we release all the 1500+ quantized checkpoints used in this work at https://huggingface.co/Xu-Ouyang.
Authors: Andreas Joseph
Abstract: We present a novel framework for estimation and inference for the broad class of universal approximators. Estimation is based on the decomposition of model predictions into Shapley values. Inference relies on analyzing the bias and variance properties of individual Shapley components. We show that Shapley value estimation is asymptotically unbiased, and we introduce Shapley regressions as a tool to uncover the true data generating process from noisy data alone. The well-known case of the linear regression is the special case in our framework if the model is linear in parameters. We present theoretical, numerical, and empirical results for the estimation of heterogeneous treatment effects as our guiding example.
Authors: James Daniell, Kazuma Kobayashi, Ayodeji Alajo, Syed Bahauddin Alam
Abstract: The accurate and efficient modeling of nuclear reactor transients is crucial for ensuring safe and optimal reactor operation. Traditional physics-based models, while valuable, can be computationally intensive and may not fully capture the complexities of real-world reactor behavior. This paper introduces a novel hybrid digital twin-focused multi-stage deep learning framework that addresses these limitations, offering a faster and more robust solution for predicting the final steady-state power of reactor transients. By leveraging a combination of feed-forward neural networks with both classification and regression stages, and training on a unique dataset that integrates real-world measurements of reactor power and controls state from the Missouri University of Science and Technology Reactor (MSTR) with noise-enhanced simulated data, our approach achieves remarkable accuracy (96% classification, 2.3% MAPE). The incorporation of simulated data with noise significantly improves the model's generalization capabilities, mitigating the risk of overfitting. Designed as a digital twin supporting system, this framework integrates real-time, synchronized predictions of reactor state transitions, enabling dynamic operational monitoring and optimization. This innovative solution not only enables rapid and precise prediction of reactor behavior but also has the potential to revolutionize nuclear reactor operations, facilitating enhanced safety protocols, optimized performance, and streamlined decision-making processes. By aligning data-driven insights with the principles of digital twins, this work lays the groundwork for adaptable and scalable solutions in nuclear system management.
Authors: Zhenwei Lin, Qi Deng
Abstract: In this paper, we introduce faster accelerated primal-dual algorithms for minimizing a convex function subject to strongly convex function constraints. Prior to our work, the best complexity bound was $\mathcal{O}(1/{\varepsilon})$, regardless of the strong convexity of the constraint function. It is unclear whether the strong convexity assumption can enable even better convergence results. To address this issue, we have developed novel techniques to progressively estimate the strong convexity of the Lagrangian function. Our approach, for the first time, effectively leverages the constraint strong convexity, obtaining an improved complexity of $\mathcal{O}(1/\sqrt{\varepsilon})$. This rate matches the complexity lower bound for strongly-convex-concave saddle point optimization and is therefore order-optimal. We show the superior performance of our methods in sparsity-inducing constrained optimization, notably Google's personalized PageRank problem. Furthermore, we show that a restarted version of the proposed methods can effectively identify the optimal solution's sparsity pattern within a finite number of steps, a result that appears to have independent significance.
Authors: Mat\'eo Mahaut, Francesca Franzon, Roberto Dess\`i, Marco Baroni
Abstract: As large pre-trained image-processing neural networks are being embedded in autonomous agents such as self-driving cars or robots, the question arises of how such systems can communicate with each other about the surrounding world, despite their different architectures and training regimes. As a first step in this direction, we systematically explore the task of referential communication in a community of heterogeneous state-of-the-art pre-trained visual networks, showing that they can develop, in a self-supervised way, a shared protocol to refer to a target object among a set of candidates. This shared protocol can also be used, to some extent, to communicate about previously unseen object categories of different granularity. Moreover, a visual network that was not initially part of an existing community can learn the community's protocol with remarkable ease. Finally, we study, both qualitatively and quantitatively, the properties of the emergent protocol, providing some evidence that it is capturing high-level semantic features of objects.
Authors: Sophie Starck, Yadunandan Vivekanand Kini, Jessica Johanna Maria Ritter, Rickmer Braren, Daniel Rueckert, Tamara Mueller
Abstract: Age prediction is an important part of medical assessments and research. It can aid in detecting diseases as well as abnormal ageing by highlighting potential discrepancies between chronological and biological age. To improve understanding of age-related changes in various body parts, we investigate the ageing of the human body on a large scale by using whole-body 3D images. We utilise the Grad-CAM method to determine the body areas most predictive of a person's age. In order to expand our analysis beyond individual subjects, we employ registration techniques to generate population-wide importance maps that show the most predictive areas in the body for a whole cohort of subjects. We show that the investigation of the full 3D volume of the whole body and the population-wide analysis can give important insights into which body parts play the most important roles in predicting a person's age. Our findings reveal three primary areas of interest: the spine, the autochthonous back muscles, and the cardiac region, which exhibits the highest importance. Finally, we investigate differences between subjects that show accelerated and decelerated ageing.
Authors: Wing Keung Cheung, Jeremy Kalindjian, Robert Bell, Arjun Nair, Leon J. Menezes, Riyaz Patel, Simon Wan, Kacy Chou, Jiahang Chen, Ryo Torii, Rhodri H. Davies, James C. Moon, Daniel C. Alexander, Joseph Jacob
Abstract: Early detection and diagnosis of coronary artery disease (CAD) could save lives and reduce healthcare costs. The current clinical practice is to perform CAD diagnosis through analysing medical images from computed tomography coronary angiography (CTCA). Most current approaches utilise deep learning methods but require centerline extraction and multi-planar reconstruction. These indirect methods are not designed in a clinician-friendly manner, and they complicate the interventional procedure. Furthermore, the current deep learning methods do not provide exact explainability and limit the usefulness of these methods to be deployed in clinical settings. In this study, we first propose a 3D Resnet-50 deep learning model to directly classify normal subjects and CAD patients on CTCA images, then we demonstrate a 2D modified U-Net model can be subsequently employed to segment the coronary arteries. Our proposed approach outperforms the state-of-the-art models by 21.43% in terms of classification accuracy. The classification model with focal loss provides a better and more focused heat map, and the segmentation model provides better explainability than the classification-only model. The proposed holistic approach not only provides a simpler and clinician-friendly solution but also good classification accuracy and exact explainability for CAD diagnosis.
Authors: Sai Saketh Rambhatla, Ishan Misra
Abstract: We present an automated way to evaluate the text alignment of text-to-image generative diffusion models using standard image-text recognition datasets. Our method, called SelfEval, uses the generative model to compute the likelihood of real images given text prompts, and the likelihood can be used to perform recognition tasks with the generative model. We evaluate generative models on standard datasets created for multimodal text-image discriminative learning and assess fine-grained aspects of their performance: attribute binding, color recognition, counting, shape recognition, spatial understanding. Existing automated metrics rely on an external pretrained model like CLIP (VLMs) or LLMs, and are sensitive to the exact pretrained model and its limitations. SelfEval sidesteps these issues, and to the best of our knowledge, is the first automated metric to show a high degree of agreement for measuring text-faithfulness with the gold-standard human evaluations across multiple generative models, benchmarks and evaluation metrics. SelfEval also reveals that generative models showcase competitive recognition performance on challenging tasks such as Winoground image-score compared to discriminative models. We hope SelfEval enables easy and reliable automated evaluation for diffusion models.
Authors: Shpresim Sadiku, Moritz Wagner, Sebastian Pokutta
Abstract: Sparse adversarial attacks fool deep neural networks (DNNs) through minimal pixel perturbations, often regularized by the $\ell_0$ norm. Recent efforts have replaced this norm with a structural sparsity regularizer, such as the nuclear group norm, to craft group-wise sparse adversarial attacks. The resulting perturbations are thus explainable and hold significant practical relevance, shedding light on an even greater vulnerability of DNNs. However, crafting such attacks poses an optimization challenge, as it involves computing norms for groups of pixels within a non-convex objective. We address this by presenting a two-phase algorithm that generates group-wise sparse attacks within semantically meaningful areas of an image. Initially, we optimize a quasinorm adversarial loss using the $1/2-$quasinorm proximal operator tailored for non-convex programming. Subsequently, the algorithm transitions to a projected Nesterov's accelerated gradient descent with $2-$norm regularization applied to perturbation magnitudes. Rigorous evaluations on CIFAR-10 and ImageNet datasets demonstrate a remarkable increase in group-wise sparsity, e.g., $50.9\%$ on CIFAR-10 and $38.4\%$ on ImageNet (average case, targeted attack). This performance improvement is accompanied by significantly faster computation times, improved explainability, and a $100\%$ attack success rate.
Authors: Yongqi Dong, Xingmin Lu, Ruohan Li, Wei Song, Bart van Arem, Haneen Farah
Abstract: The burgeoning navigation services using digital maps provide great convenience to drivers. Nevertheless, the presence of anomalies in lane rendering map images occasionally introduces potential hazards, as such anomalies can be misleading to human drivers and consequently contribute to unsafe driving conditions. In response to this concern and to accurately and effectively detect the anomalies, this paper transforms lane rendering image anomaly detection into a classification problem and proposes a four-phase pipeline consisting of data pre-processing, self-supervised pre-training with the masked image modeling (MiM) method, customized fine-tuning using cross-entropy based loss with label smoothing, and post-processing to tackle it leveraging state-of-the-art deep learning techniques, especially those involving Transformer models. Various experiments verify the effectiveness of the proposed pipeline. Results indicate that the proposed pipeline exhibits superior performance in lane rendering image anomaly detection, and notably, the self-supervised pre-training with MiM can greatly enhance the detection accuracy while significantly reducing the total training time. For instance, employing the Swin Transformer with Uniform Masking as self-supervised pretraining (Swin-Trans-UM) yielded a heightened accuracy at 94.77% and an improved Area Under The Curve (AUC) score of 0.9743 compared with the pure Swin Transformer without pre-training (Swin-Trans) with an accuracy of 94.01% and an AUC of 0.9498. The fine-tuning epochs were dramatically reduced to 41 from the original 280. In conclusion, the proposed pipeline, with its incorporation of self-supervised pre-training using MiM and other advanced deep learning techniques, emerges as a robust solution for enhancing the accuracy and efficiency of lane rendering image anomaly detection in digital navigation systems.
Authors: Xinglin Zhou, Yifu Yuan, Shaofu Yang, Jianye Hao
Abstract: Hierarchical reinforcement learning (HRL) provides a promising solution for complex tasks with sparse rewards of intelligent agents, which uses a hierarchical framework that divides tasks into subgoals and completes them sequentially. However, current methods struggle to find suitable subgoals for ensuring a stable learning process. Without additional guidance, it is impractical to rely solely on exploration or heuristics methods to determine subgoals in a large goal space. To address the issue, We propose a general hierarchical reinforcement learning framework incorporating human feedback and dynamic distance constraints (MENTOR). MENTOR acts as a "mentor", incorporating human feedback into high-level policy learning, to find better subgoals. As for low-level policy, MENTOR designs a dual policy for exploration-exploitation decoupling respectively to stabilize the training. Furthermore, although humans can simply break down tasks into subgoals to guide the right learning direction, subgoals that are too difficult or too easy can still hinder downstream learning efficiency. We propose the Dynamic Distance Constraint (DDC) mechanism dynamically adjusting the space of optional subgoals. Thus MENTOR can generate subgoals matching the low-level policy learning process from easy to hard. Extensive experiments demonstrate that MENTOR uses a small amount of human feedback to achieve significant improvement in complex tasks with sparse rewards.
Authors: Daniel Nickelsen, Gernot M\"uller
Abstract: We address the need for forecasting methodologies that handle large uncertainties in electricity prices for continuous intraday markets by incorporating parameter uncertainty and using a broad set of covariables. This study presents the first Bayesian forecasting of electricity prices traded on the German intraday market. Endogenous and exogenous covariables are handled via Orthogonal Matching Pursuit (OMP) and regularising priors. The target variable is the IDFull price index, with forecasts given as posterior predictive distributions. Validation uses the highly volatile 2022 electricity prices, which have seldom been studied. As a benchmark, we use all intraday transactions at the time of forecast to compute a live IDFull value. According to market efficiency, it should not be possible to improve on this last-price benchmark. However, we observe significant improvements in point measures and probability scores, including an average reduction of $5.9\,\%$ in absolute errors and an average increase of $1.7\,\%$ in accuracy when forecasting whether the IDFull exceeds the day-ahead price. Finally, we challenge the use of LASSO in electricity price forecasting, showing that OMP results in superior performance, specifically an average reduction of $22.7\,\%$ in absolute error and $20.2\,\%$ in the continuous ranked probability score.
Authors: Thomas Melistas, Nikos Spyrou, Nefeli Gkouti, Pedro Sanchez, Athanasios Vlontzos, Yannis Panagakis, Giorgos Papanastasiou, Sotirios A. Tsaftaris
Abstract: Generative AI has revolutionised visual content editing, empowering users to effortlessly modify images and videos. However, not all edits are equal. To perform realistic edits in domains such as natural image or medical imaging, modifications must respect causal relationships inherent to the data generation process. Such image editing falls into the counterfactual image generation regime. Evaluating counterfactual image generation is substantially complex: not only it lacks observable ground truths, but also requires adherence to causal constraints. Although several counterfactual image generation methods and evaluation metrics exist, a comprehensive comparison within a unified setting is lacking. We present a comparison framework to thoroughly benchmark counterfactual image generation methods. We integrate all models that have been used for the task at hand and expand them to novel datasets and causal graphs, demonstrating the superiority of Hierarchical VAEs across most datasets and metrics. Our framework is implemented in a user-friendly Python package that can be extended to incorporate additional SCMs, causal methods, generative models, and datasets for the community to build on. Code: https://github.com/gulnazaki/counterfactual-benchmark.
URLs: https://github.com/gulnazaki/counterfactual-benchmark.
Authors: Moshe Berchansky, Daniel Fleischer, Moshe Wasserblat, Peter Izsak
Abstract: State-of-the-art performance in QA tasks is currently achieved by systems employing Large Language Models (LLMs), however these models tend to hallucinate information in their responses. One approach focuses on enhancing the generation process by incorporating attribution from the given input to the output. However, the challenge of identifying appropriate attributions and verifying their accuracy against a source is a complex task that requires significant improvements in assessing such systems. We introduce an attribution-oriented Chain-of-Thought reasoning method to enhance the accuracy of attributions. This approach focuses the reasoning process on generating an attribution-centric output. Evaluations on two context-enhanced question-answering datasets using GPT-4 demonstrate improved accuracy and correctness of attributions. In addition, the combination of our method with finetuning enhances the response and attribution accuracy of two smaller LLMs, showing their potential to outperform GPT-4 in some cases.
Authors: Jorge F. Urb\'an, Petros Stefanou, Jos\'e A. Pons
Abstract: This study investigates the potential accuracy boundaries of physics-informed neural networks, contrasting their approach with previous similar works and traditional numerical methods. We find that selecting improved optimization algorithms significantly enhances the accuracy of the results. Simple modifications to the loss function may also improve precision, offering an additional avenue for enhancement. Despite optimization algorithms having a greater impact on convergence than adjustments to the loss function, practical considerations often favor tweaking the latter due to ease of implementation. On a global scale, the integration of an enhanced optimizer and a marginally adjusted loss function enables a reduction in the loss function by several orders of magnitude across diverse physical problems. Consequently, our results obtained using compact networks (typically comprising 2 or 3 layers of 20-30 neurons) achieve accuracies comparable to finite difference schemes employing thousands of grid points. This study encourages the continued advancement of PINNs and associated optimization techniques for broader applications across various fields.
Authors: Danial Ebrat, Eli Paradalis, Luis Rueda
Abstract: Training reinforcement learning-based recommender systems is often hindered by the lack of dynamic and realistic user interactions. To address this limitation, we introduce Lusifer, a novel environment leveraging Large Language Models (LLMs) to generate simulated user feedback. Lusifer synthesizes user profiles and interaction histories to simulate responses and behaviors toward recommended items, with profiles updated after each rating to reflect evolving user characteristics. Utilizing the MovieLens dataset as a proof of concept, we limited our implementation to the last 40 interactions for each user, representing approximately 39% and 22% of the training sets, to focus on recent user behavior. For consistency and to gain insights into the performance of traditional methods with limited data, we implemented baseline approaches using the same data subset. Our results demonstrate that Lusifer accurately emulates user behavior and preferences, even with reduced training data having an RMSE of 1.3 across various test sets. This paper presents Lusifer's operational pipeline, including prompt generation and iterative user profile updates, and compares its performance against baseline methods. The findings validate Lusifer's ability to produce realistic dynamic feedback and suggest that it offers a scalable and adjustable framework for user simulation in online reinforcement learning recommender systems for future studies, particularly when training data is limited.
Authors: Dhruvit Patel, Troy Arcomano, Brian Hunt, Istvan Szunyogh, Edward Ott
Abstract: This paper explores the potential of a hybrid modeling approach that combines machine learning (ML) with conventional physics-based modeling for weather prediction beyond the medium range. It extends the work of Arcomano et al. (2022), which tested the approach for short- and medium-range weather prediction, and the work of Arcomano et al. (2023), which investigated its potential for climate modeling. The hybrid model used for the forecast experiments of the paper is based on the low-resolution, simplified parameterization atmospheric general circulation model SPEEDY. In addition to the hybridized prognostic variables of SPEEDY, the model has three purely ML-based prognostic variables: the 6h cumulative precipitation, the sea surface temperature, and the heat content of the top 300m deep layer of the ocean (a new addition compared to the model used in Arcomano et al., 2023). The model has skill in predicting the El Nino cycle and its global teleconnections with precipitation for 3-7 months depending on the season. The model captures equatorial variability of the precipitation associated with Kelvin and Rossby waves and MJO. Predictions of the precipitation in the equatorial region have skill for 15 days in the East Pacific and 11.5 days in the West Pacific. Though the model has low spatial resolution, for these tasks it has prediction skill comparable to what has been published for high-resolution, purely physics-based, conventional, operational forecast models.
Authors: Ryo Fujii, Masashi Hatano, Hideo Saito, Hiroki Kajita
Abstract: Surgical phase recognition has gained significant attention due to its potential to offer solutions to numerous demands of the modern operating room. However, most existing methods concentrate on minimally invasive surgery (MIS), leaving surgical phase recognition for open surgery understudied. This discrepancy is primarily attributed to the scarcity of publicly available open surgery video datasets for surgical phase recognition. To address this issue, we introduce a new egocentric open surgery video dataset for phase recognition, named EgoSurgery-Phase. This dataset comprises 15 hours of real open surgery videos spanning 9 distinct surgical phases all captured using an egocentric camera attached to the surgeon's head. In addition to video, the EgoSurgery-Phase offers eye gaze. As far as we know, it is the first real open surgery video dataset for surgical phase recognition publicly available. Furthermore, inspired by the notable success of masked autoencoders (MAEs) in video understanding tasks (e.g., action recognition), we propose a gaze-guided masked autoencoder (GGMAE). Considering the regions where surgeons' gaze focuses are often critical for surgical phase recognition (e.g., surgical field), in our GGMAE, the gaze information acts as an empirical semantic richness prior to guiding the masking process, promoting better attention to semantically rich spatial regions. GGMAE significantly improves the previous state-of-the-art recognition method (6.4% in Jaccard) and the masked autoencoder-based method (3.1% in Jaccard) on EgoSurgery-Phase. The dataset is released at https://github.com/Fujiry0/EgoSurgery.
Authors: Jingwei Li, Jing Dong, Tianxing He, Jingzhao Zhang
Abstract: Given the rising popularity of AI-generated art and the associated copyright concerns, identifying whether an artwork was used to train a diffusion model is an important research topic. The work approaches this problem from the membership inference attack (MIA) perspective. We first identify the limitation of applying existing MIA methods for proprietary diffusion models: the required access of internal U-nets. To address the above problem, we introduce a novel membership inference attack method that uses only the image-to-image variation API and operates without access to the model's internal U-net. Our method is based on the intuition that the model can more easily obtain an unbiased noise prediction estimate for images from the training set. By applying the API multiple times to the target image, averaging the outputs, and comparing the result to the original image, our approach can classify whether a sample was part of the training set. We validate our method using DDIM and Stable Diffusion setups and further extend both our approach and existing algorithms to the Diffusion Transformer architecture. Our experimental results consistently outperform previous methods.
Authors: Ryo Fujii, Hideo Saito, Hiroki Kajita
Abstract: Surgical tool detection is a fundamental task for understanding egocentric open surgery videos. However, detecting surgical tools presents significant challenges due to their highly imbalanced class distribution, similar shapes and similar textures, and heavy occlusion. The lack of a comprehensive large-scale dataset compounds these challenges. In this paper, we introduce EgoSurgery-Tool, an extension of the existing EgoSurgery-Phase dataset, which contains real open surgery videos captured using an egocentric camera attached to the surgeon's head, along with phase annotations. EgoSurgery-Tool has been densely annotated with surgical tools and comprises over 49K surgical tool bounding boxes across 15 categories, constituting a large-scale surgical tool detection dataset. EgoSurgery-Tool also provides annotations for hand detection with over 46K hand-bounding boxes, capturing hand-object interactions that are crucial for understanding activities in egocentric open surgery. EgoSurgery-Tool is superior to existing datasets due to its larger scale, greater variety of surgical tools, more annotations, and denser scenes. We conduct a comprehensive analysis of EgoSurgery-Tool using nine popular object detectors to assess their effectiveness in both surgical tool and hand detection. The dataset will be released at https://github.com/Fujiry0/EgoSurgery.
Authors: Rob Cornish
Abstract: We consider the problem of symmetrising a neural network along a group homomorphism: given a homomorphism $\varphi : H \to G$, we would like a procedure that converts $H$-equivariant neural networks to $G$-equivariant ones. We formulate this in terms of Markov categories, which allows us to consider neural networks whose outputs may be stochastic, but with measure-theoretic details abstracted away. We obtain a flexible and compositional framework for symmetrisation that relies on minimal assumptions about the structure of the group and the underlying neural network architecture. Our approach recovers existing canonicalisation and averaging techniques for symmetrising deterministic models, and extends to provide a novel methodology for symmetrising stochastic models also. Beyond this, our findings also demonstrate the utility of Markov categories for addressing complex problems in machine learning in a conceptually clear yet mathematically precise way.
Authors: Igor G. Smit, Jianan Zhou, Robbert Reijnen, Yaoxin Wu, Jian Chen, Cong Zhang, Zaharah Bukhsh, Yingqian Zhang, Wim Nuijten
Abstract: Job shop scheduling problems (JSSPs) represent a critical and challenging class of combinatorial optimization problems. Recent years have witnessed a rapid increase in the application of graph neural networks (GNNs) to solve JSSPs, albeit lacking a systematic survey of the relevant literature. This paper aims to thoroughly review prevailing GNN methods for different types of JSSPs and the closely related flow-shop scheduling problems (FSPs), especially those leveraging deep reinforcement learning (DRL). We begin by presenting the graph representations of various JSSPs, followed by an introduction to the most commonly used GNN architectures. We then review current GNN-based methods for each problem type, highlighting key technical elements such as graph representations, GNN architectures, GNN tasks, and training algorithms. Finally, we summarize and analyze the advantages and limitations of GNNs in solving JSSPs and provide potential future research opportunities. We hope this survey can motivate and inspire innovative approaches for more powerful GNN-based approaches in tackling JSSPs and other scheduling problems.
Authors: Brandon Huang, Chancharik Mitra, Assaf Arbelle, Leonid Karlinsky, Trevor Darrell, Roei Herzig
Abstract: The recent success of interleaved Large Multimodal Models (LMMs) in few-shot learning suggests that in-context learning (ICL) with many examples can be promising for learning new tasks. However, this many-shot multimodal ICL setting has one crucial problem: it is fundamentally limited by the model's context length set at pretraining. The problem is especially prominent in the multimodal domain, which processes both text and images, requiring additional tokens. This motivates the need for a multimodal method to compress many shots into fewer tokens without finetuning. In this work, we enable LMMs to perform multimodal, many-shot in-context learning by leveraging Multimodal Task Vectors (MTV) -- compact implicit representations of in-context examples compressed in the model's attention heads. Specifically, we first demonstrate the existence of such MTV in LMMs and then leverage these extracted MTV to enable many-shot in-context learning for various vision-and-language tasks. Our experiments suggest that MTV can scale in performance with the number of compressed shots and generalize to similar out-of-domain tasks without additional context length for inference. Code: https://github.com/Brandon3964/MultiModal-Task-Vector
Authors: Xinyi Wang, Antonis Antoniades, Yanai Elazar, Alfonso Amayuelas, Alon Albalak, Kexun Zhang, William Yang Wang
Abstract: The impressive capabilities of large language models (LLMs) have sparked debate over whether these models genuinely generalize to unseen tasks or predominantly rely on memorizing vast amounts of pretraining data. To explore this issue, we introduce an extended concept of memorization, distributional memorization, which measures the correlation between the LLM output probabilities and the pretraining data frequency. To effectively capture task-specific pretraining data frequency, we propose a novel task-gram language model, which is built by counting the co-occurrence of semantically related $n$-gram pairs from task inputs and outputs in the pretraining corpus. Using the Pythia models trained on the Pile dataset, we evaluate four distinct tasks: machine translation, factual question answering, world knowledge understanding, and math reasoning. Our findings reveal varying levels of memorization, with the strongest effect observed in factual question answering. Furthermore, while model performance improves across all tasks as LLM size increases, only factual question answering shows an increase in memorization, whereas machine translation and reasoning tasks exhibit greater generalization, producing more novel outputs. This study demonstrates that memorization plays a larger role in simpler, knowledge-intensive tasks, while generalization is the key for harder, reasoning-based tasks, providing a scalable method for analyzing large pretraining corpora in greater depth. We also show the practical implications of our analysis through a novel prompt optimization algorithm.
Authors: Xiaogeng Liu, Peiran Li, Edward Suh, Yevgeniy Vorobeychik, Zhuoqing Mao, Somesh Jha, Patrick McDaniel, Huan Sun, Bo Li, Chaowei Xiao
Abstract: In this paper, we propose AutoDAN-Turbo, a black-box jailbreak method that can automatically discover as many jailbreak strategies as possible from scratch, without any human intervention or predefined scopes (e.g., specified candidate strategies), and use them for red-teaming. As a result, AutoDAN-Turbo can significantly outperform baseline methods, achieving a 74.3% higher average attack success rate on public benchmarks. Notably, AutoDAN-Turbo achieves an 88.5 attack success rate on GPT-4-1106-turbo. In addition, AutoDAN-Turbo is a unified framework that can incorporate existing human-designed jailbreak strategies in a plug-and-play manner. By integrating human-designed strategies, AutoDAN-Turbo can even achieve a higher attack success rate of 93.4 on GPT-4-1106-turbo.
Authors: Daniel Barzilai, Ohad Shamir
Abstract: We provide non-asymptotic, relative deviation bounds for the eigenvalues of empirical covariance and gram matrices in general settings. Unlike typical uniform bounds, which may fail to capture the behavior of smaller eigenvalues, our results provide sharper control across the spectrum. Our analysis is based on a general-purpose theorem that allows one to convert existing uniform bounds into relative ones. The theorems and techniques emphasize simplicity and should be applicable across various settings.
Authors: Chang Deng, Kevin Bello, Pradeep Ravikumar, Bryon Aragam
Abstract: Existing approaches to differentiable structure learning of directed acyclic graphs (DAGs) rely on strong identifiability assumptions in order to guarantee that global minimizers of the acyclicity-constrained optimization problem identifies the true DAG. Moreover, it has been observed empirically that the optimizer may exploit undesirable artifacts in the loss function. We explain and remedy these issues by studying the behavior of differentiable acyclicity-constrained programs under general likelihoods with multiple global minimizers. By carefully regularizing the likelihood, it is possible to identify the sparsest model in the Markov equivalence class, even in the absence of an identifiable parametrization. We first study the Gaussian case in detail, showing how proper regularization of the likelihood defines a score that identifies the sparsest model. Assuming faithfulness, it also recovers the Markov equivalence class. These results are then generalized to general models and likelihoods, where the same claims hold. These theoretical results are validated empirically, showing how this can be done using standard gradient-based optimizers, thus paving the way for differentiable structure learning under general models and losses.
Authors: Xinyuan Wang, Victor Shea-Jay Huang, Renmiao Chen, Hao Wang, Chengwei Pan, Lei Sha, Minlie Huang
Abstract: While large language models (LLMs) exhibit remarkable capabilities across various tasks, they encounter potential security risks such as jailbreak attacks, which exploit vulnerabilities to bypass security measures and generate harmful outputs. Existing jailbreak strategies mainly focus on maximizing attack success rate (ASR), frequently neglecting other critical factors, including the relevance of the jailbreak response to the query and the level of stealthiness. This narrow focus on single objectives can result in ineffective attacks that either lack contextual relevance or are easily recognizable. In this work, we introduce BlackDAN, an innovative black-box attack framework with multi-objective optimization, aiming to generate high-quality prompts that effectively facilitate jailbreaking while maintaining contextual relevance and minimizing detectability. BlackDAN leverages Multiobjective Evolutionary Algorithms (MOEAs), specifically the NSGA-II algorithm, to optimize jailbreaks across multiple objectives including ASR, stealthiness, and semantic relevance. By integrating mechanisms like mutation, crossover, and Pareto-dominance, BlackDAN provides a transparent and interpretable process for generating jailbreaks. Furthermore, the framework allows customization based on user preferences, enabling the selection of prompts that balance harmfulness, relevance, and other factors. Experimental results demonstrate that BlackDAN outperforms traditional single-objective methods, yielding higher success rates and improved robustness across various LLMs and multimodal LLMs, while ensuring jailbreak responses are both relevant and less detectable.
Authors: Guibin Zhang, Yanwei Yue, Xiangguo Sun, Guancheng Wan, Miao Yu, Junfeng Fang, Kun Wang, Dawei Cheng
Abstract: Recent advancements in large language model (LLM)-based agents have demonstrated that collective intelligence can significantly surpass the capabilities of individual agents, primarily due to well-crafted inter-agent communication topologies. Despite the diverse and high-performing designs available, practitioners often face confusion when selecting the most effective pipeline for their specific task: \textit{Which topology is the best choice for my task, avoiding unnecessary communication token overhead while ensuring high-quality solution?} In response to this dilemma, we introduce G-Designer, an adaptive, efficient, and robust solution for multi-agent deployment, which dynamically designs task-aware, customized communication topologies. Specifically, G-Designer models the multi-agent system as a multi-agent network, leveraging a variational graph auto-encoder to encode both the nodes (agents) and a task-specific virtual node, and decodes a task-adaptive and high-performing communication topology. Extensive experiments on six benchmarks showcase that G-Designer is: \textbf{(1) high-performing}, achieving superior results on MMLU with accuracy at $84.50\%$ and on HumanEval with pass@1 at $89.90\%$; \textbf{(2) task-adaptive}, architecting communication protocols tailored to task difficulty, reducing token consumption by up to $95.33\%$ on HumanEval; and \textbf{(3) adversarially robust}, defending against agent adversarial attacks with merely $0.3\%$ accuracy drop.
Authors: K V Srinanda, M Manvith Prabhu, Shyam Lal
Abstract: This manuscript summarizes work on the Capsule Vision Challenge 2024 by MISAHUB. To address the multi-class disease classification task, which is challenging due to the complexity and imbalance in the Capsule Vision challenge dataset, this paper proposes CASCRNet (Capsule endoscopy-Aspp-SCR-Network), a parameter-efficient and novel model that uses Shared Channel Residual (SCR) blocks and Atrous Spatial Pyramid Pooling (ASPP) blocks. Further, the performance of the proposed model is compared with other well-known approaches. The experimental results yield that proposed model provides better disease classification results. The proposed model was successful in classifying diseases with an F1 Score of 78.5% and a Mean AUC of 98.3%, which is promising given its compact architecture.
Authors: Ahmed Temtam, Megan A. Witherow, Liangsuo Ma, M. Shibly Sadique, F. Gerard Moeller, Khan M. Iftekharuddin
Abstract: Understanding the neurobiology of opioid use disorder (OUD) using resting-state functional magnetic resonance imaging (rs-fMRI) may help inform treatment strategies to improve patient outcomes. Recent literature suggests temporal characteristics of rs-fMRI blood oxygenation level-dependent (BOLD) signals may offer complementary information to functional connectivity analysis. However, existing studies of OUD analyze BOLD signals using measures computed across all time points. This study, for the first time in the literature, employs data-driven machine learning (ML) modeling of rs-fMRI BOLD features representing multiple time points to identify region(s) of interest that differentiate OUD subjects from healthy controls (HC). Following the triple network model, we obtain rs-fMRI BOLD features from the default mode network (DMN), salience network (SN), and executive control network (ECN) for 31 OUD and 45 HC subjects. Then, we use the Boruta ML algorithm to identify statistically significant BOLD features that differentiate OUD from HC, identifying the DMN as the most salient functional network for OUD. Furthermore, we conduct brain activity mapping, showing heightened neural activity within the DMN for OUD. We perform 5-fold cross-validation classification (OUD vs. HC) experiments to study the discriminative power of functional network features with and without fusing demographic features. The DMN shows the most discriminative power, achieving mean AUC and F1 scores of 80.91% and 73.97%, respectively, when fusing BOLD and demographic features. Follow-up Boruta analysis using BOLD features extracted from the medial prefrontal cortex, posterior cingulate cortex, and left and right temporoparietal junctions reveals significant features for all four functional hubs within the DMN.
Authors: Spyridon Kantarelis, Konstantinos Thomas, Vassilis Lyberatos, Edmund Dervakos, Giorgos Stamou
Abstract: Chord progressions encapsulate important information about music, pertaining to its structure and conveyed emotions. They serve as the backbone of musical composition, and in many cases, they are the sole information required for a musician to play along and follow the music. Despite their importance, chord progressions as a data domain remain underexplored. There is a lack of large-scale datasets suitable for deep learning applications, and limited research exploring chord progressions as an input modality. In this work, we present Chordonomicon, a dataset of over 666,000 songs and their chord progressions, annotated with structural parts, genre, and release date - created by scraping various sources of user-generated progressions and associated metadata. We demonstrate the practical utility of the Chordonomicon dataset for classification and generation tasks, and discuss its potential to provide valuable insights to the research community. Chord progressions are unique in their ability to be represented in multiple formats (e.g. text, graph) and the wealth of information chords convey in given contexts, such as their harmonic function . These characteristics make the Chordonomicon an ideal testbed for exploring advanced machine learning techniques, including transformers, graph machine learning, and hybrid systems that combine knowledge representation and machine learning.
Authors: Chenxiao Yu, Zhaotian Weng, Yuangang Li, Zheng Li, Xiyang Hu, Yue Zhao
Abstract: Can Large Language Models (LLMs) accurately predict election outcomes? While LLMs have demonstrated impressive performance in various domains, including healthcare, legal analysis, and creative tasks, their ability to forecast elections remains unknown. Election prediction poses unique challenges, such as limited voter-level data, rapidly changing political landscapes, and the need to model complex human behavior. To address these challenges, we introduce a multi-step reasoning framework designed for political analysis. Our approach is validated on real-world data from the American National Election Studies (ANES) 2016 and 2020, as well as synthetic personas generated by the leading machine learning framework, offering scalable datasets for voter behavior modeling. To capture temporal dynamics, we incorporate candidates' policy positions and biographical details, ensuring that the model adapts to evolving political contexts. Drawing on Chain of Thought prompting, our multi-step reasoning pipeline systematically integrates demographic, ideological, and time-dependent factors, enhancing the model's predictive power.
Authors: Hyun Ryu, Eric Kim
Abstract: Efficient inference in large language models (LLMs) has become a critical focus as their scale and complexity grow. Traditional autoregressive decoding, while effective, suffers from computational inefficiencies due to its sequential token generation process. Speculative decoding addresses this bottleneck by introducing a two-stage framework: drafting and verification. A smaller, efficient model generates a preliminary draft, which is then refined by a larger, more sophisticated model. This paper provides a comprehensive survey of speculative decoding methods, categorizing them into draft-centric and model-centric approaches. We discuss key ideas associated with each method, highlighting their potential for scaling LLM inference. This survey aims to guide future research in optimizing speculative decoding and its integration into real-world LLM applications.
Authors: Marco Fiandri, Alberto Maria Metelli, Francesco Trov`o
Abstract: This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e. those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. $arm$). We study a particular case of the rested bandits in which the arms' expected reward is monotonically non-decreasing and concave. We study the inherent sample complexity of the regret minimization problem by deriving suitable regret lower bounds. Then, we design an algorithm for the rested case $\textit{R-ed-UCB}$, providing a regret bound depending on the properties of the instance and, under certain circumstances, of $\widetilde{\mathcal{O}}(T^{\frac{2}{3}})$. We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset
Authors: Yiming Ma, Fei Ye, Yi Zhou, Zaixiang Zheng, Dongyu Xue, Quanquan Gu
Abstract: Nature creates diverse proteins through a 'divide and assembly' strategy. Inspired by this idea, we introduce ProteinWeaver, a two-stage framework for protein backbone design. Our method first generates individual protein domains and then employs an SE(3) diffusion model to flexibly assemble these domains. A key challenge lies in the assembling step, given the complex and rugged nature of the inter-domain interaction landscape. To address this challenge, we employ preference alignment to discern complex relationships between structure and interaction landscapes through comparative analysis of generated samples. Comprehensive experiments demonstrate that ProteinWeaver: (1) generates high-quality, novel protein backbones through versatile domain assembly; (2) outperforms RFdiffusion, the current state-of-the-art in backbone design, by 13\% and 39\% for long-chain proteins; (3) shows the potential for cooperative function design through illustrative case studies. To sum up, by introducing a `divide-and-assembly' paradigm, ProteinWeaver advances protein engineering and opens new avenues for functional protein design.