Authors: Kei Itoh
Abstract: Developing strong AI signifies the arrival of technological singularity, contributing greatly to advancing human civilization and resolving social issues. Neural networks (NNs) and deep learning, which utilize NNs, are expected to lead to strong AI due to their biological neural system-mimicking structures. However, the statistical weight optimization techniques commonly used, such as error backpropagation and loss functions, may hinder the mimicry of neural systems. This study discusses the information propagation capabilities and potential practical applications of NNs as neural system mimicking structures by solving the handwritten character recognition problem in the Modified National Institute of Standards and Technology (MNIST) database without using statistical weight optimization techniques like error backpropagation. In this study, the NNs architecture comprises fully connected layers using step functions as activation functions, with 0-15 hidden layers, and no weight updates. The accuracy is calculated by comparing the average output vectors of the training data for each label with the output vectors of the test data, based on vector similarity. The results showed that the maximum accuracy achieved is around 80%. This indicates that NNs can propagate information correctly without using statistical weight optimization. Additionally, the accuracy decreased with an increasing number of hidden layers. This is attributed to the decrease in the variance of the output vectors as the number of hidden layers increases, suggesting that the output data becomes smooth. This study's NNs and accuracy calculation methods are simple and have room for various improvements. Moreover, creating a feedforward NNs that repeatedly cycles through 'input -> processing -> output -> environmental response -> input -> ...' could pave the way for practical software applications.
Authors: Aoran Lyu, Shixian Zhao, Chuhua Xian, Zhihao Cen, Hongmin Cai, Guoxin Fang
Abstract: Reduced-order simulation is an emerging method for accelerating physical simulations with high DOFs, and recently developed neural-network-based methods with nonlinear subspaces have been proven effective in diverse applications as more concise subspaces can be detected. However, the complexity and landscape of simulation objectives within the subspace have not been optimized, which leaves room for enhancement of the convergence speed. This work focuses on this point by proposing a general method for finding optimized subspace mappings, enabling further acceleration of neural reduced-order simulations while capturing comprehensive representations of the configuration manifolds. We achieve this by optimizing the Lipschitz energy of the elasticity term in the simulation objective, and incorporating the cubature approximation into the training process to manage the high memory and time demands associated with optimizing the newly introduced energy. Our method is versatile and applicable to both supervised and unsupervised settings for optimizing the parameterizations of the configuration manifolds. We demonstrate the effectiveness of our approach through general cases in both quasi-static and dynamics simulations. Our method achieves acceleration factors of up to 6.83 while consistently preserving comparable simulation accuracy in various cases, including large twisting, bending, and rotational deformations with collision handling. This novel approach offers significant potential for accelerating physical simulations, and can be a good add-on to existing neural-network-based solutions in modeling complex deformable objects.
Authors: Akhil Premkumar
Abstract: We examine the connection between deep learning and information theory through the paradigm of diffusion models. Using well-established principles from non-equilibrium thermodynamics we can characterize the amount of information required to reverse a diffusive process. Neural networks store this information and operate in a manner reminiscent of Maxwell's demon during the generative stage. We illustrate this cycle using a novel diffusion scheme we call the entropy matching model, wherein the information conveyed to the network during training exactly corresponds to the entropy that must be negated during reversal. We demonstrate that this entropy can be used to analyze the encoding efficiency and storage capacity of the network. This conceptual picture blends elements of stochastic optimal control, thermodynamics, information theory, and optimal transport, and raises the prospect of applying diffusion models as a test bench to understand neural networks.
Authors: Sheng Cheng, Deqian Kong, Jianwen Xie, Kookjin Lee, Ying Nian Wu, Yezhou Yang
Abstract: This paper introduces a novel family of deep dynamical models designed to represent continuous-time sequence data. This family of models generates each data point in the time series by a neural emission model, which is a non-linear transformation of a latent state vector. The trajectory of the latent states is implicitly described by a neural ordinary differential equation (ODE), with the initial state following an informative prior distribution parameterized by an energy-based model. Furthermore, we can extend this model to disentangle dynamic states from underlying static factors of variation, represented as time-invariant variables in the latent space. We train the model using maximum likelihood estimation with Markov chain Monte Carlo (MCMC) in an end-to-end manner, without requiring additional assisting components such as an inference network. Our experiments on oscillating systems, videos and real-world state sequences (MuJoCo) illustrate that ODEs with the learnable energy-based prior outperform existing counterparts, and can generalize to new dynamic parameterization, enabling long-horizon predictions.
Authors: Peizhong Ju, Haibo Yang, Jia Liu, Yingbin Liang, Ness Shroff
Abstract: Federated Learning (FL) has gained significant popularity due to its effectiveness in training machine learning models across diverse sites without requiring direct data sharing. While various algorithms along with their optimization analyses have shown that FL with local updates is a communication-efficient distributed learning framework, the generalization performance of FL with local updates has received comparatively less attention. This lack of investigation can be attributed to the complex interplay between data heterogeneity and infrequent communication due to the local updates within the FL framework. This motivates us to investigate a fundamental question in FL: Can we quantify the impact of data heterogeneity and local updates on the generalization performance for FL as the learning process evolves? To this end, we conduct a comprehensive theoretical study of FL's generalization performance using a linear model as the first step, where the data heterogeneity is considered for both the stationary and online/non-stationary cases. By providing closed-form expressions of the model error, we rigorously quantify the impact of the number of the local updates (denoted as $K$) under three settings ($K=1$, $K<\infty$, and $K=\infty$) and show how the generalization performance evolves with the number of rounds $t$. Our investigation also provides a comprehensive understanding of how different configurations (including the number of model parameters $p$ and the number of training samples $n$) contribute to the overall generalization performance, thus shedding new insights (such as benign overfitting) for implementing FL over networks.
Authors: Marko Medvedev, Gal Vardi, Nathan Srebro
Abstract: We consider the overfitting behavior of minimum norm interpolating solutions of Gaussian kernel ridge regression (i.e. kernel ridgeless regression), when the bandwidth or input dimension varies with the sample size. For fixed dimensions, we show that even with varying or tuned bandwidth, the ridgeless solution is never consistent and, at least with large enough noise, always worse than the null predictor. For increasing dimension, we give a generic characterization of the overfitting behavior for any scaling of the dimension with sample size. We use this to provide the first example of benign overfitting using the Gaussian kernel with sub-polynomial scaling dimension. All our results are under the Gaussian universality ansatz and the (non-rigorous) risk predictions in terms of the kernel eigenstructure.
Authors: Veronica Kecki, Alan Said
Abstract: Fairness in AI-driven decision-making systems has become a critical concern, especially when these systems directly affect human lives. This paper explores the public's comprehension of fairness in healthcare recommendations. We conducted a survey where participants selected from four fairness metrics -- Demographic Parity, Equal Accuracy, Equalized Odds, and Positive Predictive Value -- across different healthcare scenarios to assess their understanding of these concepts. Our findings reveal that fairness is a complex and often misunderstood concept, with a generally low level of public understanding regarding fairness metrics in recommender systems. This study highlights the need for enhanced information and education on algorithmic fairness to support informed decision-making in using these systems. Furthermore, the results suggest that a one-size-fits-all approach to fairness may be insufficient, pointing to the importance of context-sensitive designs in developing equitable AI systems.
Authors: Muxing Wang, Pengkun Yang, Lili Su
Abstract: Large-scale multi-agent systems are often deployed across wide geographic areas, where agents interact with heterogeneous environments. There is an emerging interest in understanding the role of heterogeneity in the performance of the federated versions of classic reinforcement learning algorithms. In this paper, we study synchronous federated Q-learning, which aims to learn an optimal Q-function by having $K$ agents average their local Q-estimates per $E$ iterations. We observe an interesting phenomenon on the convergence speeds in terms of $K$ and $E$. Similar to the homogeneous environment settings, there is a linear speed-up concerning $K$ in reducing the errors that arise from sampling randomness. Yet, in sharp contrast to the homogeneous settings, $E>1$ leads to significant performance degradation. Specifically, we provide a fine-grained characterization of the error evolution in the presence of environmental heterogeneity, which decay to zero as the number of iterations $T$ increases. The slow convergence of having $E>1$ turns out to be fundamental rather than an artifact of our analysis. We prove that, for a wide range of stepsizes, the $\ell_{\infty}$ norm of the error cannot decay faster than $\Theta (E/T)$. In addition, our experiments demonstrate that the convergence exhibits an interesting two-phase phenomenon. For any given stepsize, there is a sharp phase-transition of the convergence: the error decays rapidly in the beginning yet later bounces up and stabilizes. Provided that the phase-transition time can be estimated, choosing different stepsizes for the two phases leads to faster overall convergence.
Authors: Carl De Sousa Trias, Mihai Mitrea, Attilio Fiandrotti, Marco Cagnazzo, Sumanta Chaudhuri, Enzo Tartaglione
Abstract: Nowadays, deep neural networks are used for solving complex tasks in several critical applications and protecting both their integrity and intellectual property rights (IPR) has become of utmost importance. To this end, we advance WaterMAS, a substitutive, white-box neural network watermarking method that improves the trade-off among robustness, imperceptibility, and computational complexity, while making provisions for increased data payload and security. WasterMAS insertion keeps unchanged the watermarked weights while sharpening their underlying gradient space. The robustness is thus ensured by limiting the attack's strength: even small alterations of the watermarked weights would impact the model's performance. The imperceptibility is ensured by inserting the watermark during the training process. The relationship among the WaterMAS data payload, imperceptibility, and robustness properties is discussed. The secret key is represented by the positions of the weights conveying the watermark, randomly chosen through multiple layers of the model. The security is evaluated by investigating the case in which an attacker would intercept the key. The experimental validations consider 5 models and 2 tasks (VGG16, ResNet18, MobileNetV3, SwinT for CIFAR10 image classification, and DeepLabV3 for Cityscapes image segmentation) as well as 4 types of attacks (Gaussian noise addition, pruning, fine-tuning, and quantization). The code will be released open-source upon acceptance of the article.
Authors: Huizhen Yu, Yi Wan, Richard S. Sutton
Abstract: This paper studies asynchronous stochastic approximation (SA) algorithms and their application to reinforcement learning in semi-Markov decision processes (SMDPs) with an average-reward criterion. We first extend Borkar and Meyn's stability proof method to accommodate more general noise conditions, leading to broader convergence guarantees for asynchronous SA algorithms. Leveraging these results, we establish the convergence of an asynchronous SA analogue of Schweitzer's classical relative value iteration algorithm, RVI Q-learning, for finite-space, weakly communicating SMDPs. Furthermore, to fully utilize the SA results in this application, we introduce new monotonicity conditions for estimating the optimal reward rate in RVI Q-learning. These conditions substantially expand the previously considered algorithmic framework, and we address them with novel proof arguments in the stability and convergence analysis of RVI Q-learning.
Authors: Sergio Calvo-Ordo\~nez, Konstantina Palla, Kamil Ciosek
Abstract: Recent work has shown that training wide neural networks with gradient descent is formally equivalent to computing the mean of the posterior distribution in a Gaussian Process (GP) with the Neural Tangent Kernel (NTK) as the prior covariance and zero aleatoric noise \parencite{jacot2018neural}. In this paper, we extend this framework in two ways. First, we show how to deal with non-zero aleatoric noise. Second, we derive an estimator for the posterior covariance, giving us a handle on epistemic uncertainty. Our proposed approach integrates seamlessly with standard training pipelines, as it involves training a small number of additional predictors using gradient descent on a mean squared error loss. We demonstrate the proof-of-concept of our method through empirical evaluation on synthetic regression.
Authors: Yi Xie, Tianyu Qiu, Yun Xiong, Xiuqi Huang, Xiaofeng Gao, Chao Chen
Abstract: Time series analysis and prediction methods currently excel in quantitative analysis, offering accurate future predictions and diverse statistical indicators, but generally falling short in elucidating the underlying evolution patterns of time series. To gain a more comprehensive understanding and provide insightful explanations, we utilize symbolic regression techniques to derive explicit expressions for the non-linear dynamics in the evolution of time series variables. However, these techniques face challenges in computational efficiency and generalizability across diverse real-world time series data. To overcome these challenges, we propose \textbf{N}eural-\textbf{E}nhanced \textbf{Mo}nte-Carlo \textbf{T}ree \textbf{S}earch (NEMoTS) for time series. NEMoTS leverages the exploration-exploitation balance of Monte-Carlo Tree Search (MCTS), significantly reducing the search space in symbolic regression and improving expression quality. Furthermore, by integrating neural networks with MCTS, NEMoTS not only capitalizes on their superior fitting capabilities to concentrate on more pertinent operations post-search space reduction, but also replaces the complex and time-consuming simulation process, thereby substantially improving computational efficiency and generalizability in time series analysis. NEMoTS offers an efficient and comprehensive approach to time series analysis. Experiments with three real-world datasets demonstrate NEMoTS's significant superiority in performance, efficiency, reliability, and interpretability, making it well-suited for large-scale real-world time series data.
Authors: RenMing Huang, Shaochong Liu, Yunqiang Pei, Peng Wang, Guoqing Wang, Yang Yang, Hengtao Shen
Abstract: In this work, we address the challenging problem of long-horizon goal-reaching policy learning from non-expert, action-free observation data. Unlike fully labeled expert data, our data is more accessible and avoids the costly process of action labeling. Additionally, compared to online learning, which often involves aimless exploration, our data provides useful guidance for more efficient exploration. To achieve our goal, we propose a novel subgoal guidance learning strategy. The motivation behind this strategy is that long-horizon goals offer limited guidance for efficient exploration and accurate state transition. We develop a diffusion strategy-based high-level policy to generate reasonable subgoals as waypoints, preferring states that more easily lead to the final goal. Additionally, we learn state-goal value functions to encourage efficient subgoal reaching. These two components naturally integrate into the off-policy actor-critic framework, enabling efficient goal attainment through informative exploration. We evaluate our method on complex robotic navigation and manipulation tasks, demonstrating a significant performance advantage over existing methods. Our ablation study further shows that our method is robust to observation data with various corruptions.
Authors: Katsuyuki Hagiwara
Abstract: The minimum norm least squares is an estimation strategy under an over-parameterized case and, in machine learning, is known as a helpful tool for understanding a nature of deep learning. In this paper, to apply it in a context of non-parametric regression problems, we established several methods which are based on thresholding of SVD (singular value decomposition) components, wihch are referred to as SVD regression methods. We considered several methods that are singular value based thresholding, hard-thresholding with cross validation, universal thresholding and bridge thresholding. Information on output samples is not utilized in the first method while it is utilized in the other methods. We then applied them to semi-supervised learning, in which unlabeled input samples are incorporated into kernel functions in a regressor. The experimental results for real data showed that, depending on the datasets, the SVD regression methods is superior to a naive ridge regression method. Unfortunately, there were no clear advantage of the methods utilizing information on output samples. Furthermore, for depending on datasets, incorporation of unlabeled input samples into kernels is found to have certain advantages.
Authors: Yin Jin, Ningtao Wang, Ruofan Wu, Pengfei Shi, Xing Fu, Weiqiang Wang
Abstract: Imbalanced data are frequently encountered in real-world classification tasks. Previous works on imbalanced learning mostly focused on learning with a minority class of few samples. However, the notion of imbalance also applies to cases where the minority class contains abundant samples, which is usually the case for industrial applications like fraud detection in the area of financial risk management. In this paper, we take a population-level approach to imbalanced learning by proposing a new formulation called \emph{ultra-imbalanced classification} (UIC). Under UIC, loss functions behave differently even if infinite amount of training samples are available. To understand the intrinsic difficulty of UIC problems, we borrow ideas from information theory and establish a framework to compare different loss functions through the lens of statistical information. A novel learning objective termed Tunable Boosting Loss is developed which is provably resistant against data imbalance under UIC, as well as being empirically efficient verified by extensive experimental studies on both public and industrial datasets.
Authors: Alberto Cattaneo, Stephen Bonner, Thomas Martynec, Carlo Luschi, Ian P Barrett, Daniel Justus
Abstract: Knowledge Graph Completion has been increasingly adopted as a useful method for several tasks in biomedical research, like drug repurposing or drug-target identification. To that end, a variety of datasets and Knowledge Graph Embedding models has been proposed over the years. However, little is known about the properties that render a dataset useful for a given task and, even though theoretical properties of Knowledge Graph Embedding models are well understood, their practical utility in this field remains controversial. We conduct a comprehensive investigation into the topological properties of publicly available biomedical Knowledge Graphs and establish links to the accuracy observed in real-world applications. By releasing all model predictions and a new suite of analysis tools we invite the community to build upon our work and continue improving the understanding of these crucial applications.
Authors: Phairot Autthasan, Rattanaphon Chaisaen, Huy Phan, Maarten De Vos, Theerawit Wilaiprasitporn
Abstract: Recent advances in deep learning (DL) have significantly impacted motor imagery (MI)-based brain-computer interface (BCI) systems, enhancing the decoding of electroencephalography (EEG) signals. However, most studies struggle to identify discriminative patterns across subjects during MI tasks, limiting MI classification performance. In this article, we propose MixNet, a novel classification framework designed to overcome this limitation by utilizing spectral-spatial signals from MI data, along with a multitask learning architecture named MIN2Net, for classification. Here, the spectral-spatial signals are generated using the filter-bank common spatial patterns (FBCSPs) method on MI data. Since the multitask learning architecture is used for the classification task, the learning in each task may exhibit different generalization rates and potential overfitting across tasks. To address this issue, we implement adaptive gradient blending, simultaneously regulating multiple loss weights and adjusting the learning pace for each task based on its generalization/overfitting tendencies. Experimental results on six benchmark data sets of different data sizes demonstrate that MixNet consistently outperforms all state-of-the-art algorithms in subject-dependent and -independent settings. Finally, the low-density EEG MI classification results show that MixNet outperforms all state-of-the-art algorithms, offering promising implications for Internet of Thing (IoT) applications, such as lightweight and portable EEG wearable devices based on low-density montages.
Authors: Jiyuan Liu, Xinwang Liu, Siqi Wang, Xingchen Hu, Qing Liao, Xinhang Wan, Yi Zhang, Xin Lv, Kunlun He
Abstract: Vertical federated learning is a natural and elegant approach to integrate multi-view data vertically partitioned across devices (clients) while preserving their privacies. Apart from the model training, existing methods requires the collaboration of all clients in the model inference. However, the model inference is probably maintained for service in a long time, while the collaboration, especially when the clients belong to different organizations, is unpredictable in real-world scenarios, such as concellation of contract, network unavailablity, etc., resulting in the failure of them. To address this issue, we, at the first attempt, propose a flexible Active-Passive Federated learning (APFed) framework. Specifically, the active client is the initiator of a learning task and responsible to build the complete model, while the passive clients only serve as assistants. Once the model built, the active client can make inference independently. In addition, we instance the APFed framework into two classification methods with employing the reconstruction loss and the contrastive loss on passive clients, respectively. Meanwhile, the two methods are tested in a set of experiments and achieves desired results, validating their effectiveness.
Authors: Clemens Damke, Eyke H\"ullermeier
Abstract: In this work, we study the influence of domain-specific characteristics when defining a meaningful notion of predictive uncertainty on graph data. Previously, the so-called Graph Posterior Network (GPN) model has been proposed to quantify uncertainty in node classification tasks. Given a graph, it uses Normalizing Flows (NFs) to estimate class densities for each node independently and converts those densities into Dirichlet pseudo-counts, which are then dispersed through the graph using the personalized Page-Rank algorithm. The architecture of GPNs is motivated by a set of three axioms on the properties of its uncertainty estimates. We show that those axioms are not always satisfied in practice and therefore propose the family of Committe-based Uncertainty Quantification Graph Neural Networks (CUQ-GNNs), which combine standard Graph Neural Networks with the NF-based uncertainty estimation of Posterior Networks (PostNets). This approach adapts more flexibly to domain-specific demands on the properties of uncertainty estimates. We compare CUQ-GNN against GPN and other uncertainty quantification approaches on common node classification benchmarks and show that it is effective at producing useful uncertainty estimates.
Authors: Vaiva Pilkauskait\.e, Jevgenij Gamper, Rasa Gini\=unait\.e, Agne Reklait\.e
Abstract: In this study, we evaluate causal inference estimators for online controlled bipartite graph experiments in a real marketplace setting. Our novel contribution is constructing a bipartite graph using in-experiment data, rather than relying on prior knowledge or historical data, the common approach in the literature published to date. We build the bipartite graph from various interactions between buyers and sellers in the marketplace, establishing a novel research direction at the intersection of bipartite experiments and mediation analysis. This approach is crucial for modern marketplaces aiming to evaluate seller-side causal effects in buyer-side experiments, or vice versa. We demonstrate our method using historical buyer-side experiments conducted at Vinted, the largest second-hand marketplace in Europe with over 80M users.
Authors: George Andriopoulos, Zixuan Dong, Li Guo, Zifan Zhao, Keith Ross
Abstract: Recently it has been observed that neural networks exhibit Neural Collapse (NC) during the final stage of training for the classification problem. We empirically show that multivariate regression, as employed in imitation learning and other applications, exhibits Neural Regression Collapse (NRC), a new form of neural collapse: (NRC1) The last-layer feature vectors collapse to the subspace spanned by the $n$ principal components of the feature vectors, where $n$ is the dimension of the targets (for univariate regression, $n=1$); (NRC2) The last-layer feature vectors also collapse to the subspace spanned by the last-layer weight vectors; (NRC3) The Gram matrix for the weight vectors converges to a specific functional form that depends on the covariance matrix of the targets. After empirically establishing the prevalence of (NRC1)-(NRC3) for a variety of datasets and network architectures, we provide an explanation of these phenomena by modeling the regression task in the context of the Unconstrained Feature Model (UFM), in which the last layer feature vectors are treated as free variables when minimizing the loss function. We show that when the regularization parameters in the UFM model are strictly positive, then (NRC1)-(NRC3) also emerge as solutions in the UFM optimization problem. We also show that if the regularization parameters are equal to zero, then there is no collapse. To our knowledge, this is the first empirical and theoretical study of neural collapse in the context of regression. This extension is significant not only because it broadens the applicability of neural collapse to a new category of problems but also because it suggests that the phenomena of neural collapse could be a universal behavior in deep learning.
Authors: Tim Lawson, Lucy Farnik, Conor Houghton, Laurence Aitchison
Abstract: Sparse autoencoders (SAEs) are a promising approach to interpreting the internal representations of transformer language models. However, standard SAEs are trained separately on each transformer layer, making it difficult to use them to study how information flows across layers. To solve this problem, we introduce the multi-layer SAE (MLSAE): a single SAE trained on the residual stream activation vectors from every transformer layer simultaneously. The residual stream is usually understood as preserving information across layers, so we expected to, and did, find individual SAE features that are active at multiple layers. Interestingly, while a single SAE feature is active at different layers for different prompts, for a single prompt, we find that a single feature is far more likely to be active at a single layer. For larger underlying models, we find that the cosine similarities between adjacent layers in the residual stream are higher, so we expect more features to be active at multiple layers. These results show that MLSAEs are a promising method to study information flow in transformers. We release our code to train and analyze MLSAEs at https://github.com/tim-lawson/mlsae.
Authors: Samuel J. Bell, Diane Bouchacourt, Levent Sagun
Abstract: Neural networks can fail when the data contains spurious correlations. To understand this phenomenon, researchers have proposed numerous spurious correlations benchmarks upon which to evaluate mitigation methods. However, we observe that these benchmarks exhibit substantial disagreement, with the best methods on one benchmark performing poorly on another. We explore this disagreement, and examine benchmark validity by defining three desiderata that a benchmark should satisfy in order to meaningfully evaluate methods. Our results have implications for both benchmarks and mitigations: we find that certain benchmarks are not meaningful measures of method performance, and that several methods are not sufficiently robust for widespread use. We present a simple recipe for practitioners to choose methods using the most similar benchmark to their given problem.
Authors: Adir Rahamim, Naomi Saphra, Sara Kangaslahti, Yonatan Belinkov
Abstract: Parameter efficient finetuning methods like low-rank adaptation (LoRA) aim to reduce the computational costs of finetuning pretrained Language Models (LMs). Enabled by these low-rank settings, we propose an even more efficient optimization strategy: Fast Forward, a simple and effective approach to accelerate large segments of training. In a Fast Forward stage, we repeat the most recent optimizer step until the loss stops improving on a tiny validation set. By alternating between regular optimization steps and Fast Forward stages, Fast Forward provides up to an 87\% reduction in FLOPs and up to an 81\% reduction in train time over standard SGD with Adam. We validate Fast Forward by finetuning various models on different tasks and demonstrate that it speeds up training without compromising model performance. Additionally, we analyze when and how to apply Fast Forward.
Authors: Coby Penso, Jacob Goldberger
Abstract: This study addresses the problem of calibrating network confidence while adapting a model that was originally trained on a source domain to a target domain using unlabeled samples from the target domain. The absence of labels from the target domain makes it impossible to directly calibrate the adapted network on the target domain. To tackle this challenge, we introduce a calibration procedure that relies on estimating the network's accuracy on the target domain. The network accuracy is first computed on the labeled source data and then is modified to represent the actual accuracy of the model on the target domain. The proposed algorithm calibrates the prediction confidence directly in the target domain by minimizing the disparity between the estimated accuracy and the computed confidence. The experimental results show that our method significantly outperforms existing methods, which rely on importance weighting, across several standard datasets.
Authors: Chengxi Pan, Junshang Chen, Jingrui Ye
Abstract: Optimal selection of optimization algorithms is crucial for training deep learning models. The Adam optimizer has gained significant attention due to its efficiency and wide applicability. However, to enhance the adaptability of optimizers across diverse datasets, we propose an innovative optimization strategy by integrating the 'warped gradient descend'concept from Meta Learning into the Adam optimizer. In the conventional Adam optimizer, gradients are utilized to compute estimates of gradient mean and variance, subsequently updating model parameters. Our approach introduces a learnable distortion matrix, denoted as P, which is employed for linearly transforming gradients. This transformation slightly adjusts gradients during each iteration, enabling the optimizer to better adapt to distinct dataset characteristics. By learning an appropriate distortion matrix P, our method aims to adaptively adjust gradient information across different data distributions, thereby enhancing optimization performance. Our research showcases the potential of this novel approach through theoretical insights and empirical evaluations. Experimental results across various tasks and datasets validate the superiority of our optimizer that integrates the 'warped gradient descend' concept in terms of adaptability. Furthermore, we explore effective strategies for training the adaptation matrix P and identify scenarios where this method can yield optimal results. In summary, this study introduces an innovative approach that merges the 'warped gradient descend' concept from Meta Learning with the Adam optimizer. By introducing a learnable distortion matrix P within the optimizer, we aim to enhance the model's generalization capability across diverse data distributions, thus opening up new possibilities in the field of deep learning optimization.
Authors: Guoqiang Zhang, Richard Heusdens
Abstract: In this paper, we extend the standard Attention in transformer by exploiting the consensus discrepancy from a distributed optimization perspective, referred to as AttentionX. It is noted that %the popular distributed optimization algorithm \cite{Boyd11ADMM} and the primal-dual method of multipliers (PDMM) \cite{Zhang16PDMM} is designed to iteratively solve a broad class of distributed optimization problems over a pear-to-pear (P2P) network, where neighbouring nodes gradually reach consensus as specified by predefined linear edge-constraints in the optimization process. In particular, at each iteration of PDMM, each node in a network first performs information-gathering from neighbours and then performs local information-fusion. From a high-level point of view, the $KQ$-softmax-based weighted summation of $V$-representations in Attention corresponds information-gathering from neighbours while the feature-processing via the feed-forward network (FFN) in transformer corresponds to local information fusion. PDMM exploits the Lagrangian multipliers to capture the historical consensus discrepancy in the form of residual errors of the linear edge-constraints, which plays a crucial role for the algorithm to converge. Inspired by PDMM, we propose AttentionX to incorporate the consensus discrepancy in the output update-expression of the standard Attention. The consensus discrepancy in AttentionX refers to the difference between the weighted summation of $V$-representations and scaled $V$-representions themselves. Experiments on ViT and nanoGPT show promising performance.
Authors: William Knottenbelt, Zeyu Gao, Rebecca Wray, Woody Zhidong Zhang, Jiashuai Liu, Mireia Crispin-Ortuzar
Abstract: Survival analysis is a branch of statistics used for modeling the time until a specific event occurs and is widely used in medicine, engineering, finance, and many other fields. When choosing survival models, there is typically a trade-off between performance and interpretability, where the highest performance is achieved by black-box models based on deep learning. This is a major problem in fields such as medicine where practitioners are reluctant to blindly trust black-box models to make important patient decisions. Kolmogorov-Arnold Networks (KANs) were recently proposed as an interpretable and accurate alternative to multi-layer perceptrons (MLPs). We introduce CoxKAN, a Cox proportional hazards Kolmogorov-Arnold Network for interpretable, high-performance survival analysis. We evaluate the proposed CoxKAN on 4 synthetic datasets and 9 real medical datasets. The synthetic experiments demonstrate that CoxKAN accurately recovers interpretable symbolic formulae for the hazard function, and effectively performs automatic feature selection. Evaluation on the 9 real datasets show that CoxKAN consistently outperforms the Cox proportional hazards model and achieves performance that is superior or comparable to that of tuned MLPs. Furthermore, we find that CoxKAN identifies complex interactions between predictor variables that would be extremely difficult to recognise using existing survival methods, and automatically finds symbolic formulae which uncover the precise effect of important biomarkers on patient risk.
Authors: Muniba Batool, Naveed Ahmed Azam, Jianshen Zhu, Kazuya Haraguchi, Liang Zhao, Tatsuya Akutsu
Abstract: Aqueous solubility (AS) is a key physiochemical property that plays a crucial role in drug discovery and material design. We report a novel unified approach to predict and infer chemical compounds with the desired AS based on simple deterministic graph-theoretic descriptors, multiple linear regression (MLR) and mixed integer linear programming (MILP). Selected descriptors based on a forward stepwise procedure enabled the simplest regression model, MLR, to achieve significantly good prediction accuracy compared to the existing approaches, achieving the accuracy in the range [0.7191, 0.9377] for 29 diverse datasets. By simulating these descriptors and learning models as MILPs, we inferred mathematically exact and optimal compounds with the desired AS, prescribed structures, and up to 50 non-hydrogen atoms in a reasonable time range [6, 1204] seconds. These findings indicate a strong correlation between the simple graph-theoretic descriptors and the AS of compounds, potentially leading to a deeper understanding of their AS without relying on widely used complicated chemical descriptors and complex machine learning models that are computationally expensive, and therefore difficult to use for inference. An implementation of the proposed approach is available at https://github.com/ku-dml/mol-infer/tree/master/AqSol.
URLs: https://github.com/ku-dml/mol-infer/tree/master/AqSol.
Authors: Emma Svensson, Hannah Rosa Friesacher, Susanne Winiwarter, Lewis Mervin, Adam Arany, Ola Engkvist
Abstract: In the early stages of drug discovery, decisions regarding which experiments to pursue can be influenced by computational models. These decisions are critical due to the time-consuming and expensive nature of the experiments. Therefore, it is becoming essential to accurately quantify the uncertainty in machine learning predictions, such that resources can be used optimally and trust in the models improves. While computational methods for drug discovery often suffer from limited data and sparse experimental observations, additional information can exist in the form of censored labels that provide thresholds rather than precise values of observations. However, the standard approaches that quantify uncertainty in machine learning cannot fully utilize censored labels. In this work, we adapt ensemble-based, Bayesian, and Gaussian models with tools to learn from censored labels by using the Tobit model from survival analysis. Our results demonstrate that despite the partial information available in censored labels, they are essential to accurately and reliably model the real pharmaceutical setting.
Authors: Daniel R. Clarkson, Lawrence A. Bull, Chandula T. Wickramarachchi, Elizabeth J. Cross, Timothy J. Rogers, Keith Worden, Nikolaos Dervilis, Aidan J. Hughes
Abstract: Regression is a fundamental prediction task common in data-centric engineering applications that involves learning mappings between continuous variables. In many engineering applications (e.g.\ structural health monitoring), feature-label pairs used to learn such mappings are of limited availability which hinders the effectiveness of traditional supervised machine learning approaches. The current paper proposes a methodology for overcoming the issue of data scarcity by combining active learning with hierarchical Bayesian modelling. Active learning is an approach for preferentially acquiring feature-label pairs in a resource-efficient manner. In particular, the current work adopts a risk-informed approach that leverages contextual information associated with regression-based engineering decision-making tasks (e.g.\ inspection and maintenance). Hierarchical Bayesian modelling allow multiple related regression tasks to be learned over a population, capturing local and global effects. The information sharing facilitated by this modelling approach means that information acquired for one engineering system can improve predictive performance across the population. The proposed methodology is demonstrated using an experimental case study. Specifically, multiple regressions are performed over a population of machining tools, where the quantity of interest is the surface roughness of the workpieces. An inspection and maintenance decision process is defined using these regression tasks which is in turn used to construct the active-learning algorithm. The novel methodology proposed is benchmarked against an uninformed approach to label acquisition and independent modelling of the regression tasks. It is shown that the proposed approach has superior performance in terms of expected cost -- maintaining predictive performance while reducing the number of inspections required.
Authors: Marvin Schmitt, Chengkun Li, Aki Vehtari, Luigi Acerbi, Paul-Christian B\"urkner, Stefan T. Radev
Abstract: Bayesian inference often faces a trade-off between computational speed and sampling accuracy. We propose an adaptive workflow that integrates rapid amortized inference with gold-standard MCMC techniques to achieve both speed and accuracy when performing inference on many observed datasets. Our approach uses principled diagnostics to guide the choice of inference method for each dataset, moving along the Pareto front from fast amortized sampling to slower but guaranteed-accurate MCMC when necessary. By reusing computations across steps, our workflow creates synergies between amortized and MCMC-based inference. We demonstrate the effectiveness of this integrated approach on a generalized extreme value task with 1000 observed data sets, showing 90x time efficiency gains while maintaining high posterior quality.
Authors: Shang Xiang, Lin Yao, Zhen Wang, Qifan Yu, Wentan Liu, Wentao Guo, Guolin Ke
Abstract: The field of computer-aided synthesis planning (CASP) has seen rapid advancements in recent years, achieving significant progress across various algorithmic benchmarks. However, chemists often encounter numerous infeasible reactions when using CASP in practice. This article delves into common errors associated with CASP and introduces a product prediction model aimed at enhancing the accuracy of single-step models. While the product prediction model reduces the number of single-step reactions, it integrates multiple single-step models to maintain the overall reaction count and increase reaction diversity. Based on manual analysis and large-scale testing, the product prediction model, combined with the multi-model ensemble approach, has been proven to offer higher feasibility and greater diversity.
Authors: Shuirong Cao, Ruoxi Cheng, Zhiqiang Wang
Abstract: LLMs can exhibit age biases, resulting in unequal treatment of individuals across age groups. While much research has addressed racial and gender biases, age bias remains little explored. The scarcity of instruction-tuning and preference datasets for age bias hampers its detection and measurement, and existing fine-tuning methods seldom address age-related fairness. In this paper, we construct age bias preference datasets and instruction-tuning datasets for RLHF. We introduce ARG, an age fairness reward to reduce differences in the response quality of LLMs across different age groups. Extensive experiments demonstrate that this reward significantly improves response accuracy and reduces performance disparities across age groups. Our source code and datasets are available at the anonymous \href{https://anonymous.4open.science/r/FairRLHF-D445/readme.md}{link}.
URLs: https://anonymous.4open.science/r/FairRLHF-D445/readme.md
Authors: Getachew K Befekadu
Abstract: In this brief paper, we present a naive aggregation algorithm for a typical learning problem with expert advice setting, in which the task of improving generalization, i.e., model validation, is embedded in the learning process as a sequential decision-making problem. In particular, we consider a class of learning problem of point estimations for modeling high-dimensional nonlinear functions, where a group of experts update their parameter estimates using the discrete-time version of gradient systems, with small additive noise term, guided by the corresponding subsample datasets obtained from the original dataset. Here, our main objective is to provide conditions under which such an algorithm will sequentially determine a set of mixing distribution strategies used for aggregating the experts' estimates that ultimately leading to an optimal parameter estimate, i.e., as a consensus solution for all experts, which is better than any individual expert's estimate in terms of improved generalization or learning performances. Finally, as part of this work, we present some numerical results for a typical case of nonlinear regression problem.
Authors: Maria-Florina Balcan, Anh Tuan Nguyen, Dravyansh Sharma
Abstract: Data-driven algorithm design automatically adapts algorithms to specific application domains, achieving better performance. In the context of parameterized algorithms, this approach involves tuning the algorithm parameters using problem instances drawn from the problem distribution of the target application domain. While empirical evidence supports the effectiveness of data-driven algorithm design, providing theoretical guarantees for several parameterized families remains challenging. This is due to the intricate behaviors of their corresponding utility functions, which typically admit piece-wise and discontinuity structures. In this work, we present refined frameworks for providing learning guarantees for parameterized data-driven algorithm design problems in both distributional and online learning settings. For the distributional learning setting, we introduce the Pfaffian GJ framework, an extension of the classical GJ framework, capable of providing learning guarantees for function classes for which the computation involves Pfaffian functions. Unlike the GJ framework, which is limited to function classes with computation characterized by rational functions, our proposed framework can deal with function classes involving Pfaffian functions, which are much more general and widely applicable. We then show that for many parameterized algorithms of interest, their utility function possesses a refined piece-wise structure, which automatically translates to learning guarantees using our proposed framework. For the online learning setting, we provide a new tool for verifying dispersion property of a sequence of loss functions. This sufficient condition allows no-regret learning for sequences of piece-wise structured loss functions where the piece-wise structure involves Pfaffian transition boundaries.
Authors: Parameswaran Kamalaruban, Yulu Pi, Stuart Burrell, Eleanor Drage, Piotr Skalski, Jason Wong, David Sutton
Abstract: Ensuring fairness in transaction fraud detection models is vital due to the potential harms and legal implications of biased decision-making. Despite extensive research on algorithmic fairness, there is a notable gap in the study of bias in fraud detection models, mainly due to the field's unique challenges. These challenges include the need for fairness metrics that account for fraud data's imbalanced nature and the tradeoff between fraud protection and service quality. To address this gap, we present a comprehensive fairness evaluation of transaction fraud models using public synthetic datasets, marking the first algorithmic bias audit in this domain. Our findings reveal three critical insights: (1) Certain fairness metrics expose significant bias only after normalization, highlighting the impact of class imbalance. (2) Bias is significant in both service quality-related parity metrics and fraud protection-related parity metrics. (3) The fairness through unawareness approach, which involved removing sensitive attributes such as gender, does not improve bias mitigation within these datasets, likely due to the presence of correlated proxies. We also discuss socio-technical fairness-related challenges in transaction fraud models. These insights underscore the need for a nuanced approach to fairness in fraud detection, balancing protection and service quality, and moving beyond simple bias mitigation strategies. Future work must focus on refining fairness metrics and developing methods tailored to the unique complexities of the transaction fraud domain.
Authors: Minh Vu, Konstantinos Slavakis
Abstract: This paper establishes a novel role for Gaussian-mixture models (GMMs) as functional approximators of Q-function losses in reinforcement learning (RL). Unlike the existing RL literature, where GMMs play their typical role as estimates of probability density functions, GMMs approximate here Q-function losses. The new Q-function approximators, coined GMM-QFs, are incorporated in Bellman residuals to promote a Riemannian-optimization task as a novel policy-evaluation step in standard policy-iteration schemes. The paper demonstrates how the hyperparameters (means and covariance matrices) of the Gaussian kernels are learned from the data, opening thus the door of RL to the powerful toolbox of Riemannian optimization. Numerical tests show that with no use of training data, the proposed design outperforms state-of-the-art methods, even deep Q-networks which use training data, on benchmark RL tasks.
Authors: Deniz Koyuncu, Alex Gittens, B\"ulent Yener, Moti Yung
Abstract: Missing data is commonly encountered in practice, and when the missingness is non-ignorable, effective remediation depends on knowledge of the missingness mechanism. Learning the underlying missingness mechanism from the data is not possible in general, so adversaries can exploit this fact by maliciously engineering non-ignorable missingness mechanisms. Such Adversarial Missingness (AM) attacks have only recently been motivated and introduced, and then successfully tailored to mislead causal structure learning algorithms into hiding specific cause-and-effect relationships. However, existing AM attacks assume the modeler (victim) uses full-information maximum likelihood methods to handle the missing data, and are of limited applicability when the modeler uses different remediation strategies. In this work we focus on associational learning in the context of AM attacks. We consider (i) complete case analysis, (ii) mean imputation, and (iii) regression-based imputation as alternative strategies used by the modeler. Instead of combinatorially searching for missing entries, we propose a novel probabilistic approximation by deriving the asymptotic forms of these methods used for handling the missing entries. We then formulate the learning of the adversarial missingness mechanism as a bi-level optimization problem. Experiments on generalized linear models show that AM attacks can be used to change the p-values of features from significant to insignificant in real datasets, such as the California-housing dataset, while using relatively moderate amounts of missingness (<20%). Additionally, we assess the robustness of our attacks against defense strategies based on data valuation.
Authors: Rayna Andreeva, James Ward, Primoz Skraba, Jie Gao, Rik Sarkar
Abstract: Metric magnitude is a measure of the "size" of point clouds with many desirable geometric properties. It has been adapted to various mathematical contexts and recent work suggests that it can enhance machine learning and optimization algorithms. But its usability is limited due to the computational cost when the dataset is large or when the computation must be carried out repeatedly (e.g. in model training). In this paper, we study the magnitude computation problem, and show efficient ways of approximating it. We show that it can be cast as a convex optimization problem, but not as a submodular optimization. The paper describes two new algorithms - an iterative approximation algorithm that converges fast and is accurate, and a subset selection method that makes the computation even faster. It has been previously proposed that magnitude of model sequences generated during stochastic gradient descent is correlated to generalization gap. Extension of this result using our more scalable algorithms shows that longer sequences in fact bear higher correlations. We also describe new applications of magnitude in machine learning - as an effective regularizer for neural network training, and as a novel clustering criterion.
Authors: Alexandru Vasilache, Jann Krausse, Klaus Knobloch, Juergen Becker
Abstract: Intra-cortical brain-machine interfaces (iBMIs) have the potential to dramatically improve the lives of people with paraplegia by restoring their ability to perform daily activities. However, current iBMIs suffer from scalability and mobility limitations due to bulky hardware and wiring. Wireless iBMIs offer a solution but are constrained by a limited data rate. To overcome this challenge, we are investigating hybrid spiking neural networks for embedded neural decoding in wireless iBMIs. The networks consist of a temporal convolution-based compression followed by recurrent processing and a final interpolation back to the original sequence length. As recurrent units, we explore gated recurrent units (GRUs), leaky integrate-and-fire (LIF) neurons, and a combination of both - spiking GRUs (sGRUs) and analyze their differences in terms of accuracy, footprint, and activation sparsity. To that end, we train decoders on the "Nonhuman Primate Reaching with Multichannel Sensorimotor Cortex Electrophysiology" dataset and evaluate it using the NeuroBench framework, targeting both tracks of the IEEE BioCAS Grand Challenge on Neural Decoding. Our approach achieves high accuracy in predicting velocities of primate reaching movements from multichannel primary motor cortex recordings while maintaining a low number of synaptic operations, surpassing the current baseline models in the NeuroBench framework. This work highlights the potential of hybrid neural networks to facilitate wireless iBMIs with high decoding precision and a substantial increase in the number of monitored neurons, paving the way toward more advanced neuroprosthetic technologies.
Authors: Jason Ramapuram, Federico Danieli, Eeshan Dhekane, Floris Weers, Dan Busbridge, Pierre Ablin, Tatiana Likhomanenko, Jagrit Digani, Zijin Gu, Amitis Shidani, Russ Webb
Abstract: Attention is a key part of the transformer architecture. It is a sequence-to-sequence mapping that transforms each sequence element into a weighted sum of values. The weights are typically obtained as the softmax of dot products between keys and queries. Recent work has explored alternatives to softmax attention in transformers, such as ReLU and sigmoid activations. In this work, we revisit sigmoid attention and conduct an in-depth theoretical and empirical analysis. Theoretically, we prove that transformers with sigmoid attention are universal function approximators and benefit from improved regularity compared to softmax attention. Through detailed empirical analysis, we identify stabilization of large initial attention norms during the early stages of training as a crucial factor for the successful training of models with sigmoid attention, outperforming prior attempts. We also introduce FLASHSIGMOID, a hardware-aware and memory-efficient implementation of sigmoid attention yielding a 17% inference kernel speed-up over FLASHATTENTION2 on H100 GPUs. Experiments across language, vision, and speech show that properly normalized sigmoid attention matches the strong performance of softmax attention on a wide range of domains and scales, which previous attempts at sigmoid attention were unable to fully achieve. Our work unifies prior art and establishes best practices for sigmoid attention as a drop-in softmax replacement in transformers.
Authors: Boris Knyazev, Abhinav Moudgil, Guillaume Lajoie, Eugene Belilovsky, Simon Lacoste-Julien
Abstract: Neural network training can be accelerated when a learnable update rule is used in lieu of classic adaptive optimizers (e.g. Adam). However, learnable update rules can be costly and unstable to train and use. A simpler recently proposed approach to accelerate training is to use Adam for most of the optimization steps and periodically, only every few steps, nowcast (predict future) parameters. We improve this approach by Neuron interaction and Nowcasting (NiNo) networks. NiNo leverages neuron connectivity and graph neural networks to more accurately nowcast parameters by learning in a supervised way from a set of training trajectories over multiple tasks. We show that in some networks, such as Transformers, neuron connectivity is non-trivial. By accurately modeling neuron connectivity, we allow NiNo to accelerate Adam training by up to 50\% in vision and language tasks.
Authors: Gabriel Rodrigues Palma, Mariusz Skocze\'n, Phil Maguire
Abstract: The decisions traders make to buy or sell an asset depend on various analyses, with expertise required to identify patterns that can be exploited for profit. In this paper we identify novel features extracted from emergent and well-established financial markets using linear models and Gaussian Mixture Models (GMM) with the aim of finding profitable opportunities. We used approximately six months of data consisting of minute candles from the Bitcoin, Pepecoin, and Nasdaq markets to derive and compare the proposed novel features with commonly used ones. These features were extracted based on the previous 59 minutes for each market and used to identify predictions for the hour ahead. We explored the performance of various machine learning strategies, such as Random Forests (RF) and K-Nearest Neighbours (KNN) to classify market movements. A naive random approach to selecting trading decisions was used as a benchmark, with outcomes assumed to be equally likely. We used a temporal cross-validation approach using test sets of 40%, 30% and 20% of total hours to evaluate the learning algorithms' performances. Our results showed that filtering the time series facilitates algorithms' generalisation. The GMM filtering approach revealed that the KNN and RF algorithms produced higher average returns than the random algorithm.
Authors: Sakhinana Sagar Srinivas, Rajat Kumar Sarkar, Venkataramana Runkana
Abstract: Characterization of materials via electron micrographs is an important and challenging task in several materials processing industries. Classification of electron micrographs is complex due to the high intra-class dissimilarity, high inter-class similarity, and multi-spatial scales of patterns. However, existing methods are ineffective in learning complex image patterns. We propose an effective end-to-end electron micrograph representation learning-based framework for nanomaterial identification to overcome the challenges. We demonstrate that our framework outperforms the popular baselines on the open-source datasets in nanomaterials-based identification tasks. The ablation studies are reported in great detail to support the efficacy of our approach.
Authors: Ajay Chatterjee, Srikanth Ranganathan
Abstract: Climate change is a pressing global concern for governments, corporations, and citizens alike. This concern underscores the necessity for these entities to accurately assess the climate impact of manufacturing goods and providing services. Tools like process life cycle analysis (pLCA) are used to evaluate the climate impact of production, use, and disposal, from raw material mining through end-of-life. pLCA further enables practitioners to look deeply into material choices or manufacturing processes for individual parts, sub-assemblies, assemblies, and the final product. Reliable and detailed data on the life cycle stages and processes of the product or service under study are not always available or accessible, resulting in inaccurate assessment of climate impact. To overcome the data limitation and enhance the effectiveness of pLCA to generate an improved environmental impact profile, we are adopting an innovative strategy to identify alternative parts, products, and components that share similarities in terms of their form, function, and performance to serve as qualified substitutes. Focusing on enterprise electronics hardware, we propose a semi-supervised learning-based framework to identify substitute parts that leverages product bill of material (BOM) data and a small amount of component-level qualified substitute data (positive samples) to generate machine knowledge graph (MKG) and learn effective embeddings of the components that constitute electronic hardware. Our methodology is grounded in attributed graph embeddings and introduces a strategy to generate biased negative samples to significantly enhance the training process. We demonstrate improved performance and generalization over existing published models.
Authors: Federico Spagnolo, Nataliia Molchanova, Mario Ocampo Pineda, Lester Melie-Garcia, Meritxell Bach Cuadra, Cristina Granziera, Vincent Andrearczyk, Adrien Depeursinge
Abstract: To date, several methods have been developed to explain deep learning algorithms for classification tasks. Recently, an adaptation of two of such methods has been proposed to generate instance-level explainable maps in a semantic segmentation scenario, such as multiple sclerosis (MS) lesion segmentation. In the mentioned work, a 3D U-Net was trained and tested for MS lesion segmentation, yielding an F1 score of 0.7006, and a positive predictive value (PPV) of 0.6265. The distribution of values in explainable maps exposed some differences between maps of true and false positive (TP/FP) examples. Inspired by those results, we explore in this paper the use of characteristics of lesion-specific saliency maps to refine segmentation and detection scores. We generate around 21000 maps from as many TP/FP lesions in a batch of 72 patients (training set) and 4868 from the 37 patients in the test set. 93 radiomic features extracted from the first set of maps were used to train a logistic regression model and classify TP versus FP. On the test set, F1 score and PPV were improved by a large margin when compared to the initial model, reaching 0.7450 and 0.7817, with 95% confidence intervals of [0.7358, 0.7547] and [0.7679, 0.7962], respectively. These results suggest that saliency maps can be used to refine prediction scores, boosting a model's performances.
Authors: Rong Han, Xiaohong Liu, Tong Pan, Jing Xu, Xiaoyu Wang, Wuyang Lan, Zhenyu Li, Zixuan Wang, Jiangning Song, Guangyu Wang, Ting Chen
Abstract: Accurately measuring protein-RNA binding affinity is crucial in many biological processes and drug design. Previous computational methods for protein-RNA binding affinity prediction rely on either sequence or structure features, unable to capture the binding mechanisms comprehensively. The recent emerging pre-trained language models trained on massive unsupervised sequences of protein and RNA have shown strong representation ability for various in-domain downstream tasks, including binding site prediction. However, applying different-domain language models collaboratively for complex-level tasks remains unexplored. In this paper, we propose CoPRA to bridge pre-trained language models from different biological domains via Complex structure for Protein-RNA binding Affinity prediction. We demonstrate for the first time that cross-biological modal language models can collaborate to improve binding affinity prediction. We propose a Co-Former to combine the cross-modal sequence and structure information and a bi-scope pre-training strategy for improving Co-Former's interaction understanding. Meanwhile, we build the largest protein-RNA binding affinity dataset PRA310 for performance evaluation. We also test our model on a public dataset for mutation effect prediction. CoPRA reaches state-of-the-art performance on all the datasets. We provide extensive analyses and verify that CoPRA can (1) accurately predict the protein-RNA binding affinity; (2) understand the binding affinity change caused by mutations; and (3) benefit from scaling data and model size.
Authors: Sina Hajer Ahmadi, Amruta Pranadika Mahashabde
Abstract: Residential outdoor water use in North America accounts for nearly 9 billion gallons daily, with approximately 50\% of this water wasted due to over-watering, particularly in lawns and gardens. This inefficiency highlights the need for smart, data-driven irrigation systems. Traditional approaches to reducing water wastage have focused on centralized data collection and processing, but such methods can raise privacy concerns and may not account for the diverse environmental conditions across different regions. In this paper, we propose a federated learning-based approach to optimize water usage in residential and agricultural settings. By integrating moisture sensors and actuators with a distributed network of edge devices, our system allows each user to locally train a model on their specific environmental data while sharing only model updates with a central server. This preserves user privacy and enables the creation of a global model that can adapt to varying conditions. Our implementation leverages low-cost hardware, including an Arduino Uno microcontroller and soil moisture sensors, to demonstrate how federated learning can be applied to reduce water wastage while maintaining efficient crop production. The proposed system not only addresses the need for water conservation but also provides a scalable, privacy-preserving solution adaptable to diverse environments.
Authors: Kiran Purohit, Anurag Reddy Parvathgari, Sourangshu Bhattacharya
Abstract: Deep convolutional neural networks (CNNs) have achieved impressive performance in many computer vision tasks. However, their large model sizes require heavy computational resources, making pruning redundant filters from existing pre-trained CNNs an essential task in developing efficient models for resource-constrained devices. Whole-network filter pruning algorithms prune varying fractions of filters from each layer, hence providing greater flexibility. Current whole-network pruning methods are either computationally expensive due to the need to calculate the loss for each pruned filter using a training dataset, or use various heuristic / learned criteria for determining the pruning fractions for each layer. This paper proposes a two-level hierarchical approach for whole-network filter pruning which is efficient and uses the classification loss as the final criterion. The lower-level algorithm (called filter-pruning) uses a sparse-approximation formulation based on linear approximation of filter weights. We explore two algorithms: orthogonal matching pursuit-based greedy selection and a greedy backward pruning approach. The backward pruning algorithm uses a novel closed-form error criterion for efficiently selecting the optimal filter at each stage, thus making the whole algorithm much faster. The higher-level algorithm (called layer-selection) greedily selects the best-pruned layer (pruning using the filter-selection algorithm) using a global pruning criterion. We propose algorithms for two different global-pruning criteria: (1) layer-wise relative error (HBGS), and (2) final classification error (HBGTS). Our suite of algorithms outperforms state-of-the-art pruning methods on ResNet18, ResNet32, ResNet56, VGG16, and ResNext101. Our method reduces the RAM requirement for ResNext101 from 7.6 GB to 1.5 GB and achieves a 94% reduction in FLOPS without losing accuracy on CIFAR-10.
Authors: Cheng Qian, Hainan Zhang, Lei Sha, Zhiming Zheng
Abstract: With the growing deployment of LLMs in daily applications like chatbots and content generation, efforts to ensure outputs align with human values and avoid harmful content have intensified. However, increasingly sophisticated jailbreak attacks threaten this alignment, aiming to induce unsafe outputs. Current defense efforts either focus on prompt rewriting or detection, which are limited in effectiveness due to the various design of jailbreak prompts, or on output control and detection, which are computationally expensive as they require LLM inference. Therefore, designing a pre-inference defense method that resists diverse jailbreak prompts is crucial for preventing LLM jailbreak attacks. We observe that jailbreak attacks, safe queries, and harmful queries exhibit different clustering patterns within the LLM's hidden state representation space. This suggests that by leveraging the LLM's hidden state representational capabilities, we can analyze the LLM's forthcoming behavior and proactively intervene for defense. In this paper, we propose a jailbreak attack defense strategy based on a Hidden State Filter (HSF), a lossless architectural defense mechanism that enables the model to preemptively identify and reject adversarial inputs before the inference process begins. We activate its defensive potential through an additional plugin module, effectively framing the defense task as a classification problem. Experimental results on two benchmark datasets, utilizing three different LLMs, show that HSF significantly enhances resilience against six cutting-edge jailbreak attacks. It significantly reduces the success rate of jailbreak attacks while minimally impacting responses to benign user queries, with negligible inference overhead, and outperforming defense baselines.Our code and data are available at https://anonymous.4open.science/r/Hidden-State-Filtering-8652/
URLs: https://anonymous.4open.science/r/Hidden-State-Filtering-8652/
Authors: Tanish Jain
Abstract: This study evaluates the reliability of two deep learning models for skin cancer detection, focusing on their explainability and fairness. Using the HAM10000 dataset of dermatoscopic images, the research assesses two convolutional neural network architectures: a MobileNet-based model and a custom CNN model. Both models are evaluated for their ability to classify skin lesions into seven categories and to distinguish between dangerous and benign lesions. Explainability is assessed using Saliency Maps and Integrated Gradients, with results interpreted by a dermatologist. The study finds that both models generally highlight relevant features for most lesion types, although they struggle with certain classes like seborrheic keratoses and vascular lesions. Fairness is evaluated using the Equalized Odds metric across sex and skin tone groups. While both models demonstrate fairness across sex groups, they show significant disparities in false positive and false negative rates between light and dark skin tones. A Calibrated Equalized Odds postprocessing strategy is applied to mitigate these disparities, resulting in improved fairness, particularly in reducing false negative rate differences. The study concludes that while the models show promise in explainability, further development is needed to ensure fairness across different skin tones. These findings underscore the importance of rigorous evaluation of AI models in medical applications, particularly in diverse population groups.
Authors: Guangjing Wang, Hanqing Guo, Yuanda Wang, Bocheng Chen, Ce Zhou, Qiben Yan
Abstract: Smartphones and wearable devices have been integrated into our daily lives, offering personalized services. However, many apps become overprivileged as their collected sensing data contains unnecessary sensitive information. For example, mobile sensing data could reveal private attributes (e.g., gender and age) and unintended sensitive features (e.g., hand gestures when entering passwords). To prevent sensitive information leakage, existing methods must obtain private labels and users need to specify privacy policies. However, they only achieve limited control over information disclosure. In this work, we present Hippo to dissociate hierarchical information including private metadata and multi-grained activity information from the sensing data. Hippo achieves fine-grained control over the disclosure of sensitive information without requiring private labels. Specifically, we design a latent guidance-based diffusion model, which generates multi-grained versions of raw sensor data conditioned on hierarchical latent activity features. Hippo enables users to control the disclosure of sensitive information in sensing data, ensuring their privacy while preserving the necessary features to meet the utility requirements of applications. Hippo is the first unified model that achieves two goals: perturbing the sensitive attributes and controlling the disclosure of sensitive information in mobile sensing data. Extensive experiments show that Hippo can anonymize personal attributes and transform activity information at various resolutions across different types of sensing data.
Authors: Yewen Li, Chaojie Wang, Xiaobo Xia, Xu He, Ruyi An, Dong Li, Tongliang Liu, Bo An, Xinrun Wang
Abstract: Unsupervised out-of-distribution (U-OOD) detection is to identify OOD data samples with a detector trained solely on unlabeled in-distribution (ID) data. The likelihood function estimated by a deep generative model (DGM) could be a natural detector, but its performance is limited in some popular "hard" benchmarks, such as FashionMNIST (ID) vs. MNIST (OOD). Recent studies have developed various detectors based on DGMs to move beyond likelihood. However, despite their success on "hard" benchmarks, most of them struggle to consistently surpass or match the performance of likelihood on some "non-hard" cases, such as SVHN (ID) vs. CIFAR10 (OOD) where likelihood could be a nearly perfect detector. Therefore, we appeal for more attention to incremental effectiveness on likelihood, i.e., whether a method could always surpass or at least match the performance of likelihood in U-OOD detection. We first investigate the likelihood of variational DGMs and find its detection performance could be improved in two directions: i) alleviating latent distribution mismatch, and ii) calibrating the dataset entropy-mutual integration. Then, we apply two techniques for each direction, specifically post-hoc prior and dataset entropy-mutual calibration. The final method, named Resultant, combines these two directions for better incremental effectiveness compared to either technique alone. Experimental results demonstrate that the Resultant could be a new state-of-the-art U-OOD detector while maintaining incremental effectiveness on likelihood in a wide range of tasks.
Authors: Yejie Wang, Keqing He, Dayuan Fu, Zhuoma Gongque, Heyang Xu, Yanxu Chen, Zhexu Wang, Yujia Fu, Guanting Dong, Muxi Diao, Jingang Wang, Mengdi Zhang, Xunliang Cai, Weiran Xu
Abstract: Recently, there has been a growing interest in studying how to construct better code instruction tuning data. However, we observe Code models trained with these datasets exhibit high performance on HumanEval but perform worse on other benchmarks such as LiveCodeBench. Upon further investigation, we find that many datasets suffer from severe data leakage. After cleaning up most of the leaked data, some well-known high-quality datasets perform poorly. This discovery reveals a new challenge: identifying which dataset genuinely qualify as high-quality code instruction data. To address this, we propose an efficient code data pruning strategy for selecting good samples. Our approach is based on three dimensions: instruction complexity, response quality, and instruction diversity. Based on our selected data, we present XCoder, a family of models finetuned from LLaMA3. Our experiments show XCoder achieves new state-of-the-art performance using fewer training data, which verify the effectiveness of our data strategy. Moreover, we perform a comprehensive analysis on the data composition and find existing code datasets have different characteristics according to their construction methods, which provide new insights for future code LLMs. Our models and dataset are released in https://github.com/banksy23/XCoder
Authors: Anoop R Katti, Rui C. Gon\c{c}alves, Rinchin Iakovlev
Abstract: In display advertising, advertisers want to achieve a marketing objective with constraints on budget and cost-per-outcome. This is usually formulated as an optimization problem that maximizes the total utility under constraints. The optimization is carried out in an online fashion in the dual space - for an incoming Ad auction, a bid is placed using an optimal bidding formula, assuming optimal values for the dual variables; based on the outcome of the previous auctions, the dual variables are updated in an online fashion. While this approach is theoretically sound, in practice, the dual variables are not optimal from the beginning, but rather converge over time. Specifically, for the cost-constraint, the convergence is asymptotic. As a result, we find that cost-control is ineffective. In this work, we analyse the shortcomings of the optimal bidding formula and propose a modification that deviates from the theoretical derivation. We simulate various practical scenarios and study the cost-control behaviors of the two algorithms. Through a large-scale evaluation on the real-word data, we show that the proposed modification reduces the cost violations by 50%, thereby achieving a better cost-control than the theoretical bidding formula.
Authors: Arnold Schwarz, Levente Hernadi, Felix Bie{\ss}mann, Kristian Hildebrand
Abstract: In this study we provide empirical evidence demonstrating that the quality of training data impacts model performance in Human Pose Estimation (HPE). Inaccurate labels in widely used data sets, ranging from minor errors to severe mislabeling, can negatively influence learning and distort performance metrics. We perform an in-depth analysis of popular HPE data sets to show the extent and nature of label inaccuracies. Our findings suggest that accounting for the impact of faulty labels will facilitate the development of more robust and accurate HPE models for a variety of real-world applications. We show improved performance with cleansed data.
Authors: Celine Reddig, Pawan Goyal, Igor Pontes Duff, Peter Benner
Abstract: Model reduction is an active research field to construct low-dimensional surrogate models of high fidelity to accelerate engineering design cycles. In this work, we investigate model reduction for linear structured systems using dominant reachable and observable subspaces. When the training set $-$ containing all possible interpolation points $-$ is large, then these subspaces can be determined by solving many large-scale linear systems. However, for high-fidelity models, this easily becomes computationally intractable. To circumvent this issue, in this work, we propose an active sampling strategy to sample only a few points from the given training set, which can allow us to estimate those subspaces accurately. To this end, we formulate the identification of the subspaces as the solution of the generalized Sylvester equations, guiding us to select the most relevant samples from the training set to achieve our goals. Consequently, we construct solutions of the matrix equations in low-rank forms, which encode subspace information. We extensively discuss computational aspects and efficient usage of the low-rank factors in the process of obtaining reduced-order models. We illustrate the proposed active sampling scheme to obtain reduced-order models via dominant reachable and observable subspaces and present its comparison with the method where all the points from the training set are taken into account. It is shown that the active sample strategy can provide us $17$x speed-up without sacrificing any noticeable accuracy.
Authors: Shrabani Ghosh
Abstract: A signed graph (SG) is a graph where edges carry sign information attached to it. The sign of a network can be positive, negative, or neutral. A signed network is ubiquitous in a real-world network like social networks, citation networks, and various technical networks. There are many network embedding models have been proposed and developed for signed networks for both homogeneous and heterogeneous types. SG embedding learns low-dimensional vector representations for nodes of a network, which helps to do many network analysis tasks such as link prediction, node classification, and community detection. In this survey, we perform a comprehensive study of SG embedding methods and applications. We introduce here the basic theories and methods of SGs and survey the current state of the art of signed graph embedding methods. In addition, we explore the applications of different types of SG embedding methods in real-world scenarios. As an application, we have explored the citation network to analyze authorship networks. We also provide source code and datasets to give future direction. Lastly, we explore the challenges of SG embedding and forecast various future research directions in this field.
Authors: Taekyun Lee, Juseong Park, Hyeji Kim, Jeffrey G. Andrews
Abstract: Deep neural network (DNN)-based algorithms are emerging as an important tool for many physical and MAC layer functions in future wireless communication systems, including for large multi-antenna channels. However, training such models typically requires a large dataset of high-dimensional channel measurements, which are very difficult and expensive to obtain. This paper introduces a novel method for generating synthetic wireless channel data using diffusion-based models to produce user-specific channels that accurately reflect real-world wireless environments. Our approach employs a conditional denoising diffusion implicit models (cDDIM) framework, effectively capturing the relationship between user location and multi-antenna channel characteristics. We generate synthetic high fidelity channel samples using user positions as conditional inputs, creating larger augmented datasets to overcome measurement scarcity. The utility of this method is demonstrated through its efficacy in training various downstream tasks such as channel compression and beam alignment. Our approach significantly improves over prior methods, such as adding noise or using generative adversarial networks (GANs), especially in scenarios with limited initial measurements.
Authors: Sarah Condran
Abstract: Detecting false information on social media is critical in mitigating its negative societal impacts. To reduce the propagation of false information, automated detection provide scalable, unbiased, and cost-effective methods. However, there are three potential research areas identified which once solved improve detection. First, current AI-based solutions often provide a uni-dimensional analysis on a complex, multi-dimensional issue, with solutions differing based on the features used. Furthermore, these methods do not account for the temporal and dynamic changes observed within the document's life cycle. Second, there has been little research on the detection of coordinated information campaigns and in understanding the intent of the actors and the campaign. Thirdly, there is a lack of consideration of cross-platform analysis, with existing datasets focusing on a single platform, such as X, and detection models designed for specific platform. This work aims to develop methods for effective detection of false information and its propagation. To this end, firstly we aim to propose the creation of an ensemble multi-faceted framework that leverages multiple aspects of false information. Secondly, we propose a method to identify actors and their intent when working in coordination to manipulate a narrative. Thirdly, we aim to analyse the impact of cross-platform interactions on the propagation of false information via the creation of a new dataset.
Authors: Eshwar Ram Arunachaleswaran, Natalie Collina, Sampath Kannan, Aaron Roth, Juba Ziani
Abstract: There has been substantial recent concern that pricing algorithms might learn to ``collude.'' Supra-competitive prices can emerge as a Nash equilibrium of repeated pricing games, in which sellers play strategies which threaten to punish their competitors who refuse to support high prices, and these strategies can be automatically learned. In fact, a standard economic intuition is that supra-competitive prices emerge from either the use of threats, or a failure of one party to optimize their payoff. Is this intuition correct? Would preventing threats in algorithmic decision-making prevent supra-competitive prices when sellers are optimizing for their own revenue? No. We show that supra-competitive prices can emerge even when both players are using algorithms which do not encode threats, and which optimize for their own revenue. We study sequential pricing games in which a first mover deploys an algorithm and then a second mover optimizes within the resulting environment. We show that if the first mover deploys any algorithm with a no-regret guarantee, and then the second mover even approximately optimizes within this now static environment, monopoly-like prices arise. The result holds for any no-regret learning algorithm deployed by the first mover and for any pricing policy of the second mover that obtains them profit at least as high as a random pricing would -- and hence the result applies even when the second mover is optimizing only within a space of non-responsive pricing distributions which are incapable of encoding threats. In fact, there exists a set of strategies, neither of which explicitly encode threats that form a Nash equilibrium of the simultaneous pricing game in algorithm space, and lead to near monopoly prices. This suggests that the definition of ``algorithmic collusion'' may need to be expanded, to include strategies without explicitly encoded threats.
Authors: Anna Guo, Razieh Nabi
Abstract: The identification theory for causal effects in directed acyclic graphs (DAGs) with hidden variables is well-developed, but methods for estimating and inferring functionals beyond the g-formula remain limited. Previous studies have proposed semiparametric estimators for identifiable functionals in a broad class of DAGs with hidden variables. While demonstrating double robustness in some models, existing estimators face challenges, particularly with density estimation and numerical integration for continuous variables, and their estimates may fall outside the parameter space of the target estimand. Their asymptotic properties are also underexplored, especially when using flexible statistical and machine learning models for nuisance estimation. This study addresses these challenges by introducing novel one-step corrected plug-in and targeted minimum loss-based estimators of causal effects for a class of DAGs that extend classical back-door and front-door criteria (known as the treatment primal fixability criterion in prior literature). These estimators leverage machine learning to minimize modeling assumptions while ensuring key statistical properties such as asymptotic linearity, double robustness, efficiency, and staying within the bounds of the target parameter space. We establish conditions for nuisance functional estimates in terms of L2(P)-norms to achieve root-n consistent causal effect estimates. To facilitate practical application, we have developed the flexCausal package in R.
Authors: Zhe Xiong, Qiaoqiao Ding, Xiaoqun Zhang
Abstract: Recently, medical image synthesis gains more and more popularity, along with the rapid development of generative models. Medical image synthesis aims to generate an unacquired image modality, often from other observed data modalities. Synthesized images can be used for clinical diagnostic assistance, data augmentation for model training and validation or image quality improving. In the meanwhile, the flow-based models are among the successful generative models for the ability of generating realistic and high-quality synthetic images. However, most flow-based models require to calculate flow ordinary different equation (ODE) evolution steps in transfer process, for which the performances are significantly limited by heavy computation time due to a large number of time iterations. In this paper, we propose a novel flow-based model, namely Discrete Process Matching (DPM) to accomplish the bi-modality image transfer tasks. Different to other flow matching based models, we propose to utilize both forward and backward ODE flow and enhance the consistency on the intermediate images of few discrete time steps, resulting in a transfer process with much less iteration steps while maintaining high-quality generations for both modalities. Our experiments on three datasets of MRI T1/T2 and CT/MRI demonstrate that DPM outperforms other state-of-the-art flow-based methods for bi-modality image synthesis, achieving higher image quality with less computation time cost.
Authors: Yudong Chen, Xumei Xi, Christina Lee Yu
Abstract: Matrix completion tackles the task of predicting missing values in a low-rank matrix based on a sparse set of observed entries. It is often assumed that the observation pattern is generated uniformly at random or has a very specific structure tuned to a given algorithm. There is still a gap in our understanding when it comes to arbitrary sampling patterns. Given an arbitrary sampling pattern, we introduce a matrix completion algorithm based on network flows in the bipartite graph induced by the observation pattern. For additive matrices, the particular flow we used is the electrical flow and we establish error upper bounds customized to each entry as a function of the observation set, along with matching minimax lower bounds. Our results show that the minimax squared error for recovery of a particular entry in the matrix is proportional to the effective resistance of the corresponding edge in the graph. Furthermore, we show that our estimator is equivalent to the least squares estimator. We apply our estimator to the two-way fixed effects model and show that it enables us to accurately infer individual causal effects and the unit-specific and time-specific confounders. For rank-$1$ matrices, we use edge-disjoint paths to form an estimator that achieves minimax optimal estimation when the sampling is sufficiently dense. Our discovery introduces a new family of estimators parametrized by network flows, which provide a fine-grained and intuitive understanding of the impact of the given sampling pattern on the relative difficulty of estimation at an entry-specific level. This graph-based approach allows us to quantify the inherent complexity of matrix completion for individual entries, rather than relying solely on global measures of performance.
Authors: Zhenxiao Zhang, Zhidong Gao, Yuanxiong Guo, Yanmin Gong
Abstract: Motivated by the drawbacks of cloud-based federated learning (FL), cooperative federated edge learning (CFEL) has been proposed to improve efficiency for FL over mobile edge networks, where multiple edge servers collaboratively coordinate the distributed model training across a large number of edge devices. However, CFEL faces critical challenges arising from dynamic and heterogeneous device properties, which slow down the convergence and increase resource consumption. This paper proposes a heterogeneity-aware CFEL scheme called \textit{Heterogeneity-Aware Cooperative Edge-based Federated Averaging} (HCEF) that aims to maximize the model accuracy while minimizing the training time and energy consumption via adaptive computation and communication compression in CFEL. By theoretically analyzing how local update frequency and gradient compression affect the convergence error bound in CFEL, we develop an efficient online control algorithm for HCEF to dynamically determine local update frequencies and compression ratios for heterogeneous devices. Experimental results show that compared with prior schemes, the proposed HCEF scheme can maintain higher model accuracy while reducing training latency and improving energy efficiency simultaneously.
Authors: Kentaro Hirahara, Chikahito Nakane, Hajime Ebisawa, Tsuyoshi Kuroda, Yohei Iwaki, Tomoyoshi Utsumi, Yuichiro Nomura, Makoto Koike, Hiroshi Mineno
Abstract: In an agricultural field, plant phenotyping using object detection models is gaining attention. However, collecting the training data necessary to create generic and high-precision models is extremely challenging due to the difficulty of annotation and the diversity of domains. Furthermore, it is difficult to transfer training data across different crops, and although machine learning models effective for specific environments, conditions, or crops have been developed, they cannot be widely applied in actual fields. In this study, we propose a generative data augmentation method (D4) for vineyard shoot detection. D4 uses a pre-trained text-guided diffusion model based on a large number of original images culled from video data collected by unmanned ground vehicles or other means, and a small number of annotated datasets. The proposed method generates new annotated images with background information adapted to the target domain while retaining annotation information necessary for object detection. In addition, D4 overcomes the lack of training data in agriculture, including the difficulty of annotation and diversity of domains. We confirmed that this generative data augmentation method improved the mean average precision by up to 28.65% for the BBox detection task and the average precision by up to 13.73% for the keypoint detection task for vineyard shoot detection. Our generative data augmentation method D4 is expected to simultaneously solve the cost and domain diversity issues of training data generation in agriculture and improve the generalization performance of detection models.
Authors: Franziska Griese, Fabian Hoppe, Alexander R\"uttgers, Philipp Knechtges
Abstract: The numerical simulation and optimization of technical systems described by partial differential equations is expensive, especially in multi-query scenarios in which the underlying equations have to be solved for different parameters. A comparatively new approach in this context is to combine the good approximation properties of neural networks (for parameter dependence) with the classical finite element method (for discretization). However, instead of considering the solution mapping of the PDE from the parameter space into the FEM-discretized solution space as a purely data-driven regression problem, so-called physically informed regression problems have proven to be useful. In these, the equation residual is minimized during the training of the neural network, i.e. the neural network "learns" the physics underlying the problem. In this paper, we extend this approach to saddle-point and non-linear fluid dynamics problems, respectively, namely stationary Stokes and stationary Navier-Stokes equations. In particular, we propose a modification of the existing approach: Instead of minimizing the plain vanilla equation residual during training, we minimize the equation residual modified by a preconditioner. By analogy with the linear case, this also improves the condition in the present non-linear case. Our numerical examples demonstrate that this approach significantly reduces the training effort and greatly increases accuracy and generalizability. Finally, we show the application of the resulting parameterized model to a related inverse problem.
Authors: Anastasios Vlachos, Anastasios Tsiamis, Aren Karapetyan, Efe C. Balta, John Lygeros
Abstract: In this paper, we consider the problem of predicting unknown targets from data. We propose Online Residual Learning (ORL), a method that combines online adaptation with offline-trained predictions. At a lower level, we employ multiple offline predictions generated before or at the beginning of the prediction horizon. We augment every offline prediction by learning their respective residual error concerning the true target state online, using the recursive least squares algorithm. At a higher level, we treat the augmented lower-level predictors as experts, adopting the Prediction with Expert Advice framework. We utilize an adaptive softmax weighting scheme to form an aggregate prediction and provide guarantees for ORL in terms of regret. We employ ORL to boost performance in the setting of online pedestrian trajectory prediction. Based on data from the Stanford Drone Dataset, we show that ORL can demonstrate best-of-both-worlds performance.
Authors: Ali Khazaee, Abdolreza Mohammadi, Ruairi Oreally
Abstract: Alzheimer's disease (AD) is a neurodegenerative disorder marked by memory loss and cognitive decline, making early detection vital for timely intervention. However, early diagnosis is challenging due to the heterogeneous presentation of symptoms. Resting-state fMRI (rs-fMRI) captures spontaneous brain activity and functional connectivity, which are known to be disrupted in AD and mild cognitive impairment (MCI). Traditional methods, such as Pearson's correlation, have been used to calculate association matrices, but these approaches often overlook the dynamic and non-stationary nature of brain activity. In this study, we introduce a novel method that integrates discrete wavelet transform (DWT) and graph theory to model the dynamic behavior of brain networks. By decomposing rs-fMRI signals using DWT, our approach captures the time-frequency representation of brain activity, allowing for a more nuanced analysis of the underlying network dynamics. Graph theory provides a robust mathematical framework to analyze these complex networks, while machine learning is employed to automate the discrimination of different stages of AD based on learned patterns from different frequency bands. We applied our method to a dataset of rs-fMRI images from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, demonstrating its potential as an early diagnostic tool for AD and for monitoring disease progression. Our statistical analysis identifies specific brain regions and connections that are affected in AD and MCI, at different frequency bands, offering deeper insights into the disease's impact on brain function.
Authors: Yicheng Fu, Raviteja Anantha, Prabal Vashisht, Jianpeng Cheng, Etai Littwin
Abstract: Generating user intent from a sequence of user interface (UI) actions is a core challenge in comprehensive UI understanding. Recent advancements in multimodal large language models (MLLMs) have led to substantial progress in this area, but their demands for extensive model parameters, computing power, and high latency makes them impractical for scenarios requiring lightweight, on-device solutions with low latency or heightened privacy. Additionally, the lack of high-quality datasets has hindered the development of such lightweight models. To address these challenges, we propose UI-JEPA, a novel framework that employs masking strategies to learn abstract UI embeddings from unlabeled data through self-supervised learning, combined with an LLM decoder fine-tuned for user intent prediction. We also introduce two new UI-grounded multimodal datasets, "Intent in the Wild" (IIW) and "Intent in the Tame" (IIT), designed for few-shot and zero-shot UI understanding tasks. IIW consists of 1.7K videos across 219 intent categories, while IIT contains 914 videos across 10 categories. We establish the first baselines for these datasets, showing that representations learned using a JEPA-style objective, combined with an LLM decoder, can achieve user intent predictions that match the performance of state-of-the-art large MLLMs, but with significantly reduced annotation and deployment resources. Measured by intent similarity scores, UI-JEPA outperforms GPT-4 Turbo and Claude 3.5 Sonnet by 10.0% and 7.2% respectively, averaged across two datasets. Notably, UI-JEPA accomplishes the performance with a 50.5x reduction in computational cost and a 6.6x improvement in latency in the IIW dataset. These results underscore the effectiveness of UI-JEPA, highlighting its potential for lightweight, high-performance UI understanding.
Authors: Chenglei Si, Diyi Yang, Tatsunori Hashimoto
Abstract: Recent advancements in large language models (LLMs) have sparked optimism about their potential to accelerate scientific discovery, with a growing number of works proposing research agents that autonomously generate and validate new ideas. Despite this, no evaluations have shown that LLM systems can take the very first step of producing novel, expert-level ideas, let alone perform the entire research process. We address this by establishing an experimental design that evaluates research idea generation while controlling for confounders and performs the first head-to-head comparison between expert NLP researchers and an LLM ideation agent. By recruiting over 100 NLP researchers to write novel ideas and blind reviews of both LLM and human ideas, we obtain the first statistically significant conclusion on current LLM capabilities for research ideation: we find LLM-generated ideas are judged as more novel (p < 0.05) than human expert ideas while being judged slightly weaker on feasibility. Studying our agent baselines closely, we identify open problems in building and evaluating research agents, including failures of LLM self-evaluation and their lack of diversity in generation. Finally, we acknowledge that human judgements of novelty can be difficult, even by experts, and propose an end-to-end study design which recruits researchers to execute these ideas into full projects, enabling us to study whether these novelty and feasibility judgements result in meaningful differences in research outcome.
Authors: Yuan-Hao Wei, Yan-Jie Sun, Chen Zhang
Abstract: Inference and inverse problems are closely related concepts, both fundamentally involving the deduction of unknown causes or parameters from observed data. Bayesian inference, a powerful class of methods, is often employed to solve a variety of problems, including those related to causal inference. Variational inference, a subset of Bayesian inference, is primarily used to efficiently approximate complex posterior distributions. Variational Autoencoders (VAEs), which combine variational inference with deep learning, have become widely applied across various domains. This study explores the potential of VAEs for solving inverse problems, such as Independent Component Analysis (ICA), without relying on an explicit inverse mapping process. Unlike other VAE-based ICA methods, this approach discards the encoder in the VAE architecture, directly setting the latent variables as trainable parameters. In other words, the latent variables are no longer outputs of the encoder but are instead optimized directly through the objective function to converge to appropriate values. We find that, with a suitable prior setup, the latent variables, represented by trainable parameters, can exhibit mutually independent properties as the parameters converge, all without the need for an encoding process. This approach, referred to as the Half-VAE, bypasses the inverse mapping process by eliminating the encoder. This study demonstrates the feasibility of using the Half-VAE to solve ICA without the need for an explicit inverse mapping process.
Authors: Thivin Anandh, Divij Ghose, Ankit Tyagi, Abhineet Gupta, Suranjan Sarkar, Sashikumaar Ganesan
Abstract: Physics-informed neural networks (PINNs) are able to solve partial differential equations (PDEs) by incorporating the residuals of the PDEs into their loss functions. Variational Physics-Informed Neural Networks (VPINNs) and hp-VPINNs use the variational form of the PDE residuals in their loss function. Although hp-VPINNs have shown promise over traditional PINNs, they suffer from higher training times and lack a framework capable of handling complex geometries, which limits their application to more complex PDEs. As such, hp-VPINNs have not been applied in solving the Navier-Stokes equations, amongst other problems in CFD, thus far. FastVPINNs was introduced to address these challenges by incorporating tensor-based loss computations, significantly improving the training efficiency. Moreover, by using the bilinear transformation, the FastVPINNs framework was able to solve PDEs on complex geometries. In the present work, we extend the FastVPINNs framework to vector-valued problems, with a particular focus on solving the incompressible Navier-Stokes equations for two-dimensional forward and inverse problems, including problems such as the lid-driven cavity flow, the Kovasznay flow, and flow past a backward-facing step for Reynolds numbers up to 200. Our results demonstrate a 2x improvement in training time while maintaining the same order of accuracy compared to PINNs algorithms documented in the literature. We further showcase the framework's efficiency in solving inverse problems for the incompressible Navier-Stokes equations by accurately identifying the Reynolds number of the underlying flow. Additionally, the framework's ability to handle complex geometries highlights its potential for broader applications in computational fluid dynamics. This implementation opens new avenues for research on hp-VPINNs, potentially extending their applicability to more complex problems.
Authors: Luis Mayer, Christian Heumann, Matthias A{\ss}enmacher
Abstract: In recent years, large language models (LLMs) have emerged as powerful tools with potential applications in various fields, including software engineering. Within the scope of this research, we evaluate five different state-of-the-art LLMs - Bard, BingChat, ChatGPT, Llama2, and Code Llama - concerning their capabilities for text-to-code generation. In an empirical study, we feed prompts with textual descriptions of coding problems sourced from the programming website LeetCode to the models with the task of creating solutions in Python. Subsequently, the quality of the generated outputs is assessed using the testing functionalities of LeetCode. The results indicate large differences in performance between the investigated models. ChatGPT can handle these typical programming challenges by far the most effectively, surpassing even code-specialized models like Code Llama. To gain further insights, we measure the runtime as well as the memory usage of the generated outputs and compared them to the other code submissions on Leetcode. A detailed error analysis, encompassing a comparison of the differences concerning correct indentation and form of the generated code as well as an assignment of the incorrectly solved tasks to certain error categories allows us to obtain a more nuanced picture of the results and potential for improvement. The results also show a clear pattern of increasingly incorrect produced code when the models are facing a lot of context in the form of longer prompts.
Authors: Valentina Vadori, Jean-Marie Gra\"ic, Antonella Peruffo, Giulia Vadori, Livio Finos, Enrico Grisan
Abstract: Delineating and classifying individual cells in microscopy tissue images is a complex task, yet it is a pivotal endeavor in various medical and biological investigations. We propose a new deep learning framework (CISCA) for automatic cell instance segmentation and classification in histological slices to support detailed morphological and structural analysis or straightforward cell counting in digital pathology workflows and brain cytoarchitecture studies. At the core of CISCA lies a network architecture featuring a lightweight U-Net with three heads in the decoder. The first head classifies pixels into boundaries between neighboring cells, cell bodies, and background, while the second head regresses four distance maps along four directions. The network outputs from the first and second heads are integrated through a tailored post-processing step, which ultimately yields the segmentation of individual cells. A third head enables simultaneous classification of cells into relevant classes, if required. We showcase the effectiveness of our method using four datasets, including CoNIC, PanNuke, and MoNuSeg, which are publicly available H\&E datasets. Additionally, we introduce CytoDArk0, a novel dataset consisting of Nissl-stained images of the cortex, cerebellum, and hippocampus from mammals belonging to the orders Cetartiodactyla and Primates. We evaluate CISCA in comparison to other state-of-the-art methods, demonstrating CISCA's robustness and accuracy in segmenting and classifying cells across diverse tissue types, magnifications, and staining techniques.
Authors: Malte Luttermann, Ralf M\"oller, Mattis Hartwig
Abstract: Probabilistic relational models provide a well-established formalism to combine first-order logic and probabilistic models, thereby allowing to represent relationships between objects in a relational domain. At the same time, the field of artificial intelligence requires increasingly large amounts of relational training data for various machine learning tasks. Collecting real-world data, however, is often challenging due to privacy concerns, data protection regulations, high costs, and so on. To mitigate these challenges, the generation of synthetic data is a promising approach. In this paper, we solve the problem of generating synthetic relational data via probabilistic relational models. In particular, we propose a fully-fledged pipeline to go from relational database to probabilistic relational model, which can then be used to sample new synthetic relational data points from its underlying probability distribution. As part of our proposed pipeline, we introduce a learning algorithm to construct a probabilistic relational model from a given relational database.
Authors: Daniel J. Tan, Qianyi Xu, Kay Choong See, Dilruk Perera, Mengling Feng
Abstract: Multi-organ diseases present significant challenges due to their simultaneous impact on multiple organ systems, necessitating complex and adaptive treatment strategies. Despite recent advancements in AI-powered healthcare decision support systems, existing solutions are limited to individual organ systems. They often ignore the intricate dependencies between organ system and thereby fails to provide holistic treatment recommendations that are useful in practice. We propose a novel hierarchical multi-agent reinforcement learning (HMARL) framework to address these challenges. This framework uses dedicated agents for each organ system, and model dynamic through explicit inter-agent communication channels, enabling coordinated treatment strategies across organs. Furthermore, we introduce a dual-layer state representation technique to contextualize patient conditions at various hierarchical levels, enhancing the treatment accuracy and relevance. Through extensive qualitative and quantitative evaluations in managing sepsis (a complex multi-organ disease), our approach demonstrates its ability to learn effective treatment policies that significantly improve patient survival rates. This framework marks a substantial advancement in clinical decision support systems, pioneering a comprehensive approach for multi-organ treatment recommendations.
Authors: Ahmad Mohammad Saber, Amr Youssef, Davor Svetinovic, Hatem Zeineldin, Ehab F. El-Saadany
Abstract: Line Current Differential Relays (LCDRs) are high-speed relays progressively used to protect critical transmission lines. However, LCDRs are vulnerable to cyberattacks. Fault-Masking Attacks (FMAs) are stealthy cyberattacks performed by manipulating the remote measurements of the targeted LCDR to disguise faults on the protected line. Hence, they remain undetected by this LCDR. In this paper, we propose a two-module framework to detect FMAs. The first module is a Mismatch Index (MI) developed from the protected transmission line's equivalent physical model. The MI is triggered only if there is a significant mismatch in the LCDR's local and remote measurements while the LCDR itself is untriggered, which indicates an FMA. After the MI is triggered, the second module, a neural network-based classifier, promptly confirms that the triggering event is a physical fault that lies on the line protected by the LCDR before declaring the occurrence of an FMA. The proposed framework is tested using the IEEE 39-bus benchmark system. Our simulation results confirm that the proposed framework can accurately detect FMAs on LCDRs and is not affected by normal system disturbances, variations, or measurement noise. Our experimental results using OPAL-RT's real-time simulator confirm the proposed solution's real-time performance capability.
Authors: Xueyuan Han, Zinuo Cai, Yichu Zhang, Chongxin Fan, Junhan Liu, Ruhui Ma, Rajkumar Buyya
Abstract: The application of Transformer-based large models has achieved numerous success in recent years. However, the exponential growth in the parameters of large models introduces formidable memory challenge for edge deployment. Prior works to address this challenge mainly focus on optimizing the model structure and adopting memory swapping methods. However, the former reduces the inference accuracy, and the latter raises the inference latency. This paper introduces PIPELOAD, a novel memory-efficient pipeline execution mechanism. It reduces memory usage by incorporating dynamic memory management and minimizes inference latency by employing parallel model loading. Based on PIPELOAD mechanism, we present Hermes, a framework optimized for large model inference on edge devices. We evaluate Hermes on Transformer-based models of different sizes. Our experiments illustrate that Hermes achieves up to 4.24 X increase in inference speed and 86.7% lower memory consumption than the state-of-the-art pipeline mechanism for BERT and ViT models, 2.58 X increase in inference speed and 90.3% lower memory consumption for GPT-style models.
Authors: Oren Mangoubi, Nisheeth K. Vishnoi
Abstract: We consider the problem of sampling from a log-concave distribution $\pi(\theta) \propto e^{-f(\theta)}$ constrained to a polytope $K:=\{\theta \in \mathbb{R}^d: A\theta \leq b\}$, where $A\in \mathbb{R}^{m\times d}$ and $b \in \mathbb{R}^m$.The fastest-known algorithm \cite{mangoubi2022faster} for the setting when $f$ is $O(1)$-Lipschitz or $O(1)$-smooth runs in roughly $O(md \times md^{\omega -1})$ arithmetic operations, where the $md^{\omega -1}$ term arises because each Markov chain step requires computing a matrix inversion and determinant (here $\omega \approx 2.37$ is the matrix multiplication constant). We present a nearly-optimal implementation of this Markov chain with per-step complexity which is roughly the number of non-zero entries of $A$ while the number of Markov chain steps remains the same. The key technical ingredients are 1) to show that the matrices that arise in this Dikin walk change slowly, 2) to deploy efficient linear solvers that can leverage this slow change to speed up matrix inversion by using information computed in previous steps, and 3) to speed up the computation of the determinantal term in the Metropolis filter step via a randomized Taylor series-based estimator.
Authors: Marco Puts, David Salgado, Piet Daas
Abstract: It is important for official statistics production to apply ML with statistical rigor, as it presents both opportunities and challenges. Although machine learning has enjoyed rapid technological advances in recent years, its application does not possess the methodological robustness necessary to produce high quality statistical results. In order to account for all sources of error in machine learning models, the Total Machine Learning Error (TMLE) is presented as a framework analogous to the Total Survey Error Model used in survey methodology. As a means of ensuring that ML models are both internally valid as well as externally valid, the TMLE model addresses issues such as representativeness and measurement errors. There are several case studies presented, illustrating the importance of applying more rigor to the application of machine learning in official statistics.
Authors: Jan Schnabel, Marco Roth
Abstract: Since the entry of kernel theory in the field of quantum machine learning, quantum kernel methods (QKMs) have gained increasing attention with regard to both probing promising applications and delivering intriguing research insights. Two common approaches for computing the underlying Gram matrix have emerged: fidelity quantum kernels (FQKs) and projected quantum kernels (PQKs). Benchmarking these methods is crucial to gain robust insights and to understand their practical utility. In this work, we present a comprehensive large-scale study examining QKMs based on FQKs and PQKs across a manifold of design choices. Our investigation encompasses both classification and regression tasks for five dataset families and 64 datasets, systematically comparing the use of FQKs and PQKs quantum support vector machines and kernel ridge regression. This resulted in over 20,000 models that were trained and optimized using a state-of-the-art hyperparameter search to ensure robust and comprehensive insights. We delve into the importance of hyperparameters on model performance scores and support our findings through rigorous correlation analyses. In this, we also closely inspect two data encoding strategies. Moreover, we provide an in-depth analysis addressing the design freedom of PQKs and explore the underlying principles responsible for learning. Our goal is not to identify the best-performing model for a specific task but to uncover the mechanisms that lead to effective QKMs and reveal universal patterns.
Authors: Jiaxing Wu, Lin Ning, Luyang Liu, Harrison Lee, Neo Wu, Chao Wang, Sushant Prakash, Shawn O'Banion, Bradley Green, Jun Xie
Abstract: LLM-powered personalization agent systems employ Large Language Models (LLMs) to predict users' behavior from their past activities. However, their effectiveness often hinges on the ability to effectively leverage extensive, long user historical data due to its inherent noise and length of such data. Existing pretrained LLMs may generate summaries that are concise but lack the necessary context for downstream tasks, hindering their utility in personalization systems. To address these challenges, we introduce Reinforcement Learning from Prediction Feedback (RLPF). RLPF fine-tunes LLMs to generate concise, human-readable user summaries that are optimized for downstream task performance. By maximizing the usefulness of the generated summaries, RLPF effectively distills extensive user history data while preserving essential information for downstream tasks. Our empirical evaluation demonstrates significant improvements in both extrinsic downstream task utility and intrinsic summary quality, surpassing baseline methods by up to 22% on downstream task performance and achieving an up to 84.59% win rate on Factuality, Abstractiveness, and Readability. RLPF also achieves a remarkable 74% reduction in context length while improving performance on 16 out of 19 unseen tasks and/or datasets, showcasing its generalizability. This approach offers a promising solution for enhancing LLM personalization by effectively transforming long, noisy user histories into informative and human-readable representations.
Authors: Yecheng Wu, Zhuoyang Zhang, Junyu Chen, Haotian Tang, Dacheng Li, Yunhao Fang, Ligeng Zhu, Enze Xie, Hongxu Yin, Li Yi, Song Han, Yao Lu
Abstract: VILA-U is a Unified foundation model that integrates Video, Image, Language understanding and generation. Traditional visual language models (VLMs) use separate modules for understanding and generating visual content, which can lead to misalignment and increased complexity. In contrast, VILA-U employs a single autoregressive next-token prediction framework for both tasks, eliminating the need for additional components like diffusion models. This approach not only simplifies the model but also achieves near state-of-the-art performance in visual language understanding and generation. The success of VILA-U is attributed to two main factors: the unified vision tower that aligns discrete visual tokens with textual inputs during pretraining, which enhances visual perception, and autoregressive image generation can achieve similar quality as diffusion models with high-quality dataset. This allows VILA-U to perform comparably to more complex models using a fully token-based autoregressive framework.
Authors: Hanbin Hong, Xinyu Zhang, Binghui Wang, Zhongjie Ba, Yuan Hong
Abstract: Black-box adversarial attacks have demonstrated strong potential to compromise machine learning models by iteratively querying the target model or leveraging transferability from a local surrogate model. Recently, such attacks can be effectively mitigated by state-of-the-art (SOTA) defenses, e.g., detection via the pattern of sequential queries, or injecting noise into the model. To our best knowledge, we take the first step to study a new paradigm of black-box attacks with provable guarantees -- certifiable black-box attacks that can guarantee the attack success probability (ASP) of adversarial examples before querying over the target model. This new black-box attack unveils significant vulnerabilities of machine learning models, compared to traditional empirical black-box attacks, e.g., breaking strong SOTA defenses with provable confidence, constructing a space of (infinite) adversarial examples with high ASP, and the ASP of the generated adversarial examples is theoretically guaranteed without verification/queries over the target model. Specifically, we establish a novel theoretical foundation for ensuring the ASP of the black-box attack with randomized adversarial examples (AEs). Then, we propose several novel techniques to craft the randomized AEs while reducing the perturbation size for better imperceptibility. Finally, we have comprehensively evaluated the certifiable black-box attacks on the CIFAR10/100, ImageNet, and LibriSpeech datasets, while benchmarking with 16 SOTA black-box attacks, against various SOTA defenses in the domains of computer vision and speech recognition. Both theoretical and experimental results have validated the significance of the proposed attack. The code and all the benchmarks are available at \url{https://github.com/datasec-lab/CertifiedAttack}.
Authors: Akansha Kalra, Daniel S. Brown
Abstract: Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for capturing human intent to alleviate the challenges of hand-crafting the reward values. Despite the increasing interest in RLHF, most works learn black box reward functions that while expressive are difficult to interpret and often require running the whole costly process of RL before we can even decipher if these frameworks are actually aligned with human preferences. We propose and evaluate a novel approach for learning expressive and interpretable reward functions from preferences using Differentiable Decision Trees (DDTs). Our experiments across several domains, including CartPole, Visual Gridworld environments and Atari games, provide evidence that the tree structure of our learned reward function is useful in determining the extent to which the reward function is aligned with human preferences. We also provide experimental evidence that not only shows that reward DDTs can often achieve competitive RL performance when compared with larger capacity deep neural network reward functions but also demonstrates the diagnostic utility of our framework in checking alignment of learned reward functions. We also observe that the choice between soft and hard (argmax) output of reward DDT reveals a tension between wanting highly shaped rewards to ensure good RL performance, while also wanting simpler, more interpretable rewards. Videos and code, are available at: https://sites.google.com/view/ddt-rlhf
Authors: Sujith K Mandala
Abstract: Fetal health classification is a critical task in obstetrics, enabling early identification and management of potential health problems. However, it remains challenging due to data complexity and limited labeled samples. This research paper presents a novel machine-learning approach for fetal health classification, leveraging a LightGBM classifier trained on a comprehensive dataset. The proposed model achieves an impressive accuracy of 98.31% on a test set. Our findings demonstrate the potential of machine learning in enhancing fetal health classification, offering a more objective and accurate assessment. Notably, our approach combines various features, such as fetal heart rate, uterine contractions, and maternal blood pressure, to provide a comprehensive evaluation. This methodology holds promise for improving early detection and treatment of fetal health issues, ensuring better outcomes for both mothers and babies. Beyond the high accuracy achieved, the novelty of our approach lies in its comprehensive feature selection and assessment methodology. By incorporating multiple data points, our model offers a more holistic and reliable evaluation compared to traditional methods. This research has significant implications in the field of obstetrics, paving the way for advancements in early detection and intervention of fetal health concerns. Future work involves validating the model on a larger dataset and developing a clinical application. Ultimately, we anticipate that our research will revolutionize the assessment and management of fetal health, contributing to improved healthcare outcomes for expectant mothers and their babies.
Authors: Giuseppe Bruni, Sepehr Maleki, Senthil K. Krishnababu
Abstract: Applications of deep learning to physical simulations such as Computational Fluid Dynamics have recently experienced a surge in interest, and their viability has been demonstrated in different domains. However, due to the highly complex, turbulent and three-dimensional flows, they have not yet been proven usable for turbomachinery applications. Multi-stage axial compressors for gas turbine applications represent a remarkably challenging case, due to the high-dimensionality of the regression of the flow-field from geometrical and operational variables. This paper demonstrates the development and application of a deep learning framework for predictions of the flow field and aerodynamic performance of multi-stage axial compressors. A physics-based dimensionality reduction unlocks the potential for flow-field predictions, as it re-formulates the regression problem from an un-structured to a structured one, as well as reducing the number of degrees of freedom. Compared to traditional "black-box" surrogate models, it provides explainability to the predictions of overall performance by identifying the corresponding aerodynamic drivers. This is applied to model the effect of manufacturing and build variations, as the associated performance scatter is known to have a significant impact on $CO_2$ emissions, therefore posing a challenge of great industrial and environmental relevance. The proposed architecture is proven to achieve an accuracy comparable to that of the CFD benchmark, in real-time, for an industrially relevant application. The deployed model, is readily integrated within the manufacturing and build process of gas turbines, thus providing the opportunity to analytically assess the impact on performance with actionable and explainable data.
Authors: Nathael Da Costa, Cyrus Mostajeran, Juan-Pablo Ortega, Salem Said
Abstract: This work aims to prove that the classical Gaussian kernel, when defined on a non-Euclidean symmetric space, is never positive-definite for any choice of parameter. To achieve this goal, the paper develops new geometric and analytical arguments. These provide a rigorous characterization of the positive-definiteness of the Gaussian kernel, which is complete but for a limited number of scenarios in low dimensions that are treated by numerical computations. Chief among these results are the L$^{\!\scriptscriptstyle p}$-$\hspace{0.02cm}$Godement theorems (where $p = 1,2$), which provide verifiable necessary and sufficient conditions for a kernel defined on a symmetric space of non-compact type to be positive-definite. A celebrated theorem, sometimes called the Bochner-Godement theorem, already gives such conditions and is far more general in its scope, but is especially hard to apply. Beyond the connection with the Gaussian kernel, the new results in this work lay out a blueprint for the study of invariant kernels on symmetric spaces, bringing forth specific harmonic analysis tools that suggest many future applications.
Authors: Krzysztof Maziarz, Austin Tripp, Guoqing Liu, Megan Stanley, Shufang Xie, Piotr Gai\'nski, Philipp Seidl, Marwin Segler
Abstract: Automated Synthesis Planning has recently re-emerged as a research area at the intersection of chemistry and machine learning. Despite the appearance of steady progress, we argue that imperfect benchmarks and inconsistent comparisons mask systematic shortcomings of existing techniques, and unnecessarily hamper progress. To remedy this, we present a synthesis planning library with an extensive benchmarking framework, called syntheseus, which promotes best practice by default, enabling consistent meaningful evaluation of single-step models and multi-step planning algorithms. We demonstrate the capabilities of syntheseus by re-evaluating several previous retrosynthesis algorithms, and find that the ranking of state-of-the-art models changes in controlled evaluation experiments. We end with guidance for future works in this area, and call the community to engage in the discussion on how to improve benchmarks for synthesis planning.
Authors: Yaodong Yu, Sam Buchanan, Druv Pai, Tianzhe Chu, Ziyang Wu, Shengbang Tong, Hao Bai, Yuexiang Zhai, Benjamin D. Haeffele, Yi Ma
Abstract: In this paper, we contend that a natural objective of representation learning is to compress and transform the distribution of the data, say sets of tokens, towards a low-dimensional Gaussian mixture supported on incoherent subspaces. The goodness of such a representation can be evaluated by a principled measure, called sparse rate reduction, that simultaneously maximizes the intrinsic information gain and extrinsic sparsity of the learned representation. From this perspective, popular deep network architectures, including transformers, can be viewed as realizing iterative schemes to optimize this measure. Particularly, we derive a transformer block from alternating optimization on parts of this objective: the multi-head self-attention operator compresses the representation by implementing an approximate gradient descent step on the coding rate of the features, and the subsequent multi-layer perceptron sparsifies the features. This leads to a family of white-box transformer-like deep network architectures, named CRATE, which are mathematically fully interpretable. We show, by way of a novel connection between denoising and compression, that the inverse to the aforementioned compressive encoding can be realized by the same class of CRATE architectures. Thus, the so-derived white-box architectures are universal to both encoders and decoders. Experiments show that these networks, despite their simplicity, indeed learn to compress and sparsify representations of large-scale real-world image and text datasets, and achieve performance very close to highly engineered transformer-based models: ViT, MAE, DINO, BERT, and GPT2. We believe the proposed computational framework demonstrates great potential in bridging the gap between theory and practice of deep learning, from a unified perspective of data compression. Code is available at: https://ma-lab-berkeley.github.io/CRATE .
Authors: Nadezda Alexandrovna Knorozova, Alessandro Ronca
Abstract: Recurrent Neural Cascades (RNCs) are the recurrent neural networks with no cyclic dependencies among recurrent neurons. This class of recurrent networks has received a lot of attention in practice. Besides training methods for a fixed architecture such as backpropagation, the cascade architecture naturally allows for constructive learning methods, where recurrent nodes are added incrementally one at a time, often yielding smaller networks. Furthermore, acyclicity amounts to a structural prior that even for the same number of neurons yields a more favourable sample complexity compared to a fully-connected architecture. A central question is whether the advantages of the cascade architecture come at the cost of a reduced expressivity. We provide new insights into this question. We show that the regular languages captured by RNCs with sign and tanh activation with positive recurrent weights are the star-free regular languages. In order to establish our results we developed a novel framework where capabilities of RNCs are accessed by analysing which semigroups and groups a single neuron is able to implement. A notable implication of our framework is that RNCs can achieve the expressivity of all regular languages by introducing neurons that can implement groups.
Authors: Herbert Woisetschl\"ager, Alexander Isenko, Shiqiang Wang, Ruben Mayer, Hans-Arno Jacobsen
Abstract: Federated Learning (FL) has become an established technique to facilitate privacy-preserving collaborative training across a multitude of clients. However, new approaches to FL often discuss their contributions involving small deep-learning models only and focus on training full models on clients. In the wake of Foundation Models (FM), the reality is different for many deep learning applications. Typically, FMs have already been pre-trained across a wide variety of tasks and can be fine-tuned to specific downstream tasks over significantly smaller datasets than required for full model training. However, access to such datasets is often challenging. By its design, FL can help to open data silos. With this survey, we introduce a novel taxonomy focused on computational and communication efficiency, the vital elements to make use of FMs in FL systems. We discuss the benefits and drawbacks of parameter-efficient fine-tuning (PEFT) for FL applications, elaborate on the readiness of FL frameworks to work with FMs, and provide future research opportunities on how to evaluate generative models in FL as well as the interplay of privacy and PEFT.
Authors: Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazar\'e, Maria Lomeli, Lucas Hosseini, Herv\'e J\'egou
Abstract: Vector databases typically manage large collections of embedding vectors. Currently, AI applications are growing rapidly, and so is the number of embeddings that need to be stored and indexed. The Faiss library is dedicated to vector similarity search, a core functionality of vector databases. Faiss is a toolkit of indexing methods and related primitives used to search, cluster, compress and transform vectors. This paper describes the trade-off space of vector search and the design principles of Faiss in terms of structure, approach to optimization and interfacing. We benchmark key features of the library and discuss a few selected applications to highlight its broad applicability.
Authors: Giuseppe Alessio D'Inverno, Jonathan Dong
Abstract: Reservoir Computing (RC) has become popular in recent years due to its fast and efficient computational capabilities. Standard RC has been shown to be equivalent in the asymptotic limit to Recurrent Kernels, which helps in analyzing its expressive power. However, many well-established RC paradigms, such as Leaky RC, Sparse RC, and Deep RC, are yet to be analyzed in such a way. This study aims to fill this gap by providing an empirical analysis of the equivalence of specific RC architectures with their corresponding Recurrent Kernel formulation. We conduct a convergence study by varying the activation function implemented in each architecture. Our study also sheds light on the role of sparse connections in RC architectures and propose an optimal sparsity level that depends on the reservoir size. Furthermore, our systematic analysis shows that in Deep RC models, convergence is better achieved with successive reservoirs of decreasing sizes.
Authors: Julian Rodemann, Hannah Blocher
Abstract: We introduce a framework for benchmarking optimizers according to multiple criteria over various test functions. Based on a recently introduced union-free generic depth function for partial orders/rankings, it fully exploits the ordinal information and allows for incomparability. Our method describes the distribution of all partial orders/rankings, avoiding the notorious shortcomings of aggregation. This permits to identify test functions that produce central or outlying rankings of optimizers and to assess the quality of benchmarking suites.
Authors: Haeun Jeon, Hyunglip Bae, Minsu Park, Chanyeong Kim, Woo Chang Kim
Abstract: In decision-making problem under uncertainty, predicting unknown parameters is often considered independent of the optimization part. Decision-focused Learning (DFL) is a task-oriented framework to integrate prediction and optimization by adapting predictive model to give better decision for the corresponding task. Here, an inevitable challenge arises when computing gradients of the optimal decision with respect to the parameters. Existing researches cope this issue by smoothly reforming surrogate optimization or construct surrogate loss function that mimic task loss. However, they are applied to restricted optimization domain. In this paper, we propose Locally Convex Global Loss Network (LCGLN), a global surrogate loss model which can be implemented in a general DFL paradigm. LCGLN learns task loss via partial input convex neural network which is guaranteed to be convex for chosen inputs, while keeping the non-convex global structure for the other inputs. This enables LCGLN to admit general DFL through only a single surrogate loss without any sense for choosing appropriate parametric forms. We confirm effectiveness and flexibility of LCGLN by evaluating our proposed model with three stochastic decision-making problems.
Authors: Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario G\"unther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Se\'an \'O h\'Eigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Aleksandar Petrov, Christian Schroeder de Witt, Sumeet Ramesh Motwan, Yoshua Bengio, Danqi Chen, Philip H. S. Torr, Samuel Albanie, Tegan Maharaj, Jakob Foerster, Florian Tramer, He He, Atoosa Kasirzadeh, Yejin Choi, David Krueger
Abstract: This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions.
Authors: Santanu Das, Jatin Batra, Piyush Srivastava
Abstract: In contemporary deep learning practice, models are often trained to near zero loss i.e. to nearly interpolate the training data. However, the number of parameters in the model is usually far more than the number of data points $n$, the theoretical minimum needed for interpolation: a phenomenon referred to as overparameterization. In an interesting piece of work that contributes to the considerable research that has been devoted to understand overparameterization, Bubeck and Sellke showed that for a broad class of covariate distributions (specifically those satisfying a natural notion of concentration of measure), overparameterization is necessary for robust interpolation i.e. if the interpolating function is required to be Lipschitz. However, their robustness results were proved only in the setting of regression with square loss. In practice, however many other kinds of losses are used, e.g. cross entropy loss for classification. In this work, we generalize Bubeck and Selke's result to Bregman divergence losses, which form a common generalization of square loss and cross-entropy loss. Our generalization relies on identifying a bias-variance type decomposition that lies at the heart of the proof and Bubeck and Sellke.
Authors: Francesca Biagini, Lukas Gonon, Niklas Walter
Abstract: Randomised signature has been proposed as a flexible and easily implementable alternative to the well-established path signature. In this article, we employ randomised signature to introduce a generative model for financial time series data in the spirit of reservoir computing. Specifically, we propose a novel Wasserstein-type distance based on discrete-time randomised signatures. This metric on the space of probability measures captures the distance between (conditional) distributions. Its use is justified by our novel universal approximation results for randomised signatures on the space of continuous functions taking the underlying path as an input. We then use our metric as the loss function in a non-adversarial generator model for synthetic time series data based on a reservoir neural stochastic differential equation. We compare the results of our model to benchmarks from the existing literature.
Authors: Yanshu Wang, Wang Li, Zhaoqian Yao, Tong Yang
Abstract: The matrix quantization entails representing matrix elements in a more space-efficient form to reduce storage usage, with dequantization restoring the original matrix for use. We formulate the Quantization Error Minimization (QEM) problem as minimizing the distance between a matrix before and after quantization, under the condition that the quantized matrix occupies the same memory space. Matrix quantization is crucial in various applications, including Large Language Models (LLMs) weight quantization, vector databases, KV cache quantization, graph compression, and image compression. Recent advancements in LLMs, such as GPT-4 and BERT, have highlighted the importance of matrix compression due to the large size of parameters and KV cache, which are stored as matrices. We propose Quantum Entanglement Trees (QET) to address the QEM problem by leveraging the local orderliness of matrix elements, involving iterative element swapping to form a locally ordered matrix. This matrix is then grouped and quantized by columns. To enhance QET, we introduce two optimizations: further quantizing residuals to reduce MSE, and using masking and batch processing to accelerate the algorithm. Experimental results demonstrate that QET can effectively reduce MSE to 5.05%, 13.33%, and 11.89% of the current best method on the LLM dataset, K cache, and V cache, respectively. Our contributions include the abstraction of the QEM problem, the design of the QET algorithm, and the proposal of two optimizations to improve accuracy and speed.
Authors: Tianheng Ling, Chao Qian, Gregor Schiele
Abstract: This paper presents the design of a hardware accelerator for Transformers, optimized for on-device time-series forecasting in AIoT systems. It integrates integer-only quantization and Quantization-Aware Training with optimized hardware designs to realize 6-bit and 4-bit quantized Transformer models, which achieved precision comparable to 8-bit quantized models from related research. Utilizing a complete implementation on an embedded FPGA (Xilinx Spartan-7 XC7S15), we examine the feasibility of deploying Transformer models on embedded IoT devices. This includes a thorough analysis of achievable precision, resource utilization, timing, power, and energy consumption for on-device inference. Our results indicate that while sufficient performance can be attained, the optimization process is not trivial. For instance, reducing the quantization bitwidth does not consistently result in decreased latency or energy consumption, underscoring the necessity of systematically exploring various optimization combinations. Compared to an 8-bit quantized Transformer model in related studies, our 4-bit quantized Transformer model increases test loss by only 0.63%, operates up to 132.33x faster, and consumes 48.19x less energy.
Authors: Zikui Cai, Yaoteng Tan, M. Salman Asif
Abstract: Unauthorized privacy-related and copyrighted content generation using generative-AI is becoming a significant concern for human society, raising ethical, legal, and privacy issues that demand urgent attention. The EU's General Data Protection Regulation (GDPR) include a "right to be forgotten," which allows individuals to request the deletion of their personal data. However, this primarily applies to data stored in traditional databases, not AI models. Recently, machine unlearning techniques have arise that attempt to eliminate the influence of sensitive content used during AI model training, but they often require extensive updates to the deployed systems and incur substantial computational costs. In this work, we propose a novel and efficient method called Single Layer Unlearning Gradient (SLUG), that can unlearn targeted information by updating targeted layers of a model using a one-time gradient computation. Our method is highly modular and enables the selective removal of multiple sensitive concepts, such as celebrity names and copyrighted content, from the generated outputs of widely used foundation models (e.g., CLIP) and generative models (e.g., Stable Diffusion). Broadly, our method ensures AI-generated content complies with privacy regulations and intellectual property laws, fostering responsible use of generative models, mitigating legal risks and promoting a trustworthy, socially responsible AI ecosystem.
Authors: Yang Qiu, Wei Liu, Jun Wang, Ruixuan Li
Abstract: This article introduces PAGE, a parameterized generative interpretive framework. PAGE is capable of providing faithful explanations for any graph neural network without necessitating prior knowledge or internal details. Specifically, we train the auto-encoder to generate explanatory substructures by designing appropriate training strategy. Due to the dimensionality reduction of features in the latent space of the auto-encoder, it becomes easier to extract causal features leading to the model's output, which can be easily employed to generate explanations. To accomplish this, we introduce an additional discriminator to capture the causality between latent causal features and the model's output. By designing appropriate optimization objectives, the well-trained discriminator can be employed to constrain the encoder in generating enhanced causal features. Finally, these features are mapped to substructures of the input graph through the decoder to serve as explanations. Compared to existing methods, PAGE operates at the sample scale rather than nodes or edges, eliminating the need for perturbation or encoding processes as seen in previous methods. Experimental results on both artificially synthesized and real-world datasets demonstrate that our approach not only exhibits the highest faithfulness and accuracy but also significantly outperforms baseline models in terms of efficiency.
Authors: Richard Bergna, Sergio Calvo-Ordo\~nez, Felix L. Opolka, Pietro Li\`o, Jose Miguel Hernandez-Lobato
Abstract: We address the problem of learning uncertainty-aware representations for graph-structured data. While Graph Neural Ordinary Differential Equations (GNODE) are effective in learning node representations, they fail to quantify uncertainty. To address this, we introduce Latent Graph Neural Stochastic Differential Equations (LGNSDE), which enhance GNODE by embedding randomness through Brownian motion to quantify uncertainty. We provide theoretical guarantees for LGNSDE and empirically show better performance in uncertainty quantification.
Authors: Chao Wang, Neo Wu, Lin Ning, Jiaxing Wu, Luyang Liu, Jun Xie, Shawn O'Banion, Bradley Green
Abstract: Large language models (LLMs) have shown remarkable capabilities in generating user summaries from a long list of raw user activity data. These summaries capture essential user information such as preferences and interests, and therefore are invaluable for LLM-based personalization applications, such as explainable recommender systems. However, the development of new summarization techniques is hindered by the lack of ground-truth labels, the inherent subjectivity of user summaries, and human evaluation which is often costly and time-consuming. To address these challenges, we introduce \UserSumBench, a benchmark framework designed to facilitate iterative development of LLM-based summarization approaches. This framework offers two key components: (1) A reference-free summary quality metric. We show that this metric is effective and aligned with human preferences across three diverse datasets (MovieLens, Yelp and Amazon Review). (2) A novel robust summarization method that leverages time-hierarchical summarizer and self-critique verifier to produce high-quality summaries while eliminating hallucination. This method serves as a strong baseline for further innovation in summarization techniques.
Authors: Cong Zhang, Shuyi Du, Hongqing Song, Yuhe Wang
Abstract: Estimating spatially distributed information through the interpolation of scattered observation datasets often overlooks the critical role of domain knowledge in understanding spatial dependencies. Additionally, the features of these data sets are typically limited to the spatial coordinates of the scattered observation locations. In this paper, we propose a hybrid framework that integrates data-driven spatial dependency feature extraction with rule-assisted spatial dependency function mapping to augment domain knowledge. We demonstrate the superior performance of our framework in two comparative application scenarios, highlighting its ability to capture more localized spatial features in the reconstructed distribution fields. Furthermore, we underscore its potential to enhance nonlinear estimation capabilities through the application of transformed fuzzy rules and to quantify the inherent uncertainties associated with the observation data sets. Our framework introduces an innovative approach to spatial information estimation by synergistically combining observational data with rule-assisted domain knowledge.
Authors: Hongyuan Su, Yu Zheng, Jingtao Ding, Depeng Jin, Yong Li
Abstract: The facility location problem (FLP) is a classical combinatorial optimization challenge aimed at strategically laying out facilities to maximize their accessibility. In this paper, we propose a reinforcement learning method tailored to solve large-scale urban FLP, capable of producing near-optimal solutions at superfast inference speed. We distill the essential swap operation from local search, and simulate it by intelligently selecting edges on a graph of urban regions, guided by a knowledge-informed graph neural network, thus sidestepping the need for heavy computation of local search. Extensive experiments on four US cities with different geospatial conditions demonstrate that our approach can achieve comparable performance to commercial solvers with less than 5\% accessibility loss, while displaying up to 1000 times speedup. We deploy our model as an online geospatial application at https://huggingface.co/spaces/randommmm/MFLP.
Authors: Raphael Lafargue, Luke Smith, Franck Vermet, Mathias L\"owe, Ian Reid, Vincent Gripon, Jack Valmadre
Abstract: The predominant method for computing confidence intervals (CI) in few-shot learning (FSL) is based on sampling the tasks with replacement, i.e.\ allowing the same samples to appear in multiple tasks. This makes the CI misleading in that it takes into account the randomness of the sampler but not the data itself. To quantify the extent of this problem, we conduct a comparative analysis between CIs computed with and without replacement. These reveal a notable underestimation by the predominant method. This observation calls for a reevaluation of how we interpret confidence intervals and the resulting conclusions in FSL comparative studies. Our research demonstrates that the use of paired tests can partially address this issue. Additionally, we explore methods to further reduce the (size of the) CI by strategically sampling tasks of a specific size. We also introduce a new optimized benchmark, which can be accessed at https://github.com/RafLaf/FSL-benchmark-again
Authors: Chris Stanford, Suman Adari, Xishun Liao, Yueshuai He, Qinhua Jiang, Chenchen Kuai, Jiaqi Ma, Emmanuel Tung, Yinlong Qian, Lingyi Zhao, Zihao Zhou, Zeeshan Rasheed, Khurram Shafique
Abstract: Collecting real-world mobility data is challenging. It is often fraught with privacy concerns, logistical difficulties, and inherent biases. Moreover, accurately annotating anomalies in large-scale data is nearly impossible, as it demands meticulous effort to distinguish subtle and complex patterns. These challenges significantly impede progress in geospatial anomaly detection research by restricting access to reliable data and complicating the rigorous evaluation, comparison, and benchmarking of methodologies. To address these limitations, we introduce a synthetic mobility dataset, NUMOSIM, that provides a controlled, ethical, and diverse environment for benchmarking anomaly detection techniques. NUMOSIM simulates a wide array of realistic mobility scenarios, encompassing both typical and anomalous behaviours, generated through advanced deep learning models trained on real mobility data. This approach allows NUMOSIM to accurately replicate the complexities of real-world movement patterns while strategically injecting anomalies to challenge and evaluate detection algorithms based on how effectively they capture the interplay between demographic, geospatial, and temporal factors. Our goal is to advance geospatial mobility analysis by offering a realistic benchmark for improving anomaly detection and mobility modeling techniques. To support this, we provide open access to the NUMOSIM dataset, along with comprehensive documentation, evaluation metrics, and benchmark results.
Authors: Lucas Felipe Ferraro Cardoso, Jos\'e de Sousa Ribeiro Filho, Vitor Cirilo Araujo Santos, Regiane Silva Kawasaki Frances, Ronnie Cley de Oliveira Alves
Abstract: Although fundamental to the advancement of Machine Learning, the classic evaluation metrics extracted from the confusion matrix, such as precision and F1, are limited. Such metrics only offer a quantitative view of the models' performance, without considering the complexity of the data or the quality of the hit. To overcome these limitations, recent research has introduced the use of psychometric metrics such as Item Response Theory (IRT), which allows an assessment at the level of latent characteristics of instances. This work investigates how IRT concepts can enrich a confusion matrix in order to identify which model is the most appropriate among options with similar performance. In the study carried out, IRT does not replace, but complements classical metrics by offering a new layer of evaluation and observation of the fine behavior of models in specific instances. It was also observed that there is 97% confidence that the score from the IRT has different contributions from 66% of the classical metrics analyzed.
Authors: Juho Timonen, Harri L\"ahdesm\"aki
Abstract: Gaussian process (GP) models that combine both categorical and continuous input variables have found use in longitudinal data analysis of and computer experiments. However, standard inference for these models has the typical cubic scaling, and common scalable approximation schemes for GPs cannot be applied since the covariance function is non-continuous. In this work, we derive a basis function approximation scheme for mixed-domain covariance functions, which scales linearly with respect to the number of observations and total number of basis functions. The proposed approach is naturally applicable to also Bayesian GP regression with discrete observation models. We demonstrate the scalability of the approach and compare model reduction techniques for additive GP models in a longitudinal data context. We confirm that we can approximate the exact GP model accurately in a fraction of the runtime compared to fitting the corresponding exact model. In addition, we demonstrate a scalable model reduction workflow for obtaining smaller and more interpretable models when dealing with a large number of candidate predictors.
Authors: J. Jon Ryu, Young-Han Kim
Abstract: This paper presents how to perform minimax optimal classification, regression, and density estimation based on fixed-$k$ nearest neighbor (NN) searches. We consider a distributed learning scenario, in which a massive dataset is split into smaller groups, where the $k$-NNs are found for a query point with respect to each subset of data. We propose \emph{optimal} rules to aggregate the fixed-$k$-NN information for classification, regression, and density estimation that achieve minimax optimal rates for the respective problems. We show that the distributed algorithm with a fixed $k$ over a sufficiently large number of groups attains a minimax optimal error rate up to a multiplicative logarithmic factor under some regularity conditions. Roughly speaking, distributed $k$-NN rules with $M$ groups has a performance comparable to the standard $\Theta(kM)$-NN rules even for fixed $k$.
Authors: Haoyu Jiang, Jason Xu
Abstract: Stochastic versions of proximal methods have gained much attention in statistics and machine learning. These algorithms tend to admit simple, scalable forms, and enjoy numerical stability via implicit updates. In this work, we propose and analyze a stochastic version of the recently proposed proximal distance algorithm, a class of iterative optimization methods that recover a desired constrained estimation problem as a penalty parameter $\rho \rightarrow \infty$. By uncovering connections to related stochastic proximal methods and interpreting the penalty parameter as the learning rate, we justify heuristics used in practical manifestations of the proximal distance method, establishing their convergence guarantees for the first time. Moreover, we extend recent theoretical devices to establish finite error bounds and a complete characterization of convergence rates regimes. We validate our analysis via a thorough empirical study, also showing that unsurprisingly, the proposed method outpaces batch versions on popular learning tasks.
Authors: Niloufar Fakhfour, Mohammad ShahverdiKondori, Sajjad Hashembeiki, Mohammadjavad Norouzi, Hoda Mohammadzade
Abstract: In this paper, we tackle the problem of video alignment, the process of matching the frames of a pair of videos containing similar actions. The main challenge in video alignment is that accurate correspondence should be established despite the differences in the execution processes and appearances between the two videos. We introduce an unsupervised method for alignment that uses global and local features of the frames. In particular, we introduce effective features for each video frame by means of three machine vision tools: person detection, pose estimation, and VGG network. Then the features are processed and combined to construct a multidimensional time series that represent the video. The resulting time series are used to align videos of the same actions using a novel version of dynamic time warping named Diagonalized Dynamic Time Warping(DDTW). The main advantage of our approach is that no training is required, which makes it applicable for any new type of action without any need to collect training samples for it. Additionally, our approach can be used for framewise labeling of action phases in a dataset with only a few labeled videos. For evaluation, we considered video synchronization and phase classification tasks on the Penn action and subset of UCF101 datasets. Also, for an effective evaluation of the video synchronization task, we present a new metric called Enclosed Area Error(EAE). The results show that our method outperforms previous state-of-the-art methods, such as TCC, and other self-supervised and weakly supervised methods.
Authors: Philipp Pilar, Niklas Wahlstr\"om
Abstract: Generative adversarial networks constitute a powerful approach to generative modeling. While generated samples often are indistinguishable from real data, there is no guarantee that they will follow the true data distribution. For scientific applications in particular, it is essential that the true distribution is well captured by the generated distribution. In this work, we propose a method to ensure that the distributions of certain generated data statistics coincide with the respective distributions of the real data. In order to achieve this, we add a new loss term to the generator loss function, which quantifies the difference between these distributions via suitable f-divergences. Kernel density estimation is employed to obtain representations of the true distributions, and to estimate the corresponding generated distributions from minibatch values at each iteration. When compared to other methods, our approach has the advantage that the complete shapes of the distributions are taken into account. We evaluate the method on a synthetic dataset and a real-world dataset and demonstrate improved performance of our approach.
Authors: Jiaxing Xu, Qingtian Bian, Xinhang Li, Aihu Zhang, Yiping Ke, Miao Qiao, Wei Zhang, Wei Khang Jeremy Sim, Bal\'azs Guly\'as
Abstract: Functional magnetic resonance imaging (fMRI) is a commonly used technique to measure neural activation. Its application has been particularly important in identifying underlying neurodegenerative conditions such as Parkinson's, Alzheimer's, and Autism. Recent analysis of fMRI data models the brain as a graph and extracts features by graph neural networks (GNNs). However, the unique characteristics of fMRI data require a special design of GNN. Tailoring GNN to generate effective and domain-explainable features remains challenging. In this paper, we propose a contrastive dual-attention block and a differentiable graph pooling method called ContrastPool to better utilize GNN for brain networks, meeting fMRI-specific requirements. We apply our method to 5 resting-state fMRI brain network datasets of 3 diseases and demonstrate its superiority over state-of-the-art baselines. Our case study confirms that the patterns extracted by our method match the domain knowledge in neuroscience literature, and disclose direct and interesting insights. Our contributions underscore the potential of ContrastPool for advancing the understanding of brain networks and neurodegenerative conditions. The source code is available at https://github.com/AngusMonroe/ContrastPool.
Authors: Zihong Sun, Hong Wang, Qi Xie, Yefeng Zheng, Deyu Meng
Abstract: Retinal vessel segmentation is of great clinical significance for the diagnosis of many eye-related diseases, but it is still a formidable challenge due to the intricate vascular morphology. With the skillful characterization of the translation symmetry existing in retinal vessels, convolutional neural networks (CNNs) have achieved great success in retinal vessel segmentation. However, the rotation-and-scale symmetry, as a more widespread image prior in retinal vessels, fails to be characterized by CNNs. Therefore, we propose a rotation-and-scale equivariant Fourier parameterized convolution (RSF-Conv) specifically for retinal vessel segmentation, and provide the corresponding equivariance analysis. As a general module, RSF-Conv can be integrated into existing networks in a plug-and-play manner while significantly reducing the number of parameters. For instance, we replace the traditional convolution filters in U-Net and Iter-Net with RSF-Convs, and faithfully conduct comprehensive experiments. RSF-Conv+U-Net and RSF-Conv+Iter-Net not only have slight advantages under in-domain evaluation, but more importantly, outperform all comparison methods by a significant margin under out-of-domain evaluation. It indicates the remarkable generalization of RSF-Conv, which holds greater practical clinical significance for the prevalent cross-device and cross-hospital challenges in clinical practice. To comprehensively demonstrate the effectiveness of RSF-Conv, we also apply RSF-Conv+U-Net and RSF-Conv+Iter-Net to retinal artery/vein classification and achieve promising performance as well, indicating its clinical application potential.
Authors: Giulio Giacomuzzos, Ruggero Carli, Diego Romeres, Alberto Dalla Libera
Abstract: Learning the inverse dynamics of robots directly from data, adopting a black-box approach, is interesting for several real-world scenarios where limited knowledge about the system is available. In this paper, we propose a black-box model based on Gaussian Process (GP) Regression for the identification of the inverse dynamics of robotic manipulators. The proposed model relies on a novel multidimensional kernel, called \textit{Lagrangian Inspired Polynomial} (\kernelInitials{}) kernel. The \kernelInitials{} kernel is based on two main ideas. First, instead of directly modeling the inverse dynamics components, we model as GPs the kinetic and potential energy of the system. The GP prior on the inverse dynamics components is derived from those on the energies by applying the properties of GPs under linear operators. Second, as regards the energy prior definition, we prove a polynomial structure of the kinetic and potential energy, and we derive a polynomial kernel that encodes this property. As a consequence, the proposed model allows also to estimate the kinetic and potential energy without requiring any label on these quantities. Results on simulation and on two real robotic manipulators, namely a 7 DOF Franka Emika Panda, and a 6 DOF MELFA RV4FL, show that the proposed model outperforms state-of-the-art black-box estimators based both on Gaussian Processes and Neural Networks in terms of accuracy, generality and data efficiency. The experiments on the MELFA robot also demonstrate that our approach achieves performance comparable to fine-tuned model-based estimators, despite requiring less prior information.
Authors: Can Jin, Tianjin Huang, Yihua Zhang, Mykola Pechenizkiy, Sijia Liu, Shiwei Liu, Tianlong Chen
Abstract: The rapid development of large-scale deep learning models questions the affordability of hardware platforms, which necessitates the pruning to reduce their computational and memory footprints. Sparse neural networks as the product, have demonstrated numerous favorable benefits like low complexity, undamaged generalization, etc. Most of the prominent pruning strategies are invented from a model-centric perspective, focusing on searching and preserving crucial weights by analyzing network topologies. However, the role of data and its interplay with model-centric pruning has remained relatively unexplored. In this research, we introduce a novel data-model co-design perspective: to promote superior weight sparsity by learning important model topology and adequate input data in a synergetic manner. Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework. As a pioneering effort, this paper conducts systematic investigations about the impact of different visual prompts on model pruning and suggests an effective joint optimization approach. Extensive experiments with 3 network architectures and 8 datasets evidence the substantial performance improvements from VPNs over existing start-of-the-art pruning algorithms. Furthermore, we find that subnetworks discovered by VPNs from pre-trained models enjoy better transferability across diverse downstream scenarios. These insights shed light on new promising possibilities of data-model co-designs for vision model sparsification.
Authors: Yiwen Zheng, Prakash Thakolkaran, Agni K. Biswal, Jake A. Smith, Ziheng Lu, Shuxin Zheng, Bichlien H. Nguyen, Siddhant Kumar, Aniruddh Vashisth
Abstract: Vitrimer is a new, exciting class of sustainable polymers with the ability to heal due to their dynamic covalent adaptive network that can go through associative rearrangement reactions. However, a limited choice of constituent molecules restricts their property space, prohibiting full realization of their potential applications. To overcome this challenge, we couple molecular dynamics (MD) simulations and a novel graph variational autoencoder (VAE) machine learning model for inverse design of vitrimer chemistries with desired glass transition temperature (Tg) and synthesize a novel vitrimer polymer. We build the first vitrimer dataset of one million chemistries and calculate Tg on 8,424 of them by high-throughput MD simulations calibrated by a Gaussian process model. The proposed novel VAE employs dual graph encoders and a latent dimension overlapping scheme which allows for individual representation of multi-component vitrimers. By constructing a continuous latent space containing necessary information of vitrimers, we demonstrate high accuracy and efficiency of our framework in discovering novel vitrimers with desirable Tg beyond the training regime. To validate the effectiveness of our framework in experiments, we generate novel vitrimer chemistries with a target Tg = 323 K. By incorporating chemical intuition, we synthesize a vitrimer with Tg of 311-317 K, and experimentally demonstrate healability and flowability. The proposed framework offers an exciting tool for polymer chemists to design and synthesize novel, sustainable vitrimer polymers for a facet of applications.
Authors: Ruoyu Chen, Hua Zhang, Siyuan Liang, Jingzhi Li, Xiaochun Cao
Abstract: Image attribution algorithms aim to identify important regions that are highly relevant to model decisions. Although existing attribution solutions can effectively assign importance to target elements, they still face the following challenges: 1) existing attribution methods generate inaccurate small regions thus misleading the direction of correct attribution, and 2) the model cannot produce good attribution results for samples with wrong predictions. To address the above challenges, this paper re-models the above image attribution problem as a submodular subset selection problem, aiming to enhance model interpretability using fewer regions. To address the lack of attention to local regions, we construct a novel submodular function to discover more accurate small interpretation regions. To enhance the attribution effect for all samples, we also impose four different constraints on the selection of sub-regions, i.e., confidence, effectiveness, consistency, and collaboration scores, to assess the importance of various subsets. Moreover, our theoretical analysis substantiates that the proposed function is in fact submodular. Extensive experiments show that the proposed method outperforms SOTA methods on two face datasets (Celeb-A and VGG-Face2) and one fine-grained dataset (CUB-200-2011). For correctly predicted samples, the proposed method improves the Deletion and Insertion scores with an average of 4.9% and 2.5% gain relative to HSIC-Attribution. For incorrectly predicted samples, our method achieves gains of 81.0% and 18.4% compared to the HSIC-Attribution algorithm in the average highest confidence and Insertion score respectively. The code is released at https://github.com/RuoyuChen10/SMDL-Attribution.
Authors: Ali Aghdaei, Zhuo Feng
Abstract: This work presents inGRASS, a novel algorithm designed for incremental spectral sparsification of large undirected graphs. The proposed inGRASS algorithm is highly scalable and parallel-friendly, having a nearly-linear time complexity for the setup phase and the ability to update the spectral sparsifier in $O(\log N)$ time for each incremental change made to the original graph with $N$ nodes. A key component in the setup phase of inGRASS is a multilevel resistance embedding framework introduced for efficiently identifying spectrally-critical edges and effectively detecting redundant ones, which is achieved by decomposing the initial sparsifier into many node clusters with bounded effective-resistance diameters leveraging a low-resistance-diameter decomposition (LRD) scheme. The update phase of inGRASS exploits low-dimensional node embedding vectors for efficiently estimating the importance and uniqueness of each newly added edge. As demonstrated through extensive experiments, inGRASS achieves up to over $200 \times$ speedups while retaining comparable solution quality in incremental spectral sparsification of graphs obtained from various datasets, such as circuit simulations, finite element analysis, and social networks.
Authors: Salah Ghamizi, Jun Cao, Aoxiang Ma, Pedro Rodriguez
Abstract: Efficiently solving unbalanced three-phase power flow in distribution grids is pivotal for grid analysis and simulation. There is a pressing need for scalable algorithms capable of handling large-scale unbalanced power grids that can provide accurate and fast solutions. To address this, deep learning techniques, especially Graph Neural Networks (GNNs), have emerged. However, existing literature primarily focuses on balanced networks, leaving a critical gap in supporting unbalanced three-phase power grids. This letter introduces PowerFlowMultiNet, a novel multigraph GNN framework explicitly designed for unbalanced three-phase power grids. The proposed approach models each phase separately in a multigraph representation, effectively capturing the inherent asymmetry in unbalanced grids. A graph embedding mechanism utilizing message passing is introduced to capture spatial dependencies within the power system network. PowerFlowMultiNet outperforms traditional methods and other deep learning approaches in terms of accuracy and computational speed. Rigorous testing reveals significantly lower error rates and a notable hundredfold increase in computational speed for large power networks compared to model-based methods.
Authors: Caio Cesar Daumann, Mauro Donega, Johannes Erdmann, Massimiliano Galli, Jan Lukas Sp\"ah, Davide Valsecchi
Abstract: Simulated events are key ingredients in almost all high-energy physics analyses. However, imperfections in the simulation can lead to sizeable differences between the observed data and simulated events. The effects of such mismodelling on relevant observables must be corrected either effectively via scale factors, with weights or by modifying the distributions of the observables and their correlations. We introduce a correction method that transforms one multidimensional distribution (simulation) into another one (data) using a simple architecture based on a single normalising flow with a boolean condition. We demonstrate the effectiveness of the method on a physics-inspired toy dataset with non-trivial mismodelling of several observables and their correlations.
Authors: Dongwei Jiang, Jingyu Zhang, Orion Weller, Nathaniel Weir, Benjamin Van Durme, Daniel Khashabi
Abstract: Can LLMs consistently improve their previous outputs for better results? For this to be true, LLMs would need to be better at discriminating among previously-generated alternatives, than generating initial responses. We explore the validity of this hypothesis in practice. We first formulate a unified framework that allows us to compare the generative and discriminative capability of any model on any task. In our resulting experimental analysis of several open-source and industrial LLMs, we observe that models are not reliably better at discriminating among previously-generated alternatives than generating initial responses. This finding challenges the notion that LLMs may be able to enhance their performance only through their own judgment.
Authors: Victor Quintas-Martinez, Mohammad Taha Bahadori, Eduardo Santiago, Jeff Mu, Dominik Janzing, David Heckerman
Abstract: Comparing two samples of data, we observe a change in the distribution of an outcome variable. In the presence of multiple explanatory variables, how much of the change can be explained by each possible cause? We develop a new estimation strategy that, given a causal model, combines regression and re-weighting methods to quantify the contribution of each causal mechanism. Our proposed methodology is multiply robust, meaning that it still recovers the target parameter under partial misspecification. We prove that our estimator is consistent and asymptotically normal. Moreover, it can be incorporated into existing frameworks for causal attribution, such as Shapley values, which will inherit the consistency and large-sample distribution properties. Our method demonstrates excellent performance in Monte Carlo simulations, and we show its usefulness in an empirical application. Our method is implemented as part of the Python library DoWhy (arXiv:2011.04216, arXiv:2206.06821).
Authors: Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, Chelsea Finn
Abstract: Large policies pretrained on a combination of Internet-scale vision-language data and diverse robot demonstrations have the potential to change how we teach robots new skills: rather than training new behaviors from scratch, we can fine-tune such vision-language-action (VLA) models to obtain robust, generalizable policies for visuomotor control. Yet, widespread adoption of VLAs for robotics has been challenging as 1) existing VLAs are largely closed and inaccessible to the public, and 2) prior work fails to explore methods for efficiently fine-tuning VLAs for new tasks, a key component for adoption. Addressing these challenges, we introduce OpenVLA, a 7B-parameter open-source VLA trained on a diverse collection of 970k real-world robot demonstrations. OpenVLA builds on a Llama 2 language model combined with a visual encoder that fuses pretrained features from DINOv2 and SigLIP. As a product of the added data diversity and new model components, OpenVLA demonstrates strong results for generalist manipulation, outperforming closed models such as RT-2-X (55B) by 16.5% in absolute task success rate across 29 tasks and multiple robot embodiments, with 7x fewer parameters. We further show that we can effectively fine-tune OpenVLA for new settings, with especially strong generalization results in multi-task environments involving multiple objects and strong language grounding abilities, and outperform expressive from-scratch imitation learning methods such as Diffusion Policy by 20.4%. We also explore compute efficiency; as a separate contribution, we show that OpenVLA can be fine-tuned on consumer GPUs via modern low-rank adaptation methods and served efficiently via quantization without a hit to downstream success rate. Finally, we release model checkpoints, fine-tuning notebooks, and our PyTorch codebase with built-in support for training VLAs at scale on Open X-Embodiment datasets.
Authors: Ning Jiang, Weiqi Chu, Yao Li
Abstract: Classical compartmental models in epidemiology often assume a homogeneous population for simplicity, which neglects the inherent heterogeneity among individuals. This assumption frequently leads to inaccurate predictions when applied to real-world data. For example, evidence has shown that classical models overestimate the final pandemic size in the H1N1-2009 and COVID-19 outbreaks. To address this issue, we introduce individual mobility as a key factor in disease transmission and control. We characterize disease dynamics using mobility distribution functions for each compartment and propose a mobility-based compartmental model that incorporates population heterogeneity. Our results demonstrate that, for the same basic reproduction number, our mobility-based model predicts a smaller final pandemic size compared to the classical models, effectively addressing the common overestimation problem. Additionally, we infer mobility distributions from the time series of the infected population. We provide sufficient conditions for uniquely identifying the mobility distribution from a dataset and propose a machine-learning-based approach to learn mobility from both synthesized and real-world data.
Authors: Stanislav Budzinskiy
Abstract: The article concerns low-rank approximation of matrices generated by sampling a smooth function of two $m$-dimensional variables. We refute an argument made in the literature to prove that, for a specific class of analytic functions, such matrices admit accurate entrywise approximation of rank that is independent of $m$ -- a claim known as "big-data matrices are approximately low-rank". We provide a theoretical explanation of the numerical results presented in support of this claim, describing three narrower classes of functions for which $n \times n$ function-generated matrices can be approximated within an entrywise error of order $\varepsilon$ with rank $\mathcal{O}(\log(n) \varepsilon^{-2} \mathrm{polylog}(\varepsilon^{-1}))$ that is independent of the dimension $m$: (i) functions of the inner product of the two variables, (ii) functions of the Euclidean distance between the variables, and (iii) shift-invariant positive-definite kernels. We extend our argument to tensor-train approximation of tensors generated with functions of the multi-linear product of their $m$-dimensional variables. We discuss our results in the context of low-rank approximation of (a) growing datasets and (b) attention in transformer neural networks.
Authors: Zedian Shao, Hongbin Liu, Yuepeng Hu, Neil Zhenqiang Gong
Abstract: Multimodal large language models (MLLMs) have become the cornerstone of today's generative AI ecosystem, sparking intense competition among tech giants and startups. In particular, an MLLM generates a text response given a prompt consisting of an image and a question. While state-of-the-art MLLMs use safety filters and alignment techniques to refuse unsafe prompts, in this work, we introduce MLLM-Refusal, the first method that induces refusals for safe prompts. In particular, our MLLM-Refusal optimizes a nearly-imperceptible refusal perturbation and adds it to an image, causing target MLLMs to likely refuse a safe prompt containing the perturbed image and a safe question. Specifically, we formulate MLLM-Refusal as a constrained optimization problem and propose an algorithm to solve it. Our method offers competitive advantages for MLLM model providers by potentially disrupting user experiences of competing MLLMs, since competing MLLM's users will receive unexpected refusals when they unwittingly use these perturbed images in their prompts. We evaluate MLLM-Refusal on four MLLMs across four datasets, demonstrating its effectiveness in causing competing MLLMs to refuse safe prompts while not affecting non-competing MLLMs. Furthermore, we explore three potential countermeasures-adding Gaussian noise, DiffPure, and adversarial training. Our results show that though they can mitigate MLLM-Refusal's effectiveness, they also sacrifice the accuracy and/or efficiency of the competing MLLM. The code is available at https://github.com/Sadcardation/MLLM-Refusal.
Authors: Lisanne van Gelderen, Cristian Tejedor-Garc\'ia
Abstract: Parkinson's disease (PD), the second most prevalent neurodegenerative disorder worldwide, frequently presents with early-stage speech impairments. Recent advancements in Artificial Intelligence (AI), particularly deep learning (DL), have significantly enhanced PD diagnosis through the analysis of speech data. Nevertheless, the progress of research is restricted by the limited availability of publicly accessible speech-based PD datasets, primarily due to privacy concerns. The goal of this systematic review is to explore the current landscape of speech-based DL approaches for PD classification, based on 33 scientific works published between January 2020 and March 2024. We discuss their available resources, capabilities, and potential limitations, and issues related to bias, explainability, and privacy. Furthermore, this review provides an overview of publicly accessible speech-based datasets and open-source material for PD. The DL approaches identified are categorized into end-to-end (E2E) learning, transfer learning (TL), and deep acoustic feature extraction (DAFE). Among E2E approaches, Convolutional Neural Networks (CNNs) are prevalent, though Transformers are increasingly popular. E2E approaches face challenges such as limited data and computational resources, especially with Transformers. TL addresses these issues by providing more robust PD diagnosis and better generalizability across languages. DAFE aims to improve the explainability and interpretability of results by examining the specific effects of deep features on both other DL approaches and more traditional machine learning (ML) methods. However, it often underperforms compared to E2E and TL approaches.
Authors: Cheng-Wei Ching, Boyuan Guan, Hailu Xu, Liting Hu
Abstract: The life cycle of machine learning (ML) applications consists of two stages: model development and model deployment. However, traditional ML systems (e.g., training-specific or inference-specific systems) focus on one particular stage or phase of the life cycle of ML applications. These systems often aim at optimizing model training or accelerating model inference, and they frequently assume homogeneous infrastructure, which may not always reflect real-world scenarios that include cloud data centers, local servers, containers, and serverless platforms. We present StraightLine, an end-to-end resource-aware scheduler that schedules the optimal resources (e.g., container, virtual machine, or serverless) for different ML application requests in a hybrid infrastructure. The key innovation is an empirical dynamic placing algorithm that intelligently places requests based on their unique characteristics (e.g., request frequency, input data size, and data distribution). In contrast to existing ML systems, StraightLine offers end-to-end resource-aware placement, thereby it can significantly reduce response time and failure rate for model deployment when facing different computing resources in the hybrid infrastructure.
Authors: Alex Sanchez-Stern, Abhishek Varghese, Zhanna Kaufman, Dylan Zhang, Talia Ringer, Yuriy Brun
Abstract: Formal verification is a promising method for producing reliable software, but the difficulty of manually writing verification proofs severely limits its utility in practice. Recent methods have automated some proof synthesis by guiding a search through the proof space using a theorem prover. Unfortunately, the theorem prover provides only the crudest estimate of progress, resulting in effectively undirected search. To address this problem, we create QEDCartographer, an automated proof-synthesis tool that combines supervised and reinforcement learning to more effectively explore the proof space. QEDCartographer incorporates the proofs' branching structure, enabling reward-free search and overcoming the sparse reward problem inherent to formal verification. We evaluate QEDCartographer using the CoqGym benchmark of 68.5K theorems from 124 open-source Coq projects. QEDCartographer fully automatically proves 21.4% of the test-set theorems. Previous search-based proof-synthesis tools Tok, Tac, ASTactic, Passport, and Proverbot9001, which rely only on supervised learning, prove 9.6%, 9.8%, 10.9%, 12.5%, and 19.8%, respectively. Diva, which combines 62 tools, proves 19.2%. Comparing to the most effective prior tool, Proverbot9001, QEDCartographer produces 34% shorter proofs 29% faster, on average over the theorems both tools prove. Together, QEDCartographer and non-learning-based CoqHammer prove 30.3% of the theorems, while CoqHammer alone proves 26.6%. Our work demonstrates that reinforcement learning is a fruitful research direction for improving proof-synthesis tools' search mechanisms.
Authors: Kejin Wu, Dimitris N. Politis
Abstract: In this paper, we provide a novel Model-free approach based on Deep Neural Network (DNN) to accomplish point prediction and prediction interval under a general regression setting. Usually, people rely on parametric or non-parametric models to bridge dependent and independent variables (Y and X). However, this classical method relies heavily on the correct model specification. Even for the non-parametric approach, some additive form is often assumed. A newly proposed Model-free prediction principle sheds light on a prediction procedure without any model assumption. Previous work regarding this principle has shown better performance than other standard alternatives. Recently, DNN, one of the machine learning methods, has received increasing attention due to its great performance in practice. Guided by the Model-free prediction idea, we attempt to apply a fully connected forward DNN to map X and some appropriate reference random variable Z to Y. The targeted DNN is trained by minimizing a specially designed loss function so that the randomness of Y conditional on X is outsourced to Z through the trained DNN. Our method is more stable and accurate compared to other DNN-based counterparts, especially for optimal point predictions. With a specific prediction procedure, our prediction interval can capture the estimation variability so that it can render a better coverage rate for finite sample cases. The superior performance of our method is verified by simulation and empirical studies.
Authors: Heath Arthur-Loui, Amina Mollaysa, Michael Krauthammer
Abstract: De novo molecule design has become a highly active research area, advanced significantly through the use of state-of-the-art generative models. Despite these advances, several fundamental questions remain unanswered as the field increasingly focuses on more complex generative models and sophisticated molecular representations as an answer to the challenges of drug design. In this paper, we return to the simplest representation of molecules, and investigate overlooked limitations of classical generative approaches, particularly Variational Autoencoders (VAEs) and auto-regressive models. We propose a hybrid model in the form of a novel regularizer that leverages the strengths of both to improve validity, conditional generation, and style transfer of molecular sequences. Additionally, we provide an in depth discussion of overlooked assumptions of these models' behaviour.
Authors: Mirgita Frasheri, Arian Bakhtiarnia, Lukas Esterle, Alexandros Iosifidis
Abstract: Countless terms of service (ToS) are being signed everyday by users all over the world while interacting with all kinds of apps and websites. More often than not, these online contracts spanning double-digit pages are signed blindly by users who simply want immediate access to the desired service. What would normally require a consultation with a legal team, has now become a mundane activity consisting of a few clicks where users potentially sign away their rights, for instance in terms of their data privacy, to countless online entities/companies. Large language models (LLMs) are good at parsing long text-based documents, and could potentially be adopted to help users when dealing with dubious clauses in ToS and their underlying privacy policies. To investigate the utility of existing models for this task, we first build a dataset consisting of 12 questions applied individually to a set of privacy policies crawled from popular websites. Thereafter, a series of open-source as well as commercial chatbots such as ChatGPT, are queried over each question, with the answers being compared to a given ground truth. Our results show that some open-source models are able to provide a higher accuracy compared to some commercial models. However, the best performance is recorded from a commercial chatbot (ChatGPT4). Overall, all models perform only slightly better than random at this task. Consequently, their performance needs to be significantly improved before they can be adopted at large for this purpose.
Authors: Wanli Hong, Shuyang Ling
Abstract: Neural collapse (NC) is a phenomenon that emerges at the terminal phase of the training (TPT) of deep neural networks (DNNs). The features of the data in the same class collapse to their respective sample means and the sample means exhibit a simplex equiangular tight frame (ETF). In the past few years, there has been a surge of works that focus on explaining why the NC occurs and how it affects generalization. Since the DNNs are notoriously difficult to analyze, most works mainly focus on the unconstrained feature model (UFM). While the UFM explains the NC to some extent, it fails to provide a complete picture of how the network architecture and the dataset affect NC. In this work, we focus on shallow ReLU neural networks and try to understand how the width, depth, data dimension, and statistical property of the training dataset influence the neural collapse. We provide a complete characterization of when the NC occurs for two or three-layer neural networks. For two-layer ReLU neural networks, a sufficient condition on when the global minimizer of the regularized empirical risk function exhibits the NC configuration depends on the data dimension, sample size, and the signal-to-noise ratio in the data instead of the network width. For three-layer neural networks, we show that the NC occurs as long as the first layer is sufficiently wide. Regarding the connection between NC and generalization, we show the generalization heavily depends on the SNR (signal-to-noise ratio) in the data: even if the NC occurs, the generalization can still be bad provided that the SNR in the data is too low. Our results significantly extend the state-of-the-art theoretical analysis of the N C under the UFM by characterizing the emergence of the N C under shallow nonlinear networks and showing how it depends on data properties and network architecture.