DiffSpectralNet : Unveiling the Potential of Diffusion Models for Hyperspectral Image Classification. (arXiv:2312.12441v1 [cs.CV])

Authors: Neetu Sigger, Tuan Thanh Nguyen, Gianluca Tozzi, Quoc-Tuan Vien, Sinh Van Nguyen

Hyperspectral images (HSI) have become popular for analysing remotely sensed images in multiple domain like agriculture, medical. However, existing models struggle with complex relationships and characteristics of spectral-spatial data due to the multi-band nature and data redundancy of hyperspectral data. To address this limitation, we propose a new network called DiffSpectralNet, which combines diffusion and transformer techniques. Our approach involves a two-step process. First, we use an unsupervised learning framework based on the diffusion model to extract both high-level and low-level spectral-spatial features. The diffusion method is capable of extracting diverse and meaningful spectral-spatial features, leading to improvement in HSI classification. Then, we employ a pretrained denoising U-Net to extract intermediate hierarchical features for classification. Finally, we use a supervised transformer-based classifier to perform the HSI classification. Through comprehensive experiments on HSI datasets, we evaluate the classification performance of DiffSpectralNet. The results demonstrate that our framework significantly outperforms existing approaches, achieving state-of-the-art performance.

Enhancing Understanding of Driving Attributes through Quantitative Assessment of Driver Cognition. (arXiv:2312.12443v1 [eess.SP])

Authors: Pallabjyoti Kakoti, Mukesh Kumar Kamti, Rauf Iqbal, Eeshankur Saikia

This paper presents a novel approach for analysing EEG data from drivers in a simulated driving test. We focused on the Hurst exponent, Shannon entropy, and fractal dimension as markers of the nonlinear dynamics of the brain. The results show significant trends: Shannon Entropy and Fractal Dimension exhibit variations during driving condition transitions, whereas the Hurst exponent reflects memory retention portraying learning patterns. These findings suggest that the tools of Non-linear Dynamical (NLD) Theory as indicators of cognitive state and driving memory changes for assessing driver performance and advancing the understanding of non-linear dynamics of human cognition in the context of driving and beyond. Our study reveals the potential of NLD tools to elucidate brain state and system variances, enabling their integration into current Deep Learning and Machine Learning models. This integration can extend beyond driving applications and be harnessed for cognitive learning, thereby improving overall productivity and accuracy levels.

Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions. (arXiv:2312.12450v1 [cs.SE])

Authors: Federico Cassano, Luisa Li, Akul Sethi, Noah Shinn, Abby Brennan-Jones, Anton Lozhkov, Carolyn Anderson, Arjun Guha

A significant amount of research is focused on developing and evaluating large language models for a variety of code synthesis tasks. These include synthesizing code from natural language instructions, synthesizing tests from code, and synthesizing explanations of code. In contrast, the behavior of instructional code editing with LLMs is understudied. These are tasks in which the model is instructed to update a block of code provided in a prompt. The editing instruction may ask for a feature to added or removed, describe a bug and ask for a fix, ask for a different kind of solution, or many other common code editing tasks.

We introduce a carefully crafted benchmark of code editing tasks and use it evaluate several cutting edge LLMs. Our evaluation exposes a significant gap between the capabilities of state-of-the-art open and closed models. For example, even GPT-3.5-Turbo is 8.8% better than the best open model at editing code.

We also introduce a new, carefully curated, permissively licensed training set of code edits coupled with natural language instructions. Using this training set, we show that we can fine-tune open Code LLMs to significantly improve their code editing capabilities.

Overdrawing Urns using Categories of Signed Probabilities. (arXiv:2312.12453v1 [math.PR])

Authors: Bart Jacobs (iHub, Radboud University Nijmegen), Dario Stein (iHub, Radboud University Nijmegen)

A basic experiment in probability theory is drawing without replacement from an urn filled with multiple balls of different colours. Clearly, it is physically impossible to overdraw, that is, to draw more balls from the urn than it contains. This paper demonstrates that overdrawing does make sense mathematically, once we allow signed distributions with negative probabilities. A new (conservative) extension of the familiar hypergeometric ('draw-and-delete') distribution is introduced that allows draws of arbitrary sizes, including overdraws. The underlying theory makes use of the dual basis functions of the Bernstein polynomials, which play a prominent role in computer graphics. Negative probabilities are treated systematically in the framework of categorical probability and the central role of datastructures such as multisets and monads is emphasised.

FengWu-4DVar: Coupling the Data-driven Weather Forecasting Model with 4D Variational Assimilation. (arXiv:2312.12455v1 [physics.ao-ph])

Authors: Yi Xiao, Lei Bai, Wei Xue, Kang Chen, Tao Han, Wanli Ouyang

Weather forecasting is a crucial yet highly challenging task. With the maturity of Artificial Intelligence (AI), the emergence of data-driven weather forecasting models has opened up a new paradigm for the development of weather forecasting systems. Despite the significant successes that have been achieved (e.g., surpassing advanced traditional physical models for global medium-range forecasting), existing data-driven weather forecasting models still rely on the analysis fields generated by the traditional assimilation and forecasting system, which hampers the significance of data-driven weather forecasting models regarding both computational cost and forecasting accuracy. In this work, we explore the possibility of coupling the data-driven weather forecasting model with data assimilation by integrating the global AI weather forecasting model, FengWu, with one of the most popular assimilation algorithms, Four-Dimensional Variational (4DVar) assimilation, and develop an AI-based cyclic weather forecasting system, FengWu-4DVar. FengWu-4DVar can incorporate observational data into the data-driven weather forecasting model and consider the temporal evolution of atmospheric dynamics to obtain accurate analysis fields for making predictions in a cycling manner without the help of physical models. Owning to the auto-differentiation ability of deep learning models, FengWu-4DVar eliminates the need of developing the cumbersome adjoint model, which is usually required in the traditional implementation of the 4DVar algorithm. Experiments on the simulated observational dataset demonstrate that FengWu-4DVar is capable of generating reasonable analysis fields for making accurate and efficient iterative predictions.

PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU. (arXiv:2312.12456v1 [cs.LG])

Authors: Yixin Song, Zeyu Mi, Haotong Xie, Haibo Chen

This paper introduces PowerInfer, a high-speed Large Language Model (LLM) inference engine on a personal computer (PC) equipped with a single consumer-grade GPU. The key underlying the design of PowerInfer is exploiting the high locality inherent in LLM inference, characterized by a power-law distribution in neuron activation. This distribution indicates that a small subset of neurons, termed hot neurons, are consistently activated across inputs, while the majority, cold neurons, vary based on specific inputs. PowerInfer exploits such an insight to design a GPU-CPU hybrid inference engine: hot-activated neurons are preloaded onto the GPU for fast access, while cold-activated neurons are computed on the CPU, thus significantly reducing GPU memory demands and CPU-GPU data transfers. PowerInfer further integrates adaptive predictors and neuron-aware sparse operators, optimizing the efficiency of neuron activation and computational sparsity. Evaluation shows that PowerInfer attains an average token generation rate of 13.20 tokens/s, with a peak of 29.08 tokens/s, across various LLMs (including OPT-175B) on a single NVIDIA RTX 4090 GPU, only 18% lower than that achieved by a top-tier server-grade A100 GPU. This significantly outperforms llama.cpp by up to 11.69x while retaining model accuracy.

Let AI Entertain You: Increasing User Engagement with Generative AI and Rejection Sampling. (arXiv:2312.12457v1 [cs.HC])

Authors: Jingying Zeng, Jaewon Yang, Waleed Malik, Xiao Yan, Richard Huang, Qi He

While generative AI excels in content generation, it does not always increase user engagement. This can be attributed to two main factors. First, generative AI generates content without incorporating explicit or implicit feedback about user interactions. Even if the generated content seems to be more informative or well-written, it does not necessarily lead to an increase in user activities, such as clicks. Second, there is a concern with the quality of the content generative AI produces, which often lacks the distinctiveness and authenticity that human-created content possesses. These two factors can lead to content that fails to meet specific needs and preferences of users, ultimately reducing its potential to be engaging.

This paper presents a generic framework of how to improve user engagement with generative AI by leveraging user feedback. Our solutions employ rejection sampling, a technique used in reinforcement learning, to boost engagement metrics. We leveraged the framework in the context of email notification subject lines generation for an online social network, and achieved significant engagement metric lift including +1% Session and +0.4% Weekly Active Users. We believe our work offers a universal framework that enhances user engagement with generative AI, particularly when standard generative AI reaches its limits in terms of enhancing content to be more captivating. To the best of our knowledge, this represents an early milestone in the industry's successful use of generative AI to enhance user engagement.

Prediction of Crash Injury Severity in Florida's Interstate-95. (arXiv:2312.12459v1 [cs.LG])

Authors: B M Tazbiul Hassan Anik, Md Mobasshir Rashid, Md Jamil Ahsan

Drivers can sustain serious injuries in traffic accidents. In this study, traffic crashes on Florida's Interstate-95 from 2016 to 2021 were gathered, and several classification methods were used to estimate the severity of driver injuries. In the feature selection method, logistic regression was applied. To compare model performances, various model assessment matrices such as accuracy, recall, and area under curve (AUC) were developed. The Adaboost algorithm outperformed the others in terms of recall and AUC. SHAP values were also generated to explain the classification model's results. This analytical study can be used to examine factors that contribute to the severity of driver injuries in crashes.

Democratize with Care: The need for fairness specific features in user-interface based open source AutoML tools. (arXiv:2312.12460v1 [cs.HC])

Authors: Sundaraparipurnan Narayanan

AI is increasingly playing a pivotal role in businesses and organizations, impacting the outcomes and interests of human users. Automated Machine Learning (AutoML) streamlines the machine learning model development process by automating repetitive tasks and making data-driven decisions, enabling even non-experts to construct high-quality models efficiently. This democratization allows more users (including non-experts) to access and utilize state-of-the-art machine-learning expertise. However, AutoML tools may also propagate bias in the way these tools handle the data, model choices, and optimization approaches adopted. We conducted an experimental study of User-interface-based open source AutoML tools (DataRobot, H2O Studio, Dataiku, and Rapidminer Studio) to examine if they had features to assist users in developing fairness-aware machine learning models. The experiments covered the following considerations for the evaluation of features: understanding use case context, data representation, feature relevance and sensitivity, data bias and preprocessing techniques, data handling capabilities, training-testing split, hyperparameter handling, and constraints, fairness-oriented model development, explainability and ability to download and edit models by the user. The results revealed inadequacies in features that could support in fairness-aware model development. Further, the results also highlight the need to establish certain essential features for promoting fairness in AutoML tools.

Bird Movement Prediction Using Long Short-Term Memory Networks to Prevent Bird Strikes with Low Altitude Aircraft. (arXiv:2312.12461v1 [cs.LG])

Authors: Elaheh Sabziyan Varnousfaderani, Syed A. M. Shihab

The number of collisions between aircraft and birds in the airspace has been increasing at an alarming rate over the past decade due to increasing bird population, air traffic and usage of quieter aircraft. Bird strikes with aircraft are anticipated to increase dramatically when emerging Advanced Air Mobility aircraft start operating in the low altitude airspace where probability of bird strikes is the highest. Not only do such bird strikes can result in human and bird fatalities, but they also cost the aviation industry millions of dollars in damages to aircraft annually. To better understand the causes and effects of bird strikes, research to date has mainly focused on analyzing factors which increase the probability of bird strikes, identifying high risk birds in different locations, predicting the future number of bird strike incidents, and estimating cost of bird strike damages. However, research on bird movement prediction for use in flight planning algorithms to minimize the probability of bird strikes is very limited. To address this gap in research, we implement four different types of Long Short-Term Memory (LSTM) models to predict bird movement latitudes and longitudes. A publicly available data set on the movement of pigeons is utilized to train the models and evaluate their performances. Using the bird flight track predictions, aircraft departures from Cleveland Hopkins airport are simulated to be delayed by varying amounts to avoid potential bird strikes with aircraft during takeoff. Results demonstrate that the LSTM models can predict bird movement with high accuracy, achieving a Mean Absolute Error of less than 100 meters, outperforming linear and nonlinear regression models. Our findings indicate that incorporating bird movement prediction into flight planning can be highly beneficial.

Towards an End-to-End Artificial Intelligence Driven Global Weather Forecasting System. (arXiv:2312.12462v1 [physics.ao-ph])

Authors: Kun Chen, Lei Bai, Fenghua Ling, Peng Ye, Tao Chen, Kang Chen, Tao Han, Wanli Ouyang

The weather forecasting system is important for science and society, and significant achievements have been made in applying artificial intelligence (AI) to medium-range weather forecasting. However, existing AI-based weather forecasting models still rely on analysis or reanalysis products from the traditional numerical weather prediction (NWP) systems as initial conditions for making predictions, preventing them from being fully independent systems. As a crucial component of an end-to-end global weather forecasting system, data assimilation is vital in generating initial states for forecasting. In this paper, we present an AI-based data assimilation model, i.e., Adas, for global weather variables, which learns to generate the analysis from the background and sparse observations. Different from existing assimilation methods, Adas employs the gated convolution module to handle sparse observations and the gated cross-attention module for capturing the interactions between observations and background efficiently, which are guided by the confidence matrix to represent the availability and quality of observations. Then, we combine Adas with the advanced AI-based weather forecasting model (i.e., FengWu) and construct the first end-to-end AI-based global weather forecasting system: FengWu-Adas. Experiments demonstrate that Adas can assimilate the simulated global observations with the AI-generated background through a one-year simulation and generate high-quality analysis stably in a cyclic manner. Based on the generated analysis, FengWu-Adas exhibits skillful performance and outperforms the Integrated Forecasting System (IFS) in weather forecasting over seven days.

Towards Better Serialization of Tabular Data for Few-shot Classification. (arXiv:2312.12464v1 [cs.LG])

Authors: Sukriti Jaitly, Tanay Shah, Ashish Shugani, Razik Singh Grewal

We present a study on the integration of Large Language Models (LLMs) in tabular data classification, emphasizing an efficient framework. Building upon existing work done in TabLLM (arXiv:2210.10723), we introduce three novel serialization techniques, including the standout LaTeX serialization method. This method significantly boosts the performance of LLMs in processing domain-specific datasets, Our method stands out for its memory efficiency and ability to fully utilize complex data structures. Through extensive experimentation, including various serialization approaches like feature combination and importance, we demonstrate our work's superiority in accuracy and efficiency over traditional models.

Learning Flexible Body Collision Dynamics with Hierarchical Contact Mesh Transformer. (arXiv:2312.12467v1 [cs.LG])

Authors: Youn-Yeol Yu, Jeongwhan Choi, Woojin Cho, Kookjin Lee, Nayong Kim, Kiseok Chang, ChangSeung Woo, Ilho Kim, SeokWoo Lee, Joon Young Yang, Sooyoung Yoon, Noseong Park

Recently, many mesh-based graph neural network (GNN) models have been proposed for modeling complex high-dimensional physical systems. Remarkable achievements have been made in significantly reducing the solving time compared to traditional numerical solvers. These methods are typically designed to i) reduce the computational cost in solving physical dynamics and/or ii) propose techniques to enhance the solution accuracy in fluid and rigid body dynamics. However, it remains under-explored whether they are effective in addressing the challenges of flexible body dynamics, where instantaneous collisions occur within a very short timeframe. In this paper, we present Hierarchical Contact Mesh Transformer (HCMT), which uses hierarchical mesh structures and can learn long-range dependencies (occurred by collisions) among spatially distant positions of a body -- two close positions in a higher-level mesh corresponds to two distant positions in a lower-level mesh. HCMT enables long-range interactions, and the hierarchical mesh structure quickly propagates collision effects to faraway positions. To this end, it consists of a contact mesh Transformer and a hierarchical mesh Transformer (CMT and HMT, respectively). Lastly, we propose a flexible body dynamics dataset, consisting of trajectories that reflect experimental settings frequently used in the display industry for product designs. We also compare the performance of several baselines using well-known benchmark datasets. Our results show that HCMT provides significant performance improvements over existing methods.

Distilling Autoregressive Models to Obtain High-Performance Non-Autoregressive Solvers for Vehicle Routing Problems with Faster Inference Speed. (arXiv:2312.12469v1 [cs.LG])

Authors: Yubin Xiao, Di Wang, Boyang Li, Mingzhao Wang, Xuan Wu, Changliang Zhou, You Zhou

Neural construction models have shown promising performance for Vehicle Routing Problems (VRPs) by adopting either the Autoregressive (AR) or Non-Autoregressive (NAR) learning approach. While AR models produce high-quality solutions, they generally have a high inference latency due to their sequential generation nature. Conversely, NAR models generate solutions in parallel with a low inference latency but generally exhibit inferior performance. In this paper, we propose a generic Guided Non-Autoregressive Knowledge Distillation (GNARKD) method to obtain high-performance NAR models having a low inference latency. GNARKD removes the constraint of sequential generation in AR models while preserving the learned pivotal components in the network architecture to obtain the corresponding NAR models through knowledge distillation. We evaluate GNARKD by applying it to three widely adopted AR models to obtain NAR VRP solvers for both synthesized and real-world instances. The experimental results demonstrate that GNARKD significantly reduces the inference time (4-5 times faster) with acceptable performance drop (2-3\%). To the best of our knowledge, this study is first-of-its-kind to obtain NAR VRP solvers from AR ones through knowledge distillation.

A Performance Evaluation of a Quantized Large Language Model on Various Smartphones. (arXiv:2312.12472v1 [cs.LG])

Authors: Tolga Çöplü, Marc Loedi, Arto Bendiken, Mykhailo Makohin, Joshua J. Bouw, Stephen Cobb (Haltia, Inc.)

This paper explores the feasibility and performance of on-device large language model (LLM) inference on various Apple iPhone models. Amidst the rapid evolution of generative AI, on-device LLMs offer solutions to privacy, security, and connectivity challenges inherent in cloud-based models. Leveraging existing literature on running multi-billion parameter LLMs on resource-limited devices, our study examines the thermal effects and interaction speeds of a high-performing LLM across different smartphone generations. We present real-world performance results, providing insights into on-device inference capabilities.

Principled Weight Initialisation for Input-Convex Neural Networks. (arXiv:2312.12474v1 [cs.LG])

Authors: Pieter-Jan Hoedt, Günter Klambauer

Input-Convex Neural Networks (ICNNs) are networks that guarantee convexity in their input-output mapping. These networks have been successfully applied for energy-based modelling, optimal transport problems and learning invariances. The convexity of ICNNs is achieved by using non-decreasing convex activation functions and non-negative weights. Because of these peculiarities, previous initialisation strategies, which implicitly assume centred weights, are not effective for ICNNs. By studying signal propagation through layers with non-negative weights, we are able to derive a principled weight initialisation for ICNNs. Concretely, we generalise signal propagation theory by removing the assumption that weights are sampled from a centred distribution. In a set of experiments, we demonstrate that our principled initialisation effectively accelerates learning in ICNNs and leads to better generalisation. Moreover, we find that, in contrast to common belief, ICNNs can be trained without skip-connections when initialised correctly. Finally, we apply ICNNs to a real-world drug discovery task and show that they allow for more effective molecular latent space exploration.

Learning to Reweight for Graph Neural Network. (arXiv:2312.12475v1 [cs.LG])

Authors: Zhengyu Chen, Teng Xiao, Kun Kuang, Zheqi Lv, Min Zhang, Jinluan Yang, Chengqiang Lu, Hongxia Yang, Fei Wu

Graph Neural Networks (GNNs) show promising results for graph tasks. However, existing GNNs' generalization ability will degrade when there exist distribution shifts between testing and training graph data. The cardinal impetus underlying the severe degeneration is that the GNNs are architected predicated upon the I.I.D assumptions. In such a setting, GNNs are inclined to leverage imperceptible statistical correlations subsisting in the training set to predict, albeit it is a spurious correlation. In this paper, we study the problem of the generalization ability of GNNs in Out-Of-Distribution (OOD) settings. To solve this problem, we propose the Learning to Reweight for Generalizable Graph Neural Network (L2R-GNN) to enhance the generalization ability for achieving satisfactory performance on unseen testing graphs that have different distributions with training graphs. We propose a novel nonlinear graph decorrelation method, which can substantially improve the out-of-distribution generalization ability and compares favorably to previous methods in restraining the over-reduced sample size. The variables of the graph representation are clustered based on the stability of the correlation, and the graph decorrelation method learns weights to remove correlations between the variables of different clusters rather than any two variables. Besides, we interpose an efficacious stochastic algorithm upon bi-level optimization for the L2R-GNN framework, which facilitates simultaneously learning the optimal weights and GNN parameters, and avoids the overfitting problem. Experimental results show that L2R-GNN greatly outperforms baselines on various graph prediction benchmarks under distribution shifts.

DSAF: A Dual-Stage Adaptive Framework for Numerical Weather Prediction Downscaling. (arXiv:2312.12476v1 [physics.ao-ph])

Authors: Pengwei Liu, Wenwei Wang, Bingqing Peng, Binqing Wu, Liang Sun

While widely recognized as one of the most substantial weather forecasting methodologies, Numerical Weather Prediction (NWP) usually suffers from relatively coarse resolution and inevitable bias due to tempo-spatial discretization, physical parametrization process, and computation limitation. With the roaring growth of deep learning-based techniques, we propose the Dual-Stage Adaptive Framework (DSAF), a novel framework to address regional NWP downscaling and bias correction tasks. DSAF uniquely incorporates adaptive elements in its design to ensure a flexible response to evolving weather conditions. Specifically, NWP downscaling and correction are well-decoupled in the framework and can be applied independently, which strategically guides the optimization trajectory of the model. Utilizing a multi-task learning mechanism and an uncertainty-weighted loss function, DSAF facilitates balanced training across various weather factors. Additionally, our specifically designed attention-centric learnable module effectively integrates geographic information, proficiently managing complex interrelationships. Experimental validation on the ECMWF operational forecast (HRES) and reanalysis (ERA5) archive demonstrates DSAF's superior performance over existing state-of-the-art models and shows substantial improvements when existing models are augmented using our proposed modules. Code is publicly available at https://github.com/pengwei07/DSAF.

Survey on Trustworthy Graph Neural Networks: From A Causal Perspective. (arXiv:2312.12477v1 [cs.LG])

Authors: Wenzhao Jiang, Hao Liu, Hui Xiong

Graph Neural Networks (GNNs) have emerged as powerful representation learning tools for capturing complex dependencies within diverse graph-structured data. Despite their success in a wide range of graph mining tasks, GNNs have raised serious concerns regarding their trustworthiness, including susceptibility to distribution shift, biases towards certain populations, and lack of explainability. Recently, integrating causal learning techniques into GNNs has sparked numerous ground-breaking studies since most of the trustworthiness issues can be alleviated by capturing the underlying data causality rather than superficial correlations. In this survey, we provide a comprehensive review of recent research efforts on causality-inspired GNNs. Specifically, we first present the key trustworthy risks of existing GNN models through the lens of causality. Moreover, we introduce a taxonomy of Causality-Inspired GNNs (CIGNNs) based on the type of causal learning capability they are equipped with, i.e., causal reasoning and causal representation learning. Besides, we systematically discuss typical methods within each category and demonstrate how they mitigate trustworthiness risks. Finally, we summarize useful resources and discuss several future directions, hoping to shed light on new research opportunities in this emerging field. The representative papers, along with open-source data and codes, are available in https://github.com/usail-hkust/Causality-Inspired-GNNs.

New Horizons: Pioneering Pharmaceutical R&D with Generative AI from lab to the clinic -- an industry perspective. (arXiv:2312.12482v1 [q-bio.QM])

Authors: Guy Doron, Sam Genway, Mark Roberts, Sai Jasti

The rapid advance of generative AI is reshaping the strategic vision for R&D across industries. The unique challenges of pharmaceutical R&D will see applications of generative AI deliver value along the entire value chain from early discovery to regulatory approval. This perspective reviews these challenges and takes a three-horizon approach to explore the generative AI applications already delivering impact, the disruptive opportunities which are just around the corner, and the longer-term transformation which will shape the future of the industry. Selected applications are reviewed for their potential to drive increase productivity, accelerate timelines, improve the quality of research, data and decision making, and support a sustainable future for the industry. Recommendations are given for Pharma R&D leaders developing a generative AI strategy today which will lay the groundwork for getting real value from the technology and safeguarding future growth. Generative AI is today providing new, efficient routes to accessing and combining organisational data to drive productivity. Next, this impact will reach clinical development, enhancing the patient experience, driving operational efficiency, and unlocking digital innovation to better tackle the future burden of disease. Looking to the furthest horizon, rapid acquisition of rich multi-omics data, which capture the 'language of life', in combination with next generation AI technologies will allow organisations to close the loop around phases of the pipeline through rapid, automated generation and testing of hypotheses from bench to bedside. This provides a vision for the future of R&D with sustainability at the core, with reduced timescales and reduced dependency on resources, while offering new hope to patients to treat the untreatable and ultimately cure diseases.

SCoTTi: Save Computation at Training Time with an adaptive framework. (arXiv:2312.12483v1 [cs.LG])

Authors: Ziyu Lin, Enzo Tartaglione, Van-Tam Nguyen

On-device training is an emerging approach in machine learning where models are trained on edge devices, aiming to enhance privacy protection and real-time performance. However, edge devices typically possess restricted computational power and resources, making it challenging to perform computationally intensive model training tasks. Consequently, reducing resource consumption during training has become a pressing concern in this field. To this end, we propose SCoTTi (Save Computation at Training Time), an adaptive framework that addresses the aforementioned challenge. It leverages an optimizable threshold parameter to effectively reduce the number of neuron updates during training which corresponds to a decrease in memory and computation footprint. Our proposed approach demonstrates superior performance compared to the state-of-the-art methods regarding computational resource savings on various commonly employed benchmarks and popular architectures, including ResNets, MobileNet, and Swin-T.

SkyMask: Attack-agnostic Robust Federated Learning with Fine-grained Learnable Masks. (arXiv:2312.12484v1 [cs.CR])

Authors: Peishen Yan, Hao Wang, Tao Song, Yang Hua, Ruhui Ma, Ningxin Hu, Mohammad R. Haghighat, Haibing Guan

Federated Learning (FL) is becoming a popular paradigm for leveraging distributed data and preserving data privacy. However, due to the distributed characteristic, FL systems are vulnerable to Byzantine attacks that compromised clients attack the global model by uploading malicious model updates. Most existing Byzantine-robust FL systems statistically analyze the weights of whole individual model updates uploaded by clients to defend against Byzantine attacks. With the development of layer-level and parameter-level fine-grained attacks, the attacks' stealthiness and effectiveness have been significantly improved. Due to unawareness or overreaction, the existing model-level defense methods degrade the training efficiency and model performance. To address this problem, we propose SkyMask, a new attack-agnostic robust FL system that leverages fine-grained learnable masks to identify malicious model updates at the parameter-level. Specifically, the FL server applies parameter-level masks to model updates uploaded by clients and trains the masks over a small clean dataset (i.e., root dataset) to learn the subtle difference between benign and malicious model updates in a high-dimension space. Our extensive experiments involve different models on three public datasets under state-of-the-art (SOTA) attacks, where the results show that SkyMask achieves up to 10% higher testing accuracy compared with SOTA defense strategies and successfully defends against attacks with malicious clients of a high fraction up to 80%. In the meantime, the experimental results demonstrate the scalability of our approach and the weak dependence on the data distribution of the root dataset.

Learning Deterministic Surrogates for Robust Convex QCQPs. (arXiv:2312.12485v1 [math.OC])

Authors: Egon Peršak, Miguel F. Anjos

Decision-focused learning is a promising development for contextual optimisation. It enables us to train prediction models that reflect the contextual sensitivity structure of the problem. However, there have been limited attempts to extend this paradigm to robust optimisation. We propose a double implicit layer model for training prediction models with respect to robust decision loss in uncertain convex quadratically constrained quadratic programs (QCQP). The first layer solves a deterministic version of the problem, the second layer evaluates the worst case realisation for an uncertainty set centred on the observation given the decisions obtained from the first layer. This enables us to learn model parameterisations that lead to robust decisions while only solving a simpler deterministic problem at test time. Additionally, instead of having to solve a robust counterpart we solve two smaller and potentially easier problems in training. The second layer (worst case problem) can be seen as a regularisation approach for predict-and-optimise by fitting to a neighbourhood of problems instead of just a point observation. We motivate relaxations of the worst-case problem in cases of uncertainty sets that would otherwise lead to trust region problems, and leverage various relaxations to deal with uncertain constraints. Both layers are typically strictly convex in this problem setting and thus have meaningful gradients almost everywhere. We demonstrate an application of this model on simulated experiments. The method is an effective regularisation tool for decision-focused learning for uncertain convex QCQPs.

Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models. (arXiv:2312.12487v1 [cs.LG])

Authors: Angela Castillo, Jonas Kohler, Juan C. Pérez, Juan Pablo Pérez, Albert Pumarola, Bernard Ghanem, Pablo Arbeláez, Ali Thabet

This paper presents a comprehensive study on the role of Classifier-Free Guidance (CFG) in text-conditioned diffusion models from the perspective of inference efficiency. In particular, we relax the default choice of applying CFG in all diffusion steps and instead search for efficient guidance policies. We formulate the discovery of such policies in the differentiable Neural Architecture Search framework. Our findings suggest that the denoising steps proposed by CFG become increasingly aligned with simple conditional steps, which renders the extra neural network evaluation of CFG redundant, especially in the second half of the denoising process. Building upon this insight, we propose "Adaptive Guidance" (AG), an efficient variant of CFG, that adaptively omits network evaluations when the denoising process displays convergence. Our experiments demonstrate that AG preserves CFG's image quality while reducing computation by 25%. Thus, AG constitutes a plug-and-play alternative to Guidance Distillation, achieving 50% of the speed-ups of the latter while being training-free and retaining the capacity to handle negative prompts. Finally, we uncover further redundancies of CFG in the first half of the diffusion process, showing that entire neural function evaluations can be replaced by simple affine transformations of past score estimates. This method, termed LinearAG, offers even cheaper inference at the cost of deviating from the baseline model. Our findings provide insights into the efficiency of the conditional denoising process that contribute to more practical and swift deployment of text-conditioned diffusion models.

Foreseeing Reconstruction Quality of Gradient Inversion: An Optimization Perspective. (arXiv:2312.12488v1 [cs.LG])

Authors: HyeongGwon Hong, Yooshin Cho, Hanbyel Cho, Jaesung Ahn, Junmo Kim

Gradient inversion attacks can leak data privacy when clients share weight updates with the server in federated learning (FL). Existing studies mainly use L2 or cosine distance as the loss function for gradient matching in the attack. Our empirical investigation shows that the vulnerability ranking varies with the loss function used. Gradient norm, which is commonly used as a vulnerability proxy for gradient inversion attack, cannot explain this as it remains constant regardless of the loss function for gradient matching. In this paper, we propose a loss-aware vulnerability proxy (LAVP) for the first time. LAVP refers to either the maximum or minimum eigenvalue of the Hessian with respect to gradient matching loss at ground truth. This suggestion is based on our theoretical findings regarding the local optimization of the gradient inversion in proximity to the ground truth, which corresponds to the worst case attack scenario. We demonstrate the effectiveness of LAVP on various architectures and datasets, showing its consistent superiority over the gradient norm in capturing sample vulnerabilities. The performance of each proxy is measured in terms of Spearman's rank correlation with respect to several similarity scores. This work will contribute to enhancing FL security against any potential loss functions beyond L2 or cosine distance in the future.

H-ensemble: An Information Theoretic Approach to Reliable Few-Shot Multi-Source-Free Transfer. (arXiv:2312.12489v1 [cs.LG])

Authors: Yanru Wu, Jianning Wang, Weida Wang, Yang Li

Multi-source transfer learning is an effective solution to data scarcity by utilizing multiple source tasks for the learning of the target task. However, access to source data and model details is limited in the era of commercial models, giving rise to the setting of multi-source-free (MSF) transfer learning that aims to leverage source domain knowledge without such access. As a newly defined problem paradigm, MSF transfer learning remains largely underexplored and not clearly formulated. In this work, we adopt an information theoretic perspective on it and propose a framework named H-ensemble, which dynamically learns the optimal linear combination, or ensemble, of source models for the target task, using a generalization of maximal correlation regression. The ensemble weights are optimized by maximizing an information theoretic metric for transferability. Compared to previous works, H-ensemble is characterized by: 1) its adaptability to a novel and realistic MSF setting for few-shot target tasks, 2) theoretical reliability, 3) a lightweight structure easy to interpret and adapt. Our method is empirically validated by ablation studies, along with extensive comparative analysis with other task ensemble and transfer learning methods. We show that the H-ensemble can successfully learn the optimal task ensemble, as well as outperform prior arts.

InstructVideo: Instructing Video Diffusion Models with Human Feedback. (arXiv:2312.12490v1 [cs.CV])

Authors: Hangjie Yuan, Shiwei Zhang, Xiang Wang, Yujie Wei, Tao Feng, Yining Pan, Yingya Zhang, Ziwei Liu, Samuel Albanie, Dong Ni

Diffusion models have emerged as the de facto paradigm for video generation. However, their reliance on web-scale data of varied quality often yields results that are visually unappealing and misaligned with the textual prompts. To tackle this problem, we propose InstructVideo to instruct text-to-video diffusion models with human feedback by reward fine-tuning. InstructVideo has two key ingredients: 1) To ameliorate the cost of reward fine-tuning induced by generating through the full DDIM sampling chain, we recast reward fine-tuning as editing. By leveraging the diffusion process to corrupt a sampled video, InstructVideo requires only partial inference of the DDIM sampling chain, reducing fine-tuning cost while improving fine-tuning efficiency. 2) To mitigate the absence of a dedicated video reward model for human preferences, we repurpose established image reward models, e.g., HPSv2. To this end, we propose Segmental Video Reward, a mechanism to provide reward signals based on segmental sparse sampling, and Temporally Attenuated Reward, a method that mitigates temporal modeling degradation during fine-tuning. Extensive experiments, both qualitative and quantitative, validate the practicality and efficacy of using image reward models in InstructVideo, significantly enhancing the visual quality of generated videos without compromising generalization capabilities. Code and models will be made publicly available.

StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation. (arXiv:2312.12491v1 [cs.CV])

Authors: Akio Kodaira, Chenfeng Xu, Toshiki Hazama, Takanori Yoshimoto, Kohei Ohno, Shogo Mitsuhori, Soichi Sugano, Hanying Cho, Zhijian Liu, Kurt Keutzer

We introduce StreamDiffusion, a real-time diffusion pipeline designed for interactive image generation. Existing diffusion models are adept at creating images from text or image prompts, yet they often fall short in real-time interaction. This limitation becomes particularly evident in scenarios involving continuous input, such as Metaverse, live video streaming, and broadcasting, where high throughput is imperative. To address this, we present a novel approach that transforms the original sequential denoising into the batching denoising process. Stream Batch eliminates the conventional wait-and-interact approach and enables fluid and high throughput streams. To handle the frequency disparity between data input and model throughput, we design a novel input-output queue for parallelizing the streaming process. Moreover, the existing diffusion pipeline uses classifier-free guidance(CFG), which requires additional U-Net computation. To mitigate the redundant computations, we propose a novel residual classifier-free guidance (RCFG) algorithm that reduces the number of negative conditional denoising steps to only one or even zero. Besides, we introduce a stochastic similarity filter(SSF) to optimize power consumption. Our Stream Batch achieves around 1.5x speedup compared to the sequential denoising method at different denoising levels. The proposed RCFG leads to speeds up to 2.05x higher than the conventional CFG. Combining the proposed strategies and existing mature acceleration tools makes the image-to-image generation achieve up-to 91.07fps on one RTX4090, improving the throughputs of AutoPipline developed by Diffusers over 59.56x. Furthermore, our proposed StreamDiffusion also significantly reduces the energy consumption by 2.39x on one RTX3060 and 1.99x on one RTX4090, respectively.

CodeLL: A Lifelong Learning Dataset to Support the Co-Evolution of Data and Language Models of Code. (arXiv:2312.12492v1 [cs.SE])

Authors: Martin Weyssow, Claudio Di Sipio, Davide Di Ruscio, Houari Sahraoui

Motivated by recent work on lifelong learning applications for language models (LMs) of code, we introduce CodeLL, a lifelong learning dataset focused on code changes. Our contribution addresses a notable research gap marked by the absence of a long-term temporal dimension in existing code change datasets, limiting their suitability in lifelong learning scenarios. In contrast, our dataset aims to comprehensively capture code changes across the entire release history of open-source software repositories. In this work, we introduce an initial version of CodeLL, comprising 71 machine-learning-based projects mined from Software Heritage. This dataset enables the extraction and in-depth analysis of code changes spanning 2,483 releases at both the method and API levels. CodeLL enables researchers studying the behaviour of LMs in lifelong fine-tuning settings for learning code changes. Additionally, the dataset can help studying data distribution shifts within software repositories and the evolution of API usages over time.

Blood Glucose Level Prediction: A Graph-based Explainable Method with Federated Learning. (arXiv:2312.12541v1 [cs.LG])

Authors: Chengzhe Piao, Ken Li

In the UK, approximately 400,000 people with type 1 diabetes (T1D) rely on insulin delivery due to insufficient pancreatic insulin production. Managing blood glucose (BG) levels is crucial, with continuous glucose monitoring (CGM) playing a key role. CGM, tracking BG every 5 minutes, enables effective blood glucose level prediction (BGLP) by considering factors like carbohydrate intake and insulin delivery.

Recent research has focused on developing sequential models for BGLP using historical BG data, incorporating additional attributes such as carbohydrate intake, insulin delivery, and time. These methods have shown notable success in BGLP, with some providing temporal explanations. However, they often lack clear correlations between attributes and their impact on BGLP. Additionally, some methods raise privacy concerns by aggregating participant data to learn population patterns.

Addressing these limitations, we introduced a graph attentive memory (GAM) model, combining a graph attention network (GAT) with a gated recurrent unit (GRU). GAT applies graph attention to model attribute correlations, offering transparent, dynamic attribute relationships. Attention weights dynamically gauge attribute significance over time. To ensure privacy, we employed federated learning (FL), facilitating secure population pattern analysis.

Our method was validated using the OhioT1DM'18 and OhioT1DM'20 datasets from 12 participants, focusing on 6 key attributes. We demonstrated our model's stability and effectiveness through hyperparameter impact analysis.

Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge. (arXiv:2312.12558v1 [cs.LG])

Authors: Meshal Alharbi, Mardavij Roozbehani, Munther Dahleh

The problem of sample complexity of online reinforcement learning is often studied in the literature without taking into account any partial knowledge about the system dynamics that could potentially accelerate the learning process. In this paper, we study the sample complexity of online Q-learning methods when some prior knowledge about the dynamics is available or can be learned efficiently. We focus on systems that evolve according to an additive disturbance model of the form $S_{h+1} = f(S_h, A_h) + W_h$, where $f$ represents the underlying system dynamics, and $W_h$ are unknown disturbances independent of states and actions. In the setting of finite episodic Markov decision processes with $S$ states, $A$ actions, and episode length $H$, we present an optimistic Q-learning algorithm that achieves $\tilde{\mathcal{O}}(\text{Poly}(H)\sqrt{T})$ regret under perfect knowledge of $f$, where $T$ is the total number of interactions with the system. This is in contrast to the typical $\tilde{\mathcal{O}}(\text{Poly}(H)\sqrt{SAT})$ regret for existing Q-learning methods. Further, if only a noisy estimate $\hat{f}$ of $f$ is available, our method can learn an approximately optimal policy in a number of samples that is independent of the cardinalities of state and action spaces. The sub-optimality gap depends on the approximation error $\hat{f}-f$, as well as the Lipschitz constant of the corresponding optimal value function. Our approach does not require modeling of the transition probabilities and enjoys the same memory complexity as model-free methods.

Comprehensive Validation on Reweighting Samples for Bias Mitigation via AIF360. (arXiv:2312.12560v1 [cs.LG])

Authors: Christina Hastings Blow, Lijun Qian, Camille Gibson, Pamela Obiomon, Xishuang Dong

Fairness AI aims to detect and alleviate bias across the entire AI development life cycle, encompassing data curation, modeling, evaluation, and deployment-a pivotal aspect of ethical AI implementation. Addressing data bias, particularly concerning sensitive attributes like gender and race, reweighting samples proves efficient for fairness AI. This paper contributes a systematic examination of reweighting samples for traditional machine learning (ML) models, employing five models for binary classification on the Adult Income and COMPUS datasets with various protected attributes. The study evaluates prediction results using five fairness metrics, uncovering the nuanced and model-specific nature of reweighting sample effectiveness in achieving fairness in traditional ML models, as well as revealing the complexity of bias dynamics.

Leading the Pack: N-player Opponent Shaping. (arXiv:2312.12564v1 [cs.LG])

Authors: Alexandra Souly, Timon Willi, Akbir Khan, Robert Kirk, Chris Lu, Edward Grefenstette, Tim Rocktäschel

Reinforcement learning solutions have great success in the 2-player general sum setting. In this setting, the paradigm of Opponent Shaping (OS), in which agents account for the learning of their co-players, has led to agents which are able to avoid collectively bad outcomes, whilst also maximizing their reward. These methods have currently been limited to 2-player game. However, the real world involves interactions with many more agents, with interactions on both local and global scales. In this paper, we extend Opponent Shaping (OS) methods to environments involving multiple co-players and multiple shaping agents. We evaluate on over 4 different environments, varying the number of players from 3 to 5, and demonstrate that model-based OS methods converge to equilibrium with better global welfare than naive learning. However, we find that when playing with a large number of co-players, OS methods' relative performance reduces, suggesting that in the limit OS methods may not perform well. Finally, we explore scenarios where more than one OS method is present, noticing that within games requiring a majority of cooperating agents, OS methods converge to outcomes with poor global welfare.

Generator Assisted Mixture of Experts For Feature Acquisition in Batch. (arXiv:2312.12574v1 [cs.LG])

Authors: Vedang Asgaonkar, Aditya Jain, Abir De

Given a set of observations, feature acquisition is about finding the subset of unobserved features which would enhance accuracy. Such problems have been explored in a sequential setting in prior work. Here, the model receives feedback from every new feature acquired and chooses to explore more features or to predict. However, sequential acquisition is not feasible in some settings where time is of the essence. We consider the problem of feature acquisition in batch, where the subset of features to be queried in batch is chosen based on the currently observed features, and then acquired as a batch, followed by prediction. We solve this problem using several technical innovations. First, we use a feature generator to draw a subset of the synthetic features for some examples, which reduces the cost of oracle queries. Second, to make the feature acquisition problem tractable for the large heterogeneous observed features, we partition the data into buckets, by borrowing tools from locality sensitive hashing and then train a mixture of experts model. Third, we design a tractable lower bound of the original objective. We use a greedy algorithm combined with model training to solve the underlying problem. Experiments with four datasets show that our approach outperforms these methods in terms of trade-off between accuracy and feature acquisition cost.

Improving the Expressive Power of Deep Neural Networks through Integral Activation Transform. (arXiv:2312.12578v1 [cs.LG])

Authors: Zezhong Zhang, Feng Bao, Guannan Zhang

The impressive expressive power of deep neural networks (DNNs) underlies their widespread applicability. However, while the theoretical capacity of deep architectures is high, the practical expressive power achieved through successful training often falls short. Building on the insights gained from Neural ODEs, which explore the depth of DNNs as a continuous variable, in this work, we generalize the traditional fully connected DNN through the concept of continuous width. In the Generalized Deep Neural Network (GDNN), the traditional notion of neurons in each layer is replaced by a continuous state function. Using the finite rank parameterization of the weight integral kernel, we establish that GDNN can be obtained by employing the Integral Activation Transform (IAT) as activation layers within the traditional DNN framework. The IAT maps the input vector to a function space using some basis functions, followed by nonlinear activation in the function space, and then extracts information through the integration with another collection of basis functions. A specific variant, IAT-ReLU, featuring the ReLU nonlinearity, serves as a smooth generalization of the scalar ReLU activation. Notably, IAT-ReLU exhibits a continuous activation pattern when continuous basis functions are employed, making it smooth and enhancing the trainability of the DNN. Our numerical experiments demonstrate that IAT-ReLU outperforms regular ReLU in terms of trainability and better smoothness.

Observation-Augmented Contextual Multi-Armed Bandits for Robotic Exploration with Uncertain Semantic Data. (arXiv:2312.12583v1 [cs.RO])

Authors: Shohei Wakayama, Nisar Ahmed

For robotic decision-making under uncertainty, the balance between exploitation and exploration of available options must be carefully taken into account. In this study, we introduce a new variant of contextual multi-armed bandits called observation-augmented CMABs (OA-CMABs) wherein a decision-making agent can utilize extra outcome observations from an external information source. CMABs model the expected option outcomes as a function of context features and hidden parameters, which are inferred from previous option outcomes. In OA-CMABs, external observations are also a function of context features and thus provide additional evidence about the hidden parameters. Yet, if an external information source is error-prone, the resulting posterior updates can harm decision-making performance unless the presence of errors is considered. To this end, we propose a robust Bayesian inference process for OA-CMABs that is based on the concept of probabilistic data validation. Our approach handles complex mixture model parameter priors and hybrid observation likelihoods for semantic data sources, allowing us to develop validation algorithms based on recently develop probabilistic semantic data association techniques. Furthermore, to more effectively cope with the combined sources of uncertainty in OA-CMABs, we derive a new active inference algorithm for option selection based on expected free energy minimization. This generalizes previous work on active inference for bandit-based robotic decision-making by accounting for faulty observations and non-Gaussian inference. Our approaches are demonstrated on a simulated asynchronous search site selection problem for space exploration. The results show that even if incorrect observations are provided by external information sources, efficient decision-making and robust parameter inference are still achieved in a wide variety of experimental conditions.

BadRL: Sparse Targeted Backdoor Attack Against Reinforcement Learning. (arXiv:2312.12585v1 [cs.LG])

Authors: Jing Cui, Yufei Han, Yuzhe Ma, Jianbin Jiao, Junge Zhang

Backdoor attacks in reinforcement learning (RL) have previously employed intense attack strategies to ensure attack success. However, these methods suffer from high attack costs and increased detectability. In this work, we propose a novel approach, BadRL, which focuses on conducting highly sparse backdoor poisoning efforts during training and testing while maintaining successful attacks. Our algorithm, BadRL, strategically chooses state observations with high attack values to inject triggers during training and testing, thereby reducing the chances of detection. In contrast to the previous methods that utilize sample-agnostic trigger patterns, BadRL dynamically generates distinct trigger patterns based on targeted state observations, thereby enhancing its effectiveness. Theoretical analysis shows that the targeted backdoor attack is always viable and remains stealthy under specific assumptions. Empirical results on various classic RL tasks illustrate that BadRL can substantially degrade the performance of a victim agent with minimal poisoning efforts 0.003% of total training steps) during training and infrequent attacks during testing.

Robust Machine Learning by Transforming and Augmenting Imperfect Training Data. (arXiv:2312.12597v1 [cs.LG])

Authors: Elliot Creager

Machine Learning (ML) is an expressive framework for turning data into computer programs. Across many problem domains -- both in industry and policy settings -- the types of computer programs needed for accurate prediction or optimal control are difficult to write by hand. On the other hand, collecting instances of desired system behavior may be relatively more feasible. This makes ML broadly appealing, but also induces data sensitivities that often manifest as unexpected failure modes during deployment. In this sense, the training data available tend to be imperfect for the task at hand. This thesis explores several data sensitivities of modern machine learning and how to address them. We begin by discussing how to prevent ML from codifying prior human discrimination measured in the training data, where we take a fair representation learning approach. We then discuss the problem of learning from data containing spurious features, which provide predictive fidelity during training but are unreliable upon deployment. Here we observe that insofar as standard training methods tend to learn such features, this propensity can be leveraged to search for partitions of training data that expose this inconsistency, ultimately promoting learning algorithms invariant to spurious features. Finally, we turn our attention to reinforcement learning from data with insufficient coverage over all possible states and actions. To address the coverage issue, we discuss how causal priors can be used to model the single-step dynamics of the setting where data are collected. This enables a new type of data augmentation where observed trajectories are stitched together to produce new but plausible counterfactual trajectories.

Unsupervised Segmentation of Colonoscopy Images. (arXiv:2312.12599v1 [eess.IV])

Authors: Heming Yao, Jérôme Lüscher, Benjamin Gutierrez Becker, Josep Arús-Pous, Tommaso Biancalani, Amelie Bigorgne, David Richmond

Colonoscopy plays a crucial role in the diagnosis and prognosis of various gastrointestinal diseases. Due to the challenges of collecting large-scale high-quality ground truth annotations for colonoscopy images, and more generally medical images, we explore using self-supervised features from vision transformers in three challenging tasks for colonoscopy images. Our results indicate that image-level features learned from DINO models achieve image classification performance comparable to fully supervised models, and patch-level features contain rich semantic information for object detection. Furthermore, we demonstrate that self-supervised features combined with unsupervised segmentation can be used to discover multiple clinically relevant structures in a fully unsupervised manner, demonstrating the tremendous potential of applying these methods in medical image analysis.

Studying the Practices of Testing Machine Learning Software in the Wild. (arXiv:2312.12604v1 [cs.SE])

Authors: Moses Openja, Foutse Khomh, Armstrong Foundjem, Zhen Ming (Jack) Jiang, Mouna Abidi, Ahmed E. Hassan

Background: We are witnessing an increasing adoption of machine learning (ML), especially deep learning (DL) algorithms in many software systems, including safety-critical systems such as health care systems or autonomous driving vehicles. Ensuring the software quality of these systems is yet an open challenge for the research community, mainly due to the inductive nature of ML software systems. Traditionally, software systems were constructed deductively, by writing down the rules that govern the behavior of the system as program code. However, for ML software, these rules are inferred from training data. Few recent research advances in the quality assurance of ML systems have adapted different concepts from traditional software testing, such as mutation testing, to help improve the reliability of ML software systems. However, it is unclear if any of these proposed testing techniques from research are adopted in practice. There is little empirical evidence about the testing strategies of ML engineers. Aims: To fill this gap, we perform the first fine-grained empirical study on ML testing practices in the wild, to identify the ML properties being tested, the followed testing strategies, and their implementation throughout the ML workflow. Method: First, we systematically summarized the different testing strategies (e.g., Oracle Approximation), the tested ML properties (e.g., Correctness, Bias, and Fairness), and the testing methods (e.g., Unit test) from the literature. Then, we conducted a study to understand the practices of testing ML software. Results: In our findings: 1) we identified four (4) major categories of testing strategy including Grey-box, White-box, Black-box, and Heuristic-based techniques that are used by the ML engineers to find software bugs. 2) We identified 16 ML properties that are tested in the ML workflow.

Optimizing Neural Networks with Gradient Lexicase Selection. (arXiv:2312.12606v1 [cs.LG])

Authors: Li Ding, Lee Spector

One potential drawback of using aggregated performance measurement in machine learning is that models may learn to accept higher errors on some training cases as compromises for lower errors on others, with the lower errors actually being instances of overfitting. This can lead to both stagnation at local optima and poor generalization. Lexicase selection is an uncompromising method developed in evolutionary computation, which selects models on the basis of sequences of individual training case errors instead of using aggregated metrics such as loss and accuracy. In this paper, we investigate how lexicase selection, in its general form, can be integrated into the context of deep learning to enhance generalization. We propose Gradient Lexicase Selection, an optimization framework that combines gradient descent and lexicase selection in an evolutionary fashion. Our experimental results demonstrate that the proposed method improves the generalization performance of various widely-used deep neural network architectures across three image classification benchmarks. Additionally, qualitative analysis suggests that our method assists networks in learning more diverse representations. Our source code is available on GitHub: https://github.com/ld-ing/gradient-lexicase.

Trust, But Verify: A Survey of Randomized Smoothing Techniques. (arXiv:2312.12608v1 [cs.LG])

Authors: Anupriya Kumari, Devansh Bhardwaj, Sukrit Jindal, Sarthak Gupta

Machine learning models have demonstrated remarkable success across diverse domains but remain vulnerable to adversarial attacks. Empirical defence mechanisms often fall short, as new attacks constantly emerge, rendering existing defences obsolete. A paradigm shift from empirical defences to certification-based defences has been observed in response. Randomized smoothing has emerged as a promising technique among notable advancements. This study reviews the theoretical foundations, empirical effectiveness, and applications of randomized smoothing in verifying machine learning classifiers. We provide an in-depth exploration of the fundamental concepts underlying randomized smoothing, highlighting its theoretical guarantees in certifying robustness against adversarial perturbations. Additionally, we discuss the challenges of existing methodologies and offer insightful perspectives on potential solutions. This paper is novel in its attempt to systemise the existing knowledge in the context of randomized smoothing.

Enhancing predictive capabilities in fusion burning plasmas through surrogate-based optimization in core transport solvers. (arXiv:2312.12610v1 [physics.plasm-ph])

Authors: P. Rodriguez-Fernandez, N.T. Howard, A. Saltzman, S. Kantamneni, J. Candy, C. Holland, M. Balandat, S. Ament, A.E. White

This work presents the PORTALS framework, which leverages surrogate modeling and optimization techniques to enable the prediction of core plasma profiles and performance with nonlinear gyrokinetic simulations at significantly reduced cost, with no loss of accuracy. The efficiency of PORTALS is benchmarked against standard methods, and its full potential is demonstrated on a unique, simultaneous 5-channel (electron temperature, ion temperature, electron density, impurity density and angular rotation) prediction of steady-state profiles in a DIII-D ITER Similar Shape plasma with GPU-accelerated, nonlinear CGYRO. This paper also provides general guidelines for accurate performance predictions in burning plasmas and the impact of transport modeling in fusion pilot plants studies.

Online Variational Sequential Monte Carlo. (arXiv:2312.12616v1 [stat.ML])

Authors: Alessandro Mastrototaro, Jimmy Olsson

Being the most classical generative model for serial data, state-space models (SSM) are fundamental in AI and statistical machine learning. In SSM, any form of parameter learning or latent state inference typically involves the computation of complex latent-state posteriors. In this work, we build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference by combining particle methods and variational inference. While standard VSMC operates in the offline mode, by re-processing repeatedly a given batch of data, we distribute the approximation of the gradient of the VSMC surrogate ELBO in time using stochastic approximation, allowing for online learning in the presence of streams of data. This results in an algorithm, online VSMC, that is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation. In addition, we provide rigorous theoretical results describing the algorithm's convergence properties as the number of data tends to infinity as well as numerical illustrations of its excellent convergence properties and usefulness also in batch-processing settings.

Calibrating Wireless Ray Tracing for Digital Twinning using Local Phase Error Estimates. (arXiv:2312.12625v1 [eess.SP])

Authors: Clement Ruah, Osvaldo Simeone, Jakob Hoydis, Bashir Al-Hashimi

Embodying the principle of simulation intelligence, digital twin (DT) systems construct and maintain a high-fidelity virtual model of a physical system. This paper focuses on ray tracing (RT), which is widely seen as an enabling technology for DTs of the radio access network (RAN) segment of next-generation disaggregated wireless systems. RT makes it possible to simulate channel conditions, enabling data augmentation and prediction-based transmission. However, the effectiveness of RT hinges on the adaptation of the electromagnetic properties assumed by the RT to actual channel conditions, a process known as calibration. The main challenge of RT calibration is the fact that small discrepancies in the geometric model fed to the RT software hinder the accuracy of the predicted phases of the simulated propagation paths. Existing solutions to this problem either rely on the channel power profile, hence disregarding phase information, or they operate on the channel responses by assuming the simulated phases to be sufficiently accurate for calibration. This paper proposes a novel channel response-based scheme that, unlike the state of the art, estimates and compensates for the phase errors in the RT-generated channel responses. The proposed approach builds on the variational expectation maximization algorithm with a flexible choice of the prior phase-error distribution that bridges between a deterministic model with no phase errors and a stochastic model with uniform phase errors. The algorithm is computationally efficient, and is demonstrated, by leveraging the open-source differentiable RT software available within the Sionna library, to outperform existing methods in terms of the accuracy of RT predictions.

Data-driven discovery with Limited Data Acquisition for fluid flow across cylinder. (arXiv:2312.12630v1 [math.DS])

Authors: Dr. Himanshu Singh

One of the central challenge for extracting governing principles of dynamical system via Dynamic Mode Decomposition (DMD) is about the limit data availability or formally called as Limited Data Acquisition in the present paper. In the interest of discovering the governing principles for a dynamical system with limited data acquisition, we provide a variant of Kernelized Extended DMD (KeDMD) based on the Koopman operator which employ the notion of Gaussian random matrix to recover the dominant Koopman modes for the standard fluid flow across cylinder experiment. It turns out that the traditional kernel function, Gaussian Radial Basis Function Kernel, unfortunately, is not able to generate the desired Koopman modes in the scenario of executing KeDMD with limited data acquisition. However, the Laplacian Kernel Function successfully generates the desired Koopman modes when limited data is provided in terms of data-set snapshot for the aforementioned experiment and this manuscripts serves the purpose of reporting these exciting experimental insights. This paper also explores the functionality of the Koopman operator when it interacts with the reproducing kernel Hilbert space (RKHS) that arises from the normalized probability Lebesgue measure $d\mu_{\sigma,1,\mathbb{C}^n}(z)=(2\pi\sigma^2)^{-n}\exp\left(-\frac{\|z\|_2}{\sigma}\right)dV(z)$ when it is embedded in $L^2-$sense for the holomorphic functions over $\mathbb{C}^n$, in the aim of determining the Koopman modes for fluid flow across cylinder experiment. We explore the operator-theoretic characterizations of the Koopman operator on the RKHS generated by the normalized Laplacian measure $d\mu_{\sigma,1,\mathbb{C}^n}(z)$ in the $L^2-$sense. In doing so, we provide the compactification & closable characterization of Koopman operator over the RKHS generated by the normalized Laplacian measure in the $L^2-$sense.

Long-run Behaviour of Multi-fidelity Bayesian Optimisation. (arXiv:2312.12633v1 [cs.LG])

Authors: Gbetondji J-S Dovonon, Jakob Zeitler

Multi-fidelity Bayesian Optimisation (MFBO) has been shown to generally converge faster than single-fidelity Bayesian Optimisation (SFBO) (Poloczek et al. (2017)). Inspired by recent benchmark papers, we are investigating the long-run behaviour of MFBO, based on observations in the literature that it might under-perform in certain scenarios (Mikkola et al. (2023), Eggensperger et al. (2021)). An under-performance of MBFO in the long-run could significantly undermine its application to many research tasks, especially when we are not able to identify when the under-performance begins. We create a simple benchmark study, showcase empirical results and discuss scenarios and possible reasons of under-performance.

Matching via Distance Profiles. (arXiv:2312.12641v1 [stat.ME])

Authors: YoonHaeng Hur, Yuehaw Khoo

In this paper, we introduce and study matching methods based on distance profiles. For the matching of point clouds, the proposed method is easily implementable by solving a linear program, circumventing the computational obstacles of quadratic matching. Also, we propose and analyze a flexible way to execute location-to-location matching using distance profiles. Moreover, we provide a statistical estimation error analysis in the context of location-to-location matching using empirical process theory. Furthermore, we apply our method to a certain model and show its noise stability by characterizing conditions on the noise level for the matching to be successful. Lastly, we demonstrate the performance of the proposed method and compare it with some existing methods using synthetic and real data.

IS-DARTS: Stabilizing DARTS through Precise Measurement on Candidate Importance. (arXiv:2312.12648v1 [cs.LG])

Authors: Hongyi He, Longjun Liu, Haonan Zhang, Nanning Zheng

Among existing Neural Architecture Search methods, DARTS is known for its efficiency and simplicity. This approach applies continuous relaxation of network representation to construct a weight-sharing supernet and enables the identification of excellent subnets in just a few GPU days. However, performance collapse in DARTS results in deteriorating architectures filled with parameter-free operations and remains a great challenge to the robustness. To resolve this problem, we reveal that the fundamental reason is the biased estimation of the candidate importance in the search space through theoretical and experimental analysis, and more precisely select operations via information-based measurements. Furthermore, we demonstrate that the excessive concern over the supernet and inefficient utilization of data in bi-level optimization also account for suboptimal results. We adopt a more realistic objective focusing on the performance of subnets and simplify it with the help of the information-based measurements. Finally, we explain theoretically why progressively shrinking the width of the supernet is necessary and reduce the approximation error of optimal weights in DARTS. Our proposed method, named IS-DARTS, comprehensively improves DARTS and resolves the aforementioned problems. Extensive experiments on NAS-Bench-201 and DARTS-based search space demonstrate the effectiveness of IS-DARTS.

Can Transformers Learn Sequential Function Classes In Context?. (arXiv:2312.12655v1 [cs.LG])

Authors: Ryan Campbell, Emma Guo, Evan Hu, Reya Vir, Ethan Hsiao

In-context learning (ICL) has revolutionized the capabilities of transformer models in NLP. In our project, we extend the understanding of the mechanisms underpinning ICL by exploring whether transformers can learn from sequential, non-textual function class data distributions. We introduce a novel sliding window sequential function class and employ toy-sized transformers with a GPT-2 architecture to conduct our experiments. Our analysis indicates that these models can indeed leverage ICL when trained on non-textual sequential function classes. Additionally, our experiments with randomized y-label sequences highlights that transformers retain some ICL capabilities even when the label associations are obfuscated. We provide evidence that transformers can reason with and understand sequentiality encoded within function classes, as reflected by the effective learning of our proposed tasks. Our results also show that the performance deteriorated with increasing randomness in the labels, though not to the extent one might expect, implying a potential robustness of learned sequentiality against label noise. Future research may want to look into how previous explanations of transformers, such as induction heads and task vectors, relate to sequentiality in ICL in these toy examples. Our investigation lays the groundwork for further research into how transformers process and perceive sequential data.

The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models. (arXiv:2312.12657v1 [cs.LG])

Authors: Tolga Ergen, Mert Pilanci

Due to the non-convex nature of training Deep Neural Network (DNN) models, their effectiveness relies on the use of non-convex optimization heuristics. Traditional methods for training DNNs often require costly empirical methods to produce successful models and do not have a clear theoretical foundation. In this study, we examine the use of convex optimization theory and sparse recovery models to refine the training process of neural networks and provide a better interpretation of their optimal weights. We focus on training two-layer neural networks with piecewise linear activations and demonstrate that they can be formulated as a finite-dimensional convex program. These programs include a regularization term that promotes sparsity, which constitutes a variant of group Lasso. We first utilize semi-infinite programming theory to prove strong duality for finite width neural networks and then we express these architectures equivalently as high dimensional convex sparse recovery models. Remarkably, the worst-case complexity to solve the convex program is polynomial in the number of samples and number of neurons when the rank of the data matrix is bounded, which is the case in convolutional networks. To extend our method to training data of arbitrary rank, we develop a novel polynomial-time approximation scheme based on zonotope subsampling that comes with a guaranteed approximation ratio. We also show that all the stationary of the nonconvex training objective can be characterized as the global optimum of a subsampled convex program. Our convex models can be trained using standard convex solvers without resorting to heuristics or extensive hyper-parameter tuning unlike non-convex methods. Through extensive numerical experiments, we show that convex models can outperform traditional non-convex methods and are not sensitive to optimizer hyperparameters.

Incremental Semi-supervised Federated Learning for Health Inference via Mobile Sensing. (arXiv:2312.12666v1 [cs.LG])

Authors: Guimin Dong, Lihua Cai, Mingyue Tang, Laura E. Barnes, Mehdi Boukhechba

Mobile sensing appears as a promising solution for health inference problem (e.g., influenza-like symptom recognition) by leveraging diverse smart sensors to capture fine-grained information about human behaviors and ambient contexts. Centralized training of machine learning models can place mobile users' sensitive information under privacy risks due to data breach and misexploitation. Federated Learning (FL) enables mobile devices to collaboratively learn global models without the exposure of local private data. However, there are challenges of on-device FL deployment using mobile sensing: 1) long-term and continuously collected mobile sensing data may exhibit domain shifts as sensing objects (e.g. humans) have varying behaviors as a result of internal and/or external stimulus; 2) model retraining using all available data may increase computation and memory burden; and 3) the sparsity of annotated crowd-sourced data causes supervised FL to lack robustness. In this work, we propose FedMobile, an incremental semi-supervised federated learning algorithm, to train models semi-supervisedly and incrementally in a decentralized online fashion. We evaluate FedMobile using a real-world mobile sensing dataset for influenza-like symptom recognition. Our empirical results show that FedMobile-trained models achieve the best results in comparison to the selected baseline methods.

Discovering Malicious Signatures in Software from Structural Interactions. (arXiv:2312.12667v1 [cs.CR])

Authors: Chenzhong Yin, Hantang Zhang, Mingxi Cheng, Xiongye Xiao, Xinghe Chen, Xin Ren, Paul Bogdan

Malware represents a significant security concern in today's digital landscape, as it can destroy or disable operating systems, steal sensitive user information, and occupy valuable disk space. However, current malware detection methods, such as static-based and dynamic-based approaches, struggle to identify newly developed (``zero-day") malware and are limited by customized virtual machine (VM) environments. To overcome these limitations, we propose a novel malware detection approach that leverages deep learning, mathematical techniques, and network science. Our approach focuses on static and dynamic analysis and utilizes the Low-Level Virtual Machine (LLVM) to profile applications within a complex network. The generated network topologies are input into the GraphSAGE architecture to efficiently distinguish between benign and malicious software applications, with the operation names denoted as node features. Importantly, the GraphSAGE models analyze the network's topological geometry to make predictions, enabling them to detect state-of-the-art malware and prevent potential damage during execution in a VM. To evaluate our approach, we conduct a study on a dataset comprising source code from 24,376 applications, specifically written in C/C++, sourced directly from widely-recognized malware and various types of benign software. The results show a high detection performance with an Area Under the Receiver Operating Characteristic Curve (AUROC) of 99.85%. Our approach marks a substantial improvement in malware detection, providing a notably more accurate and efficient solution when compared to current state-of-the-art malware detection methods.

Convolutional Channel-wise Competitive Learning for the Forward-Forward Algorithm. (arXiv:2312.12668v1 [cs.LG])

Authors: Andreas Papachristodoulou, Christos Kyrkou, Stelios Timotheou, Theocharis Theocharides

The Forward-Forward (FF) Algorithm has been recently proposed to alleviate the issues of backpropagation (BP) commonly used to train deep neural networks. However, its current formulation exhibits limitations such as the generation of negative data, slower convergence, and inadequate performance on complex tasks. In this paper, we take the main ideas of FF and improve them by leveraging channel-wise competitive learning in the context of convolutional neural networks for image classification tasks. A layer-wise loss function is introduced that promotes competitive learning and eliminates the need for negative data construction. To enhance both the learning of compositional features and feature space partitioning, a channel-wise feature separator and extractor block is proposed that complements the competitive learning process. Our method outperforms recent FF-based models on image classification tasks, achieving testing errors of 0.58%, 7.69%, 21.89%, and 48.77% on MNIST, Fashion-MNIST, CIFAR-10 and CIFAR-100 respectively. Our approach bridges the performance gap between FF learning and BP methods, indicating the potential of our proposed approach to learn useful representations in a layer-wise modular fashion, enabling more efficient and flexible learning.

On the Role of Server Momentum in Federated Learning. (arXiv:2312.12670v1 [cs.LG])

Authors: Jianhui Sun, Xidong Wu, Heng Huang, Aidong Zhang

Federated Averaging (FedAvg) is known to experience convergence issues when encountering significant clients system heterogeneity and data heterogeneity. Server momentum has been proposed as an effective mitigation. However, existing server momentum works are restrictive in the momentum formulation, do not properly schedule hyperparameters and focus only on system homogeneous settings, which leaves the role of server momentum still an under-explored problem. In this paper, we propose a general framework for server momentum, that (a) covers a large class of momentum schemes that are unexplored in federated learning (FL), (b) enables a popular stagewise hyperparameter scheduler, (c) allows heterogeneous and asynchronous local computing. We provide rigorous convergence analysis for the proposed framework. To our best knowledge, this is the first work that thoroughly analyzes the performances of server momentum with a hyperparameter scheduler and system heterogeneity. Extensive experiments validate the effectiveness of our proposed framework.

Combinatorial Gaussian Process Bandits in Bayesian Settings: Theory and Application for Energy-Efficient Navigation. (arXiv:2312.12676v1 [cs.LG])

Authors: Jack Sandberg, Niklas Åkerblom, Morteza Haghir Chehreghani

We consider a combinatorial Gaussian process semi-bandit problem with time-varying arm availability. Each round, an agent is provided a set of available base arms and must select a subset of them to maximize the long-term cumulative reward. Assuming the expected rewards are sampled from a Gaussian process (GP) over the arm space, the agent can efficiently learn. We study the Bayesian setting and provide novel Bayesian regret bounds for three GP-based algorithms: GP-UCB, Bayes-GP-UCB and GP-TS. Our bounds extend previous results for GP-UCB and GP-TS to a combinatorial setting with varying arm availability and to the best of our knowledge, we provide the first Bayesian regret bound for Bayes-GP-UCB. Time-varying arm availability encompasses other widely considered bandit problems such as contextual bandits. We formulate the online energy-efficient navigation problem as a combinatorial and contextual bandit and provide a comprehensive experimental study on synthetic and real-world road networks with detailed simulations. The contextual GP model obtains lower regret and is less dependent on the informativeness of the prior compared to the non-contextual Bayesian inference model. In addition, Thompson sampling obtains lower regret than Bayes-UCB for both the contextual and non-contextual model.

Causal Discovery for fMRI data: Challenges, Solutions, and a Case Study. (arXiv:2312.12678v1 [q-bio.QM])

Authors: Eric Rawls, Bryan Andrews, Kelvin Lim, Erich Kummerfeld

Designing studies that apply causal discovery requires navigating many researcher degrees of freedom. This complexity is exacerbated when the study involves fMRI data. In this paper we (i) describe nine challenges that occur when applying causal discovery to fMRI data, (ii) discuss the space of decisions that need to be made, (iii) review how a recent case study made those decisions, (iv) and identify existing gaps that could potentially be solved by the development of new methods. Overall, causal discovery is a promising approach for analyzing fMRI data, and multiple successful applications have indicated that it is superior to traditional fMRI functional connectivity methods, but current causal discovery methods for fMRI leave room for improvement.

Towards Efficient Verification of Quantized Neural Networks. (arXiv:2312.12679v1 [cs.LG])

Authors: Pei Huang, Haoze Wu, Yuting Yang, Ieva Daukantas, Min Wu, Yedi Zhang, Clark Barrett

Quantization replaces floating point arithmetic with integer arithmetic in deep neural network models, providing more efficient on-device inference with less power and memory. In this work, we propose a framework for formally verifying properties of quantized neural networks. Our baseline technique is based on integer linear programming which guarantees both soundness and completeness. We then show how efficiency can be improved by utilizing gradient-based heuristic search methods and also bound-propagation techniques. We evaluate our approach on perception networks quantized with PyTorch. Our results show that we can verify quantized networks with better scalability and efficiency than the previous state of the art.

How Good Are Deep Generative Models for Solving Inverse Problems?. (arXiv:2312.12691v1 [cs.LG])

Authors: Shichong Peng, Alireza Moazeni, Ke Li

Deep generative models, such as diffusion models, GANs, and IMLE, have shown impressive capability in tackling inverse problems. However, the validity of model-generated solutions w.r.t. the forward problem and the reliability of associated uncertainty estimates remain understudied. This study evaluates recent diffusion-based, GAN-based, and IMLE-based methods on three inverse problems, i.e., $16\times$ super-resolution, colourization, and image decompression. We assess the validity of these models' outputs as solutions to the inverse problems and conduct a thorough analysis of the reliability of the models' estimates of uncertainty over the solution. Overall, we find that the IMLE-based CHIMLE method outperforms other methods in terms of producing valid solutions and reliable uncertainty estimates.

DGCLUSTER: A Neural Framework for Attributed Graph Clustering via Modularity Maximization. (arXiv:2312.12697v1 [cs.LG])

Authors: Aritra Bhowmick, Mert Kosan, Zexi Huang, Ambuj Singh, Sourav Medya

Graph clustering is a fundamental and challenging task in the field of graph mining where the objective is to group the nodes into clusters taking into consideration the topology of the graph. It has several applications in diverse domains spanning social network analysis, recommender systems, computer vision, and bioinformatics. In this work, we propose a novel method, DGCluster, which primarily optimizes the modularity objective using graph neural networks and scales linearly with the graph size. Our method does not require the number of clusters to be specified as a part of the input and can also leverage the availability of auxiliary node level information. We extensively test DGCluster on several real-world datasets of varying sizes, across multiple popular cluster quality metrics. Our approach consistently outperforms the state-of-the-art methods, demonstrating significant performance gains in almost all settings.

Federated Learning with Extremely Noisy Clients via Negative Distillation. (arXiv:2312.12703v1 [cs.LG])

Authors: Yang Lu, Lin Chen, Yonggang Zhang, Yiliang Zhang, Bo Han, Yiu-ming Cheung, Hanzi Wang

Federated learning (FL) has shown remarkable success in cooperatively training deep models, while typically struggling with noisy labels. Advanced works propose to tackle label noise by a re-weighting strategy with a strong assumption, i.e., mild label noise. However, it may be violated in many real-world FL scenarios because of highly contaminated clients, resulting in extreme noise ratios, e.g., $>$90%. To tackle extremely noisy clients, we study the robustness of the re-weighting strategy, showing a pessimistic conclusion: minimizing the weight of clients trained over noisy data outperforms re-weighting strategies. To leverage models trained on noisy clients, we propose a novel approach, called negative distillation (FedNed). FedNed first identifies noisy clients and employs rather than discards the noisy clients in a knowledge distillation manner. In particular, clients identified as noisy ones are required to train models using noisy labels and pseudo-labels obtained by global models. The model trained on noisy labels serves as a `bad teacher' in knowledge distillation, aiming to decrease the risk of providing incorrect information. Meanwhile, the model trained on pseudo-labels is involved in model aggregation if not identified as a noisy client. Consequently, through pseudo-labeling, FedNed gradually increases the trustworthiness of models trained on noisy clients, while leveraging all clients for model aggregation through negative distillation. To verify the efficacy of FedNed, we conduct extensive experiments under various settings, demonstrating that FedNed can consistently outperform baselines and achieve state-of-the-art performance. Our code is available at https://github.com/linChen99/FedNed.

Learning Performance Maximizing Ensembles with Explainability Guarantees. (arXiv:2312.12715v1 [stat.ML])

Authors: Vincent Pisztora, Jia Li

In this paper we propose a method for the optimal allocation of observations between an intrinsically explainable glass box model and a black box model. An optimal allocation being defined as one which, for any given explainability level (i.e. the proportion of observations for which the explainable model is the prediction function), maximizes the performance of the ensemble on the underlying task, and maximizes performance of the explainable model on the observations allocated to it, subject to the maximal ensemble performance condition. The proposed method is shown to produce such explainability optimal allocations on a benchmark suite of tabular datasets across a variety of explainable and black box model types. These learned allocations are found to consistently maintain ensemble performance at very high explainability levels (explaining $74\%$ of observations on average), and in some cases even outperforming both the component explainable and black box models while improving explainability.

BloomVQA: Assessing Hierarchical Multi-modal Comprehension. (arXiv:2312.12716v1 [cs.CV])

Authors: Yunye Gong, Robik Shrestha, Jared Claypoole, Michael Cogswell, Arijit Ray, Christopher Kanan, Ajay Divakaran

We propose a novel VQA dataset, based on picture stories designed for educating young children, that aims to facilitate comprehensive evaluation and characterization of vision-language models on comprehension tasks. Unlike current VQA datasets that often focus on fact-based memorization and simple reasoning tasks without principled scientific grounding, we collect data containing tasks reflecting different levels of comprehension and underlying cognitive processes, as laid out in Bloom's Taxonomy, a classic framework widely adopted in education research. The proposed BloomVQA dataset can be mapped to a hierarchical graph-based representation of visual stories, enabling automatic data augmentation and novel measures characterizing model consistency across the underlying taxonomy. We demonstrate graded evaluation and reliability analysis based on our proposed consistency metrics on state-of-the-art vision-language models. Our results suggest that, while current models achieve the most gain on low-level comprehension tasks, they generally fall short on high-level tasks requiring more advanced comprehension and cognitive skills, as 38.0% drop in VQA accuracy is observed comparing lowest and highest level tasks. Furthermore, current models show consistency patterns misaligned with human comprehension in various scenarios, suggesting emergent structures of model behaviors.

DoDo-Code: a Deep Levenshtein Distance Embedding-based Code for IDS Channel and DNA Storage. (arXiv:2312.12717v1 [cs.IT])

Authors: Alan J.X. Guo, Sihan Sun, Xiang Wei, Mengyi Wei, Xin Chen

Recently, DNA storage has emerged as a promising data storage solution, offering significant advantages in storage density, maintenance cost efficiency, and parallel replication capability. Mathematically, the DNA storage pipeline can be viewed as an insertion, deletion, and substitution (IDS) channel. Because of the mathematical terra incognita of the Levenshtein distance, designing an IDS-correcting code is still a challenge. In this paper, we propose an innovative approach that utilizes deep Levenshtein distance embedding to bypass these mathematical challenges. By representing the Levenshtein distance between two sequences as a conventional distance between their corresponding embedding vectors, the inherent structural property of Levenshtein distance is revealed in the friendly embedding space. Leveraging this embedding space, we introduce the DoDo-Code, an IDS-correcting code that incorporates deep embedding of Levenshtein distance, deep embedding-based codeword search, and deep embedding-based segment correcting. To address the requirements of DNA storage, we also present a preliminary algorithm for long sequence decoding. As far as we know, the DoDo-Code is the first IDS-correcting code designed using plausible deep learning methodologies, potentially paving the way for a new direction in error-correcting code research. It is also the first IDS code that exhibits characteristics of being `optimal' in terms of redundancy, significantly outperforming the mainstream IDS-correcting codes of the Varshamov-Tenengolts code family in code rate.

Progressive Poisoned Data Isolation for Training-time Backdoor Defense. (arXiv:2312.12724v1 [cs.CR])

Authors: Yiming Chen, Haiwei Wu, Jiantao Zhou

Deep Neural Networks (DNN) are susceptible to backdoor attacks where malicious attackers manipulate the model's predictions via data poisoning. It is hence imperative to develop a strategy for training a clean model using a potentially poisoned dataset. Previous training-time defense mechanisms typically employ an one-time isolation process, often leading to suboptimal isolation outcomes. In this study, we present a novel and efficacious defense method, termed Progressive Isolation of Poisoned Data (PIPD), that progressively isolates poisoned data to enhance the isolation accuracy and mitigate the risk of benign samples being misclassified as poisoned ones. Once the poisoned portion of the dataset has been identified, we introduce a selective training process to train a clean model. Through the implementation of these techniques, we ensure that the trained model manifests a significantly diminished attack success rate against the poisoned data. Extensive experiments on multiple benchmark datasets and DNN models, assessed against nine state-of-the-art backdoor attacks, demonstrate the superior performance of our PIPD method for backdoor defense. For instance, our PIPD achieves an average True Positive Rate (TPR) of 99.95% and an average False Positive Rate (FPR) of 0.06% for diverse attacks over CIFAR-10 dataset, markedly surpassing the performance of state-of-the-art methods.

Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy. (arXiv:2312.12728v1 [cs.IR])

Authors: Yao Zhao, Zhitian Xie, Chenyi Zhuang, Jinjie Gu

As Large Language Models (LLMs) have made significant advancements across various tasks, such as question answering, translation, text summarization, and dialogue systems, the need for accuracy in information becomes crucial, especially for serious financial products serving billions of users like Alipay. To address this, Alipay has developed a Retrieval-Augmented Generation (RAG) system that grounds LLMs on the most accurate and up-to-date information. However, for a real-world product serving millions of users, the inference speed of LLMs becomes a critical factor compared to a mere experimental model.

Hence, this paper presents a generic framework for accelerating the inference process, resulting in a substantial increase in speed and cost reduction for our RAG system, with lossless generation accuracy. In the traditional inference process, each token is generated sequentially by the LLM, leading to a time consumption proportional to the number of generated tokens. To enhance this process, our framework, named \textit{lookahead}, introduces a \textit{multi-branch} strategy. Instead of generating a single token at a time, we propose a \textit{Trie-based Retrieval} (TR) process that enables the generation of multiple branches simultaneously, each of which is a sequence of tokens. Subsequently, for each branch, a \textit{Verification and Accept} (VA) process is performed to identify the longest correct sub-sequence as the final output. Our strategy offers two distinct advantages: (1) it guarantees absolute correctness of the output, avoiding any approximation algorithms, and (2) the worst-case performance of our approach is equivalent to the conventional process. We conduct extensive experiments to demonstrate the significant improvements achieved by applying our inference acceleration framework.

Robustly Improving Bandit Algorithms with Confounded and Selection Biased Offline Data: A Causal Approach. (arXiv:2312.12731v1 [cs.LG])

Authors: Wen Huang, Xintao Wu

This paper studies bandit problems where an agent has access to offline data that might be utilized to potentially improve the estimation of each arm's reward distribution. A major obstacle in this setting is the existence of compound biases from the observational data. Ignoring these biases and blindly fitting a model with the biased data could even negatively affect the online learning phase. In this work, we formulate this problem from a causal perspective. First, we categorize the biases into confounding bias and selection bias based on the causal structure they imply. Next, we extract the causal bound for each arm that is robust towards compound biases from biased observational data. The derived bounds contain the ground truth mean reward and can effectively guide the bandit agent to learn a nearly-optimal decision policy. We also conduct regret analysis in both contextual and non-contextual bandit settings and show that prior causal bounds could help consistently reduce the asymptotic regret.

Learning and Forgetting Unsafe Examples in Large Language Models. (arXiv:2312.12736v1 [cs.CL])

Authors: Jiachen Zhao, Zhun Deng, David Madras, James Zou, Mengye Ren

As the number of large language models (LLMs) released to the public grows, there is a pressing need to understand the safety implications associated with these models learning from third-party custom finetuning data. We explore the behavior of LLMs finetuned on noisy custom data containing unsafe content, represented by datasets that contain biases, toxicity, and harmfulness, finding that while aligned LLMs can readily learn this unsafe content, they also tend to forget it more significantly than other examples when subsequently finetuned on safer content. Drawing inspiration from the discrepancies in forgetting, we introduce the "ForgetFilter" algorithm, which filters unsafe data based on how strong the model's forgetting signal is for that data. We demonstrate that the ForgetFilter algorithm ensures safety in customized finetuning without compromising downstream task performance, unlike sequential safety finetuning. ForgetFilter outperforms alternative strategies like replay and moral self-correction in curbing LLMs' ability to assimilate unsafe content during custom finetuning, e.g. 75% lower than not applying any safety measures and 62% lower than using self-correction in toxicity score.

FSscore: A Machine Learning-based Synthetic Feasibility Score Leveraging Human Expertise. (arXiv:2312.12737v1 [cs.LG])

Authors: Rebecca M. Neeser, Bruno Correia, Philippe Schwaller

Determining whether a molecule can be synthesized is crucial for many aspects of chemistry and drug discovery, allowing prioritization of experimental work and ranking molecules in de novo design tasks. Existing scoring approaches to assess synthetic feasibility struggle to extrapolate to out-of-distribution chemical spaces or fail to discriminate based on minor differences such as chirality that might be obvious to trained chemists. This work aims to address these limitations by introducing the Focused Synthesizability score (FSscore), which learns to rank structures based on binary preferences using a graph attention network. First, a baseline trained on an extensive set of reactant-product pairs is established that subsequently is fine-tuned with expert human feedback on a chemical space of interest. Fine-tuning on focused datasets improves performance on these chemical scopes over the pre-trained model exhibiting moderate performance and generalizability. This enables distinguishing hard- from easy-to-synthesize molecules and improving the synthetic accessibility of generative model outputs. On very complex scopes with limited labels achieving satisfactory gains remains challenging. The FSscore showcases how human expert feedback can be utilized to optimize the assessment of synthetic feasibility for a variety of applications.

Locally Optimal Fixed-Budget Best Arm Identification in Two-Armed Gaussian Bandits with Unknown Variances. (arXiv:2312.12741v1 [cs.LG])

Authors: Masahiro Kato

We address the problem of best arm identification (BAI) with a fixed budget for two-armed Gaussian bandits. In BAI, given multiple arms, we aim to find the best arm, an arm with the highest expected reward, through an adaptive experiment. Kaufmann et al. (2016) develops a lower bound for the probability of misidentifying the best arm. They also propose a strategy, assuming that the variances of rewards are known, and show that it is asymptotically optimal in the sense that its probability of misidentification matches the lower bound as the budget approaches infinity. However, an asymptotically optimal strategy is unknown when the variances are unknown. For this open issue, we propose a strategy that estimates variances during an adaptive experiment and draws arms with a ratio of the estimated standard deviations. We refer to this strategy as the Neyman Allocation (NA)-Augmented Inverse Probability weighting (AIPW) strategy. We then demonstrate that this strategy is asymptotically optimal by showing that its probability of misidentification matches the lower bound when the budget approaches infinity, and the gap between the expected rewards of two arms approaches zero (small-gap regime). Our results suggest that under the worst-case scenario characterized by the small-gap regime, our strategy, which employs estimated variance, is asymptotically optimal even when the variances are unknown.

3D-CLMI: A Motor Imagery EEG Classification Model via Fusion of 3D-CNN and LSTM with Attention. (arXiv:2312.12744v1 [cs.HC])

Authors: Shiwei Cheng, Yuejiang Hao

Due to the limitations in the accuracy and robustness of current electroencephalogram (EEG) classification algorithms, applying motor imagery (MI) for practical Brain-Computer Interface (BCI) applications remains challenging. This paper proposed a model that combined a three-dimensional convolutional neural network (CNN) with a long short-term memory (LSTM) network with attention to classify MI-EEG signals. This model combined MI-EEG signals from different channels into three-dimensional features and extracted spatial features through convolution operations with multiple three-dimensional convolutional kernels of different scales. At the same time, to ensure the integrity of the extracted MI-EEG signal temporal features, the LSTM network was directly trained on the preprocessed raw signal. Finally, the features obtained from these two networks were combined and used for classification. Experimental results showed that this model achieved a classification accuracy of 92.7% and an F1-score of 0.91 on the public dataset BCI Competition IV dataset 2a, which were both higher than the state-of-the-art models in the field of MI tasks. Additionally, 12 participants were invited to complete a four-class MI task in our lab, and experiments on the collected dataset showed that the 3D-CLMI model also maintained the highest classification accuracy and F1-score. The model greatly improved the classification accuracy of users' motor imagery intentions, giving brain-computer interfaces better application prospects in emerging fields such as autonomous vehicles and medical rehabilitation.

ALMANACS: A Simulatability Benchmark for Language Model Explainability. (arXiv:2312.12747v1 [cs.LG])

Authors: Edmund Mills, Shiye Su, Stuart Russell, Scott Emmons

How do we measure the efficacy of language model explainability methods? While many explainability methods have been developed, they are typically evaluated on bespoke tasks, preventing an apples-to-apples comparison. To help fill this gap, we present ALMANACS, a language model explainability benchmark. ALMANACS scores explainability methods on simulatability, i.e., how well the explanations improve behavior prediction on new inputs. The ALMANACS scenarios span twelve safety-relevant topics such as ethical reasoning and advanced AI behaviors; they have idiosyncratic premises to invoke model-specific behavior; and they have a train-test distributional shift to encourage faithful explanations. By using another language model to predict behavior based on the explanations, ALMANACS is a fully automated benchmark. We use ALMANACS to evaluate counterfactuals, rationalizations, attention, and Integrated Gradients explanations. Our results are sobering: when averaged across all topics, no explanation method outperforms the explanation-free control. We conclude that despite modest successes in prior work, developing an explanation method that aids simulatability in ALMANACS remains an open challenge.

Segmenting Messy Text: Detecting Boundaries in Text Derived from Historical Newspaper Images. (arXiv:2312.12773v1 [cs.CV])

Authors: Carol Anderson, Phil Crone (Ancestry.com)

Text segmentation, the task of dividing a document into sections, is often a prerequisite for performing additional natural language processing tasks. Existing text segmentation methods have typically been developed and tested using clean, narrative-style text with segments containing distinct topics. Here we consider a challenging text segmentation task: dividing newspaper marriage announcement lists into units of one announcement each. In many cases the information is not structured into sentences, and adjacent segments are not topically distinct from each other. In addition, the text of the announcements, which is derived from images of historical newspapers via optical character recognition, contains many typographical errors. As a result, these announcements are not amenable to segmentation with existing techniques. We present a novel deep learning-based model for segmenting such text and show that it significantly outperforms an existing state-of-the-art method on our task.

DynaLay: An Introspective Approach to Dynamic Layer Selection for Deep Networks. (arXiv:2312.12781v1 [cs.LG])

Authors: Mrinal Mathur, Sergey Plis

Deep learning models have become increasingly computationally intensive, requiring extensive computational resources and time for both training and inference. A significant contributing factor to this challenge is the uniform computational effort expended on each input example, regardless of its complexity. We introduce \textbf{DynaLay}, an alternative architecture that features a decision-making agent to adaptively select the most suitable layers for processing each input, thereby endowing the model with a remarkable level of introspection. DynaLay reevaluates more complex inputs during inference, adjusting the computational effort to optimize both performance and efficiency. The core of the system is a main model equipped with Fixed-Point Iterative (FPI) layers, capable of accurately approximating complex functions, paired with an agent that chooses these layers or a direct action based on the introspection of the models inner state. The model invests more time in processing harder examples, while minimal computation is required for easier ones. This introspective approach is a step toward developing deep learning models that "think" and "ponder", rather than "ballistically'' produce answers. Our experiments demonstrate that DynaLay achieves accuracy comparable to conventional deep models while significantly reducing computational demands.

Fast Cell Library Characterization for Design Technology Co-Optimization Based on Graph Neural Networks. (arXiv:2312.12784v1 [cs.LG])

Authors: Tianliang Ma, Zhihui Deng, Xuguang Sun, Leilai Shao

Design technology co-optimization (DTCO) plays a critical role in achieving optimal power, performance, and area (PPA) for advanced semiconductor process development. Cell library characterization is essential in DTCO flow, but traditional methods are time-consuming and costly. To overcome these challenges, we propose a graph neural network (GNN)-based machine learning model for rapid and accurate cell library characterization. Our model incorporates cell structures and demonstrates high prediction accuracy across various process-voltage-temperature (PVT) corners and technology parameters. Validation with 512 unseen technology corners and over one million test data points shows accurate predictions of delay, power, and input pin capacitance for 33 types of cells, with a mean absolute percentage error (MAPE) $\le$ 0.95% and a speed-up of 100X compared with SPICE simulations. Additionally, we investigate system-level metrics such as worst negative slack (WNS), leakage power, and dynamic power using predictions obtained from the GNN-based model on unseen corners. Our model achieves precise predictions, with absolute error $\le$3.0 ps for WNS, percentage errors $\le$0.60% for leakage power, and $\le$0.99% for dynamic power, when compared to golden reference. With the developed model, we further proposed a fine-grained drive strength interpolation methodology to enhance PPA for small-to-medium-scale designs, resulting in an approximate 1-3% improvement.

SLP-Net:An efficient lightweight network for segmentation of skin lesions. (arXiv:2312.12789v1 [eess.IV])

Authors: Bo Yang, Hong Peng, Chenggang Guo, Xiaohui Luo, Jun Wang, Xianzhong Long

Prompt treatment for melanoma is crucial. To assist physicians in identifying lesion areas precisely in a quick manner, we propose a novel skin lesion segmentation technique namely SLP-Net, an ultra-lightweight segmentation network based on the spiking neural P(SNP) systems type mechanism. Most existing convolutional neural networks achieve high segmentation accuracy while neglecting the high hardware cost. SLP-Net, on the contrary, has a very small number of parameters and a high computation speed. We design a lightweight multi-scale feature extractor without the usual encoder-decoder structure. Rather than a decoder, a feature adaptation module is designed to replace it and implement multi-scale information decoding. Experiments at the ISIC2018 challenge demonstrate that the proposed model has the highest Acc and DSC among the state-of-the-art methods, while experiments on the PH2 dataset also demonstrate a favorable generalization ability. Finally, we compare the computational complexity as well as the computational speed of the models in experiments, where SLP-Net has the highest overall superiority

Model-Based Control with Sparse Neural Dynamics. (arXiv:2312.12791v1 [cs.RO])

Authors: Ziang Liu, Genggeng Zhou, Jeff He, Tobia Marcucci, Li Fei-Fei, Jiajun Wu, Yunzhu Li

Learning predictive models from observations using deep neural networks (DNNs) is a promising new approach to many real-world planning and control problems. However, common DNNs are too unstructured for effective planning, and current control methods typically rely on extensive sampling or local gradient descent. In this paper, we propose a new framework for integrated model learning and predictive control that is amenable to efficient optimization algorithms. Specifically, we start with a ReLU neural model of the system dynamics and, with minimal losses in prediction accuracy, we gradually sparsify it by removing redundant neurons. This discrete sparsification process is approximated as a continuous problem, enabling an end-to-end optimization of both the model architecture and the weight parameters. The sparsified model is subsequently used by a mixed-integer predictive controller, which represents the neuron activations as binary variables and employs efficient branch-and-bound algorithms. Our framework is applicable to a wide variety of DNNs, from simple multilayer perceptrons to complex graph neural dynamics. It can efficiently handle tasks involving complicated contact dynamics, such as object pushing, compositional object sorting, and manipulation of deformable objects. Numerical and hardware experiments show that, despite the aggressive sparsification, our framework can deliver better closed-loop performance than existing state-of-the-art methods.

Bandit Sequential Posted Pricing via Half-Concavity. (arXiv:2312.12794v1 [cs.LG])

Authors: Sahil Singla, Yifan Wang

Sequential posted pricing auctions are popular because of their simplicity in practice and their tractability in theory. A usual assumption in their study is that the Bayesian prior distributions of the buyers are known to the seller, while in reality these priors can only be accessed from historical data. To overcome this assumption, we study sequential posted pricing in the bandit learning model, where the seller interacts with $n$ buyers over $T$ rounds: In each round the seller posts $n$ prices for the $n$ buyers and the first buyer with a valuation higher than the price takes the item. The only feedback that the seller receives in each round is the revenue.

Our main results obtain nearly-optimal regret bounds for single-item sequential posted pricing in the bandit learning model. In particular, we achieve an $\tilde{O}(\mathsf{poly}(n)\sqrt{T})$ regret for buyers with (Myerson's) regular distributions and an $\tilde{O}(\mathsf{poly}(n)T^{{2}/{3}})$ regret for buyers with general distributions, both of which are tight in the number of rounds $T$. Our result for regular distributions was previously not known even for the single-buyer setting and relies on a new half-concavity property of the revenue function in the value space. For $n$ sequential buyers, our technique is to run a generalized single-buyer algorithm for all the buyers and to carefully bound the regret from the sub-optimal pricing of the suffix buyers.

Near-Optimal Resilient Aggregation Rules for Distributed Learning Using 1-Center and 1-Mean Clustering with Outliers. (arXiv:2312.12835v1 [cs.LG])

Authors: Yuhao Yi, Ronghui You, Hong Liu, Changxin Liu, Yuan Wang, Jiancheng Lv

Byzantine machine learning has garnered considerable attention in light of the unpredictable faults that can occur in large-scale distributed learning systems. The key to secure resilience against Byzantine machines in distributed learning is resilient aggregation mechanisms. Although abundant resilient aggregation rules have been proposed, they are designed in ad-hoc manners, imposing extra barriers on comparing, analyzing, and improving the rules across performance criteria. This paper studies near-optimal aggregation rules using clustering in the presence of outliers. Our outlier-robust clustering approach utilizes geometric properties of the update vectors provided by workers. Our analysis show that constant approximations to the 1-center and 1-mean clustering problems with outliers provide near-optimal resilient aggregators for metric-based criteria, which have been proven to be crucial in the homogeneous and heterogeneous cases respectively. In addition, we discuss two contradicting types of attacks under which no single aggregation rule is guaranteed to improve upon the naive average. Based on the discussion, we propose a two-phase resilient aggregation framework. We run experiments for image classification using a non-convex loss function. The proposed algorithms outperform previously known aggregation rules by a large margin with both homogeneous and heterogeneous data distributions among non-faulty workers. Code and appendix are available at https://github.com/jerry907/AAAI24-RASHB.

FedA3I: Annotation Quality-Aware Aggregation for Federated Medical Image Segmentation Against Heterogeneous Annotation Noise. (arXiv:2312.12838v1 [cs.LG])

Authors: Nannan Wu, Zhaobin Sun, Zengqiang Yan, Li Yu

Federated learning (FL) has emerged as a promising paradigm for training segmentation models on decentralized medical data, owing to its privacy-preserving property. However, existing research overlooks the prevalent annotation noise encountered in real-world medical datasets, which limits the performance ceilings of FL. In this paper, we, for the first time, identify and tackle this problem. For problem formulation, we propose a contour evolution for modeling non-independent and identically distributed (Non-IID) noise across pixels within each client and then extend it to the case of multi-source data to form a heterogeneous noise model (\textit{i.e.}, Non-IID annotation noise across clients). For robust learning from annotations with such two-level Non-IID noise, we emphasize the importance of data quality in model aggregation, allowing high-quality clients to have a greater impact on FL. To achieve this, we propose \textbf{Fed}erated learning with \textbf{A}nnotation qu\textbf{A}lity-aware \textbf{A}ggregat\textbf{I}on, named \textbf{FedA$^3$I}, by introducing a quality factor based on client-wise noise estimation. Specifically, noise estimation at each client is accomplished through the Gaussian mixture model and then incorporated into model aggregation in a layer-wise manner to up-weight high-quality clients. Extensive experiments on two real-world medical image segmentation datasets demonstrate the superior performance of FedA$^3$I against the state-of-the-art approaches in dealing with cross-client annotation noise. The code is available at \color{blue}{https://github.com/wnn2000/FedAAAI}.

Comparing Machine Learning Algorithms by Union-Free Generic Depth. (arXiv:2312.12839v1 [cs.LG])

Authors: Hannah Blocher, Georg Schollmeyer, Malte Nalenz, Christoph Jansen

We propose a framework for descriptively analyzing sets of partial orders based on the concept of depth functions. Despite intensive studies in linear and metric spaces, there is very little discussion on depth functions for non-standard data types such as partial orders. We introduce an adaptation of the well-known simplicial depth to the set of all partial orders, the union-free generic (ufg) depth. Moreover, we utilize our ufg depth for a comparison of machine learning algorithms based on multidimensional performance measures. Concretely, we provide two examples of classifier comparisons on samples of standard benchmark data sets. Our results demonstrate promisingly the wide variety of different analysis approaches based on ufg methods. Furthermore, the examples outline that our approach differs substantially from existing benchmarking approaches, and thus adds a new perspective to the vivid debate on classifier comparison.

Causal Discovery under Identifiable Heteroscedastic Noise Model. (arXiv:2312.12844v1 [cs.LG])

Authors: Naiyu Yin, Tian Gao, Yue Yu, Qiang Ji

Capturing the underlying structural causal relations represented by Directed Acyclic Graphs (DAGs) has been a fundamental task in various AI disciplines. Causal DAG learning via the continuous optimization framework has recently achieved promising performance in terms of both accuracy and efficiency. However, most methods make strong assumptions of homoscedastic noise, i.e., exogenous noises have equal variances across variables, observations, or even both. The noises in real data usually violate both assumptions due to the biases introduced by different data collection processes. To address the issue of heteroscedastic noise, we introduce relaxed and implementable sufficient conditions, proving the identifiability of a general class of SEM subject to these conditions. Based on the identifiable general SEM, we propose a novel formulation for DAG learning that accounts for the variation in noise variance across variables and observations. We then propose an effective two-phase iterative DAG learning algorithm to address the increasing optimization difficulties and to learn a causal DAG from data with heteroscedastic variable noise under varying variance. We show significant empirical gains of the proposed approaches over state-of-the-art methods on both synthetic data and real data.

Divergences induced by dual subtractive and divisive normalizations of exponential families and their convex deformations. (arXiv:2312.12849v1 [cs.IT])

Authors: Frank Nielsen

Exponential families are statistical models which are the workhorses in statistics, information theory, and machine learning. An exponential family can either be normalized subtractively by its cumulant function or equivalently normalized divisively by its partition function. Both subtractive and divisive normalizers are strictly convex and smooth functions inducing pairs of Bregman and Jensen divergences. It is well-known that skewed Bhattacharryya distances between probability densities of an exponential family amounts to skewed Jensen divergences induced by the cumulant function between their corresponding natural parameters, and in limit cases that the sided Kullback-Leibler divergences amount to reverse-sided Bregman divergences. In this note, we first show that the $\alpha$-divergences between unnormalized densities of an exponential family amounts scaled $\alpha$-skewed Jensen divergences induced by the partition function. We then show how comparative convexity with respect to a pair of quasi-arithmetic means allows to deform convex functions and define dually flat spaces with corresponding divergences when ordinary convexity is preserved.

SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing. (arXiv:2312.12856v1 [cs.CV])

Authors: Zhecheng Wang, Rajanie Prabha, Tianyuan Huang, Jiajun Wu, Ram Rajagopal

Remote sensing imagery, despite its broad applications in helping achieve Sustainable Development Goals and tackle climate change, has not yet benefited from the recent advancements of versatile, task-agnostic vision language models (VLMs). A key reason is that the large-scale, semantically diverse image-text dataset required for developing VLMs is still absent for remote sensing images. Unlike natural images, remote sensing images and their associated text descriptions cannot be efficiently collected from the public Internet at scale. In this work, we bridge this gap by using geo-coordinates to automatically connect open, unlabeled remote sensing images with rich semantics covered in OpenStreetMap, and thus construct SkyScript, a comprehensive vision-language dataset for remote sensing images, comprising 2.6 million image-text pairs covering 29K distinct semantic tags. With continual pre-training on this dataset, we obtain a VLM that surpasses baseline models with a 6.2% average accuracy gain in zero-shot scene classification across seven benchmark datasets. It also demonstrates the ability of zero-shot transfer for fine-grained object attribute classification and cross-modal retrieval. We hope this dataset can support the advancement of VLMs for various multi-modal tasks in remote sensing, such as open-vocabulary classification, retrieval, captioning, and text-to-image synthesis.

Federated Learning While Providing Model as a Service: Joint Training and Inference Optimization. (arXiv:2312.12863v1 [cs.DC])

Authors: Pengchao Han, Shiqiang Wang, Yang Jiao, Jianwei Huang

While providing machine learning model as a service to process users' inference requests, online applications can periodically upgrade the model utilizing newly collected data. Federated learning (FL) is beneficial for enabling the training of models across distributed clients while keeping the data locally. However, existing work has overlooked the coexistence of model training and inference under clients' limited resources. This paper focuses on the joint optimization of model training and inference to maximize inference performance at clients. Such an optimization faces several challenges. The first challenge is to characterize the clients' inference performance when clients may partially participate in FL. To resolve this challenge, we introduce a new notion of age of model (AoM) to quantify client-side model freshness, based on which we use FL's global model convergence error as an approximate measure of inference performance. The second challenge is the tight coupling among clients' decisions, including participation probability in FL, model download probability, and service rates. Toward the challenges, we propose an online problem approximation to reduce the problem complexity and optimize the resources to balance the needs of model training and inference. Experimental results demonstrate that the proposed algorithm improves the average inference accuracy by up to 12%.

Parameterized Projected Bellman Operator. (arXiv:2312.12869v1 [cs.LG])

Authors: Théo Vincent, Alberto Maria Metelli, Boris Belousov, Jan Peters, Marcello Restelli, Carlo D'Eramo

Approximate value iteration~(AVI) is a family of algorithms for reinforcement learning~(RL) that aims to obtain an approximation of the optimal value function. Generally, AVI algorithms implement an iterated procedure where each step consists of (i) an application of the Bellman operator and (ii) a projection step into a considered function space. Notoriously, the Bellman operator leverages transition samples, which strongly determine its behavior, as uninformative samples can result in negligible updates or long detours, whose detrimental effects are further exacerbated by the computationally intensive projection step. To address these issues, we propose a novel alternative approach based on learning an approximate version of the Bellman operator rather than estimating it through samples as in AVI approaches. This way, we are able to (i) generalize across transition samples and (ii) avoid the computationally intensive projection step. For this reason, we call our novel operator projected Bellman operator (PBO). We formulate an optimization problem to learn PBO for generic sequential decision-making problems, and we theoretically analyze its properties in two representative classes of RL problems. Furthermore, we theoretically study our approach under the lens of AVI and devise algorithmic implementations to learn PBO in offline and online settings by leveraging neural network parameterizations. Finally, we empirically showcase the benefits of PBO w.r.t. the regular Bellman operator on several RL problems.

Effect Size Estimation for Duration Recommendation in Online Experiments: Leveraging Hierarchical Models and Objective Utility Approaches. (arXiv:2312.12871v1 [cs.LG])

Authors: Yu Liu, Runzhe Wan, James McQueen, Doug Hains, Jinxiang Gu, Rui Song

The selection of the assumed effect size (AES) critically determines the duration of an experiment, and hence its accuracy and efficiency. Traditionally, experimenters determine AES based on domain knowledge. However, this method becomes impractical for online experimentation services managing numerous experiments, and a more automated approach is hence of great demand. We initiate the study of data-driven AES selection in for online experimentation services by introducing two solutions. The first employs a three-layer Gaussian Mixture Model considering the heteroskedasticity across experiments, and it seeks to estimate the true expected effect size among positive experiments. The second method, grounded in utility theory, aims to determine the optimal effect size by striking a balance between the experiment's cost and the precision of decision-making. Through comparisons with baseline methods using both simulated and real data, we showcase the superior performance of the proposed approaches.

Rule-Extraction Methods From Feedforward Neural Networks: A Systematic Literature Review. (arXiv:2312.12878v1 [cs.LG])

Authors: Sara El Mekkaoui, Loubna Benabbou, Abdelaziz Berrado

Motivated by the interpretability question in ML models as a crucial element for the successful deployment of AI systems, this paper focuses on rule extraction as a means for neural networks interpretability. Through a systematic literature review, different approaches for extracting rules from feedforward neural networks, an important block in deep learning models, are identified and explored. The findings reveal a range of methods developed for over two decades, mostly suitable for shallow neural networks, with recent developments to meet deep learning models' challenges. Rules offer a transparent and intuitive means of explaining neural networks, making this study a comprehensive introduction for researchers interested in the field. While the study specifically addresses feedforward networks with supervised learning and crisp rules, future work can extend to other network types, machine learning methods, and fuzzy rule extraction.

Testing the Segment Anything Model on radiology data. (arXiv:2312.12880v1 [eess.IV])

Authors: José Guilherme de Almeida, Nuno M. Rodrigues, Sara Silva, Nickolas Papanikolaou

Deep learning models trained with large amounts of data have become a recent and effective approach to predictive problem solving -- these have become known as "foundation models" as they can be used as fundamental tools for other applications. While the paramount examples of image classification (earlier) and large language models (more recently) led the way, the Segment Anything Model (SAM) was recently proposed and stands as the first foundation model for image segmentation, trained on over 10 million images and with recourse to over 1 billion masks. However, the question remains -- what are the limits of this foundation? Given that magnetic resonance imaging (MRI) stands as an important method of diagnosis, we sought to understand whether SAM could be used for a few tasks of zero-shot segmentation using MRI data. Particularly, we wanted to know if selecting masks from the pool of SAM predictions could lead to good segmentations.

Here, we provide a critical assessment of the performance of SAM on magnetic resonance imaging data. We show that, while acceptable in a very limited set of cases, the overall trend implies that these models are insufficient for MRI segmentation across the whole volume, but can provide good segmentations in a few, specific slices. More importantly, we note that while foundation models trained on natural images are set to become key aspects of predictive modelling, they may prove ineffective when used on other imaging modalities.

BSL: Understanding and Improving Softmax Loss for Recommendation. (arXiv:2312.12882v1 [cs.LG])

Authors: Junkang Wu, Jiawei Chen, Jiancan Wu, Wentao Shi, Jizhi Zhang, Xiang Wang

Loss functions steer the optimization direction of recommendation models and are critical to model performance, but have received relatively little attention in recent recommendation research. Among various losses, we find Softmax loss (SL) stands out for not only achieving remarkable accuracy but also better robustness and fairness. Nevertheless, the current literature lacks a comprehensive explanation for the efficacy of SL. Toward addressing this research gap, we conduct theoretical analyses on SL and uncover three insights: 1) Optimizing SL is equivalent to performing Distributionally Robust Optimization (DRO) on the negative data, thereby learning against perturbations on the negative distribution and yielding robustness to noisy negatives. 2) Comparing with other loss functions, SL implicitly penalizes the prediction variance, resulting in a smaller gap between predicted values and and thus producing fairer results. Building on these insights, we further propose a novel loss function Bilateral SoftMax Loss (BSL) that extends the advantage of SL to both positive and negative sides. BSL augments SL by applying the same Log-Expectation-Exp structure to positive examples as is used for negatives, making the model robust to the noisy positives as well. Remarkably, BSL is simple and easy-to-implement -- requiring just one additional line of code compared to SL. Experiments on four real-world datasets and three representative backbones demonstrate the effectiveness of our proposal. The code is available at https://github.com/junkangwu/BSL

A Minimal Control Family of Dynamical Syetem for Universal Approximation. (arXiv:2312.12903v1 [eess.SY])

Authors: Yifei Duan, Yongqiang Cai

The universal approximation property (UAP) of neural networks is a fundamental characteristic of deep learning. It is widely recognized that a composition of linear functions and non-linear functions, such as the rectified linear unit (ReLU) activation function, can approximate continuous functions on compact domains. In this paper, we extend this efficacy to the scenario of dynamical systems with controls. We prove that the control family $\mathcal{F}_1 = \mathcal{F}_0 \cup \{ \text{ReLU}(\cdot)\} $ is enough to generate flow maps that can uniformly approximate diffeomorphisms of $\mathbb{R}^d$ on any compact domain, where $\mathcal{F}_0 = \{x \mapsto Ax+b: A\in \mathbb{R}^{d\times d}, b \in \mathbb{R}^d\}$ is the set of linear maps and the dimension $d\ge2$. Since $\mathcal{F}_1$ contains only one nonlinear function and $\mathcal{F}_0$ does not hold the UAP, we call $\mathcal{F}_1$ a minimal control family for UAP. Based on this, some sufficient conditions, such as the affine invariance, on the control family are established and discussed. Our result reveals an underlying connection between the approximation power of neural networks and control systems.

PGN: A perturbation generation network against deep reinforcement learning. (arXiv:2312.12904v1 [cs.LG])

Authors: Xiangjuan Li, Feifan Li, Yang Li, Quan Pan

Deep reinforcement learning has advanced greatly and applied in many areas. In this paper, we explore the vulnerability of deep reinforcement learning by proposing a novel generative model for creating effective adversarial examples to attack the agent. Our proposed model can achieve both targeted attacks and untargeted attacks. Considering the specificity of deep reinforcement learning, we propose the action consistency ratio as a measure of stealthiness, and a new measurement index of effectiveness and stealthiness. Experiment results show that our method can ensure the effectiveness and stealthiness of attack compared with other algorithms. Moreover, our methods are considerably faster and thus can achieve rapid and efficient verification of the vulnerability of deep reinforcement learning.

Energy-efficient Spiking Neural Network Equalization for IM/DD Systems with Optimized Neural Encoding. (arXiv:2312.12909v1 [eess.SP])

Authors: Alexander von Bank, Eike-Manuel Edelmann, Laurent Schmalen

We propose an energy-efficient equalizer for IM/DD systems based on spiking neural networks. We optimize a neural spike encoding that boosts the equalizer's performance while decreasing energy consumption.

Stability of Graph Convolutional Neural Networks through the lens of small perturbation analysis. (arXiv:2312.12934v1 [cs.LG])

Authors: Lucia Testa, Claudio Battiloro, Stefania Sardellitti, Sergio Barbarossa

In this work, we study the problem of stability of Graph Convolutional Neural Networks (GCNs) under random small perturbations in the underlying graph topology, i.e. under a limited number of insertions or deletions of edges. We derive a novel bound on the expected difference between the outputs of unperturbed and perturbed GCNs. The proposed bound explicitly depends on the magnitude of the perturbation of the eigenpairs of the Laplacian matrix, and the perturbation explicitly depends on which edges are inserted or deleted. Then, we provide a quantitative characterization of the effect of perturbing specific edges on the stability of the network. We leverage tools from small perturbation analysis to express the bounds in closed, albeit approximate, form, in order to enhance interpretability of the results, without the need to compute any perturbed shift operator. Finally, we numerically evaluate the effectiveness of the proposed bound.

Robust Loss Functions for Training Decision Trees with Noisy Labels. (arXiv:2312.12937v1 [cs.LG])

Authors: Jonathan Wilton, Nan Ye

We consider training decision trees using noisily labeled data, focusing on loss functions that can lead to robust learning algorithms. Our contributions are threefold. First, we offer novel theoretical insights on the robustness of many existing loss functions in the context of decision tree learning. We show that some of the losses belong to a class of what we call conservative losses, and the conservative losses lead to an early stopping behavior during training and noise-tolerant predictions during testing. Second, we introduce a framework for constructing robust loss functions, called distribution losses. These losses apply percentile-based penalties based on an assumed margin distribution, and they naturally allow adapting to different noise rates via a robustness parameter. In particular, we introduce a new loss called the negative exponential loss, which leads to an efficient greedy impurity-reduction learning algorithm. Lastly, our experiments on multiple datasets and noise settings validate our theoretical insight and the effectiveness of our adaptive negative exponential loss.

Misclassification excess risk bounds for 1-bit matrix completion. (arXiv:2312.12945v1 [cs.LG])

Authors: The Tien Mai

This study investigates the misclassification excess risk bound in the context of 1-bit matrix completion, a significant problem in machine learning involving the recovery of an unknown matrix from a limited subset of its entries. Matrix completion has garnered considerable attention in the last two decades due to its diverse applications across various fields. Unlike conventional approaches that deal with real-valued samples, 1-bit matrix completion is concerned with binary observations. While prior research has predominantly focused on the estimation error of proposed estimators, our study shifts attention to the prediction error. This paper offers theoretical analysis regarding the prediction errors of two previous works utilizing the logistic regression model: one employing a max-norm constrained minimization and the other employing nuclear-norm penalization. Significantly, our findings demonstrate that the latter achieves the minimax-optimal rate without the need for an additional logarithmic term. These novel results contribute to a deeper understanding of 1-bit matrix completion by shedding light on the predictive performance of specific methodologies.

Class Conditional Time Series Generation with Structured Noise Space GAN. (arXiv:2312.12946v1 [cs.LG])

Authors: Hamidreza Gholamrezaei, Alireza Koochali, Andreas Dengel, Sheraz Ahmed

This paper introduces Structured Noise Space GAN (SNS-GAN), a novel approach in the field of generative modeling specifically tailored for class-conditional generation in both image and time series data. It addresses the challenge of effectively integrating class labels into generative models without requiring structural modifications to the network. The SNS-GAN method embeds class conditions within the generator's noise space, simplifying the training process and enhancing model versatility. The model's efficacy is demonstrated through qualitative validations in the image domain and superior performance in time series generation compared to baseline models. This research opens new avenues for the application of GANs in various domains, including but not limited to time series and image data generation.

From Past to Future: Rethinking Eligibility Traces. (arXiv:2312.12972v1 [cs.LG])

Authors: Dhawal Gupta, Scott M. Jordan, Shreyas Chaudhari, Bo Liu, Philip S. Thomas, Bruno Castro da Silva

In this paper, we introduce a fresh perspective on the challenges of credit assignment and policy evaluation. First, we delve into the nuances of eligibility traces and explore instances where their updates may result in unexpected credit assignment to preceding states. From this investigation emerges the concept of a novel value function, which we refer to as the \emph{bidirectional value function}. Unlike traditional state value functions, bidirectional value functions account for both future expected returns (rewards anticipated from the current state onward) and past expected returns (cumulative rewards from the episode's start to the present). We derive principled update equations to learn this value function and, through experimentation, demonstrate its efficacy in enhancing the process of policy evaluation. In particular, our results indicate that the proposed learning approach can, in certain challenging contexts, perform policy evaluation more rapidly than TD($\lambda$) -- a method that learns forward value functions, $v^\pi$, \emph{directly}. Overall, our findings present a new perspective on eligibility traces and potential advantages associated with the novel value function it inspires, especially for policy evaluation.

Sparse Mean Field Load Balancing in Large Localized Queueing Systems. (arXiv:2312.12973v1 [cs.DC])

Authors: Anam Tahir, Kai Cui, Heinz Koeppl

Scalable load balancing algorithms are of great interest in cloud networks and data centers, necessitating the use of tractable techniques to compute optimal load balancing policies for good performance. However, most existing scalable techniques, especially asymptotically scaling methods based on mean field theory, have not been able to model large queueing networks with strong locality. Meanwhile, general multi-agent reinforcement learning techniques can be hard to scale and usually lack a theoretical foundation. In this work, we address this challenge by leveraging recent advances in sparse mean field theory to learn a near-optimal load balancing policy in sparsely connected queueing networks in a tractable manner, which may be preferable to global approaches in terms of communication overhead. Importantly, we obtain a general load balancing framework for a large class of sparse bounded-degree topologies. By formulating a novel mean field control problem in the context of graphs with bounded degree, we reduce the otherwise difficult multi-agent problem to a single-agent problem. Theoretically, the approach is justified by approximation guarantees. Empirically, the proposed methodology performs well on several realistic and scalable network topologies. Moreover, we compare it with a number of well-known load balancing heuristics and with existing scalable multi-agent reinforcement learning methods. Overall, we obtain a tractable approach for load balancing in highly localized networks.

Collaborative Optimization of the Age of Information under Partial Observability. (arXiv:2312.12977v1 [cs.MA])

Authors: Anam Tahir, Kai Cui, Bastian Alt, Amr Rizk, Heinz Koeppl

The significance of the freshness of sensor and control data at the receiver side, often referred to as Age of Information (AoI), is fundamentally constrained by contention for limited network resources. Evidently, network congestion is detrimental for AoI, where this congestion is partly self-induced by the sensor transmission process in addition to the contention from other transmitting sensors. In this work, we devise a decentralized AoI-minimizing transmission policy for a number of sensor agents sharing capacity-limited, non-FIFO duplex channels that introduce random delays in communication with a common receiver. By implementing the same policy, however with no explicit inter-agent communication, the agents minimize the expected AoI in this partially observable system. We cater to the partial observability due to random channel delays by designing a bootstrap particle filter that independently maintains a belief over the AoI of each agent. We also leverage mean-field control approximations and reinforcement learning to derive scalable and optimal solutions for minimizing the expected AoI collaboratively.

Benchmarking and Analyzing In-context Learning, Fine-tuning and Supervised Learning for Biomedical Knowledge Curation: a focused study on chemical entities of biological interest. (arXiv:2312.12989v1 [cs.LG])

Authors: Emily Groves, Minhong Wang, Yusuf Abdulle, Holger Kunz, Jason Hoelscher-Obermaier, Ronin Wu, Honghan Wu

Automated knowledge curation for biomedical ontologies is key to ensure that they remain comprehensive, high-quality and up-to-date. In the era of foundational language models, this study compares and analyzes three NLP paradigms for curation tasks: in-context learning (ICL), fine-tuning (FT), and supervised learning (ML). Using the Chemical Entities of Biological Interest (ChEBI) database as a model ontology, three curation tasks were devised. For ICL, three prompting strategies were employed with GPT-4, GPT-3.5, BioGPT. PubmedBERT was chosen for the FT paradigm. For ML, six embedding models were utilized for training Random Forest and Long-Short Term Memory models. Five setups were designed to assess ML and FT model performance across different data availability scenarios.Datasets for curation tasks included: task 1 (620,386), task 2 (611,430), and task 3 (617,381), maintaining a 50:50 positive versus negative ratio. For ICL models, GPT-4 achieved best accuracy scores of 0.916, 0.766 and 0.874 for tasks 1-3 respectively. In a direct comparison, ML (trained on ~260,000 triples) outperformed ICL in accuracy across all tasks. (accuracy differences: +.11, +.22 and +.17). Fine-tuned PubmedBERT performed similarly to leading ML models in tasks 1 & 2 (F1 differences: -.014 and +.002), but worse in task 3 (-.048). Simulations revealed performance declines in both ML and FT models with smaller and higher imbalanced training data. where ICL (particularly GPT-4) excelled in tasks 1 & 3. GPT-4 excelled in tasks 1 and 3 with less than 6,000 triples, surpassing ML/FT. ICL underperformed ML/FT in task 2.ICL-augmented foundation models can be good assistants for knowledge curation with correct prompting, however, not making ML and FT paradigms obsolete. The latter two require task-specific data to beat ICL. In such cases, ML relies on small pretrained embeddings, minimizing computational demands.

No More Shortcuts: Realizing the Potential of Temporal Self-Supervision. (arXiv:2312.13008v1 [cs.CV])

Authors: Ishan Rajendrakumar Dave, Simon Jenni, Mubarak Shah

Self-supervised approaches for video have shown impressive results in video understanding tasks. However, unlike early works that leverage temporal self-supervision, current state-of-the-art methods primarily rely on tasks from the image domain (e.g., contrastive learning) that do not explicitly promote the learning of temporal features. We identify two factors that limit existing temporal self-supervision: 1) tasks are too simple, resulting in saturated training performance, and 2) we uncover shortcuts based on local appearance statistics that hinder the learning of high-level features. To address these issues, we propose 1) a more challenging reformulation of temporal self-supervision as frame-level (rather than clip-level) recognition tasks and 2) an effective augmentation strategy to mitigate shortcuts. Our model extends a representation of single video frames, pre-trained through contrastive learning, with a transformer that we train through temporal self-supervision. We demonstrate experimentally that our more challenging frame-level task formulations and the removal of shortcuts drastically improve the quality of features learned through temporal self-supervision. The generalization capability of our self-supervised video method is evidenced by its state-of-the-art performance in a wide range of high-level semantic tasks, including video retrieval, action classification, and video attribute recognition (such as object and scene identification), as well as low-level temporal correspondence tasks like video object segmentation and pose tracking. Additionally, we show that the video representations learned through our method exhibit increased robustness to the input perturbations.

Doubly Perturbed Task-Free Continual Learning. (arXiv:2312.13027v1 [cs.LG])

Authors: Byung Hyun Lee, Min-hwan Oh, Se Young Chun

Task-free online continual learning (TF-CL) is a challenging problem where the model incrementally learns tasks without explicit task information. Although training with entire data from the past, present as well as future is considered as the gold standard, naive approaches in TF-CL with the current samples may be conflicted with learning with samples in the future, leading to catastrophic forgetting and poor plasticity. Thus, a proactive consideration of an unseen future sample in TF-CL becomes imperative. Motivated by this intuition, we propose a novel TF-CL framework considering future samples and show that injecting adversarial perturbations on both input data and decision-making is effective. Then, we propose a novel method named Doubly Perturbed Continual Learning (DPCL) to efficiently implement these input and decision-making perturbations. Specifically, for input perturbation, we propose an approximate perturbation method that injects noise into the input data as well as the feature vector and then interpolates the two perturbed samples. For decision-making process perturbation, we devise multiple stochastic classifiers. We also investigate a memory management scheme and learning rate scheduling reflecting our proposed double perturbations. We demonstrate that our proposed method outperforms the state-of-the-art baseline methods by large margins on various TF-CL benchmarks.

A self-attention-based differentially private tabular GAN with high data utility. (arXiv:2312.13031v1 [cs.LG])

Authors: Zijian Li, Zhihui Wang

Generative Adversarial Networks (GANs) have become a ubiquitous technology for data generation, with their prowess in image generation being well-established. However, their application in generating tabular data has been less than ideal. Furthermore, attempting to incorporate differential privacy technology into these frameworks has often resulted in a degradation of data utility. To tackle these challenges, this paper introduces DP-SACTGAN, a novel Conditional Generative Adversarial Network (CGAN) framework for differentially private tabular data generation, aiming to surmount these obstacles. Experimental findings demonstrate that DP-SACTGAN not only accurately models the distribution of the original data but also effectively satisfies the requirements of differential privacy.

NodeMixup: Tackling Under-Reaching for Graph Neural Networks. (arXiv:2312.13032v1 [cs.LG])

Authors: Weigang Lu, Ziyu Guan, Wei Zhao, Long Jin

Graph Neural Networks (GNNs) have become mainstream methods for solving the semi-supervised node classification problem. However, due to the uneven location distribution of labeled nodes in the graph, labeled nodes are only accessible to a small portion of unlabeled nodes, leading to the \emph{under-reaching} issue. In this study, we firstly reveal under-reaching by conducting an empirical investigation on various well-known graphs. Then, we demonstrate that under-reaching results in unsatisfactory distribution alignment between labeled and unlabeled nodes through systematic experimental analysis, significantly degrading GNNs' performance. To tackle under-reaching for GNNs, we propose an architecture-agnostic method dubbed NodeMixup. The fundamental idea is to (1) increase the reachability of labeled nodes by labeled-unlabeled pairs mixup, (2) leverage graph structures via fusing the neighbor connections of intra-class node pairs to improve performance gains of mixup, and (3) use neighbor label distribution similarity incorporating node degrees to determine sampling weights for node mixup. Extensive experiments demonstrate the efficacy of NodeMixup in assisting GNNs in handling under-reaching. The source code is available at \url{https://github.com/WeigangLu/NodeMixup}.

Explainable artificial intelligence approaches for brain-computer interfaces: a review and design space. (arXiv:2312.13033v1 [cs.HC])

Authors: Param Rajpura, Hubert Cecotti, Yogesh Kumar Meena

This review paper provides an integrated perspective of Explainable Artificial Intelligence techniques applied to Brain-Computer Interfaces. BCIs use predictive models to interpret brain signals for various high-stake applications. However, achieving explainability in these complex models is challenging as it compromises accuracy. The field of XAI has emerged to address the need for explainability across various stakeholders, but there is a lack of an integrated perspective in XAI for BCI (XAI4BCI) literature. It is necessary to differentiate key concepts like explainability, interpretability, and understanding in this context and formulate a comprehensive framework. To understand the need of XAI for BCI, we pose six key research questions for a systematic review and meta-analysis, encompassing its purposes, applications, usability, and technical feasibility. We employ the PRISMA methodology -- preferred reporting items for systematic reviews and meta-analyses to review (n=1246) and analyze (n=84) studies published in 2015 and onwards for key insights. The results highlight that current research primarily focuses on interpretability for developers and researchers, aiming to justify outcomes and enhance model performance. We discuss the unique approaches, advantages, and limitations of XAI4BCI from the literature. We draw insights from philosophy, psychology, and social sciences. We propose a design space for XAI4BCI, considering the evolving need to visualize and investigate predictive model outcomes customised for various stakeholders in the BCI development and deployment lifecycle. This paper is the first to focus solely on reviewing XAI4BCI research articles. This systematic review and meta-analysis findings with the proposed design space prompt important discussions on establishing standards for BCI explanations, highlighting current limitations, and guiding the future of XAI in BCI.

1D-CNN Optimization for Non-contact Respiration Pattern Classification. (arXiv:2312.13035v1 [eess.SP])

Authors: Md Zobaer Islam, Gary Yen

In this study, we present a deep learning-based approach for time-series respiration data classification. The dataset contains regular breathing patterns as well as various forms of abnormal breathing, obtained through non-contact incoherent light-wave sensing (LWS) technology. Given the one-dimensional (1D) nature of the data, we employed a 1D convolutional neural network (1D-CNN) for classification purposes. Genetic algorithm was employed to optimize the 1D-CNN architecture to maximize classification accuracy. Addressing the computational complexity associated with training the 1D-CNN across multiple generations, we implemented transfer learning from a pre-trained model. This approach significantly reduced the computational time required for training, thereby enhancing the efficiency of the optimization process. This study contributes valuable insights into the potential applications of deep learning methodologies for enhancing respiratory anomaly detection through precise and efficient respiration classification.

AutoXPCR: Automated Multi-Objective Model Selection for Time Series Forecasting. (arXiv:2312.13038v1 [cs.LG])

Authors: Raphael Fischer, Amal Saadallah

Automated machine learning (AutoML) streamlines the creation of ML models. While most methods select the "best" model based on predictive quality, it's crucial to acknowledge other aspects, such as interpretability and resource consumption. This holds particular importance in the context of deep neural networks (DNNs), as these models are often perceived as computationally intensive black boxes. In the challenging domain of time series forecasting, DNNs achieve stunning results, but specialized approaches for automatically selecting models are scarce. In this paper, we propose AutoXPCR - a novel method for automated and explainable multi-objective model selection. Our approach leverages meta-learning to estimate any model's performance along PCR criteria, which encompass (P)redictive error, (C)omplexity, and (R)esource demand. Explainability is addressed on multiple levels, as our interactive framework can prioritize less complex models and provide by-product explanations of recommendations. We demonstrate practical feasibility by deploying AutoXPCR on over 1000 configurations across 114 data sets from various domains. Our method clearly outperforms other model selection approaches - on average, it only requires 20% of computation costs for recommending models with 90% of the best-possible quality.

Continuous-time Graph Representation with Sequential Survival Process. (arXiv:2312.13068v1 [cs.LG])

Authors: Abdulkadir Celikkanat, Nikolaos Nakis, Morten Mørup

Over the past two decades, there has been a tremendous increase in the growth of representation learning methods for graphs, with numerous applications across various fields, including bioinformatics, chemistry, and the social sciences. However, current dynamic network approaches focus on discrete-time networks or treat links in continuous-time networks as instantaneous events. Therefore, these approaches have limitations in capturing the persistence or absence of links that continuously emerge and disappear over time for particular durations. To address this, we propose a novel stochastic process relying on survival functions to model the durations of links and their absences over time. This forms a generic new likelihood specification explicitly accounting for intermittent edge-persistent networks, namely GraSSP: Graph Representation with Sequential Survival Process. We apply the developed framework to a recent continuous time dynamic latent distance model characterizing network dynamics in terms of a sequence of piecewise linear movements of nodes in latent space. We quantitatively assess the developed framework in various downstream tasks, such as link prediction and network completion, demonstrating that the developed modeling framework accounting for link persistence and absence well tracks the intrinsic trajectories of nodes in a latent space and captures the underlying characteristics of evolving network structure.

Pyreal: A Framework for Interpretable ML Explanations. (arXiv:2312.13084v1 [cs.LG])

Authors: Alexandra Zytek, Wei-En Wang, Dongyu Liu, Laure Berti-Equille, Kalyan Veeramachaneni

Users in many domains use machine learning (ML) predictions to help them make decisions. Effective ML-based decision-making often requires explanations of ML models and their predictions. While there are many algorithms that explain models, generating explanations in a format that is comprehensible and useful to decision-makers is a nontrivial task that can require extensive development overhead. We developed Pyreal, a highly extensible system with a corresponding Python implementation for generating a variety of interpretable ML explanations. Pyreal converts data and explanations between the feature spaces expected by the model, relevant explanation algorithms, and human users, allowing users to generate interpretable explanations in a low-code manner. Our studies demonstrate that Pyreal generates more useful explanations than existing systems while remaining both easy-to-use and efficient.

MoSAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable Shading. (arXiv:2312.13091v1 [cs.CV])

Authors: Abdallah Dib, Luiz Gustavo Hafemann, Emeline Got, Trevor Anderson, Amin Fadaeinejad, Rafael M. O. Cruz, Marc-Andre Carbonneau

Reconstructing an avatar from a portrait image has many applications in multimedia, but remains a challenging research problem. Extracting reflectance maps and geometry from one image is ill-posed: recovering geometry is a one-to-many mapping problem and reflectance and light are difficult to disentangle. Accurate geometry and reflectance can be captured under the controlled conditions of a light stage, but it is costly to acquire large datasets in this fashion. Moreover, training solely with this type of data leads to poor generalization with in-the-wild images. This motivates the introduction of MoSAR, a method for 3D avatar generation from monocular images. We propose a semi-supervised training scheme that improves generalization by learning from both light stage and in-the-wild datasets. This is achieved using a novel differentiable shading formulation. We show that our approach effectively disentangles the intrinsic face parameters, producing relightable avatars. As a result, MoSAR estimates a richer set of skin reflectance maps, and generates more realistic avatars than existing state-of-the-art methods. We also introduce a new dataset, named FFHQ-UV-Intrinsics, the first public dataset providing intrisic face attributes at scale (diffuse, specular, ambient occlusion and translucency maps) for a total of 10k subjects. The project website and the dataset are available on the following link: https://ubisoftlaforge.github.io/character/mosar

Pre-training of Molecular GNNs as Conditional Boltzmann Generator. (arXiv:2312.13110v1 [cs.LG])

Authors: Daiki Koge, Naoaki Ono, Shigehiko Kanaya

Learning representations of molecular structures using deep learning is a fundamental problem in molecular property prediction tasks. Molecules inherently exist in the real world as three-dimensional structures; furthermore, they are not static but in continuous motion in the 3D Euclidean space, forming a potential energy surface. Therefore, it is desirable to generate multiple conformations in advance and extract molecular representations using a 4D-QSAR model that incorporates multiple conformations. However, this approach is impractical for drug and material discovery tasks because of the computational cost of obtaining multiple conformations. To address this issue, we propose a pre-training method for molecular GNNs using an existing dataset of molecular conformations to generate a latent vector universal to multiple conformations from a 2D molecular graph. Our method, called Boltzmann GNN, is formulated by maximizing the conditional marginal likelihood of a conditional generative model for conformations generation. We show that our model has a better prediction performance for molecular properties than existing pre-training methods using molecular graphs and three-dimensional molecular structures.

LRS: Enhancing Adversarial Transferability through Lipschitz Regularized Surrogate. (arXiv:2312.13118v1 [cs.LG])

Authors: Tao Wu, Tie Luo, Donald C. Wunsch

The transferability of adversarial examples is of central importance to transfer-based black-box adversarial attacks. Previous works for generating transferable adversarial examples focus on attacking \emph{given} pretrained surrogate models while the connections between surrogate models and adversarial trasferability have been overlooked. In this paper, we propose {\em Lipschitz Regularized Surrogate} (LRS) for transfer-based black-box attacks, a novel approach that transforms surrogate models towards favorable adversarial transferability. Using such transformed surrogate models, any existing transfer-based black-box attack can run without any change, yet achieving much better performance. Specifically, we impose Lipschitz regularization on the loss landscape of surrogate models to enable a smoother and more controlled optimization process for generating more transferable adversarial examples. In addition, this paper also sheds light on the connection between the inner properties of surrogate models and adversarial transferability, where three factors are identified: smaller local Lipschitz constant, smoother loss landscape, and stronger adversarial robustness. We evaluate our proposed LRS approach by attacking state-of-the-art standard deep neural networks and defense models. The results demonstrate significant improvement on the attack success rates and transferability. Our code is available at https://github.com/TrustAIoT/LRS.

Prometheus: Infrastructure Security Posture Analysis with AI-generated Attack Graphs. (arXiv:2312.13119v1 [cs.CR])

Authors: Xin Jin, Charalampos Katsis, Fan Sang, Jiahao Sun, Elisa Bertino, Ramana Rao Kompella, Ashish Kundu

The rampant occurrence of cybersecurity breaches imposes substantial limitations on the progress of network infrastructures, leading to compromised data, financial losses, potential harm to individuals, and disruptions in essential services. The current security landscape demands the urgent development of a holistic security assessment solution that encompasses vulnerability analysis and investigates the potential exploitation of these vulnerabilities as attack paths. In this paper, we propose Prometheus, an advanced system designed to provide a detailed analysis of the security posture of computing infrastructures. Using user-provided information, such as device details and software versions, Prometheus performs a comprehensive security assessment. This assessment includes identifying associated vulnerabilities and constructing potential attack graphs that adversaries can exploit. Furthermore, Prometheus evaluates the exploitability of these attack paths and quantifies the overall security posture through a scoring mechanism. The system takes a holistic approach by analyzing security layers encompassing hardware, system, network, and cryptography. Furthermore, Prometheus delves into the interconnections between these layers, exploring how vulnerabilities in one layer can be leveraged to exploit vulnerabilities in others. In this paper, we present the end-to-end pipeline implemented in Prometheus, showcasing the systematic approach adopted for conducting this thorough security analysis.

Distribution-Dependent Rates for Multi-Distribution Learning. (arXiv:2312.13130v1 [stat.ML])

Authors: Rafael Hanashiro, Patrick Jaillet

To address the needs of modeling uncertainty in sensitive machine learning applications, the setup of distributionally robust optimization (DRO) seeks good performance uniformly across a variety of tasks. The recent multi-distribution learning (MDL) framework tackles this objective in a dynamic interaction with the environment, where the learner has sampling access to each target distribution. Drawing inspiration from the field of pure-exploration multi-armed bandits, we provide distribution-dependent guarantees in the MDL regime, that scale with suboptimality gaps and result in superior dependence on the sample size when compared to the existing distribution-independent analyses. We investigate two non-adaptive strategies, uniform and non-uniform exploration, and present non-asymptotic regret bounds using novel tools from empirical process theory. Furthermore, we devise an adaptive optimistic algorithm, LCB-DR, that showcases enhanced dependence on the gaps, mirroring the contrast between uniform and optimistic allocation in the multi-armed bandit literature.

Scaling Compute Is Not All You Need for Adversarial Robustness. (arXiv:2312.13131v1 [cs.LG])

Authors: Edoardo Debenedetti, Zishen Wan, Maksym Andriushchenko, Vikash Sehwag, Kshitij Bhardwaj, Bhavya Kailkhura

The last six years have witnessed significant progress in adversarially robust deep learning. As evidenced by the CIFAR-10 dataset category in RobustBench benchmark, the accuracy under $\ell_\infty$ adversarial perturbations improved from 44\% in \citet{Madry2018Towards} to 71\% in \citet{peng2023robust}. Although impressive, existing state-of-the-art is still far from satisfactory. It is further observed that best-performing models are often very large models adversarially trained by industrial labs with significant computational budgets. In this paper, we aim to understand: ``how much longer can computing power drive adversarial robustness advances?" To answer this question, we derive \emph{scaling laws for adversarial robustness} which can be extrapolated in the future to provide an estimate of how much cost we would need to pay to reach a desired level of robustness. We show that increasing the FLOPs needed for adversarial training does not bring as much advantage as it does for standard training in terms of performance improvements. Moreover, we find that some of the top-performing techniques are difficult to exactly reproduce, suggesting that they are not robust enough for minor changes in the training setup. Our analysis also uncovers potentially worthwhile directions to pursue in future research. Finally, we make our benchmarking framework (built on top of \texttt{timm}~\citep{rw2019timm}) publicly available to facilitate future analysis in efficient robust deep learning.

Molecular Hypergraph Neural Networks. (arXiv:2312.13136v1 [physics.chem-ph])

Authors: Junwu Chen, Philippe Schwaller

Graph neural networks (GNNs) have demonstrated promising performance across various chemistry-related tasks. However, conventional graphs only model the pairwise connectivity in molecules, failing to adequately represent higher-order connections like multi-center bonds and conjugated structures. To tackle this challenge, we introduce molecular hypergraphs and propose Molecular Hypergraph Neural Networks (MHNN) to predict the optoelectronic properties of organic semiconductors, where hyperedges represent conjugated structures. A general algorithm is designed for irregular high-order connections, which can efficiently operate on molecular hypergraphs with hyperedges of various orders. The results show that MHNN outperforms all baseline models on most tasks of OPV, OCELOTv1 and PCQM4Mv2 datasets. Notably, MHNN achieves this without any 3D geometric information, surpassing the baseline model that utilizes atom positions. Moreover, MHNN achieves better performance than pretrained GNNs under limited training data, underscoring its excellent data efficiency. This work provides a new strategy for more general molecular representations and property prediction tasks related to high-order connections.

Augment on Manifold: Mixup Regularization with UMAP. (arXiv:2312.13141v1 [cs.LG])

Authors: Yousef El-Laham, Elizabeth Fons, Dillon Daudert, Svitlana Vyetrenko

Data augmentation techniques play an important role in enhancing the performance of deep learning models. Despite their proven benefits in computer vision tasks, their application in the other domains remains limited. This paper proposes a Mixup regularization scheme, referred to as UMAP Mixup, designed for "on-manifold" automated data augmentation for deep learning predictive models. The proposed approach ensures that the Mixup operations result in synthesized samples that lie on the data manifold of the features and labels by utilizing a dimensionality reduction technique known as uniform manifold approximation and projection. Evaluations across diverse regression tasks show that UMAP Mixup is competitive with or outperforms other Mixup variants, show promise for its potential as an effective tool for enhancing the generalization performance of deep learning models.

Underwater Acoustic Signal Recognition Based on Salient Features. (arXiv:2312.13143v1 [cs.SD])

Authors: Minghao Chen

With the rapid advancement of technology, the recognition of underwater acoustic signals in complex environments has become increasingly crucial. Currently, mainstream underwater acoustic signal recognition relies primarily on time-frequency analysis to extract spectral features, finding widespread applications in the field. However, existing recognition methods heavily depend on expert systems, facing limitations such as restricted knowledge bases and challenges in handling complex relationships. These limitations stem from the complexity and maintenance difficulties associated with rules or inference engines. Recognizing the potential advantages of deep learning in handling intricate relationships, this paper proposes a method utilizing neural networks for underwater acoustic signal recognition. The proposed approach involves continual learning of features extracted from spectra for the classification of underwater acoustic signals. Deep learning models can automatically learn abstract features from data and continually adjust weights during training to enhance classification performance.

Neural Stochastic Differential Equations with Change Points: A Generative Adversarial Approach. (arXiv:2312.13152v1 [cs.LG])

Authors: Zhongchang Sun, Yousef El-Laham, Svitlana Vyetrenko

Stochastic differential equations (SDEs) have been widely used to model real world random phenomena. Existing works mainly focus on the case where the time series is modeled by a single SDE, which might be restrictive for modeling time series with distributional shift. In this work, we propose a change point detection algorithm for time series modeled as neural SDEs. Given a time series dataset, the proposed method jointly learns the unknown change points and the parameters of distinct neural SDE models corresponding to each change point. Specifically, the SDEs are learned under the framework of generative adversarial networks (GANs) and the change points are detected based on the output of the GAN discriminator in a forward pass. At each step of the proposed algorithm, the change points and the SDE model parameters are updated in an alternating fashion. Numerical results on both synthetic and real datasets are provided to validate the performance of our algorithm in comparison to classical change point detection benchmarks, standard GAN-based neural SDEs, and other state-of-the-art deep generative models for time series data.

Gappy local conformal auto-encoders for heterogeneous data fusion: in praise of rigidity. (arXiv:2312.13155v1 [cs.LG])

Authors: Erez Peterfreund, Iryna Burak, Ofir Lindenbaum, Jim Gimlett, Felix Dietrich, Ronald R. Coifman, Ioannis G. Kevrekidis

Fusing measurements from multiple, heterogeneous, partial sources, observing a common object or process, poses challenges due to the increasing availability of numbers and types of sensors. In this work we propose, implement and validate an end-to-end computational pipeline in the form of a multiple-auto-encoder neural network architecture for this task. The inputs to the pipeline are several sets of partial observations, and the result is a globally consistent latent space, harmonizing (rigidifying, fusing) all measurements. The key enabler is the availability of multiple slightly perturbed measurements of each instance:, local measurement, "bursts", that allows us to estimate the local distortion induced by each instrument. We demonstrate the approach in a sequence of examples, starting with simple two-dimensional data sets and proceeding to a Wi-Fi localization problem and to the solution of a "dynamical puzzle" arising in spatio-temporal observations of the solutions of Partial Differential Equations.

Learning Fair Policies for Multi-stage Selection Problems from Observational Data. (arXiv:2312.13173v1 [cs.LG])

Authors: Zhuangzhuang Jia, Grani A. Hanasusanto, Phebe Vayanos, Weijun Xie

We consider the problem of learning fair policies for multi-stage selection problems from observational data. This problem arises in several high-stakes domains such as company hiring, loan approval, or bail decisions where outcomes (e.g., career success, loan repayment, recidivism) are only observed for those selected. We propose a multi-stage framework that can be augmented with various fairness constraints, such as demographic parity or equal opportunity. This problem is a highly intractable infinite chance-constrained program involving the unknown joint distribution of covariates and outcomes. Motivated by the potential impact of selection decisions on people's lives and livelihoods, we propose to focus on interpretable linear selection rules. Leveraging tools from causal inference and sample average approximation, we obtain an asymptotically consistent solution to this selection problem by solving a mixed binary conic optimization problem, which can be solved using standard off-the-shelf solvers. We conduct extensive computational experiments on a variety of datasets adapted from the UCI repository on which we show that our proposed approaches can achieve an 11.6% improvement in precision and a 38% reduction in the measure of unfairness compared to the existing selection policy.

Measurement-based quantum computation from Clifford quantum cellular automata. (arXiv:2312.13185v1 [quant-ph])

Authors: Hendrik Poulsen Nautrup, Hans J. Briegel

Measurement-based quantum computation (MBQC) is a paradigm for quantum computation where computation is driven by local measurements on a suitably entangled resource state. In this work we show that MBQC is related to a model of quantum computation based on Clifford quantum cellular automata (CQCA). Specifically, we show that certain MBQCs can be directly constructed from CQCAs which yields a simple and intuitive circuit model representation of MBQC in terms of quantum computation based on CQCA. We apply this description to construct various MBQC-based Ans\"atze for parameterized quantum circuits, demonstrating that the different Ans\"atze may lead to significantly different performances on different learning tasks. In this way, MBQC yields a family of Hardware-efficient Ans\"atze that may be adapted to specific problem settings and is particularly well suited for architectures with translationally invariant gates such as neutral atoms.

A 3D super-resolution of wind fields via physics-informed pixel-wise self-attention generative adversarial network. (arXiv:2312.13212v1 [physics.ao-ph])

Authors: Takuya Kurihana, Kyongmin Yeo, Daniela Szwarcman, Bruce Elmegreen, Karthik Mukkavilli, Johannes Schmude, Levente Klein

To mitigate global warming, greenhouse gas sources need to be resolved at a high spatial resolution and monitored in time to ensure the reduction and ultimately elimination of the pollution source. However, the complexity of computation in resolving high-resolution wind fields left the simulations impractical to test different time lengths and model configurations. This study presents a preliminary development of a physics-informed super-resolution (SR) generative adversarial network (GAN) that super-resolves the three-dimensional (3D) low-resolution wind fields by upscaling x9 times. We develop a pixel-wise self-attention (PWA) module that learns 3D weather dynamics via a self-attention computation followed by a 2D convolution. We also employ a loss term that regularizes the self-attention map during pretraining, capturing the vertical convection process from input wind data. The new PWA SR-GAN shows the high-fidelity super-resolved 3D wind data, learns a wind structure at the high-frequency domain, and reduces the computational cost of a high-resolution wind simulation by x89.7 times.

FiFAR: A Fraud Detection Dataset for Learning to Defer. (arXiv:2312.13218v1 [cs.LG])

Authors: Jean V. Alves, Diogo Leitão, Sérgio Jesus, Marco O. P. Sampaio, Pedro Saleiro, Mário A. T. Figueiredo, Pedro Bizarro

Public dataset limitations have significantly hindered the development and benchmarking of learning to defer (L2D) algorithms, which aim to optimally combine human and AI capabilities in hybrid decision-making systems. In such systems, human availability and domain-specific concerns introduce difficulties, while obtaining human predictions for training and evaluation is costly. Financial fraud detection is a high-stakes setting where algorithms and human experts often work in tandem; however, there are no publicly available datasets for L2D concerning this important application of human-AI teaming. To fill this gap in L2D research, we introduce the Financial Fraud Alert Review Dataset (FiFAR), a synthetic bank account fraud detection dataset, containing the predictions of a team of 50 highly complex and varied synthetic fraud analysts, with varied bias and feature dependence. We also provide a realistic definition of human work capacity constraints, an aspect of L2D systems that is often overlooked, allowing for extensive testing of assignment systems under real-world conditions. We use our dataset to develop a capacity-aware L2D method and rejection learning approach under realistic data availability conditions, and benchmark these baselines under an array of 300 distinct testing scenarios. We believe that this dataset will serve as a pivotal instrument in facilitating a systematic, rigorous, reproducible, and transparent evaluation and comparison of L2D methods, thereby fostering the development of more synergistic human-AI collaboration in decision-making systems. The public dataset and detailed synthetic expert information are available at: https://github.com/feedzai/fifar-dataset

Finding Subgroups with Significant Treatment Effects. (arXiv:2103.07066v2 [econ.EM] UPDATED)

Authors: Jann Spiess, Vasilis Syrgkanis, Victor Yaneng Wang

Researchers often run resource-intensive randomized controlled trials (RCTs) to estimate the causal effects of interventions on outcomes of interest. Yet these outcomes are often noisy, and estimated overall effects can be small or imprecise. Nevertheless, we may still be able to produce reliable evidence of the efficacy of an intervention by finding subgroups with significant effects. In this paper, we propose a machine-learning method that is specifically optimized for finding such subgroups in noisy data. Unlike available methods for personalized treatment assignment, our tool is fundamentally designed to take significance testing into account: it produces a subgroup that is chosen to maximize the probability of obtaining a statistically significant positive treatment effect. We provide a computationally efficient implementation using decision trees and demonstrate its gain over selecting subgroups based on positive (estimated) treatment effects. Compared to standard tree-based regression and classification tools, this approach tends to yield higher power in detecting subgroups affected by the treatment.

The Power of Contrast for Feature Learning: A Theoretical Analysis. (arXiv:2110.02473v4 [cs.LG] UPDATED)

Authors: Wenlong Ji, Zhun Deng, Ryumei Nakada, James Zou, Linjun Zhang

Contrastive learning has achieved state-of-the-art performance in various self-supervised learning tasks and even outperforms its supervised counterpart. Despite its empirical success, theoretical understanding of the superiority of contrastive learning is still limited. In this paper, under linear representation settings, (i) we provably show that contrastive learning outperforms the standard autoencoders and generative adversarial networks, two classical generative unsupervised learning methods, for both feature recovery and in-domain downstream tasks; (ii) we also illustrate the impact of labeled data in supervised contrastive learning. This provides theoretical support for recent findings that contrastive learning with labels improves the performance of learned representations in the in-domain downstream task, but it can harm the performance in transfer learning. We verify our theory with numerical experiments.

Functional Mixtures-of-Experts. (arXiv:2202.02249v2 [stat.ME] UPDATED)

Authors: Faïcel Chamroukhi, Nhat Thien Pham, Van Hà Hoang, Geoffrey J. McLachlan

We consider the statistical analysis of heterogeneous data for prediction in situations where the observations include functions, typically time series. We extend the modeling with Mixtures-of-Experts (ME), as a framework of choice in modeling heterogeneity in data for prediction with vectorial observations, to this functional data analysis context. We first present a new family of ME models, named functional ME (FME) in which the predictors are potentially noisy observations, from entire functions. Furthermore, the data generating process of the predictor and the real response, is governed by a hidden discrete variable representing an unknown partition. Second, by imposing sparsity on derivatives of the underlying functional parameters via Lasso-like regularizations, we provide sparse and interpretable functional representations of the FME models called iFME. We develop dedicated expectation--maximization algorithms for Lasso-like (EM-Lasso) regularized maximum-likelihood parameter estimation strategies to fit the models. The proposed models and algorithms are studied in simulated scenarios and in applications to two real data sets, and the obtained results demonstrate their performance in accurately capturing complex nonlinear relationships and in clustering the heterogeneous regression data.

Analysis of Dual-Based PID Controllers through Convolutional Mirror Descent. (arXiv:2202.06152v4 [math.OC] UPDATED)

Authors: Santiago R. Balseiro, Haihao Lu, Vahab Mirrokni, Balasubramanian Sivan

Dual-based proportional-integral-derivative (PID) controllers are often employed in practice to solve online allocation problems with global constraints, such as budget pacing in online advertising. However, controllers are used in a heuristic fashion and come with no provable guarantees on their performance. This paper provides the first regret bounds on the performance of dual-based PID controllers for online allocation problems. We do so by first establishing a fundamental connection between dual-based PID controllers and a new first-order algorithm for online convex optimization called \emph{Convolutional Mirror Descent} (CMD), which updates iterates based on a weighted moving average of past gradients. CMD recovers, in a special case, online mirror descent with momentum and optimistic mirror descent. We establish sufficient conditions under which CMD attains low regret for general online convex optimization problems with adversarial inputs. We leverage this new result to give the first regret bound for dual-based PID controllers for online allocation problems. As a byproduct of our proofs, we provide the first regret bound for CMD for non-smooth convex optimization, which might be of independent interest.

Attribution-based Explanations that Provide Recourse Cannot be Robust. (arXiv:2205.15834v3 [stat.ML] UPDATED)

Authors: Hidde Fokkema, Rianne de Heide, Tim van Erven

Different users of machine learning methods require different explanations, depending on their goals. To make machine learning accountable to society, one important goal is to get actionable options for recourse, which allow an affected user to change the decision $f(x)$ of a machine learning system by making limited changes to its input $x$. We formalize this by providing a general definition of recourse sensitivity, which needs to be instantiated with a utility function that describes which changes to the decisions are relevant to the user. This definition applies to local attribution methods, which attribute an importance weight to each input feature. It is often argued that such local attributions should be robust, in the sense that a small change in the input $x$ that is being explained, should not cause a large change in the feature weights. However, we prove formally that it is in general impossible for any single attribution method to be both recourse sensitive and robust at the same time. It follows that there must always exist counterexamples to at least one of these properties. We provide such counterexamples for several popular attribution methods, including LIME, SHAP, Integrated Gradients and SmoothGrad. Our results also cover counterfactual explanations, which may be viewed as attributions that describe a perturbation of $x$. We further discuss possible ways to work around our impossibility result, for instance by allowing the output to consist of sets with multiple attributions, and we provide sufficient conditions for specific classes of continuous functions to be recourse sensitive. Finally, we strengthen our impossibility result for the restricted case where users are only able to change a single attribute of $x$, by providing an exact characterization of the functions $f$ to which impossibility applies.

On the Number of Regions of Piecewise Linear Neural Networks. (arXiv:2206.08615v2 [cs.LG] UPDATED)

Authors: Alexis Goujon, Arian Etemadi, Michael Unser

Many feedforward neural networks (NNs) generate continuous and piecewise-linear (CPWL) mappings. Specifically, they partition the input domain into regions on which the mapping is affine. The number of these so-called linear regions offers a natural metric to characterize the expressiveness of CPWL NNs. The precise determination of this quantity is often out of reach in practice, and bounds have been proposed for specific architectures, including for ReLU and Maxout NNs. In this work, we generalize these bounds to NNs with arbitrary and possibly multivariate CPWL activation functions. We first provide upper and lower bounds on the maximal number of linear regions of a CPWL NN given its depth, width, and the number of linear regions of its activation functions. Our results rely on the combinatorial structure of convex partitions and confirm the distinctive role of depth which, on its own, is able to exponentially increase the number of regions. We then introduce a complementary stochastic framework to estimate the average number of linear regions produced by a CPWL NN. Under reasonable assumptions, the expected density of linear regions along any 1D path is bounded by the product of depth, width, and a measure of activation complexity (up to a scaling factor). This yields an identical role to the three sources of expressiveness: no exponential growth with depth is observed anymore.

Learning Lattice Quantum Field Theories with Equivariant Continuous Flows. (arXiv:2207.00283v3 [hep-lat] UPDATED)

Authors: Mathis Gerdes, Pim de Haan, Corrado Rainone, Roberto Bondesan, Miranda C. N. Cheng

We propose a novel machine learning method for sampling from the high-dimensional probability distributions of Lattice Field Theories, which is based on a single neural ODE layer and incorporates the full symmetries of the problem. We test our model on the $\phi^4$ theory, showing that it systematically outperforms previously proposed flow-based methods in sampling efficiency, and the improvement is especially pronounced for larger lattices. Furthermore, we demonstrate that our model can learn a continuous family of theories at once, and the results of learning can be transferred to larger lattices. Such generalizations further accentuate the advantages of machine learning methods.

Multipoint-BAX: A New Approach for Efficiently Tuning Particle Accelerator Emittance via Virtual Objectives. (arXiv:2209.04587v5 [physics.acc-ph] UPDATED)

Authors: Sara A. Miskovich, Willie Neiswanger, William Colocho, Claudio Emma, Jacqueline Garrahan, Timothy Maxwell, Christopher Mayes, Stefano Ermon, Auralee Edelen, Daniel Ratner

Although beam emittance is critical for the performance of high-brightness accelerators, optimization is often time limited as emittance calculations, commonly done via quadrupole scans, are typically slow. Such calculations are a type of $\textit{multipoint query}$, i.e. each query requires multiple secondary measurements. Traditional black-box optimizers such as Bayesian optimization are slow and inefficient when dealing with such objectives as they must acquire the full series of measurements, but return only the emittance, with each query. We propose a new information-theoretic algorithm, Multipoint-BAX, for black-box optimization on multipoint queries, which queries and models individual beam-size measurements using techniques from Bayesian Algorithm Execution (BAX). Our method avoids the slow multipoint query on the accelerator by acquiring points through a $\textit{virtual objective}$, i.e. calculating the emittance objective from a fast learned model rather than directly from the accelerator. We use Multipoint-BAX to minimize emittance at the Linac Coherent Light Source (LCLS) and the Facility for Advanced Accelerator Experimental Tests II (FACET-II). In simulation, our method is 20$\times$ faster and more robust to noise compared to existing methods. In live tests, it matched the hand-tuned emittance at FACET-II and achieved a 24% lower emittance than hand-tuning at LCLS. Our method represents a conceptual shift for optimizing multipoint queries, and we anticipate that it can be readily adapted to similar problems in particle accelerators and other scientific instruments.

Automatic and effective discovery of quantum kernels. (arXiv:2209.11144v2 [quant-ph] UPDATED)

Authors: Massimiliano Incudini, Daniele Lizzio Bosco, Francesco Martini, Michele Grossi, Giuseppe Serra, Alessandra Di Pierro

Quantum computing can empower machine learning models by enabling kernel machines to leverage quantum kernels for representing similarity measures between data. Quantum kernels are able to capture relationships in the data that are not efficiently computable on classical devices. However, there is no straightforward method to engineer the optimal quantum kernel for each specific use case. While recent literature has focused on exploiting the potential offered by the presence of symmetries in the data to guide the construction of quantum kernels, we adopt here a different approach, which employs optimization techniques, similar to those used in neural architecture search and AutoML, to automatically find an optimal kernel in a heuristic manner. The algorithm we present constructs a quantum circuit implementing the similarity measure as a combinatorial object, which is evaluated based on a cost function and is then iteratively modified using a meta-heuristic optimization technique. The cost function can encode many criteria ensuring favorable statistical properties of the candidate solution, such as the rank of the Dynamical Lie Algebra. Importantly, our approach is independent of the optimization technique employed. The results obtained by testing our approach on a high-energy physics problem demonstrate that, in the best-case scenario, we can either match or improve testing accuracy with respect to the manual design approach, showing the potential of our technique to deliver superior results with reduced effort.

Detecting fake accounts through Generative Adversarial Network in online social media. (arXiv:2210.15657v4 [cs.SI] UPDATED)

Authors: Jinus Bordbar, Mohammadreza Mohammadrezaie, Saman Ardalan, Mohammad Ebrahim Shiri

Online social media is integral to human life, facilitating messaging, information sharing, and confidential communication while preserving privacy. Platforms like Twitter, Instagram, and Facebook exemplify this phenomenon. However, users face challenges due to network anomalies, often stemming from malicious activities such as identity theft for financial gain or harm. This paper proposes a novel method using user similarity measures and the Generative Adversarial Network (GAN) algorithm to identify fake user accounts in the Twitter dataset. Despite the problem's complexity, the method achieves an AUC rate of 80\% in classifying and detecting fake accounts. Notably, the study builds on previous research, highlighting advancements and insights into the evolving landscape of anomaly detection in online social networks.

Differentiable Uncalibrated Imaging. (arXiv:2211.10525v3 [eess.IV] UPDATED)

Authors: Sidharth Gupta, Konik Kothari, Valentin Debarnot, Ivan Dokmanić

We propose a differentiable imaging framework to address uncertainty in measurement coordinates such as sensor locations and projection angles. We formulate the problem as measurement interpolation at unknown nodes supervised through the forward operator. To solve it we apply implicit neural networks, also known as neural fields, which are naturally differentiable with respect to the input coordinates. We also develop differentiable spline interpolators which perform as well as neural networks, require less time to optimize and have well-understood properties. Differentiability is key as it allows us to jointly fit a measurement representation, optimize over the uncertain measurement coordinates, and perform image reconstruction which in turn ensures consistent calibration. We apply our approach to 2D and 3D computed tomography, and show that it produces improved reconstructions compared to baselines that do not account for the lack of calibration. The flexibility of the proposed framework makes it easy to extend to almost arbitrary imaging problems.

SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition. (arXiv:2212.01039v2 [cs.CL] UPDATED)

Authors: Yichong Leng, Xu Tan, Wenjie Liu, Kaitao Song, Rui Wang, Xiang-Yang Li, Tao Qin, Edward Lin, Tie-Yan Liu

Error correction in automatic speech recognition (ASR) aims to correct those incorrect words in sentences generated by ASR models. Since recent ASR models usually have low word error rate (WER), to avoid affecting originally correct tokens, error correction models should only modify incorrect words, and therefore detecting incorrect words is important for error correction. Previous works on error correction either implicitly detect error words through target-source attention or CTC (connectionist temporal classification) loss, or explicitly locate specific deletion/substitution/insertion errors. However, implicit error detection does not provide clear signal about which tokens are incorrect and explicit error detection suffers from low detection accuracy. In this paper, we propose SoftCorrect with a soft error detection mechanism to avoid the limitations of both explicit and implicit error detection. Specifically, we first detect whether a token is correct or not through a probability produced by a dedicatedly designed language model, and then design a constrained CTC loss that only duplicates the detected incorrect tokens to let the decoder focus on the correction of error tokens. Compared with implicit error detection with CTC loss, SoftCorrect provides explicit signal about which words are incorrect and thus does not need to duplicate every token but only incorrect tokens; compared with explicit error detection, SoftCorrect does not detect specific deletion/substitution/insertion errors but just leaves it to CTC loss. Experiments on AISHELL-1 and Aidatatang datasets show that SoftCorrect achieves 26.1% and 9.4% CER reduction respectively, outperforming previous works by a large margin, while still enjoying fast speed of parallel generation.

Fake detection in imbalance dataset by Semi-supervised learning with GAN. (arXiv:2212.01071v5 [cs.LG] UPDATED)

Authors: Jinus Bordbar, Saman Ardalan, Mohammadreza Mohammadrezaie, Zahra Ghasemi

As social media continues to grow rapidly, the prevalence of harassment on these platforms has also increased. This has piqued the interest of researchers in the field of fake detection. Social media data, often forms complex graphs with numerous nodes, posing several challenges. These challenges and limitations include dealing with a significant amount of irrelevant features in matrices and addressing issues such as high data dispersion and an imbalanced class distribution within the dataset. To overcome these challenges and limitations, researchers have employed auto-encoders and a combination of semi-supervised learning with a GAN algorithm, referred to as SGAN. Our proposed method utilizes auto-encoders for feature extraction and incorporates SGAN. By leveraging an unlabeled dataset, the unsupervised layer of SGAN compensates for the limited availability of labeled data, making efficient use of the limited number of labeled instances. Multiple evaluation metrics were employed, including the Confusion Matrix and the ROC curve. The dataset was divided into training and testing sets, with 100 labeled samples for training and 1,000 samples for testing. The novelty of our research lies in applying SGAN to address the issue of imbalanced datasets in fake account detection. By optimizing the use of a smaller number of labeled instances and reducing the need for extensive computational power, our method offers a more efficient solution. Additionally, our study contributes to the field by achieving an 81% accuracy in detecting fake accounts using only 100 labeled samples. This demonstrates the potential of SGAN as a powerful tool for handling minority classes and addressing big data challenges in fake account detection.

Instance-Conditional Timescales of Decay for Non-Stationary Learning. (arXiv:2212.05908v2 [cs.LG] UPDATED)

Authors: Nishant Jain, Pradeep Shenoy

Slow concept drift is a ubiquitous, yet under-studied problem in practical machine learning systems. In such settings, although recent data is more indicative of future data, naively prioritizing recent instances runs the risk of losing valuable information from the past. We propose an optimization-driven approach towards balancing instance importance over large training windows. First, we model instance relevance using a mixture of multiple timescales of decay, allowing us to capture rich temporal trends. Second, we learn an auxiliary scorer model that recovers the appropriate mixture of timescales as a function of the instance itself. Finally, we propose a nested optimization objective for learning the scorer, by which it maximizes forward transfer for the learned model. Experiments on a large real-world dataset of 39M photos over a 9 year period show upto 15% relative gains in accuracy compared to other robust learning baselines. We replicate our gains on two collections of real-world datasets for non-stationary learning, and extend our work to continual learning settings where, too, we beat SOTA methods by large margins.

Dual Accuracy-Quality-Driven Neural Network for Prediction Interval Generation. (arXiv:2212.06370v3 [cs.LG] UPDATED)

Authors: Giorgio Morales, John W. Sheppard

Accurate uncertainty quantification is necessary to enhance the reliability of deep learning models in real-world applications. In the case of regression tasks, prediction intervals (PIs) should be provided along with the deterministic predictions of deep learning models. Such PIs are useful or "high-quality" as long as they are sufficiently narrow and capture most of the probability density. In this paper, we present a method to learn prediction intervals for regression-based neural networks automatically in addition to the conventional target predictions. In particular, we train two companion neural networks: one that uses one output, the target estimate, and another that uses two outputs, the upper and lower bounds of the corresponding PI. Our main contribution is the design of a novel loss function for the PI-generation network that takes into account the output of the target-estimation network and has two optimization objectives: minimizing the mean prediction interval width and ensuring the PI integrity using constraints that maximize the prediction interval probability coverage implicitly. Furthermore, we introduce a self-adaptive coefficient that balances both objectives within the loss function, which alleviates the task of fine-tuning. Experiments using a synthetic dataset, eight benchmark datasets, and a real-world crop yield prediction dataset showed that our method was able to maintain a nominal probability coverage and produce significantly narrower PIs without detriment to its target estimation accuracy when compared to those PIs generated by three state-of-the-art neural-network-based methods. In other words, our method was shown to produce higher-quality PIs.

Non-contact Respiratory Anomaly Detection using Infrared Light-wave Sensing. (arXiv:2301.03713v3 [eess.SP] UPDATED)

Authors: Md Zobaer Islam, Brenden Martin, Carly Gotcher, Tyler Martinez, John F. O'Hara, Sabit Ekin

Human respiratory rate and its pattern convey essential information about the physical and psychological states of the subject. Abnormal breathing can indicate fatal health issues leading to further diagnosis and treatment. Wireless light-wave sensing (LWS) using incoherent infrared light shows promise in safe, discreet, efficient, and non-invasive human breathing monitoring without raising privacy concerns. The respiration monitoring system needs to be trained on different types of breathing patterns to identify breathing anomalies.The system must also validate the collected data as a breathing waveform, discarding any faulty data caused by external interruption, user movement, or system malfunction. To address these needs, this study simulated normal and different types of abnormal respiration using a robot that mimics human breathing patterns. Then, time-series respiration data were collected using infrared light-wave sensing technology. Three machine learning algorithms, decision tree, random forest and XGBoost, were applied to detect breathing anomalies and faulty data. Model performances were evaluated through cross-validation, assessing classification accuracy, precision and recall scores. The random forest model achieved the highest classification accuracy of 96.75% with data collected at a 0.5m distance. In general, ensemble models like random forest and XGBoost performed better than a single model in classifying the data collected at multiple distances from the light-wave sensing setup.

Deep Learning for Time Series Classification and Extrinsic Regression: A Current Survey. (arXiv:2302.02515v2 [cs.LG] UPDATED)

Authors: Navid Mohammadi Foumani, Lynn Miller, Chang Wei Tan, Geoffrey I. Webb, Germain Forestier, Mahsa Salehi

Time Series Classification and Extrinsic Regression are important and challenging machine learning tasks. Deep learning has revolutionized natural language processing and computer vision and holds great promise in other fields such as time series analysis where the relevant features must often be abstracted from the raw data but are not known a priori. This paper surveys the current state of the art in the fast-moving field of deep learning for time series classification and extrinsic regression. We review different network architectures and training methods used for these tasks and discuss the challenges and opportunities when applying deep learning to time series data. We also summarize two critical applications of time series classification and extrinsic regression, human activity recognition and satellite earth observation.

Transformed Low-Rank Parameterization Can Help Robust Generalization for Tensor Neural Networks. (arXiv:2303.00196v3 [cs.LG] UPDATED)

Authors: Andong Wang, Chao Li, Mingyuan Bai, Zhong Jin, Guoxu Zhou, Qibin Zhao

Achieving efficient and robust multi-channel data learning is a challenging task in data science. By exploiting low-rankness in the transformed domain, i.e., transformed low-rankness, tensor Singular Value Decomposition (t-SVD) has achieved extensive success in multi-channel data representation and has recently been extended to function representation such as Neural Networks with t-product layers (t-NNs). However, it still remains unclear how t-SVD theoretically affects the learning behavior of t-NNs. This paper is the first to answer this question by deriving the upper bounds of the generalization error of both standard and adversarially trained t-NNs. It reveals that the t-NNs compressed by exact transformed low-rank parameterization can achieve a sharper adversarial generalization bound. In practice, although t-NNs rarely have exactly transformed low-rank weights, our analysis further shows that by adversarial training with gradient flow (GF), the over-parameterized t-NNs with ReLU activations are trained with implicit regularization towards transformed low-rank parameterization under certain conditions. We also establish adversarial generalization bounds for t-NNs with approximately transformed low-rank weights. Our analysis indicates that the transformed low-rank parameterization can promisingly enhance robust generalization for t-NNs.

Distilling Multi-Level X-vector Knowledge for Small-footprint Speaker Verification. (arXiv:2303.01125v3 [cs.SD] UPDATED)

Authors: Xuechen Liu, Md Sahidullah, Tomi Kinnunen

Even though deep speaker models have demonstrated impressive accuracy in speaker verification tasks, this often comes at the expense of increased model size and computation time, presenting challenges for deployment in resource-constrained environments. Our research focuses on addressing this limitation through the development of small footprint deep speaker embedding extraction using knowledge distillation. While previous work in this domain has concentrated on speaker embedding extraction at the utterance level, our approach involves amalgamating embeddings from different levels of the x-vector model (teacher network) to train a compact student network. The results highlight the significance of frame-level information, with the student models exhibiting a remarkable size reduction of 85%-91% compared to their teacher counterparts, depending on the size of the teacher embeddings. Notably, by concatenating teacher embeddings, we achieve student networks that maintain comparable performance to the teacher while enjoying a substantial 75% reduction in model size. These findings and insights extend to other x-vector variants, underscoring the broad applicability of our approach.

Robust Contrastive Language-Image Pre-training against Data Poisoning and Backdoor Attacks. (arXiv:2303.06854v2 [cs.CV] UPDATED)

Authors: Wenhan Yang, Jingdong Gao, Baharan Mirzasoleiman

Contrastive vision-language representation learning has achieved state-of-the-art performance for zero-shot classification, by learning from millions of image-caption pairs crawled from the internet. However, the massive data that powers large multimodal models such as CLIP, makes them extremely vulnerable to various types of targeted data poisoning and backdoor attacks. Despite this vulnerability, robust contrastive vision-language pre-training against such attacks has remained unaddressed. In this work, we propose ROCLIP, the first effective method for robust pre-training multimodal vision-language models against targeted data poisoning and backdoor attacks. ROCLIP effectively breaks the association between poisoned image-caption pairs by considering a relatively large and varying pool of random captions, and matching every image with the text that is most similar to it in the pool instead of its own caption, every few epochs.It also leverages image and text augmentations to further strengthen the defense and improve the performance of the model. Our extensive experiments show that ROCLIP renders state-of-the-art targeted data poisoning and backdoor attacks ineffective during pre-training CLIP models. In particular, ROCLIP decreases the success rate for targeted data poisoning attacks from 93.75% to 12.5% and that of backdoor attacks down to 0%, while improving the model's linear probe performance by 10% and maintains a similar zero shot performance compared to CLIP. By increasing the frequency of matching, ROCLIP is able to defend strong attacks, which add up to 1% poisoned examples to the data, and successfully maintain a low attack success rate of 12.5%, while trading off the performance on some tasks.

Debiasing Scores and Prompts of 2D Diffusion for View-consistent Text-to-3D Generation. (arXiv:2303.15413v5 [cs.CV] UPDATED)

Authors: Susung Hong, Donghoon Ahn, Seungryong Kim

Existing score-distilling text-to-3D generation techniques, despite their considerable promise, often encounter the view inconsistency problem. One of the most notable issues is the Janus problem, where the most canonical view of an object (\textit{e.g}., face or head) appears in other views. In this work, we explore existing frameworks for score-distilling text-to-3D generation and identify the main causes of the view inconsistency problem -- the embedded bias of 2D diffusion models. Based on these findings, we propose two approaches to debias the score-distillation frameworks for view-consistent text-to-3D generation. Our first approach, called score debiasing, involves cutting off the score estimated by 2D diffusion models and gradually increasing the truncation value throughout the optimization process. Our second approach, called prompt debiasing, identifies conflicting words between user prompts and view prompts using a language model, and adjusts the discrepancy between view prompts and the viewing direction of an object. Our experimental results show that our methods improve the realism of the generated 3D objects by significantly reducing artifacts and achieve a good trade-off between faithfulness to the 2D diffusion models and 3D consistency with little overhead. Our project page is available at~\url{https://susunghong.github.io/Debiased-Score-Distillation-Sampling/}.

Hard Regularization to Prevent Deep Online Clustering Collapse without Data Augmentation. (arXiv:2303.16521v2 [cs.LG] UPDATED)

Authors: Louis Mahon, Thomas Lukasiewicz

Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed. While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster. Successful existing models have employed various techniques to avoid this problem, most of which require data augmentation or which aim to make the average soft assignment across the dataset the same for each cluster. We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments. Using a Bayesian framework, we derive an intuitive optimization objective that can be straightforwardly included in the training of the encoder network. Tested on four image datasets and one human-activity recognition dataset, it consistently avoids collapse more robustly than other methods and leads to more accurate clustering. We also conduct further experiments and analyses justifying our choice to regularize the hard cluster assignments. Code is available at https://github.com/Lou1sM/online_hard_clustering.

RED-PSM: Regularization by Denoising of Partially Separable Models for Dynamic Imaging. (arXiv:2304.03483v3 [eess.IV] UPDATED)

Authors: Berk Iskender, Marc L. Klasky, Yoram Bresler

Dynamic imaging addresses the recovery of a time-varying 2D or 3D object at each time instant using its undersampled measurements. In particular, in the case of dynamic tomography, only a single projection at a single view angle may be available at a time, making the problem severely ill-posed. In this work, we propose an approach, RED-PSM, which combines for the first time two powerful techniques to address this challenging imaging problem. The first, are partially separable models, which have been used to efficiently introduce a low-rank prior for the spatio-temporal object. The second is the recent \textit{Regularization by Denoising (RED)}, which provides a flexible framework to exploit the impressive performance of state-of-the-art image denoising algorithms, for various inverse problems. We propose a partially separable objective with RED and a computationally efficient and scalable optimization scheme with variable splitting and ADMM. Theoretical analysis proves the convergence of our objective to a value corresponding to a stationary point satisfying the first-order optimality conditions. Convergence is accelerated by a particular projection-domain-based initialization. We demonstrate the performance and computational improvements of our proposed RED-PSM with a learned image denoiser by comparing it to a recent deep-prior-based method known as TD-DIP. Although the main focus is on dynamic tomography, we also show performance advantages of RED-PSM in a cardiac dynamic MRI setting.

Exponentially Improved Efficient and Accurate Machine Learning for Quantum Many-body States with Provable Guarantees. (arXiv:2304.04353v2 [quant-ph] UPDATED)

Authors: Yanming Che, Clemens Gneiting, Franco Nori

Solving the ground state and the ground-state properties of quantum many-body systems is generically a hard task for classical algorithms. For a family of Hamiltonians defined on an $m$-dimensional space of physical parameters, the ground state and its properties at an arbitrary parameter configuration can be predicted via a machine learning protocol up to a prescribed prediction error $\varepsilon$, provided that a sample set (of size $N$) of the states can be efficiently prepared and measured. In a recent work [Huang et al., Science 377, eabk3333 (2022)], a rigorous guarantee for such a generalization was proved. Unfortunately, an exponential scaling for the provable sample complexity, $N=m^{{\cal{O}}\left(\frac{1}{\varepsilon}\right)}$, was found to be universal for generic gapped Hamiltonians. This result applies to the situation where the dimension of the parameter space is large while the scaling with the accuracy is not an urgent factor. In this work, we consider an alternative scenario where $m$ is a finite, not necessarily large constant while the scaling with the prediction error becomes the central concern. By jointly preserving the fundamental properties of density matrices in the learning protocol and utilizing the continuity of quantum states in the parameter range of interest, we rigorously obtain a polynomial sample complexity for predicting quantum many-body states and their properties, with respect to the uniform prediction error $\varepsilon$ and the number of qubits $n$. Moreover, if restricted to learning local quantum-state properties, the number of samples with respect to $n$ can be further reduced exponentially. Our results provide theoretical guarantees for efficient and accurate learning of quantum many-body states and their properties, with model-independent applications not restricted to ground states of gapped Hamiltonians.

Data-driven Piecewise Affine Decision Rules for Stochastic Programming with Covariate Information. (arXiv:2304.13646v3 [math.OC] UPDATED)

Authors: Yiyang Zhang, Junyi Liu, Xiaobo Zhao

Focusing on stochastic programming (SP) with covariate information, this paper proposes an empirical risk minimization (ERM) method embedded within a nonconvex piecewise affine decision rule (PADR), which aims to learn the direct mapping from features to optimal decisions. We establish the nonasymptotic consistency result of our PADR-based ERM model for unconstrained problems and asymptotic consistency result for constrained ones. To solve the nonconvex and nondifferentiable ERM problem, we develop an enhanced stochastic majorization-minimization algorithm and establish the asymptotic convergence to (composite strong) directional stationarity along with complexity analysis. We show that the proposed PADR-based ERM method applies to a broad class of nonconvex SP problems with theoretical consistency guarantees and computational tractability. Our numerical study demonstrates the superior performance of PADR-based ERM methods compared to state-of-the-art approaches under various settings, with significantly lower costs, less computation time, and robustness to feature dimensions and nonlinearity of the underlying dependency.

LAVA: Data Valuation without Pre-Specified Learning Algorithms. (arXiv:2305.00054v3 [cs.LG] UPDATED)

Authors: Hoang Anh Just, Feiyang Kang, Jiachen T. Wang, Yi Zeng, Myeongseob Ko, Ming Jin, Ruoxi Jia

Traditionally, data valuation (DV) is posed as a problem of equitably splitting the validation performance of a learning algorithm among the training data. As a result, the calculated data values depend on many design choices of the underlying learning algorithm. However, this dependence is undesirable for many DV use cases, such as setting priorities over different data sources in a data acquisition process and informing pricing mechanisms in a data marketplace. In these scenarios, data needs to be valued before the actual analysis and the choice of the learning algorithm is still undetermined then. Another side-effect of the dependence is that to assess the value of individual points, one needs to re-run the learning algorithm with and without a point, which incurs a large computation burden. This work leapfrogs over the current limits of data valuation methods by introducing a new framework that can value training data in a way that is oblivious to the downstream learning algorithm. Our main results are as follows. (1) We develop a proxy for the validation performance associated with a training set based on a non-conventional class-wise Wasserstein distance between training and validation sets. We show that the distance characterizes the upper bound of the validation performance for any given model under certain Lipschitz conditions. (2) We develop a novel method to value individual data based on the sensitivity analysis of the class-wise Wasserstein distance. Importantly, these values can be directly obtained for free from the output of off-the-shelf optimization solvers when computing the distance. (3) We evaluate our new data valuation framework over various use cases related to detecting low-quality data and show that, surprisingly, the learning-agnostic feature of our framework enables a significant improvement over SOTA performance while being orders of magnitude faster.

PiML Toolbox for Interpretable Machine Learning Model Development and Diagnostics. (arXiv:2305.04214v3 [cs.LG] UPDATED)

Authors: Agus Sudjianto, Aijun Zhang, Zebin Yang, Yu Su, Ningzhou Zeng

PiML (read $\pi$-ML, /`pai`em`el/) is an integrated and open-access Python toolbox for interpretable machine learning model development and model diagnostics. It is designed with machine learning workflows in both low-code and high-code modes, including data pipeline, model training and tuning, model interpretation and explanation, and model diagnostics and comparison. The toolbox supports a growing list of interpretable models (e.g. GAM, GAMI-Net, XGB1/XGB2) with inherent local and/or global interpretability. It also supports model-agnostic explainability tools (e.g. PFI, PDP, LIME, SHAP) and a powerful suite of model-agnostic diagnostics (e.g. weakness, reliability, robustness, resilience, fairness). Integration of PiML models and tests to existing MLOps platforms for quality assurance are enabled by flexible high-code APIs. Furthermore, PiML toolbox comes with a comprehensive user guide and hands-on examples, including the applications for model development and validation in banking. The project is available at https://github.com/SelfExplainML/PiML-Toolbox.

Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion. (arXiv:2305.12554v2 [cs.CV] UPDATED)

Authors: Jiarui Sun, Girish Chowdhary

Stochastic Human Motion Prediction (HMP) aims to predict multiple possible upcoming pose sequences based on past human motion trajectories. Although previous approaches have shown impressive performance, they face several issues, including complex training processes and a tendency to generate predictions that are often inconsistent with the provided history, and sometimes even becoming entirely unreasonable. To overcome these issues, we propose DiffMotion, an end-to-end diffusion-based stochastic HMP framework. DiffMotion's motion predictor is composed of two modules, including (1) a Transformer-based network for initial motion reconstruction from corrupted motion, and (2) a Graph Convolutional Network (GCN) to refine the generated motion considering past observations. Our method, facilitated by this novel Transformer-GCN module design and a proposed variance scheduler, excels in predicting accurate, realistic, and consistent motions, while maintaining an appropriate level of diversity. Our results on benchmark datasets show that DiffMotion significantly outperforms previous methods in terms of both accuracy and fidelity, while demonstrating superior robustness.

MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation. (arXiv:2305.15296v3 [cs.CV] UPDATED)

Authors: Marco Bellagente, Manuel Brack, Hannah Teufel, Felix Friedrich, Björn Deiseroth, Constantin Eichenberg, Andrew Dai, Robert Baldock, Souradeep Nanda, Koen Oostermeijer, Andres Felipe Cruz-Salinas, Patrick Schramowski, Kristian Kersting, Samuel Weinbach

The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users. The intended generation can be expressed in natural language, with the model producing faithful interpretations of text prompts. However, expressing complex or nuanced ideas in text alone can be difficult. To ease image generation, we propose MultiFusion that allows one to express complex and nuanced concepts with arbitrarily interleaved inputs of multiple modalities and languages. MutliFusion leverages pre-trained models and aligns them for integration into a cohesive system, thereby avoiding the need for extensive training from scratch. Our experimental results demonstrate the efficient transfer of capabilities from individual modules to the downstream model. Specifically, the fusion of all independent components allows the image generation module to utilize multilingual, interleaved multimodal inputs despite being trained solely on monomodal data in a single language.

MADiff: Offline Multi-agent Learning with Diffusion Models. (arXiv:2305.17330v3 [cs.AI] UPDATED)

Authors: Zhengbang Zhu, Minghuan Liu, Liyuan Mao, Bingyi Kang, Minkai Xu, Yong Yu, Stefano Ermon, Weinan Zhang

Diffusion model (DM), as a powerful generative model, recently achieved huge success in various scenarios including offline reinforcement learning, where the policy learns to conduct planning by generating trajectory in the online evaluation. However, despite the effectiveness shown for single-agent learning, it remains unclear how DMs can operate in multi-agent problems, where agents can hardly complete teamwork without good coordination by independently modeling each agent's trajectories. In this paper, we propose MADiff, a novel generative multi-agent learning framework to tackle this problem. MADiff is realized with an attention-based diffusion model to model the complex coordination among behaviors of multiple diffusion agents. To the best of our knowledge, MADiff is the first diffusion-based multi-agent offline RL framework, which behaves as both a decentralized policy and a centralized controller. During decentralized executions, MADiff simultaneously performs teammate modeling, and the centralized controller can also be applied in multi-agent trajectory predictions. Our experiments show the superior performance of MADiff compared to baseline algorithms in a wide range of multi-agent learning tasks, which emphasizes the effectiveness of MADiff in modeling complex multi-agent interactions. Our code is available at https://github.com/zbzhu99/madiff.

Self Contrastive Learning for Session-based Recommendation. (arXiv:2306.01266v2 [cs.IR] UPDATED)

Authors: Zhengxiang Shi, Xi Wang, Aldo Lipani

Session-based recommendation, which aims to predict the next item of users' interest as per an existing sequence interaction of items, has attracted growing applications of Contrastive Learning (CL) with improved user and item representations. However, these contrastive objectives: (1) serve a similar role as the cross-entropy loss while ignoring the item representation space optimisation; and (2) commonly require complicated modelling, including complex positive/negative sample constructions and extra data augmentation. In this work, we introduce Self-Contrastive Learning (SCL), which simplifies the application of CL and enhances the performance of state-of-the-art CL-based recommendation techniques. Specifically, SCL is formulated as an objective function that directly promotes a uniform distribution among item representations and efficiently replaces all the existing contrastive objective components of state-of-the-art models. Unlike previous works, SCL eliminates the need for any positive/negative sample construction or data augmentation, leading to enhanced interpretability of the item representation space and facilitating its extensibility to existing recommender systems. Through experiments on three benchmark datasets, we demonstrate that SCL consistently improves the performance of state-of-the-art models with statistical significance. Notably, our experiments show that SCL improves the performance of two best-performing models by 8.2% and 9.5% in P@10 (Precision) and 9.9% and 11.2% in MRR@10 (Mean Reciprocal Rank) on average across different benchmarks. Additionally, our analysis elucidates the improvement in terms of alignment and uniformity of representations, as well as the effectiveness of SCL with a low computational cost.

Covariance Adaptive Best Arm Identification. (arXiv:2306.02630v2 [stat.ML] UPDATED)

Authors: El Mehdi Saad (CentraleSupéléc), Gilles Blanchard (LMO, DATASHAPE), Nicolas Verzelen (MISTEA)

We consider the problem of best arm identification in the multi-armed bandit model, under fixed confidence. Given a confidence input $\delta$, the goal is to identify the arm with the highest mean reward with a probability of at least 1 -- $\delta$, while minimizing the number of arm pulls. While the literature provides solutions to this problem under the assumption of independent arms distributions, we propose a more flexible scenario where arms can be dependent and rewards can be sampled simultaneously. This framework allows the learner to estimate the covariance among the arms distributions, enabling a more efficient identification of the best arm. The relaxed setting we propose is relevant in various applications, such as clinical trials, where similarities between patients or drugs suggest underlying correlations in the outcomes. We introduce new algorithms that adapt to the unknown covariance of the arms and demonstrate through theoretical guarantees that substantial improvement can be achieved over the standard setting. Additionally, we provide new lower bounds for the relaxed setting and present numerical simulations that support their theoretical findings.

Learning to Simulate Tree-Branch Dynamics for Manipulation. (arXiv:2306.03410v3 [cs.RO] UPDATED)

Authors: Jayadeep Jacob, Tirthankar Bandyopadhyay, Jason Williams, Paulo Borges, Fabio Ramos

We propose to use a simulation driven inverse inference approach to model the dynamics of tree branches under manipulation. Learning branch dynamics and gaining the ability to manipulate deformable vegetation can help with occlusion-prone tasks, such as fruit picking in dense foliage, as well as moving overhanging vines and branches for navigation in dense vegetation. The underlying deformable tree geometry is encapsulated as coarse spring abstractions executed on parallel, non-differentiable simulators. The implicit statistical model defined by the simulator, reference trajectories obtained by actively probing the ground truth, and the Bayesian formalism, together guide the spring parameter posterior density estimation. Our non-parametric inference algorithm, based on Stein Variational Gradient Descent, incorporates biologically motivated assumptions into the inference process as neural network driven learnt joint priors; moreover, it leverages the finite difference scheme for gradient approximations. Real and simulated experiments confirm that our model can predict deformation trajectories, quantify the estimation uncertainty, and it can perform better when base-lined against other inference algorithms, particularly from the Monte Carlo family. The model displays strong robustness properties in the presence of heteroscedastic sensor noise; furthermore, it can generalise to unseen grasp locations.

Fair and Robust Estimation of Heterogeneous Treatment Effects for Policy Learning. (arXiv:2306.03625v2 [stat.ME] UPDATED)

Authors: Kwangho Kim, José R. Zubizarreta

We propose a simple and general framework for nonparametric estimation of heterogeneous treatment effects under fairness constraints. Under standard regularity conditions, we show that the resulting estimators possess the double robustness property. We use this framework to characterize the trade-off between fairness and the maximum welfare achievable by the optimal policy. We evaluate the methods in a simulation study and illustrate them in a real-world case study.

Multi-task Bioassay Pre-training for Protein-ligand Binding Affinity Prediction. (arXiv:2306.04886v2 [q-bio.BM] UPDATED)

Authors: Jiaxian Yan, Zhaofeng Ye, Ziyi Yang, Chengqiang Lu, Shengyu Zhang, Qi Liu, Jiezhong Qiu

Protein-ligand binding affinity (PLBA) prediction is the fundamental task in drug discovery. Recently, various deep learning-based models predict binding affinity by incorporating the three-dimensional structure of protein-ligand complexes as input and achieving astounding progress. However, due to the scarcity of high-quality training data, the generalization ability of current models is still limited. In addition, different bioassays use varying affinity measurement labels (i.e., IC50, Ki, Kd), and different experimental conditions inevitably introduce systematic noise, which poses a significant challenge to constructing high-precision affinity prediction models. To address these issues, we (1) propose Multi-task Bioassay Pre-training (MBP), a pre-training framework for structure-based PLBA prediction; (2) construct a pre-training dataset called ChEMBL-Dock with more than 300k experimentally measured affinity labels and about 2.8M docked three-dimensional structures. By introducing multi-task pre-training to treat the prediction of different affinity labels as different tasks and classifying relative rankings between samples from the same bioassay, MBP learns robust and transferrable structural knowledge from our new ChEMBL-Dock dataset with varied and noisy labels. Experiments substantiate the capability of MBP as a general framework that can improve and be tailored to mainstream structure-based PLBA prediction tasks. To the best of our knowledge, MBP is the first affinity pre-training model and shows great potential for future development.

A Graph Dynamics Prior for Relational Inference. (arXiv:2306.06041v2 [cs.LG] UPDATED)

Authors: Liming Pan, Cheng Shi, Ivan Dokmanić

Relational inference aims to identify interactions between parts of a dynamical system from the observed dynamics. Current state-of-the-art methods fit the dynamics with a graph neural network (GNN) on a learnable graph. They use one-step message-passing GNNs -- intuitively the right choice since non-locality of multi-step or spectral GNNs may confuse direct and indirect interactions. But the \textit{effective} interaction graph depends on the sampling rate and it is rarely localized to direct neighbors, leading to poor local optima for the one-step model. In this work, we propose a \textit{graph dynamics prior} (GDP) for relational inference. GDP constructively uses error amplification in non-local polynomial filters to steer the solution to the ground-truth graph. To deal with non-uniqueness, GDP simultaneously fits a ``shallow'' one-step model and a polynomial multi-step model with shared graph topology. Experiments show that GDP reconstructs graphs far more accurately than earlier methods, with remarkable robustness to under-sampling. Since appropriate sampling rates for unknown dynamical systems are not known a priori, this robustness makes GDP suitable for real applications in scientific machine learning. Reproducible code is available at https://github.com/DaDaCheng/GDP.

Differentially Private Over-the-Air Federated Learning Over MIMO Fading Channels. (arXiv:2306.10982v2 [cs.IT] UPDATED)

Authors: Hang Liu, Jia Yan, Ying-Jun Angela Zhang

Federated learning (FL) enables edge devices to collaboratively train machine learning models, with model communication replacing direct data uploading. While over-the-air model aggregation improves communication efficiency, uploading models to an edge server over wireless networks can pose privacy risks. Differential privacy (DP) is a widely used quantitative technique to measure statistical data privacy in FL. Previous research has focused on over-the-air FL with a single-antenna server, leveraging communication noise to enhance user-level DP. This approach achieves the so-called "free DP" by controlling transmit power rather than introducing additional DP-preserving mechanisms at devices, such as adding artificial noise. In this paper, we study differentially private over-the-air FL over a multiple-input multiple-output (MIMO) fading channel. We show that FL model communication with a multiple-antenna server amplifies privacy leakage as the multiple-antenna server employs separate receive combining for model aggregation and information inference. Consequently, relying solely on communication noise, as done in the multiple-input single-output system, cannot meet high privacy requirements, and a device-side privacy-preserving mechanism is necessary for optimal DP design. We analyze the learning convergence and privacy loss of the studied FL system and propose a transceiver design algorithm based on alternating optimization. Numerical results demonstrate that the proposed method achieves a better privacy-learning trade-off compared to prior work.

Temporal Conditioning Spiking Latent Variable Models of the Neural Response to Natural Visual Scenes. (arXiv:2306.12045v6 [q-bio.NC] UPDATED)

Authors: Gehua Ma, Runhao Jiang, Rui Yan, Huajin Tang

Developing computational models of neural response is crucial for understanding sensory processing and neural computations. Current state-of-the-art neural network methods use temporal filters to handle temporal dependencies, resulting in an unrealistic and inflexible processing paradigm. Meanwhile, these methods target trial-averaged firing rates and fail to capture important features in spike trains. This work presents the temporal conditioning spiking latent variable models (TeCoS-LVM) to simulate the neural response to natural visual stimuli. We use spiking neurons to produce spike outputs that directly match the recorded trains. This approach helps to avoid losing information embedded in the original spike trains. We exclude the temporal dimension from the model parameter space and introduce a temporal conditioning operation to allow the model to adaptively explore and exploit temporal dependencies in stimuli sequences in a {\it natural paradigm}. We show that TeCoS-LVM models can produce more realistic spike activities and accurately fit spike statistics than powerful alternatives. Additionally, learned TeCoS-LVM models can generalize well to longer time scales. Overall, while remaining computationally tractable, our model effectively captures key features of neural coding systems. It thus provides a useful tool for building accurate predictive computational accounts for various sensory perception circuits.

GloptiNets: Scalable Non-Convex Optimization with Certificates. (arXiv:2306.14932v3 [math.OC] UPDATED)

Authors: Gaspard Beugnot (PSL, DI-ENS), Julien Mairal, Alessandro Rudi (PSL, DI-ENS)

We present a novel approach to non-convex optimization with certificates, which handles smooth functions on the hypercube or on the torus. Unlike traditional methods that rely on algebraic properties, our algorithm exploits the regularity of the target function intrinsic in the decay of its Fourier spectrum. By defining a tractable family of models, we allow at the same time to obtain precise certificates and to leverage the advanced and powerful computational techniques developed to optimize neural networks. In this way the scalability of our approach is naturally enhanced by parallel computing with GPUs. Our approach, when applied to the case of polynomials of moderate dimensions but with thousands of coefficients, outperforms the state-of-the-art optimization methods with certificates, as the ones based on Lasserre's hierarchy, addressing problems intractable for the competitors.

Contextual Pre-Planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning. (arXiv:2307.05209v2 [cs.AI] UPDATED)

Authors: Guy Azran, Mohamad H. Danesh, Stefano V. Albrecht, Sarah Keren

Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RMs), state machine abstractions that induce subtasks based on the current task's rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Empirical results show that our representations improve sample efficiency and few-shot transfer in a variety of domains.

Bootstrapping Vision-Language Learning with Decoupled Language Pre-training. (arXiv:2307.07063v4 [cs.CV] UPDATED)

Authors: Yiren Jian, Chongyang Gao, Soroush Vosoughi

We present a novel methodology aimed at optimizing the application of frozen large language models (LLMs) for resource-intensive vision-language (VL) pre-training. The current paradigm uses visual features as prompts to guide language models, with a focus on determining the most relevant visual features for corresponding text. Our approach diverges by concentrating on the language component, specifically identifying the optimal prompts to align with visual features. We introduce the Prompt-Transformer (P-Former), a model that predicts these ideal prompts, which is trained exclusively on linguistic data, bypassing the need for image-text pairings. This strategy subtly bifurcates the end-to-end VL training process into an additional, separate stage. Our experiments reveal that our framework significantly enhances the performance of a robust image-to-text baseline (BLIP-2), and effectively narrows the performance gap between models trained with either 4M or 129M image-text pairs. Importantly, our framework is modality-agnostic and flexible in terms of architectural design, as validated by its successful application in a video learning task using varied base modules. The code will be made available at https://github.com/yiren-jian/BLIText.

Feature Transportation Improves Graph Neural Networks. (arXiv:2307.16092v2 [cs.LG] UPDATED)

Authors: Moshe Eliasof, Eldad Haber, Eran Treister

Graph neural networks (GNNs) have shown remarkable success in learning representations for graph-structured data. However, GNNs still face challenges in modeling complex phenomena that involve feature transportation. In this paper, we propose a novel GNN architecture inspired by Advection-Diffusion-Reaction systems, called ADR-GNN. Advection models feature transportation, while diffusion captures the local smoothing of features, and reaction represents the non-linear transformation between feature channels. We provide an analysis of the qualitative behavior of ADR-GNN, that shows the benefit of combining advection, diffusion, and reaction. To demonstrate its efficacy, we evaluate ADR-GNN on real-world node classification and spatio-temporal datasets, and show that it improves or offers competitive performance compared to state-of-the-art networks.

DeSCo: Towards Generalizable and Scalable Deep Subgraph Counting. (arXiv:2308.08198v2 [cs.LG] UPDATED)

Authors: Tianyu Fu, Chiyue Wei, Yu Wang, Rex Ying

We introduce DeSCo, a scalable neural deep subgraph counting pipeline, designed to accurately predict both the count and occurrence position of queries on target graphs post single training. Firstly, DeSCo uses a novel canonical partition and divides the large target graph into small neighborhood graphs, greatly reducing the count variation while guaranteeing no missing or double-counting. Secondly, neighborhood counting uses an expressive subgraph-based heterogeneous graph neural network to accurately count in each neighborhood. Finally, gossip propagation propagates neighborhood counts with learnable gates to harness the inductive biases of motif counts. DeSCo is evaluated on eight real-world datasets from various domains. It outperforms state-of-the-art neural methods with 137x improvement in the mean squared error of count prediction, while maintaining the polynomial runtime complexity. Our open source project is at https://github.com/fuvty/DeSCo.

Two-and-a-half Order Score-based Model for Solving 3D Ill-posed Inverse Problems. (arXiv:2308.08511v3 [eess.IV] UPDATED)

Authors: Zirong Li, Yanyang Wang, Jianjia Zhang, Weiwen Wu, Hengyong Yu

Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) are crucial technologies in the field of medical imaging. Score-based models have proven to be effective in addressing different inverse problems encountered in CT and MRI, such as sparse-view CT and fast MRI reconstruction. However, these models face challenges in achieving accurate three dimensional (3D) volumetric reconstruction. The existing score-based models primarily focus on reconstructing two dimensional (2D) data distribution, leading to inconsistencies between adjacent slices in the reconstructed 3D volumetric images. To overcome this limitation, we propose a novel two-and-a-half order score-based model (TOSM). During the training phase, our TOSM learns data distributions in 2D space, which reduces the complexity of training compared to directly working on 3D volumes. However, in the reconstruction phase, the TOSM updates the data distribution in 3D space, utilizing complementary scores along three directions (sagittal, coronal, and transaxial) to achieve a more precise reconstruction. The development of TOSM is built on robust theoretical principles, ensuring its reliability and efficacy. Through extensive experimentation on large-scale sparse-view CT and fast MRI datasets, our method demonstrates remarkable advancements and attains state-of-the-art results in solving 3D ill-posed inverse problems. Notably, the proposed TOSM effectively addresses the inter-slice inconsistency issue, resulting in high-quality 3D volumetric reconstruction.

PMET: Precise Model Editing in a Transformer. (arXiv:2308.08742v4 [cs.CL] UPDATED)

Authors: Xiaopeng Li, Shasha Li, Shezheng Song, Jing Yang, Jun Ma, Jie Yu

Model editing techniques modify a minor proportion of knowledge in Large Language Models (LLMs) at a relatively low cost, which have demonstrated notable success. Existing methods assume Transformer Layer (TL) hidden states are values of key-value memories of the Feed-Forward Network (FFN). They usually optimize the TL hidden states to memorize target knowledge and use it to update the weights of the FFN in LLMs. However, the information flow of TL hidden states comes from three parts: Multi-Head Self-Attention (MHSA), FFN, and residual connections. Existing methods neglect the fact that the TL hidden states contains information not specifically required for FFN. Consequently, the performance of model editing decreases. To achieve more precise model editing, we analyze hidden states of MHSA and FFN, finding that MHSA encodes certain general knowledge extraction patterns. This implies that MHSA weights do not require updating when new knowledge is introduced. Based on above findings, we introduce PMET, which simultaneously optimizes Transformer Component (TC, namely MHSA and FFN) hidden states, while only using the optimized TC hidden states of FFN to precisely update FFN weights. Our experiments demonstrate that PMET exhibits state-of-the-art performance on both the COUNTERFACT and zsRE datasets. Our ablation experiments substantiate the effectiveness of our enhancements, further reinforcing the finding that the MHSA encodes certain general knowledge extraction patterns and indicating its storage of a small amount of factual knowledge. Our code is available at https://github.com/xpq-tech/PMET.

Learning Weakly Convex Regularizers for Convergent Image-Reconstruction Algorithms. (arXiv:2308.10542v2 [eess.IV] UPDATED)

Authors: Alexis Goujon, Sebastian Neumayer, Michael Unser

We propose to learn non-convex regularizers with a prescribed upper bound on their weak-convexity modulus. Such regularizers give rise to variational denoisers that minimize a convex energy. They rely on few parameters (less than 15,000) and offer a signal-processing interpretation as they mimic handcrafted sparsity-promoting regularizers. Through numerical experiments, we show that such denoisers outperform convex-regularization methods as well as the popular BM3D denoiser. Additionally, the learned regularizer can be deployed to solve inverse problems with iterative schemes that provably converge. For both CT and MRI reconstruction, the regularizer generalizes well and offers an excellent tradeoff between performance, number of parameters, guarantees, and interpretability when compared to other data-driven approaches.

From system models to class models: An in-context learning paradigm. (arXiv:2308.13380v2 [eess.SY] UPDATED)

Authors: Marco Forgione, Filippo Pura, Dario Piga

Is it possible to understand the intricacies of a dynamical system not solely from its input/output pattern, but also by observing the behavior of other systems within the same class? This central question drives the study presented in this paper.

In response to this query, we introduce a novel paradigm for system identification, addressing two primary tasks: one-step-ahead prediction and multi-step simulation. Unlike conventional methods, we do not directly estimate a model for the specific system. Instead, we learn a meta model that represents a class of dynamical systems. This meta model is trained on a potentially infinite stream of synthetic data, generated by simulators whose settings are randomly extracted from a probability distribution. When provided with a context from a new system-specifically, an input/output sequence-the meta model implicitly discerns its dynamics, enabling predictions of its behavior.

The proposed approach harnesses the power of Transformers, renowned for their \emph{in-context learning} capabilities. For one-step prediction, a GPT-like decoder-only architecture is utilized, whereas the simulation problem employs an encoder-decoder structure. Initial experimental results affirmatively answer our foundational question, opening doors to fresh research avenues in system identification.

On the Tradeoff between Privacy Preservation and Byzantine-Robustness in Decentralized Learning. (arXiv:2308.14606v3 [cs.LG] UPDATED)

Authors: Haoxiang Ye, Heng Zhu, Qing Ling

This paper jointly considers privacy preservation and Byzantine-robustness in decentralized learning. In a decentralized network, honest-but-curious agents faithfully follow the prescribed algorithm, but expect to infer their neighbors' private data from messages received during the learning process, while dishonest-and-Byzantine agents disobey the prescribed algorithm, and deliberately disseminate wrong messages to their neighbors so as to bias the learning process. For this novel setting, we investigate a generic privacy-preserving and Byzantine-robust decentralized stochastic gradient descent (SGD) framework, in which Gaussian noise is injected to preserve privacy and robust aggregation rules are adopted to counteract Byzantine attacks. We analyze its learning error and privacy guarantee, discovering an essential tradeoff between privacy preservation and Byzantine-robustness in decentralized learning -- the learning error caused by defending against Byzantine attacks is exacerbated by the Gaussian noise added to preserve privacy. For a class of state-of-the-art robust aggregation rules, we give unified analysis of the "mixing abilities". Building upon this analysis, we reveal how the "mixing abilities" affect the tradeoff between privacy preservation and Byzantine-robustness. The theoretical results provide guidelines for achieving a favorable tradeoff with proper design of robust aggregation rules. Numerical experiments are conducted and corroborate our theoretical findings.

Data-Juicer: A One-Stop Data Processing System for Large Language Models. (arXiv:2309.02033v3 [cs.LG] UPDATED)

Authors: Daoyuan Chen, Yilun Huang, Zhijian Ma, Hesen Chen, Xuchen Pan, Ce Ge, Dawei Gao, Yuexiang Xie, Zhaoyang Liu, Jinyang Gao, Yaliang Li, Bolin Ding, Jingren Zhou

The immense evolution in Large Language Models (LLMs) has underscored the importance of massive, heterogeneous, and high-quality data. A data recipe is a mixture of data from different sources for training LLMs, which plays a vital role in LLMs' performance. Existing open-source tools for LLM data processing are mostly tailored for specific data recipes. To continuously uncover the potential of LLMs, incorporate data from new sources, and improve LLMs' performance, we build a new system named Data-Juicer, with which we can efficiently generate diverse data recipes, explore different possibilities in forming data mixtures, and evaluate their effects on model performance. Different from traditional data-analytics pipelines, Data-Juicer faces some unique challenges. Firstly, the possible data sources for forming data recipes are truly heterogeneous and massive with various qualities. Secondly, it is extremely expensive to precisely evaluate data recipes' impact on LLMs' performance. Thirdly, the end users of Data-Juicer, model developers, need sufficient flexibility to configure and evaluate different data recipes.

Data-Juicer features a fine-grained abstraction of pipelines for constructing data recipes, with over 50 built-in operators for easy composition and extension. By incorporating visualization and auto-evaluation capabilities, Data-Juicer enables a timely feedback loop for both LLM pre-training and fine-tuning. Further, Data-Juicer is optimized and integrated with ecosystems for LLM training, evaluation, and distributed computing. The data recipes derived with Data-Juicer gain notable improvements on state-of-the-art LLMs, by up to 7.45% increase in averaged score across 16 LLM benchmarks and 17.5% higher win rate in pair-wise GPT-4 evaluations. Our system, data recipes, and tutorials are released, calling for broader data-centric research on training and understanding LLMs.

USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models. (arXiv:2309.08023v2 [eess.AS] UPDATED)

Authors: Guanlong Zhao, Yongqiang Wang, Jason Pelecanos, Yu Zhang, Hank Liao, Yiling Huang, Han Lu, Quan Wang

We introduce a multilingual speaker change detection model (USM-SCD) that can simultaneously detect speaker turns and perform ASR for 96 languages. This model is adapted from a speech foundation model trained on a large quantity of supervised and unsupervised data, demonstrating the utility of fine-tuning from a large generic foundation model for a downstream task. We analyze the performance of this multilingual speaker change detection model through a series of ablation studies. We show that the USM-SCD model can achieve more than 75% average speaker change detection F1 score across a test set that consists of data from 96 languages. On American English, the USM-SCD model can achieve an 85.8% speaker change detection F1 score across various public and internal test sets, beating the previous monolingual baseline model by 21% relative. We also show that we only need to fine-tune one-quarter of the trainable model parameters to achieve the best model performance. The USM-SCD model exhibits state-of-the-art ASR quality compared with a strong public ASR baseline, making it suitable to handle both tasks with negligible additional computational cost.

MAPTree: Beating "Optimal" Decision Trees with Bayesian Decision Trees. (arXiv:2309.15312v3 [cs.LG] UPDATED)

Authors: Colin Sullivan, Mo Tiwari, Sebastian Thrun

Decision trees remain one of the most popular machine learning models today, largely due to their out-of-the-box performance and interpretability. In this work, we present a Bayesian approach to decision tree induction via maximum a posteriori inference of a posterior distribution over trees. We first demonstrate a connection between maximum a posteriori inference of decision trees and AND/OR search. Using this connection, we propose an AND/OR search algorithm, dubbed MAPTree, which is able to recover the maximum a posteriori tree. Lastly, we demonstrate the empirical performance of the maximum a posteriori tree both on synthetic data and in real world settings. On 16 real world datasets, MAPTree either outperforms baselines or demonstrates comparable performance but with much smaller trees. On a synthetic dataset, MAPTree also demonstrates greater robustness to noise and better generalization than existing approaches. Finally, MAPTree recovers the maxiumum a posteriori tree faster than existing sampling approaches and, in contrast with those algorithms, is able to provide a certificate of optimality. The code for our experiments is available at https://github.com/ThrunGroup/maptree.

A Framework for Interpretability in Machine Learning for Medical Imaging. (arXiv:2310.01685v2 [cs.LG] UPDATED)

Authors: Alan Q. Wang, Batuhan K. Karaman, Heejong Kim, Jacob Rosenthal, Rachit Saluja, Sean I. Young, Mert R. Sabuncu

Interpretability for machine learning models in medical imaging (MLMI) is an important direction of research. However, there is a general sense of murkiness in what interpretability means. Why does the need for interpretability in MLMI arise? What goals does one actually seek to address when interpretability is needed? To answer these questions, we identify a need to formalize the goals and elements of interpretability in MLMI. By reasoning about real-world tasks and goals common in both medical image analysis and its intersection with machine learning, we identify five core elements of interpretability: localization, visual recognizability, physical attribution, model transparency, and actionability. From this, we arrive at a framework for interpretability in MLMI, which serves as a step-by-step guide to approaching interpretability in this context. Overall, this paper formalizes interpretability needs in the context of medical imaging, and our applied perspective clarifies concrete MLMI-specific goals and considerations in order to guide method design and improve real-world usage. Our goal is to provide practical and didactic information for model designers and practitioners, inspire developers of models in the medical imaging field to reason more deeply about what interpretability is achieving, and suggest future directions of interpretability research.

Graph Neural Network-based EEG Classification: A Survey. (arXiv:2310.02152v2 [q-bio.NC] UPDATED)

Authors: Dominik Klepl, Min Wu, Fei He

Graph neural networks (GNN) are increasingly used to classify EEG for tasks such as emotion recognition, motor imagery and neurological diseases and disorders. A wide range of methods have been proposed to design GNN-based classifiers. Therefore, there is a need for a systematic review and categorisation of these approaches. We exhaustively search the published literature on this topic and derive several categories for comparison. These categories highlight the similarities and differences among the methods. The results suggest a prevalence of spectral graph convolutional layers over spatial. Additionally, we identify standard forms of node features, with the most popular being the raw EEG signal and differential entropy. Our results summarise the emerging trends in GNN-based approaches for EEG classification. Finally, we discuss several promising research directions, such as exploring the potential of transfer learning methods and appropriate modelling of cross-frequency interactions.

Taming Binarized Neural Networks and Mixed-Integer Programs. (arXiv:2310.04469v3 [cs.LG] UPDATED)

Authors: Johannes Aspman, Georgios Korpas, Jakub Marecek

There has been a great deal of recent interest in binarized neural networks, especially because of their explainability. At the same time, automatic differentiation algorithms such as backpropagation fail for binarized neural networks, which limits their applicability. By reformulating the problem of training binarized neural networks as a subadditive dual of a mixed-integer program, we show that binarized neural networks admit a tame representation. This, in turn, makes it possible to use the framework of Bolte et al. for implicit differentiation, which offers the possibility for practical implementation of backpropagation in the context of binarized neural networks.

This approach could also be used for a broader class of mixed-integer programs, beyond the training of binarized neural networks, as encountered in symbolic approaches to AI and beyond.

Comparing the robustness of modern no-reference image- and video-quality metrics to adversarial attacks. (arXiv:2310.06958v2 [cs.CV] UPDATED)

Authors: Anastasia Antsiferova, Khaled Abud, Aleksandr Gushchin, Ekaterina Shumitskaya, Sergey Lavrushkin, Dmitriy Vatolin

Nowadays neural-network-based image- and video-quality metrics show better performance compared to traditional methods. However, they also became more vulnerable to adversarial attacks that increase metrics' scores without improving visual quality. The existing benchmarks of quality metrics compare their performance in terms of correlation with subjective quality and calculation time. However, the adversarial robustness of image-quality metrics is also an area worth researching. In this paper, we analyse modern metrics' robustness to different adversarial attacks. We adopted adversarial attacks from computer vision tasks and compared attacks' efficiency against 15 no-reference image/video-quality metrics. Some metrics showed high resistance to adversarial attacks which makes their usage in benchmarks safer than vulnerable metrics. The benchmark accepts new metrics submissions for researchers who want to make their metrics more robust to attacks or to find such metrics for their needs. Try our benchmark using pip install robustness-benchmark.

Online RL in Linearly $q^\pi$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore. (arXiv:2310.07811v2 [cs.LG] UPDATED)

Authors: Gellért Weisz, András György, Csaba Szepesvári

We consider online reinforcement learning (RL) in episodic Markov decision processes (MDPs) under the linear $q^\pi$-realizability assumption, where it is assumed that the action-values of all policies can be expressed as linear functions of state-action features. This class is known to be more general than linear MDPs, where the transition kernel and the reward function are assumed to be linear functions of the feature vectors. As our first contribution, we show that the difference between the two classes is the presence of states in linearly $q^\pi$-realizable MDPs where for any policy, all the actions have approximately equal values, and skipping over these states by following an arbitrarily fixed policy in those states transforms the problem to a linear MDP. Based on this observation, we derive a novel (computationally inefficient) learning algorithm for linearly $q^\pi$-realizable MDPs that simultaneously learns what states should be skipped over and runs another learning algorithm on the linear MDP hidden in the problem. The method returns an $\epsilon$-optimal policy after $\text{polylog}(H, d)/\epsilon^2$ interactions with the MDP, where $H$ is the time horizon and $d$ is the dimension of the feature vectors, giving the first polynomial-sample-complexity online RL algorithm for this setting. The results are proved for the misspecified case, where the sample complexity is shown to degrade gracefully with the misspecification error.

Trust, but Verify: Robust Image Segmentation using Deep Learning. (arXiv:2310.16999v3 [cs.CV] UPDATED)

Authors: Fahim Ahmed Zaman, Xiaodong Wu, Weiyu Xu, Milan Sonka, Raghuraman Mudumbai

We describe a method for verifying the output of a deep neural network for medical image segmentation that is robust to several classes of random as well as worst-case perturbations i.e. adversarial attacks. This method is based on a general approach recently developed by the authors called "Trust, but Verify" wherein an auxiliary verification network produces predictions about certain masked features in the input image using the segmentation as an input. A well-designed auxiliary network will produce high-quality predictions when the input segmentations are accurate, but will produce low-quality predictions when the segmentations are incorrect. Checking the predictions of such a network with the original image allows us to detect bad segmentations. However, to ensure the verification method is truly robust, we need a method for checking the quality of the predictions that does not itself rely on a black-box neural network. Indeed, we show that previous methods for segmentation evaluation that do use deep neural regression networks are vulnerable to false negatives i.e. can inaccurately label bad segmentations as good. We describe the design of a verification network that avoids such vulnerability and present results to demonstrate its robustness compared to previous methods.

Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization. (arXiv:2311.03351v2 [cs.LG] UPDATED)

Authors: Kun Lei, Zhengmao He, Chenhao Lu, Kaizhe Hu, Yang Gao, Huazhe Xu

Combining offline and online reinforcement learning (RL) is crucial for efficient and safe learning. However, previous approaches treat offline and online learning as separate procedures, resulting in redundant designs and limited performance. We ask: Can we achieve straightforward yet effective offline and online learning without introducing extra conservatism or regularization? In this study, we propose Uni-o4, which utilizes an on-policy objective for both offline and online learning. Owning to the alignment of objectives in two phases, the RL agent can transfer between offline and online learning seamlessly. This property enhances the flexibility of the learning paradigm, allowing for arbitrary combinations of pretraining, fine-tuning, offline, and online learning. In the offline phase, specifically, Uni-o4 leverages diverse ensemble policies to address the mismatch issues between the estimated behavior policy and the offline dataset. Through a simple offline policy evaluation (OPE) approach, Uni-o4 can achieve multi-step policy improvement safely. We demonstrate that by employing the method above, the fusion of these two paradigms can yield superior offline initialization as well as stable and rapid online fine-tuning capabilities. Through real-world robot tasks, we highlight the benefits of this paradigm for rapid deployment in challenging, previously unseen real-world environments. Additionally, through comprehensive evaluations using numerous simulated benchmarks, we substantiate that our method achieves state-of-the-art performance in both offline and offline-to-online fine-tuning learning. Our website: https://lei-kun.github.io/uni-o4/ .

FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline. (arXiv:2311.13073v2 [cs.CV] UPDATED)

Authors: Vladimir Arkhipkin, Zein Shaheen, Viacheslav Vasilev, Elizaveta Dakhova, Andrey Kuznetsov, Denis Dimitrov

Multimedia generation approaches occupy a prominent place in artificial intelligence research. Text-to-image models achieved high-quality results over the last few years. However, video synthesis methods recently started to develop. This paper presents a new two-stage latent diffusion text-to-video generation architecture based on the text-to-image diffusion model. The first stage concerns keyframes synthesis to figure the storyline of a video, while the second one is devoted to interpolation frames generation to make movements of the scene and objects smooth. We compare several temporal conditioning approaches for keyframes generation. The results show the advantage of using separate temporal blocks over temporal layers in terms of metrics reflecting video generation quality aspects and human preference. The design of our interpolation model significantly reduces computational costs compared to other masked frame interpolation approaches. Furthermore, we evaluate different configurations of MoVQ-based video decoding scheme to improve consistency and achieve higher PSNR, SSIM, MSE, and LPIPS scores. Finally, we compare our pipeline with existing solutions and achieve top-2 scores overall and top-1 among open-source solutions: CLIPSIM = 0.2976 and FVD = 433.054. Project page: https://ai-forever.github.io/kandinsky-video/

Use of Deep Neural Networks for Uncertain Stress Functions with Extensions to Impact Mechanics. (arXiv:2311.16135v2 [cond-mat.mtrl-sci] UPDATED)

Authors: Garrett Blum, Ryan Doris, Diego Klabjan, Horacio Espinosa, Ron Szalkowski

Stress-strain curves, or more generally, stress functions, are an extremely important characterization of a material's mechanical properties. However, stress functions are often difficult to derive and are narrowly tailored to a specific material. Further, large deformations, high strain-rates, temperature sensitivity, and effect of material parameters compound modeling challenges. We propose a generalized deep neural network approach to model stress as a state function with quantile regression to capture uncertainty. We extend these models to uniaxial impact mechanics using stochastic differential equations to demonstrate a use case and provide a framework for implementing this uncertainty-aware stress function. We provide experiments benchmarking our approach against leading constitutive, machine learning, and transfer learning approaches to stress and impact mechanics modeling on publicly available and newly presented data sets. We also provide a framework to optimize material parameters given multiple competing impact scenarios.

FedECA: A Federated External Control Arm Method for Causal Inference with Time-To-Event Data in Distributed Settings. (arXiv:2311.16984v2 [stat.ME] UPDATED)

Authors: Jean Ogier du Terrail, Quentin Klopfenstein, Honghao Li, Imke Mayer, Nicolas Loiseau, Mohammad Hallal, Félix Balazard, Mathieu Andreux

External control arms (ECA) can inform the early clinical development of experimental drugs and provide efficacy evidence for regulatory approval in non-randomized settings. However, the main challenge of implementing ECA lies in accessing real-world data or historical clinical trials. Indeed, data sharing is often not feasible due to privacy considerations related to data leaving the original collection centers, along with pharmaceutical companies' competitive motives. In this paper, we leverage a privacy-enhancing technology called federated learning (FL) to remove some of the barriers to data sharing. We introduce a federated learning inverse probability of treatment weighted (IPTW) method for time-to-event outcomes called FedECA which eases the implementation of ECA by limiting patients' data exposure. We show with extensive experiments that FedECA outperforms its closest competitor, matching-adjusted indirect comparison (MAIC), in terms of statistical power and ability to balance the treatment and control groups. To encourage the use of such methods, we publicly release our code which relies on Substra, an open-source FL software with proven experience in privacy-sensitive contexts.

Forecasting Trends in Food Security: a Reservoir Computing Approach. (arXiv:2312.00626v2 [cs.LG] UPDATED)

Authors: Joschka Herteux, Christoph Räth, Amine Baha, Giulia Martini, Duccio Piovani

Early warning systems are an essential tool for effective humanitarian action. Advance warnings on impending disasters facilitate timely and targeted response which help save lives, livelihoods, and scarce financial resources. In this work we present a new quantitative methodology to forecast levels of food consumption for 60 consecutive days, at the sub-national level, in four countries: Mali, Nigeria, Syria, and Yemen. The methodology is built on publicly available data from the World Food Programme's integrated global hunger monitoring system which collects, processes, and displays daily updates on key food security metrics, conflict, weather events, and other drivers of food insecurity across 90 countries (https://hungermap.wfp.org/). In this study, we assessed the performance of various models including ARIMA, XGBoost, LSTMs, CNNs, and Reservoir Computing (RC), by comparing their Root Mean Squared Error (RMSE) metrics. This comprehensive analysis spanned classical statistical, machine learning, and deep learning approaches. Our findings highlight Reservoir Computing as a particularly well-suited model in the field of food security given both its notable resistance to over-fitting on limited data samples and its efficient training capabilities. The methodology we introduce establishes the groundwork for a global, data-driven early warning system designed to anticipate and detect food insecurity.

MIND: Multi-Task Incremental Network Distillation. (arXiv:2312.02916v2 [cs.CV] UPDATED)

Authors: Jacopo Bonato, Francesco Pelosin, Luigi Sabetta, Alessandro Nicolosi

The recent surge of pervasive devices that generate dynamic data streams has underscored the necessity for learning systems to adapt continually to data distributional shifts. To tackle this challenge, the research community has put forth a spectrum of methodologies, including the demanding pursuit of class-incremental learning without replay data. In this study, we present MIND, a parameter isolation method that aims to significantly enhance the performance of replay-free solutions and achieve state-of-the-art results on several widely studied datasets. Our approach introduces two main contributions: two alternative distillation procedures that significantly improve the efficiency of MIND increasing the accumulated knowledge of each sub-network, and the optimization of the BachNorm layers across tasks inside the sub-networks. Overall, MIND outperforms all the state-of-the-art methods for rehearsal-free Class-Incremental learning (with an increment in classification accuracy of approx. +6% on CIFAR-100/10 and +10% on TinyImageNet/10) reaching up to approx. +40% accuracy in Domain-Incremental scenarios. Moreover, we ablated each contribution to demonstrate its impact on performance improvement. Our results showcase the superior performance of MIND indicating its potential for addressing the challenges posed by Class-incremental and Domain-Incremental learning in resource-constrained environments.

Achieving ${O}(\epsilon^{-1.5})$ Complexity in Hessian/Jacobian-free Stochastic Bilevel Optimization. (arXiv:2312.03807v2 [math.OC] UPDATED)

Authors: Yifan Yang, Peiyao Xiao, Kaiyi Ji

In this paper, we revisit the bilevel optimization problem, in which the upper-level objective function is generally nonconvex and the lower-level objective function is strongly convex. Although this type of problem has been studied extensively, it still remains an open question how to achieve an ${O}(\epsilon^{-1.5})$ sample complexity in Hessian/Jacobian-free stochastic bilevel optimization without any second-order derivative computation. To fill this gap, we propose a novel Hessian/Jacobian-free bilevel optimizer named FdeHBO, which features a simple fully single-loop structure, a projection-aided finite-difference Hessian/Jacobian-vector approximation, and momentum-based updates. Theoretically, we show that FdeHBO requires ${O}(\epsilon^{-1.5})$ iterations (each using ${O}(1)$ samples and only first-order gradient information) to find an $\epsilon$-accurate stationary point. As far as we know, this is the first Hessian/Jacobian-free method with an ${O}(\epsilon^{-1.5})$ sample complexity for nonconvex-strongly-convex stochastic bilevel optimization.

Invariant Random Forest: Tree-Based Model Solution for OOD Generalization. (arXiv:2312.04273v2 [cs.LG] UPDATED)

Authors: Yufan Liao, Qi Wu, Xing Yan

Out-Of-Distribution (OOD) generalization is an essential topic in machine learning. However, recent research is only focusing on the corresponding methods for neural networks. This paper introduces a novel and effective solution for OOD generalization of decision tree models, named Invariant Decision Tree (IDT). IDT enforces a penalty term with regard to the unstable/varying behavior of a split across different environments during the growth of the tree. Its ensemble version, the Invariant Random Forest (IRF), is constructed. Our proposed method is motivated by a theoretical result under mild conditions, and validated by numerical tests with both synthetic and real datasets. The superior performance compared to non-OOD tree models implies that considering OOD generalization for tree models is absolutely necessary and should be given more attention.

Transformer as Linear Expansion of Learngene. (arXiv:2312.05614v2 [cs.AI] UPDATED)

Authors: Shiyu Xia, Miaosen Zhang, Xu Yang, Ruiming Chen, Haokun Chen, Xin Geng

We propose expanding the shared Transformer module to produce and initialize Transformers of varying depths, enabling adaptation to diverse resource constraints. Drawing an analogy to genetic expansibility, we term such module as learngene. To identify the expansion mechanism, we delve into the relationship between the layer's position and its corresponding weight value, and find that linear function appropriately approximates this relationship. Building on this insight, we present Transformer as Linear Expansion of learnGene (TLEG), a novel approach for flexibly producing and initializing Transformers of diverse depths. Specifically, to learn learngene, we firstly construct an auxiliary Transformer linearly expanded from learngene, after which we train it through employing soft distillation. Subsequently, we can produce and initialize Transformers of varying depths via linearly expanding the well-trained learngene, thereby supporting diverse downstream scenarios. Extensive experiments on ImageNet-1K demonstrate that TLEG achieves comparable or better performance in contrast to many individual models trained from scratch, while reducing around 2x training cost. When transferring to several downstream classification datasets, TLEG surpasses existing initialization methods by a large margin (e.g., +6.87% on iNat 2019 and +7.66% on CIFAR-100). Under the situation where we need to produce models of varying depths adapting for different resource constraints, TLEG achieves comparable results while reducing around 19x parameters stored to initialize these models and around 5x pre-training costs, in contrast to the pre-training and fine-tuning approach. When transferring a fixed set of parameters to initialize different models, TLEG presents better flexibility and competitive performance while reducing around 2.9x parameters stored to initialize, compared to the pre-training approach.

AUGCAL: Improving Sim2Real Adaptation by Uncertainty Calibration on Augmented Synthetic Images. (arXiv:2312.06106v2 [cs.CV] UPDATED)

Authors: Prithvijit Chattopadhyay, Bharat Goyal, Boglarka Ecsedi, Viraj Prabhu, Judy Hoffman

Synthetic data (SIM) drawn from simulators have emerged as a popular alternative for training models where acquiring annotated real-world images is difficult. However, transferring models trained on synthetic images to real-world applications can be challenging due to appearance disparities. A commonly employed solution to counter this SIM2REAL gap is unsupervised domain adaptation, where models are trained using labeled SIM data and unlabeled REAL data. Mispredictions made by such SIM2REAL adapted models are often associated with miscalibration - stemming from overconfident predictions on real data. In this paper, we introduce AUGCAL, a simple training-time patch for unsupervised adaptation that improves SIM2REAL adapted models by - (1) reducing overall miscalibration, (2) reducing overconfidence in incorrect predictions and (3) improving confidence score reliability by better guiding misclassification detection - all while retaining or improving SIM2REAL performance. Given a base SIM2REAL adaptation algorithm, at training time, AUGCAL involves replacing vanilla SIM images with strongly augmented views (AUG intervention) and additionally optimizing for a training time calibration loss on augmented SIM predictions (CAL intervention). We motivate AUGCAL using a brief analytical justification of how to reduce miscalibration on unlabeled REAL data. Through our experiments, we empirically show the efficacy of AUGCAL across multiple adaptation methods, backbones, tasks and shifts.

Exploring Novel Object Recognition and Spontaneous Location Recognition Machine Learning Analysis Techniques in Alzheimer's Mice. (arXiv:2312.06914v2 [cs.LG] UPDATED)

Authors: Soham Bafana

Understanding object recognition patterns in mice is crucial for advancing behavioral neuroscience and has significant implications for human health, particularly in the realm of Alzheimer's research. This study is centered on the development, application, and evaluation of a state-of-the-art computational pipeline designed to analyze such behaviors, specifically focusing on Novel Object Recognition (NOR) and Spontaneous Location Recognition (SLR) tasks. The pipeline integrates three advanced computational models: Any-Maze for initial data collection, DeepLabCut for detailed pose estimation, and Convolutional Neural Networks (CNNs) for nuanced behavioral classification. Employed across four distinct mouse groups, this pipeline demonstrated high levels of accuracy and robustness. Despite certain challenges like video quality limitations and the need for manual calculations, the results affirm the pipeline's efficacy and potential for scalability. The study serves as a proof of concept for a multidimensional computational approach to behavioral neuroscience, emphasizing the pipeline's versatility and readiness for future, more complex analyses.

ReRoGCRL: Representation-based Robustness in Goal-Conditioned Reinforcement Learning. (arXiv:2312.07392v3 [cs.LG] UPDATED)

Authors: Xiangyu Yin, Sihao Wu, Jiaxu Liu, Meng Fang, Xingyu Zhao, Xiaowei Huang, Wenjie Ruan

While Goal-Conditioned Reinforcement Learning (GCRL) has gained attention, its algorithmic robustness against adversarial perturbations remains unexplored. The attacks and robust representation training methods that are designed for traditional RL become less effective when applied to GCRL. To address this challenge, we first propose the Semi-Contrastive Representation attack, a novel approach inspired by the adversarial contrastive attack. Unlike existing attacks in RL, it only necessitates information from the policy function and can be seamlessly implemented during deployment. Then, to mitigate the vulnerability of existing GCRL algorithms, we introduce Adversarial Representation Tactics, which combines Semi-Contrastive Adversarial Augmentation with Sensitivity-Aware Regularizer to improve the adversarial robustness of the underlying RL agent against various types of perturbations. Extensive experiments validate the superior performance of our attack and defence methods across multiple state-of-the-art GCRL algorithms. Our tool ReRoGCRL is available at https://github.com/TrustAI/ReRoGCRL.

Double Machine Learning for Static Panel Models with Fixed Effects. (arXiv:2312.08174v2 [econ.EM] UPDATED)

Authors: Paul Clarke, Annalivia Polselli

Machine Learning (ML) algorithms are powerful data-driven tools for approximating highdimensional or non-linear nuisance functions which are useful in practice because the true functional form of the predictors is ex-ante unknown. In this paper, we develop estimators of policy interventions from panel data which allow for non-linear effects of the confounding regressors, and investigate the performance of these estimators using three well-known ML algorithms, specifically, LASSO, classification and regression trees, and random forests. We use Double Machine Learning (DML) (Chernozhukov et al., 2018) for the estimation of causal effects of homogeneous treatments with unobserved individual heterogeneity (fixed effects) and no unobserved confounding by extending Robinson (1988)'s partially linear regression model. We develop three alternative approaches for handling unobserved individual heterogeneity based on extending the within-group estimator, first-difference estimator, and correlated random effect estimator (Mundlak, 1978) for non-linear models. Using Monte Carlo simulations, we find that conventional least squares estimators can perform well even if the data generating process is nonlinear, but there are substantial performance gains in terms of bias reduction under a process where the true effect of the regressors is non-linear and discontinuous. However, for the same scenarios, we also find - despite extensive hyperparameter tuning - inference to be problematic for both tree-based learners because these lead to highly non-normal estimator distributions and the estimator variance being severely under-estimated. This contradicts the performance of trees in other circumstances and requires further investigation. Finally, we provide an illustrative example of DML for observational panel data showing the impact of the introduction of the national minimum wage in the UK.

Hybrid Sample Synthesis-based Debiasing of Classifier in Limited Data Setting. (arXiv:2312.08288v2 [cs.CV] UPDATED)

Authors: Piyush Arora, Pratik Mazumder

Deep learning models are known to suffer from the problem of bias, and researchers have been exploring methods to address this issue. However, most of these methods require prior knowledge of the bias and are not always practical. In this paper, we focus on a more practical setting with no prior information about the bias. Generally, in this setting, there are a large number of bias-aligned samples that cause the model to produce biased predictions and a few bias-conflicting samples that do not conform to the bias. If the training data is limited, the influence of the bias-aligned samples may become even stronger on the model predictions, and we experimentally demonstrate that existing debiasing techniques suffer severely in such cases. In this paper, we examine the effects of unknown bias in small dataset regimes and present a novel approach to mitigate this issue. The proposed approach directly addresses the issue of the extremely low occurrence of bias-conflicting samples in limited data settings through the synthesis of hybrid samples that can be used to reduce the effect of bias. We perform extensive experiments on several benchmark datasets and experimentally demonstrate the effectiveness of our proposed approach in addressing any unknown bias in the presence of limited data. Specifically, our approach outperforms the vanilla, LfF, LDD, and DebiAN debiasing methods by absolute margins of 10.39%, 9.08%, 8.07%, and 9.67% when only 10% of the Corrupted CIFAR-10 Type 1 dataset is available with a bias-conflicting sample ratio of 0.05.

Universal Approximation Property of Random Neural Networks. (arXiv:2312.08410v2 [cs.LG] UPDATED)

Authors: Ariel Neufeld, Philipp Schmocker

In this paper, we study random neural networks which are single-hidden-layer feedforward neural networks whose weights and biases are randomly initialized. After this random initialization, only the linear readout needs to be trained, which can be performed efficiently, e.g., by the least squares method. By viewing random neural networks as Banach space-valued random variables, we prove a universal approximation theorem within a large class of Bochner spaces. Hereby, the corresponding Banach space can be significantly more general than the space of continuous functions over a compact subset of a Euclidean space, namely, e.g., an $L^p$-space or a Sobolev space, where the latter includes the approximation of the derivatives. Moreover, we derive approximation rates and an explicit algorithm to learn a deterministic function by a random neural network. In addition, we provide a full error analysis and study when random neural networks overcome the curse of dimensionality in the sense that the training costs scale at most polynomially in the input and output dimension. Furthermore, we show in two numerical examples the empirical advantages of random neural networks compared to fully trained deterministic neural networks.

Physics-informed Neural Network Estimation of Material Properties in Soft Tissue Nonlinear Biomechanical Models. (arXiv:2312.09787v2 [cs.LG] UPDATED)

Authors: Federica Caforio, Francesco Regazzoni, Stefano Pagani, Elias Karabelas, Christoph Augustin, Gundolf Haase, Gernot Plank, Alfio Quarteroni

The development of biophysical models for clinical applications is rapidly advancing in the research community, thanks to their predictive nature and their ability to assist the interpretation of clinical data. However, high-resolution and accurate multi-physics computational models are computationally expensive and their personalisation involves fine calibration of a large number of parameters, which may be space-dependent, challenging their clinical translation. In this work, we propose a new approach which relies on the combination of physics-informed neural networks (PINNs) with three-dimensional soft tissue nonlinear biomechanical models, capable of reconstructing displacement fields and estimating heterogeneous patient-specific biophysical properties. The proposed learning algorithm encodes information from a limited amount of displacement and, in some cases, strain data, that can be routinely acquired in the clinical setting, and combines it with the physics of the problem, represented by a mathematical model based on partial differential equations, to regularise the problem and improve its convergence properties. Several benchmarks are presented to show the accuracy and robustness of the proposed method and its great potential to enable the robust and effective identification of patient-specific, heterogeneous physical properties, s.a. tissue stiffness properties. In particular, we demonstrate the capability of the PINN to detect the presence, location and severity of scar tissue, which is beneficial to develop personalised simulation models for disease diagnosis, especially for cardiac applications.

No prejudice! Fair Federated Graph Neural Networks for Personalized Recommendation. (arXiv:2312.10080v2 [cs.IR] UPDATED)

Authors: Nimesh Agrawal, Anuj Kumar Sirohi, Jayadeva, Sandeep Kumar

Ensuring fairness in Recommendation Systems (RSs) across demographic groups is critical due to the increased integration of RSs in applications such as personalized healthcare, finance, and e-commerce. Graph-based RSs play a crucial role in capturing intricate higher-order interactions among entities. However, integrating these graph models into the Federated Learning (FL) paradigm with fairness constraints poses formidable challenges as this requires access to the entire interaction graph and sensitive user information (such as gender, age, etc.) at the central server. This paper addresses the pervasive issue of inherent bias within RSs for different demographic groups without compromising the privacy of sensitive user attributes in FL environment with the graph-based model. To address the group bias, we propose F2PGNN (Fair Federated Personalized Graph Neural Network), a novel framework that leverages the power of Personalized Graph Neural Network (GNN) coupled with fairness considerations. Additionally, we use differential privacy techniques to fortify privacy protection. Experimental evaluation on three publicly available datasets showcases the efficacy of F2PGNN in mitigating group unfairness by 47% - 99% compared to the state-of-the-art while preserving privacy and maintaining the utility. The results validate the significance of our framework in achieving equitable and personalized recommendations using GNN within the FL landscape.

One step closer to unbiased aleatoric uncertainty estimation. (arXiv:2312.10469v2 [cs.LG] UPDATED)

Authors: Wang Zhang, Ziwen Ma, Subhro Das, Tsui-Wei Weng, Alexandre Megretski, Luca Daniel, Lam M. Nguyen

Neural networks are powerful tools in various applications, and quantifying their uncertainty is crucial for reliable decision-making. In the deep learning field, the uncertainties are usually categorized into aleatoric (data) and epistemic (model) uncertainty. In this paper, we point out that the existing popular variance attenuation method highly overestimates aleatoric uncertainty. To address this issue, we propose a new estimation method by actively de-noising the observed data. By conducting a broad range of experiments, we demonstrate that our proposed approach provides a much closer approximation to the actual data uncertainty than the standard method.

NN-Steiner: A Mixed Neural-algorithmic Approach for the Rectilinear Steiner Minimum Tree Problem. (arXiv:2312.10589v2 [cs.AI] UPDATED)

Authors: Andrew B. Kahng, Robert R. Nerem, Yusu Wang, Chien-Yi Yang

Recent years have witnessed rapid advances in the use of neural networks to solve combinatorial optimization problems. Nevertheless, designing the "right" neural model that can effectively handle a given optimization problem can be challenging, and often there is no theoretical understanding or justification of the resulting neural model. In this paper, we focus on the rectilinear Steiner minimum tree (RSMT) problem, which is of critical importance in IC layout design and as a result has attracted numerous heuristic approaches in the VLSI literature. Our contributions are two-fold. On the methodology front, we propose NN-Steiner, which is a novel mixed neural-algorithmic framework for computing RSMTs that leverages the celebrated PTAS algorithmic framework of Arora to solve this problem (and other geometric optimization problems). Our NN-Steiner replaces key algorithmic components within Arora's PTAS by suitable neural components. In particular, NN-Steiner only needs four neural network (NN) components that are called repeatedly within an algorithmic framework. Crucially, each of the four NN components is only of bounded size independent of input size, and thus easy to train. Furthermore, as the NN component is learning a generic algorithmic step, once learned, the resulting mixed neural-algorithmic framework generalizes to much larger instances not seen in training. Our NN-Steiner, to our best knowledge, is the first neural architecture of bounded size that has capacity to approximately solve RSMT (and variants). On the empirical front, we show how NN-Steiner can be implemented and demonstrate the effectiveness of our resulting approach, especially in terms of generalization, by comparing with state-of-the-art methods (both neural and non-neural based).

Agglomerative Federated Learning: Empowering Larger Model Training via End-Edge-Cloud Collaboration. (arXiv:2312.11489v2 [cs.DC] UPDATED)

Authors: Zhiyuan Wu, Sheng Sun, Yuwei Wang, Min Liu, Bo Gao, Quyang Pan, Tianliu He, Xuefeng Jiang

Federated Learning (FL) enables training Artificial Intelligence (AI) models over end devices without compromising their privacy. As computing tasks are increasingly performed by a combination of cloud, edge, and end devices, FL can benefit from this End-Edge-Cloud Collaboration (EECC) paradigm to achieve collaborative device-scale expansion with real-time access. Although Hierarchical Federated Learning (HFL) supports multi-tier model aggregation suitable for EECC, prior works assume the same model structure on all computing nodes, constraining the model scale by the weakest end devices. To address this issue, we propose Agglomerative Federated Learning (FedAgg), which is a novel EECC-empowered FL framework that allows the trained models from end, edge, to cloud to grow larger in size and stronger in generalization ability. FedAgg recursively organizes computing nodes among all tiers based on Bridge Sample Based Online Distillation Protocol (BSBODP), which enables every pair of parent-child computing nodes to mutually transfer and distill knowledge extracted from generated bridge samples. This design enhances the performance by exploiting the potential of larger models, with privacy constraints of FL and flexibility requirements of EECC both satisfied. Experiments under various settings demonstrate that FedAgg outperforms state-of-the-art methods by an average of 4.53\% accuracy gains and remarkable improvements in convergence rate.

Unlocking Musculoskeletal Disorder Risk Factors: NLP-Based Classification and Mode-Based Ranking. (arXiv:2312.11517v2 [cs.CL] UPDATED)

Authors: Md Abrar Jahin, Subrata Talapatra

This research delves into the intricate landscape of Musculoskeletal Disorder (MSD) risk factors, employing a novel fusion of Natural Language Processing (NLP) techniques and mode-based ranking methodologies. The primary objective is to advance the comprehension of MSD risk factors, their classification, and their relative severity, facilitating more targeted preventive and management interventions. The study utilizes eight diverse models, integrating pre-trained transformers, cosine similarity, and various distance metrics to classify risk factors into personal, biomechanical, workplace, psychological, and organizational classes. Key findings reveal that the BERT model with cosine similarity attains an overall accuracy of 28%, while the sentence transformer, coupled with Euclidean, Bray-Curtis, and Minkowski distances, achieves a flawless accuracy score of 100%. In tandem with the classification efforts, the research employs a mode-based ranking approach on survey data to discern the severity hierarchy of MSD risk factors. Intriguingly, the rankings align precisely with the previous literature, reaffirming the consistency and reliability of the approach. ``Working posture" emerges as the most severe risk factor, emphasizing the critical role of proper posture in preventing MSDs. The collective perceptions of survey participants underscore the significance of factors like "Job insecurity," "Effort reward imbalance," and "Poor employee facility" in contributing to MSD risks. The convergence of rankings provides actionable insights for organizations aiming to reduce the prevalence of MSDs. The study concludes with implications for targeted interventions, recommendations for improving workplace conditions, and avenues for future research.

Improved Differentially Private and Lazy Online Convex Optimization. (arXiv:2312.11534v2 [cs.CR] UPDATED)

Authors: Naman Agarwal, Satyen Kale, Karan Singh, Abhradeep Guha Thakurta

We study the task of $(\epsilon, \delta)$-differentially private online convex optimization (OCO). In the online setting, the release of each distinct decision or iterate carries with it the potential for privacy loss. This problem has a long history of research starting with Jain et al. [2012] and the best known results for the regime of {\epsilon} not being very small are presented in Agarwal et al. [2023]. In this paper we improve upon the results of Agarwal et al. [2023] in terms of the dimension factors as well as removing the requirement of smoothness. Our results are now the best known rates for DP-OCO in this regime.

Our algorithms builds upon the work of [Asi et al., 2023] which introduced the idea of explicitly limiting the number of switches via rejection sampling. The main innovation in our algorithm is the use of sampling from a strongly log-concave density which allows us to trade-off the dimension factors better leading to improved results.

A Survey of Reasoning with Foundation Models: Concepts, Methodologies, and Outlook. (arXiv:2312.11562v2 [cs.AI] UPDATED)

Authors: Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, Yue Wu, Wenhai Wang, Junsong Chen, Zhangyue Yin, Xiaozhe Ren, Jie Fu, Junxian He, Wu Yuan, Qi Liu, Xihui Liu, Yu Li, Hao Dong, Yu Cheng, Ming Zhang, Pheng Ann Heng, Jifeng Dai, Ping Luo, Jingdong Wang, Ji-Rong Wen, Xipeng Qiu, Yike Guo, Hui Xiong, Qun Liu, Zhenguo Li

Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, there is a growing interest in exploring their abilities in reasoning tasks. In this paper, we introduce seminal foundation models proposed or adaptable for reasoning, highlighting the latest advancements in various reasoning tasks, methods, and benchmarks. We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models. We also discuss the relevance of multimodal learning, autonomous agents, and super alignment in the context of reasoning. By discussing these future research directions, we hope to inspire researchers in their exploration of this field, stimulate further advancements in reasoning with foundation models, and contribute to the development of AGI.

Locally-Minimal Probabilistic Explanations. (arXiv:2312.11831v2 [cs.LG] UPDATED)

Authors: Yacine Izza, Kuldeep S. Meel, Joao Marques-Silva

Formal abductive explanations offer crucial guarantees of rigor and so are of interest in high-stakes uses of machine learning (ML). One drawback of abductive explanations is explanation size, justified by the cognitive limits of human decision-makers. Probabilistic abductive explanations (PAXps) address this limitation, but their theoretical and practical complexity makes their exact computation most often unrealistic. This paper proposes novel efficient algorithms for the computation of locally-minimal PXAps, which offer high-quality approximations of PXAps in practice. The experimental results demonstrate the practical efficiency of the proposed algorithms.

OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments. (arXiv:2312.12145v2 [cs.LG] UPDATED)

Authors: Jinyi Liu, Zhi Wang, Yan Zheng, Jianye Hao, Chenjia Bai, Junjie Ye, Zhen Wang, Haiyin Piao, Yang Sun

In reinforcement learning, the optimism in the face of uncertainty (OFU) is a mainstream principle for directing exploration towards less explored areas, characterized by higher uncertainty. However, in the presence of environmental stochasticity (noise), purely optimistic exploration may lead to excessive probing of high-noise areas, consequently impeding exploration efficiency. Hence, in exploring noisy environments, while optimism-driven exploration serves as a foundation, prudent attention to alleviating unnecessary over-exploration in high-noise areas becomes beneficial. In this work, we propose Optimistic Value Distribution Explorer (OVD-Explorer) to achieve a noise-aware optimistic exploration for continuous control. OVD-Explorer proposes a new measurement of the policy's exploration ability considering noise in optimistic perspectives, and leverages gradient ascent to drive exploration. Practically, OVD-Explorer can be easily integrated with continuous control RL algorithms. Extensive evaluations on the MuJoCo and GridChaos tasks demonstrate the superiority of OVD-Explorer in achieving noise-aware optimistic exploration.

Poincar\'e Differential Privacy for Hierarchy-Aware Graph Embedding. (arXiv:2312.12183v2 [cs.LG] UPDATED)

Authors: Yuecen Wei, Haonan Yuan, Xingcheng Fu, Qingyun Sun, Hao Peng, Xianxian Li, Chunming Hu

Hierarchy is an important and commonly observed topological property in real-world graphs that indicate the relationships between supervisors and subordinates or the organizational behavior of human groups. As hierarchy is introduced as a new inductive bias into the Graph Neural Networks (GNNs) in various tasks, it implies latent topological relations for attackers to improve their inference attack performance, leading to serious privacy leakage issues. In addition, existing privacy-preserving frameworks suffer from reduced protection ability in hierarchical propagation due to the deficiency of adaptive upper-bound estimation of the hierarchical perturbation boundary. It is of great urgency to effectively leverage the hierarchical property of data while satisfying privacy guarantees. To solve the problem, we propose the Poincar\'e Differential Privacy framework, named PoinDP, to protect the hierarchy-aware graph embedding based on hyperbolic geometry. Specifically, PoinDP first learns the hierarchy weights for each entity based on the Poincar\'e model in hyperbolic space. Then, the Personalized Hierarchy-aware Sensitivity is designed to measure the sensitivity of the hierarchical structure and adaptively allocate the privacy protection strength. Besides, the Hyperbolic Gaussian Mechanism (HGM) is proposed to extend the Gaussian mechanism in Euclidean space to hyperbolic space to realize random perturbations that satisfy differential privacy under the hyperbolic space metric. Extensive experiment results on five real-world datasets demonstrate the proposed PoinDP's advantages of effective privacy protection while maintaining good performance on the node classification task.

Efficient Title Reranker for Fast and Improved Knowledge-Intense NLP. (arXiv:2312.12430v2 [cs.IR] UPDATED)

Authors: Ziyi Chen, Heyi Tao, Daqian Zuo, Jize Jiang, Jun Yang, Yuxiang Wei

We introduce Efficient Title Reranker via Broadcasting Query Encoder, a novel title reranking technique to achieve efficient title reranking 20x-40x faster than vanilla passage reranker. However, one of the challenges with the training of Efficient Title Reranker is the instability. Analyzing the issue, we found some very difficult ground truths might act as noisy labels causing accuracy to drop as well as some extreme values in model probability output causing nan. To address these issues, we introduce the Sigmoid Trick, a novel technique that reduces the gradient update of both cases resulting in better retrieval efficacy. Experiments showed the effectiveness of ETR and sigmoid trick as we achieved four state-of-the-art positions on the kilt knowledge benchmark.

Software Effort Estimation with Ridge Regression and Evolutionary Attribute Selection. (arXiv:1012.5754v1 [cs.SE] CROSS LISTED)

Authors: Efi Papatheocharous, Harris Papadopoulos, Andreas S. Andreou

Software cost estimation is one of the prerequisite managerial activities carried out at the software development initiation stages and also repeated throughout the whole software life-cycle so that amendments to the total cost are made. In software cost estimation typically, a selection of project attributes is employed to produce effort estimations of the expected human resources to deliver a software product. However, choosing the appropriate project cost drivers in each case requires a lot of experience and knowledge on behalf of the project manager which can only be obtained through years of software engineering practice. A number of studies indicate that popular methods applied in the literature for software cost estimation, such as linear regression, are not robust enough and do not yield accurate predictions. Recently the dual variables Ridge Regression (RR) technique has been used for effort estimation yielding promising results. In this work we show that results may be further improved if an AI method is used to automatically select appropriate project cost drivers (inputs) for the technique. We propose a hybrid approach combining RR with a Genetic Algorithm, the latter evolving the subset of attributes for approximating effort more accurately. The proposed hybrid cost model has been applied on a widely known high-dimensional dataset of software project samples and the results obtained show that accuracy may be increased if redundant attributes are eliminated.

Feature Subset Selection for Software Cost Modelling and Estimation. (arXiv:1210.1161v1 [cs.SE] CROSS LISTED)

Authors: Efi Papatheocharous, Harris Papadopoulos, Andreas S. Andreou

Feature selection has been recently used in the area of software engineering for improving the accuracy and robustness of software cost models. The idea behind selecting the most informative subset of features from a pool of available cost drivers stems from the hypothesis that reducing the dimensionality of datasets will significantly minimise the complexity and time required to reach to an estimation using a particular modelling technique. This work investigates the appropriateness of attributes, obtained from empirical project databases and aims to reduce the cost drivers used while preserving performance. Finding suitable subset selections that may cater improved predictions may be considered as a pre-processing step of a particular technique employed for cost estimation (filter or wrapper) or an internal (embedded) step to minimise the fitting error. This paper compares nine relatively popular feature selection methods and uses the empirical values of selected attributes recorded in the ISBSG and Desharnais datasets to estimate software development effort.