Transport Equation based Physics Informed Neural Network to predict the Yield Strength of Architected Materials. (arXiv:2312.00003v1 [cs.LG])

Authors: Akshansh Mishra

In this research, the application of the Physics-Informed Neural Network (PINN) model is explored to solve transport equation-based Partial Differential Equations (PDEs). The primary objective is to analyze the impact of different activation functions incorporated within the PINN model on its predictive performance, specifically assessing the Mean Squared Error (MSE) and Mean Absolute Error (MAE). The dataset used in the study consists of a varied set of input parameters related to strut diameter, unit cell size, and the corresponding yield stress values. Through this investigation the aim is to understand the effectiveness of the PINN model and the significance of choosing appropriate activation functions for solving complex PDEs in real-world applications. The outcomes suggest that the choice of activation function may have minimal influence on the model's predictive accuracy for this particular problem. The PINN model showcases exceptional generalization capabilities, indicating its capacity to avoid overfitting with the provided dataset. The research underscores the importance of striking a balance between performance and computational efficiency while selecting an activation function for specific real-world applications. These valuable findings contribute to advancing the understanding and potential adoption of PINN as an effective tool for solving challenging PDEs in diverse scientific and engineering domains.

Can LLMs Patch Security Issues?. (arXiv:2312.00024v1 [cs.CR])

Authors: Kamel Alrashedy, Abdullah Aljasser

Large Language Models (LLMs) have shown impressive proficiency in code generation. Nonetheless, similar to human developers, these models might generate code that contains security vulnerabilities and flaws. Writing secure code remains a substantial challenge, as vulnerabilities often arise during interactions between programs and external systems or services, such as databases and operating systems. In this paper, we propose a novel approach, Feedback-Driven Solution Synthesis (FDSS), designed to explore the use of LLMs in receiving feedback from Bandit, which is a static code analysis tool, and then the LLMs generate potential solutions to resolve security vulnerabilities. Each solution, along with the vulnerable code, is then sent back to the LLM for code refinement. Our approach shows a significant improvement over the baseline and outperforms existing approaches. Furthermore, we introduce a new dataset, PythonSecurityEval, collected from real-world scenarios on Stack Overflow to evaluate the LLMs' ability to generate secure code. Code and data are available at \url{https://github.com/Kamel773/LLM-code-refine}

Secure Transformer Inference. (arXiv:2312.00025v1 [cs.CR])

Authors: Mu Yuan, Lan Zhang, Xiang-Yang Li

We present a three-party protocol that can protect both Transformer parameters and user data during the inference phase. For each feedforward inference process, our protocol only introduces permutation computation of input and output data on the user side. Our protocol, Secure Transformer Inference Protocol (STIP), can be applied to real-world services like ChatGPT.

A Quality-of-Service Compliance System using Federated Learning and Optimistic Rollups. (arXiv:2312.00026v1 [cs.CR])

Authors: Joao Paulo de Brito Goncalves, Guilherme Emerick Sathler, Rodolfo da Silva Villaca

Edge computing brings a new paradigm in which the sharing of computing, storage, and bandwidth resources as close as possible to the mobile devices or sensors generating a large amount of data. A parallel trend is the rise of phones and tablets as primary computing devices for many people. The powerful sensors present on these devices combined with the fact that they are mobile, mean they have access to data of an unprecedentedly diverse and private nature. Models learned on such data hold the promise of greatly improving usability by powering more intelligent applications, but the sensitive nature of the data means there are risks and responsibilities to storing it in a centralized location. To address the data privacy required for some data in these devices we propose the use of Federated Learning (FL) so that specific data about services performed by clients do not leave the source machines. Instead of sharing data, users collaboratively train a model by only sending weight updates to a server. However, the naive use of FL in those scenarios exposes it to a risk of corruption, whether intentional or not, during the training phase. To improve the security of the FL structure, we propose a decentralized Blockchain-based FL in an edge computing scenario. We also apply blockchain to create a reward mechanism in FL to enable incentive strategy for trainers.

Revolutionizing Forensic Toolmark Analysis: An Objective and Transparent Comparison Algorithm. (arXiv:2312.00032v1 [cs.CR])

Authors: Maria Cuellar, Sheng Gao, Heike Hofmann

Forensic toolmark comparisons are currently performed subjectively by humans, which leads to a lack of consistency and accuracy. There is little evidence that examiners can determine whether pairs of marks were made by the same tool or different tools. There is also little evidence that they can make this classification when marks are made under different conditions, such as different angles of attack or direction of mark generation. We generate original toolmark data in 3D, extract the signal from each toolmarks, and train an algorithm to compare toolmark signals objectively. We find that toolmark signals cluster by tool, and not by angle or direction. That is, the variability within tool, regardless of angle/direction, is smaller than the variability between tools. The known-match and known-non-match densities of the similarities of pairs of marks have a small overlap, even when accounting for dependencies in the data, making them a useful instrument for determining whether a new pair of marks was made by the same tool. We provide a likelihood ratio approach as a formal method for comparing toolmark signals with a measure of uncertainty. This empirically trained, open-source method can be used by forensic examiners to compare toolmarks objectively and thus improve the reliability of toolmark comparisons. This can, in turn, reduce miscarriages of justice in the criminal justice system.

FBChain: A Blockchain-based Federated Learning Model with Efficiency and Secure Communication. (arXiv:2312.00035v1 [cs.CR])

Authors: Yang Li, Chunhe Xia, Wei Liu, Weidong Zhou, Chen Chen, Tianbo Wang

Privacy and security in the parameter transmission process of federated learning are currently among the most prominent concerns. However, there are two thorny problems caused by unprotected communication methods: "parameter-leakage" and "inefficient-communication". This article proposes Blockchain-based Federated Learning (FBChain) model for federated learning parameter communication to overcome the above two problems. First, we utilize the immutability of blockchain to store the global model and hash value of local model parameters in case of tampering during the communication process, protect data privacy by encrypting parameters, and verify data consistency by comparing the hash values of local parameters, thus addressing the "parameter-leakage" problem. Second, the Proof of Weighted Link Speed (PoWLS) consensus algorithm comprehensively selects nodes with the higher weighted link speed to aggregate global model and package blocks, thereby solving the "inefficient-communication" problem. Experimental results demonstrate the effectiveness of our proposed FBChain model and its ability to improve model communication efficiency in federated learning.

Privacy-Preserving Load Forecasting via Personalized Model Obfuscation. (arXiv:2312.00036v1 [cs.CR])

Authors: Shourya Bose, Yu Zhang, Kibaek Kim

The widespread adoption of smart meters provides access to detailed and localized load consumption data, suitable for training building-level load forecasting models. To mitigate privacy concerns stemming from model-induced data leakage, federated learning (FL) has been proposed. This paper addresses the performance challenges of short-term load forecasting models trained with FL on heterogeneous data, emphasizing privacy preservation through model obfuscation. Our proposed algorithm, Privacy Preserving Federated Learning (PPFL), incorporates personalization layers for localized training at each smart meter. Additionally, we employ a differentially private mechanism to safeguard against data leakage from shared layers. Simulations on the NREL ComStock dataset corroborate the effectiveness of our approach.

A Physics-Constrained NeuralODE Approach for Robust Learning of Stiff Chemical Kinetics. (arXiv:2312.00038v1 [physics.comp-ph])

Authors: Tadbhagya Kumar, Anuj Kumar, Pinaki Pal

The high computational cost associated with solving for detailed chemistry poses a significant challenge for predictive computational fluid dynamics (CFD) simulations of turbulent reacting flows. These models often require solving a system of coupled stiff ordinary differential equations (ODEs). While deep learning techniques have been experimented with to develop faster surrogate models, they often fail to integrate reliably with CFD solvers. This instability arises because deep learning methods optimize for training error without ensuring compatibility with ODE solvers, leading to accumulation of errors over time. Recently, NeuralODE-based techniques have offered a promising solution by effectively modeling chemical kinetics. In this study, we extend the NeuralODE framework for stiff chemical kinetics by incorporating mass conservation constraints directly into the loss function during training. This ensures that the total mass and the elemental mass are conserved, a critical requirement for reliable downstream integration with CFD solvers. Our results demonstrate that this enhancement not only improves the physical consistency with respect to mass conservation criteria but also ensures better robustness and makes the training process more computationally efficient.

Acoustic Cybersecurity: Exploiting Voice-Activated Systems. (arXiv:2312.00039v1 [cs.CR])

Authors: Forrest McKee, David Noever

In this study, we investigate the emerging threat of inaudible acoustic attacks targeting digital voice assistants, a critical concern given their projected prevalence to exceed the global population by 2024. Our research extends the feasibility of these attacks across various platforms like Amazon's Alexa, Android, iOS, and Cortana, revealing significant vulnerabilities in smart devices. The twelve attack vectors identified include successful manipulation of smart home devices and automotive systems, potential breaches in military communication, and challenges in critical infrastructure security. We quantitatively show that attack success rates hover around 60%, with the ability to activate devices remotely from over 100 feet away. Additionally, these attacks threaten critical infrastructure, emphasizing the need for multifaceted defensive strategies combining acoustic shielding, advanced signal processing, machine learning, and robust user authentication to mitigate these risks.

Presentation Attack detection using Wavelet Transform and Deep Residual Neural Net. (arXiv:2312.00040v1 [cs.CR])

Authors: Prosenjit Chatterjee, Alex Yalchin, Joseph Shelton, Kaushik Roy, Xiaohong Yuan, Kossi D. Edoh

Biometric authentication is becoming more prevalent for secured authentication systems. However, the biometric substances can be deceived by the imposters in several ways. Among other imposter attacks, print attacks, mask attacks, and replay attacks fall under the presentation attack category. The bio-metric images, especially the iris and face, are vulnerable to different presentation attacks. This research applies deep learning approaches to mitigate presentation attacks in a biometric access control system. Our contribution in this paper is two-fold: First, we applied the wavelet transform to extract the features from the biometric images. Second, we modified the deep residual neural net and applied it to the spoof datasets in an attempt to detect the presentation attacks. This research applied the proposed approach to biometric spoof datasets, namely ATVS, CASIA two class, and CASIA cropped image sets. The datasets used in this research contain images that are captured in both a controlled and uncontrolled environment along with different resolutions and sizes. We obtained the best accuracy of 93% on the ATVS Iris datasets. For CASIA two class and CASIA cropped datasets, we achieved test accuracies of 91% and 82%, respectively.

Presentation Attack Detection using Convolutional Neural Networks and Local Binary Patterns. (arXiv:2312.00041v1 [cs.CR])

Authors: Justin Spencer, Deborah Lawrence, Prosenjit Chatterjee, Kaushik Roy, Albert Esterline, Jung-Hee Kim

The use of biometrics to authenticate users and control access to secure areas has become extremely popular in recent years, and biometric access control systems are frequently used by both governments and private corporations. However, these systems may represent risks to security when deployed without considering the possibility of biometric presentation attacks (also known as spoofing). Presentation attacks are a serious threat because they do not require significant time, expense, or skill to carry out while remaining effective against many biometric systems in use today. This research compares three different software-based methods for facial and iris presentation attack detection in images. The first method uses Inception-v3, a pre-trained deep Convolutional Neural Network (CNN) made by Google for the ImageNet challenge, which is retrained for this problem. The second uses a shallow CNN based on a modified Spoofnet architecture, which is trained normally. The third is a texture-based method using Local Binary Patterns (LBP). The datasets used are the ATVS-FIr dataset, which contains real and fake iris images, and the CASIA Face Anti-Spoofing Dataset, which contains real images as well as warped photos, cut photos, and video replay presentation attacks. We also present a third set of results, based on cropped versions of the CASIA images.

DeepTreeGANv2: Iterative Pooling of Point Clouds. (arXiv:2312.00042v1 [physics.data-an])

Authors: Moritz Alfons Wilhelm Scham, Dirk Krücker, Kerstin Borras

In High Energy Physics, detailed and time-consuming simulations are used for particle interactions with detectors. To bypass these simulations with a generative model, the generation of large point clouds in a short time is required, while the complex dependencies between the particles must be correctly modelled. Particle showers are inherently tree-based processes, as each particle is produced by the decay or detector interaction of a particle of the previous generation. In this work, we present a significant extension to DeepTreeGAN, featuring a critic, that is able to aggregate such point clouds iteratively in a tree-based manner. We show that this model can reproduce complex distributions, and we evaluate its performance on the public JetNet 150 dataset.

Tokenized Model: A Blockchain-Empowered Decentralized Model Ownership Verification Platform. (arXiv:2312.00048v1 [cs.CR])

Authors: Yihao Li, Yanyi Lai, Tianchi Liao, Chuan Chen, Zibin Zheng

With the development of practical deep learning models like generative AI, their excellent performance has brought huge economic value. For instance, ChatGPT has attracted more than 100 million users in three months. Since the model training requires a lot of data and computing power, a well-performing deep learning model is behind a huge effort and cost. Facing various model attacks, unauthorized use and abuse from the network that threaten the interests of model owners, in addition to considering legal and other administrative measures, it is equally important to protect the model's copyright from the technical means. By using the model watermarking technology, we point out the possibility of building a unified platform for model ownership verification. Given the application history of blockchain in copyright verification and the drawbacks of a centralized third-party, this paper considers combining model watermarking technology and blockchain to build a unified model copyright protection platform. By a new solution we called Tokenized Model, it protects the model's copyright by reliable ownership record and verification mechanism. It also promotes the financial value of model by constructing the model's transaction process and contribution shares of a model. In the typical case study, we also study the various performance under usual scenario to verify the effectiveness of this platform.

Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift. (arXiv:2312.00050v1 [cs.CR])

Authors: Shengwei An, Sheng-Yen Chou, Kaiyuan Zhang, Qiuling Xu, Guanhong Tao, Guangyu Shen, Siyuan Cheng, Shiqing Ma, Pin-Yu Chen, Tsung-Yi Ho, Xiangyu Zhang

Diffusion models (DM) have become state-of-the-art generative models because of their capability to generate high-quality images from noises without adversarial training. However, they are vulnerable to backdoor attacks as reported by recent studies. When a data input (e.g., some Gaussian noise) is stamped with a trigger (e.g., a white patch), the backdoored model always generates the target image (e.g., an improper photo). However, effective defense strategies to mitigate backdoors from DMs are underexplored. To bridge this gap, we propose the first backdoor detection and removal framework for DMs. We evaluate our framework Elijah on hundreds of DMs of 3 types including DDPM, NCSN and LDM, with 13 samplers against 3 existing backdoor attacks. Extensive experiments show that our approach can have close to 100% detection accuracy and reduce the backdoor effects to close to zero without significantly sacrificing the model utility.

MIA-BAD: An Approach for Enhancing Membership Inference Attack and its Mitigation with Federated Learning. (arXiv:2312.00051v1 [cs.CR])

Authors: Soumya Banerjee, Sandip Roy, Sayyed Farid Ahamed, Devin Quinn, Marc Vucovich, Dhruv Nandakumar, Kevin Choi, Abdul Rahman, Edward Bowen, Sachin Shetty

The membership inference attack (MIA) is a popular paradigm for compromising the privacy of a machine learning (ML) model. MIA exploits the natural inclination of ML models to overfit upon the training data. MIAs are trained to distinguish between training and testing prediction confidence to infer membership information. Federated Learning (FL) is a privacy-preserving ML paradigm that enables multiple clients to train a unified model without disclosing their private data. In this paper, we propose an enhanced Membership Inference Attack with the Batch-wise generated Attack Dataset (MIA-BAD), a modification to the MIA approach. We investigate that the MIA is more accurate when the attack dataset is generated batch-wise. This quantitatively decreases the attack dataset while qualitatively improving it. We show how training an ML model through FL, has some distinct advantages and investigate how the threat introduced with the proposed MIA-BAD approach can be mitigated with FL approaches. Finally, we demonstrate the qualitative effects of the proposed MIA-BAD methodology by conducting extensive experiments with various target datasets, variable numbers of federated clients, and training batch sizes.

Anti-Sexism Alert System: Identification of Sexist Comments on Social Media Using AI Techniques. (arXiv:2312.00053v1 [cs.CY])

Authors: Rebeca P. Díaz Redondo, Ana Fernández Vilas, Mateo Ramos Merino, Sonia Valladares, Soledad Torres Guijarro, Manar Mohamed Hafez

Social relationships in the digital sphere are becoming more usual and frequent, and they constitute a very important aspect for all of us. {Violent interactions in this sphere are very frequent, and have serious effects on the victims}. Within this global scenario, there is one kind of digital violence that is becoming really worrying: sexism against women. Sexist comments that are publicly posted in social media (newspaper comments, social networks, etc.), usually obtain a lot of attention and become viral, with consequent damage to the persons involved. In this paper, we introduce an anti-sexism alert system, based on natural language processing (NLP) and artificial intelligence (AI), that analyzes any public post, and decides if it could be considered a sexist comment or not. Additionally, this system also works on analyzing all the public comments linked to any multimedia content (piece of news, video, tweet, etc.) and decides, using a color-based system similar to traffic lights, if there is sexism in the global set of posts. We have created a labeled data set in Spanish, since the majority of studies focus on English, to train our system, which offers a very good performance after the validation experiments.

Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning?. (arXiv:2312.00054v1 [stat.ML])

Authors: Lei Zhao, Mengdi Wang, Yu Bai

Inverse Reinforcement Learning (IRL) -- the problem of learning reward functions from demonstrations of an \emph{expert policy} -- plays a critical role in developing intelligent systems, such as those that understand and imitate human behavior. While widely used in applications, theoretical understandings of IRL admit unique challenges and remain less developed compared with standard RL theory. For example, it remains open how to do IRL efficiently in standard \emph{offline} settings with pre-collected data, where states are obtained from a \emph{behavior policy} (which could be the expert policy itself), and actions are sampled from the expert policy.

This paper provides the first line of results for efficient IRL in vanilla offline and online settings using polynomial samples and runtime. We first design a new IRL algorithm for the offline setting, Reward Learning with Pessimism (RLP), and show that it achieves polynomial sample complexity in terms of the size of the MDP, a concentrability coefficient between the behavior policy and the expert policy, and the desired accuracy. Building on RLP, we further design an algorithm Reward Learning with Exploration (RLE), which operates in a natural online setting where the learner can both actively explore the environment and query the expert policy, and obtain a stronger notion of IRL guarantee from polynomial samples. We establish sample complexity lower bounds for both settings showing that RLP and RLE are nearly optimal. Finally, as an application, we show that the learned reward functions can \emph{transfer} to another target MDP with suitable guarantees when the target MDP satisfies certain similarity assumptions with the original (source) MDP.

LEAP: LLM-Generation of Egocentric Action Programs. (arXiv:2312.00055v1 [cs.CV])

Authors: Eadom Dessalene, Michael Maynord, Cornelia Fermüller, Yiannis Aloimonos

We introduce LEAP (illustrated in Figure 1), a novel method for generating video-grounded action programs through use of a Large Language Model (LLM). These action programs represent the motoric, perceptual, and structural aspects of action, and consist of sub-actions, pre- and post-conditions, and control flows. LEAP's action programs are centered on egocentric video and employ recent developments in LLMs both as a source for program knowledge and as an aggregator and assessor of multimodal video information. We apply LEAP over a majority (87\%) of the training set of the EPIC Kitchens dataset, and release the resulting action programs as a publicly available dataset here (https://drive.google.com/drive/folders/1Cpkw_TI1IIxXdzor0pOXG3rWJWuKU5Ex?usp=drive_link). We employ LEAP as a secondary source of supervision, using its action programs in a loss term applied to action recognition and anticipation networks. We demonstrate sizable improvements in performance in both tasks due to training with the LEAP dataset. Our method achieves 1st place on the EPIC Kitchens Action Recognition leaderboard as of November 17 among the networks restricted to RGB-input (see Supplementary Materials).

Exploring Factors Affecting Pedestrian Crash Severity Using TabNet: A Deep Learning Approach. (arXiv:2312.00066v1 [cs.LG])

Authors: Amir Rafe, Patrick A. Singleton

This study presents the first investigation of pedestrian crash severity using the TabNet model, a novel tabular deep learning method exceptionally suited for analyzing the tabular data inherent in transportation safety research. Through the application of TabNet to a comprehensive dataset from Utah covering the years 2010 to 2022, we uncover intricate factors contributing to pedestrian crash severity. The TabNet model, capitalizing on its compatibility with structured data, demonstrates remarkable predictive accuracy, eclipsing that of traditional models. It identifies critical variables, such as pedestrian age, involvement in left or right turns, lighting conditions, and alcohol consumption, which significantly influence crash outcomes. The utilization of SHapley Additive exPlanations (SHAP) enhances our ability to interpret the TabNet model's predictions, ensuring transparency and understandability in our deep learning approach. The insights derived from our analysis provide a valuable compass for transportation safety engineers and policymakers, enabling the identification of pivotal factors that affect pedestrian crash severity. Such knowledge is instrumental in formulating precise, data-driven interventions aimed at bolstering pedestrian safety across diverse urban and rural settings.

Predicting breast cancer with AI for individual risk-adjusted MRI screening and early detection. (arXiv:2312.00067v1 [physics.med-ph])

Authors: Lukas Hirsch, Yu Huang, Hernan A. Makse, Danny F. Martinez, Mary Hughes, Sarah Eskreis-Winkler, Katja Pinker, Elizabeth Morris, Lucas C. Parra, Elizabeth J. Sutton

Women with an increased life-time risk of breast cancer undergo supplemental annual screening MRI. We propose to predict the risk of developing breast cancer within one year based on the current MRI, with the objective of reducing screening burden and facilitating early detection. An AI algorithm was developed on 53,858 breasts from 12,694 patients who underwent screening or diagnostic MRI and accrued over 12 years, with 2,331 confirmed cancers. A first U-Net was trained to segment lesions and identify regions of concern. A second convolutional network was trained to detect malignant cancer using features extracted by the U-Net. This network was then fine-tuned to estimate the risk of developing cancer within a year in cases that radiologists considered normal or likely benign. Risk predictions from this AI were evaluated with a retrospective analysis of 9,183 breasts from a high-risk screening cohort, which were not used for training. Statistical analysis focused on the tradeoff between number of omitted exams versus negative predictive value, and number of potential early detections versus positive predictive value. The AI algorithm identified regions of concern that coincided with future tumors in 52% of screen-detected cancers. Upon directed review, a radiologist found that 71.3% of cancers had a visible correlate on the MRI prior to diagnosis, 65% of these correlates were identified by the AI model. Reevaluating these regions in 10% of all cases with higher AI-predicted risk could have resulted in up to 33% early detections by a radiologist. Additionally, screening burden could have been reduced in 16% of lower-risk cases by recommending a later follow-up without compromising current interval cancer rate. With increasing datasets and improving image quality we expect this new AI-aided, adaptive screening to meaningfully reduce screening burden and improve early detection.

Towards A Foundation Model For Trajectory Intelligence. (arXiv:2312.00076v1 [cs.LG])

Authors: Alameen Najjar

We present the results of training a large trajectory model using real-world user check-in data. Our approach follows a pre-train and fine-tune paradigm, where a base model is pre-trained via masked trajectory modeling and then adapted through fine-tuning for various downstream tasks. To address challenges posed by noisy data and large spatial vocabularies, we propose a novel spatial tokenization block. Our empirical analysis utilizes a comprehensive dataset of over 2 billion check-ins generated by more than 6 million users. Through fine-tuning on 3 downstream tasks we demonstrate that our base model has effectively learned valuable underlying patterns in raw data, enabling its application in meaningful trajectory intelligence tasks. Despite some limitations, we believe this work represents an important step forward in the realization of a foundation model for trajectory intelligence.

HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models. (arXiv:2312.00079v1 [cs.CV])

Authors: Zhonghao Wang, Wei Wei, Yang Zhao, Zhisheng Xiao, Mark Hasegawa-Johnson, Humphrey Shi, Tingbo Hou

This paper explores advancements in high-fidelity personalized image generation through the utilization of pre-trained text-to-image diffusion models. While previous approaches have made significant strides in generating versatile scenes based on text descriptions and a few input images, challenges persist in maintaining the subject fidelity within the generated images. In this work, we introduce an innovative algorithm named HiFi Tuner to enhance the appearance preservation of objects during personalized image generation. Our proposed method employs a parameter-efficient fine-tuning framework, comprising a denoising process and a pivotal inversion process. Key enhancements include the utilization of mask guidance, a novel parameter regularization technique, and the incorporation of step-wise subject representations to elevate the sample fidelity. Additionally, we propose a reference-guided generation approach that leverages the pivotal inversion of a reference image to mitigate unwanted subject variations and artifacts. We further extend our method to a novel image editing task: substituting the subject in an image through textual manipulations. Experimental evaluations conducted on the DreamBooth dataset using the Stable Diffusion model showcase promising results. Fine-tuning solely on textual embeddings improves CLIP-T score by 3.6 points and improves DINO score by 9.6 points over Textual Inversion. When fine-tuning all parameters, HiFi Tuner improves CLIP-T score by 1.2 points and improves DINO score by 1.2 points over DreamBooth, establishing a new state of the art.

PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design. (arXiv:2312.00080v1 [q-bio.QM])

Authors: Chuanrui Wang, Bozitao Zhong, Zuobai Zhang, Narendra Chaudhary, Sanchit Misra, Jian Tang

Structure-based protein design has attracted increasing interest, with numerous methods being introduced in recent years. However, a universally accepted method for evaluation has not been established, since the wet-lab validation can be overly time-consuming for the development of new algorithms, and the $\textit{in silico}$ validation with recovery and perplexity metrics is efficient but may not precisely reflect true foldability. To address this gap, we introduce two novel metrics: refoldability-based metric, which leverages high-accuracy protein structure prediction models as a proxy for wet lab experiments, and stability-based metric, which assesses whether models can assign high likelihoods to experimentally stable proteins. We curate datasets from high-quality CATH protein data, high-throughput $\textit{de novo}$ designed proteins, and mega-scale experimental mutagenesis experiments, and in doing so, present the $\textbf{PDB-Struct}$ benchmark that evaluates both recent and previously uncompared protein design methods. Experimental results indicate that ByProt, ProteinMPNN, and ESM-IF perform exceptionally well on our benchmark, while ESM-Design and AF-Design fall short on the refoldability metric. We also show that while some methods exhibit high sequence recovery, they do not perform as well on our new benchmark. Our proposed benchmark paves the way for a fair and comprehensive evaluation of protein design methods in the future. Code is available at https://github.com/WANG-CR/PDB-Struct.

BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos. (arXiv:2312.00083v1 [cs.CV])

Authors: Pilhyeon Lee, Hyeran Byun

Temporal sentence grounding aims to localize moments relevant to a language description. Recently, DETR-like approaches have shown notable progress by decoding the center and length of a target moment from learnable queries. However, they suffer from the issue of center misalignment raised by the inherent ambiguity of moment centers, leading to inaccurate predictions. To remedy this problem, we introduce a novel boundary-oriented moment formulation. In our paradigm, the model no longer needs to find the precise center but instead suffices to predict any anchor point within the interval, from which the onset and offset are directly estimated. Based on this idea, we design a Boundary-Aligned Moment Detection Transformer (BAM-DETR), equipped with a dual-pathway decoding process. Specifically, it refines the anchor and boundaries within parallel pathways using global and boundary-focused attention, respectively. This separate design allows the model to focus on desirable regions, enabling precise refinement of moment predictions. Further, we propose a quality-based ranking method, ensuring that proposals with high localization qualities are prioritized over incomplete ones. Extensive experiments verify the advantages of our methods, where our model records new state-of-the-art results on three benchmarks. Code is at https://github.com/Pilhyeon/BAM-DETR.

Anomaly Detection via Learning-Based Sequential Controlled Sensing. (arXiv:2312.00088v1 [cs.LG])

Authors: Geethu Joseph, Chen Zhong, M. Cenk Gursoy, Senem Velipasalar, Pramod K. Varshney

In this paper, we address the problem of detecting anomalies among a given set of binary processes via learning-based controlled sensing. Each process is parameterized by a binary random variable indicating whether the process is anomalous. To identify the anomalies, the decision-making agent is allowed to observe a subset of the processes at each time instant. Also, probing each process has an associated cost. Our objective is to design a sequential selection policy that dynamically determines which processes to observe at each time with the goal to minimize the delay in making the decision and the total sensing cost. We cast this problem as a sequential hypothesis testing problem within the framework of Markov decision processes. This formulation utilizes both a Bayesian log-likelihood ratio-based reward and an entropy-based reward. The problem is then solved using two approaches: 1) a deep reinforcement learning-based approach where we design both deep Q-learning and policy gradient actor-critic algorithms; and 2) a deep active inference-based approach. Using numerical experiments, we demonstrate the efficacy of our algorithms and show that our algorithms adapt to any unknown statistical dependence pattern of the processes.

Tree-based Forecasting of Day-ahead Solar Power Generation from Granular Meteorological Features. (arXiv:2312.00090v1 [cs.LG])

Authors: Nick Berlanger, Noah van Ophoven, Tim Verdonck, Ines Wilms

Accurate forecasts for day-ahead photovoltaic (PV) power generation are crucial to support a high PV penetration rate in the local electricity grid and to assure stability in the grid. We use state-of-the-art tree-based machine learning methods to produce such forecasts and, unlike previous studies, we hereby account for (i) the effects various meteorological as well as astronomical features have on PV power production, and this (ii) at coarse as well as granular spatial locations. To this end, we use data from Belgium and forecast day-ahead PV power production at an hourly resolution. The insights from our study can assist utilities, decision-makers, and other stakeholders in optimizing grid operations, economic dispatch, and in facilitating the integration of distributed PV power into the electricity grid.

GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs. (arXiv:2312.00093v1 [cs.CV])

Authors: Gege Gao, Weiyang Liu, Anpei Chen, Andreas Geiger, Bernhard Schölkopf

As pretrained text-to-image diffusion models become increasingly powerful, recent efforts have been made to distill knowledge from these text-to-image pretrained models for optimizing a text-guided 3D model. Most of the existing methods generate a holistic 3D model from a plain text input. This can be problematic when the text describes a complex scene with multiple objects, because the vectorized text embeddings are inherently unable to capture a complex description with multiple entities and relationships. Holistic 3D modeling of the entire scene further prevents accurate grounding of text entities and concepts. To address this limitation, we propose GraphDreamer, a novel framework to generate compositional 3D scenes from scene graphs, where objects are represented as nodes and their interactions as edges. By exploiting node and edge information in scene graphs, our method makes better use of the pretrained text-to-image diffusion model and is able to fully disentangle different objects without image-level supervision. To facilitate modeling of object-wise relationships, we use signed distance fields as representation and impose a constraint to avoid inter-penetration of objects. To avoid manual scene graph creation, we design a text prompt for ChatGPT to generate scene graphs based on text inputs. We conduct both qualitative and quantitative experiments to validate the effectiveness of GraphDreamer in generating high-fidelity compositional 3D scenes with disentangled object entities.

Textual-Knowledge-Guided Numerical Feature Discovery Method for Power Demand Forecasting. (arXiv:2312.00095v1 [cs.LG])

Authors: Zifan Ning, Min Jin

Power demand forecasting is a crucial and challenging task for new power system and integrated energy system. However, as public feature databases and the theoretical mechanism of power demand changes are unavailable, the known features of power demand fluctuation are much limited. Recently, multimodal learning approaches have shown great vitality in machine learning and AIGC. In this paper, we interact two modal data and propose a textual-knowledge-guided numerical feature discovery (TKNFD) method for short-term power demand forecasting. TKNFD extensively accumulates qualitative textual knowledge, expands it into a candidate feature-type set, collects numerical data of these features, and eventually builds four-dimensional multivariate source-tracking databases (4DM-STDs). Next, TKNFD presents a two-level quantitative feature identification strategy independent of forecasting models, finds 43-48 features, and systematically analyses feature contribution and dependency correlation. Benchmark experiments in two different regions around the world demonstrate that the forecasting accuracy of TKNFD-discovered features reliably outperforms that of SoTA feature schemes by 16.84% to 36.36% MAPE. In particular, TKNFD reveals many unknown features, especially several dominant features in the unknown energy and astronomical dimensions, which extend the knowledge on the origin of strong randomness and non-linearity in power demand fluctuation. Besides, 4DM-STDs can serve as public baseline databases.

Online Influence Maximization: Concept and Algorithm. (arXiv:2312.00099v1 [cs.SI])

Authors: Jianxiong Guo

In this survey, we offer an extensive overview of the Online Influence Maximization (IM) problem by covering both theoretical aspects and practical applications. For the integrity of the article and because the online algorithm takes an offline oracle as a subroutine, we first make a clear definition of the Offline IM problem and summarize those commonly used Offline IM algorithms, which include traditional approximation or heuristic algorithms and ML-based algorithms. Then, we give a standard definition of the Online IM problem and a basic Combinatorial Multi-Armed Bandit (CMAB) framework, CMAB-T. Here, we summarize three types of feedback in the CMAB model and discuss in detail how to study the Online IM problem based on the CMAB-T model. This paves the way for solving the Online IM problem by using online learning methods. Furthermore, we have covered almost all Online IM algorithms up to now, focusing on characteristics and theoretical guarantees of online algorithms for different feedback types. Here, we elaborately explain their working principle and how to obtain regret bounds. Besides, we also collect plenty of innovative ideas about problem definition and algorithm designs and pioneering works for variants of the Online IM problem and their corresponding algorithms. Finally, we encapsulate current challenges and outline prospective research directions from four distinct perspectives.

Towards Unsupervised Representation Learning: Learning, Evaluating and Transferring Visual Representations. (arXiv:2312.00101v1 [cs.CV])

Authors: Bonifaz Stuhr

Unsupervised representation learning aims at finding methods that learn representations from data without annotation-based signals. Abstaining from annotations not only leads to economic benefits but may - and to some extent already does - result in advantages regarding the representation's structure, robustness, and generalizability to different tasks. In the long run, unsupervised methods are expected to surpass their supervised counterparts due to the reduction of human intervention and the inherently more general setup that does not bias the optimization towards an objective originating from specific annotation-based signals. While major advantages of unsupervised representation learning have been recently observed in natural language processing, supervised methods still dominate in vision domains for most tasks. In this dissertation, we contribute to the field of unsupervised (visual) representation learning from three perspectives: (i) Learning representations: We design unsupervised, backpropagation-free Convolutional Self-Organizing Neural Networks (CSNNs) that utilize self-organization- and Hebbian-based learning rules to learn convolutional kernels and masks to achieve deeper backpropagation-free models. (ii) Evaluating representations: We build upon the widely used (non-)linear evaluation protocol to define pretext- and target-objective-independent metrics for measuring and investigating the objective function mismatch between various unsupervised pretext tasks and target tasks. (iii) Transferring representations: We contribute CARLANE, the first 3-way sim-to-real domain adaptation benchmark for 2D lane detection, and a method based on prototypical self-supervised learning. Finally, we contribute a content-consistent unpaired image-to-image translation method that utilizes masks, global and local discriminators, and similarity sampling to mitigate content inconsistencies.

FedEmb: A Vertical and Hybrid Federated Learning Algorithm using Network And Feature Embedding Aggregation. (arXiv:2312.00102v1 [cs.LG])

Authors: Fanfei Meng, Lele Zhang, Yu Chen, Yuxin Wang

Federated learning (FL) is an emerging paradigm for decentralized training of machine learning models on distributed clients, without revealing the data to the central server. The learning scheme may be horizontal, vertical or hybrid (both vertical and horizontal). Most existing research work with deep neural network (DNN) modelling is focused on horizontal data distributions, while vertical and hybrid schemes are much less studied. In this paper, we propose a generalized algorithm FedEmb, for modelling vertical and hybrid DNN-based learning. The idea of our algorithm is characterised by higher inference accuracy, stronger privacy-preserving properties, and lower client-server communication bandwidth demands as compared with existing work. The experimental results show that FedEmb is an effective method to tackle both split feature & subject space decentralized problems, shows 0.3% to 4.2% inference accuracy improvement with limited privacy revealing for datasets stored in local clients, and reduces 88.9 % time complexity over vertical baseline method.

DeepEn2023: Energy Datasets for Edge Artificial Intelligence. (arXiv:2312.00103v1 [cs.LG])

Authors: Xiaolong Tu, Anik Mallik, Haoxin Wang, Jiang Xie

Climate change poses one of the most significant challenges to humanity. As a result of these climatic changes, the frequency of weather, climate, and water-related disasters has multiplied fivefold over the past 50 years, resulting in over 2 million deaths and losses exceeding $3.64 trillion USD. Leveraging AI-powered technologies for sustainable development and combating climate change is a promising avenue. Numerous significant publications are dedicated to using AI to improve renewable energy forecasting, enhance waste management, and monitor environmental changes in real time. However, very few research studies focus on making AI itself environmentally sustainable. This oversight regarding the sustainability of AI within the field might be attributed to a mindset gap and the absence of comprehensive energy datasets. In addition, with the ubiquity of edge AI systems and applications, especially on-device learning, there is a pressing need to measure, analyze, and optimize their environmental sustainability, such as energy efficiency. To this end, in this paper, we propose large-scale energy datasets for edge AI, named DeepEn2023, covering a wide range of kernels, state-of-the-art deep neural network models, and popular edge AI applications. We anticipate that DeepEn2023 will improve transparency in sustainability in on-device deep learning across a range of edge AI systems and applications. For more information, including access to the dataset and code, please visit https://amai-gsu.github.io/DeepEn2023.

Improving the Robustness of Quantized Deep Neural Networks to White-Box Attacks using Stochastic Quantization and Information-Theoretic Ensemble Training. (arXiv:2312.00105v1 [cs.CV])

Authors: Saurabh Farkya, Aswin Raghavan, Avi Ziskind

Most real-world applications that employ deep neural networks (DNNs) quantize them to low precision to reduce the compute needs. We present a method to improve the robustness of quantized DNNs to white-box adversarial attacks. We first tackle the limitation of deterministic quantization to fixed ``bins'' by introducing a differentiable Stochastic Quantizer (SQ). We explore the hypothesis that different quantizations may collectively be more robust than each quantized DNN. We formulate a training objective to encourage different quantized DNNs to learn different representations of the input image. The training objective captures diversity and accuracy via mutual information between ensemble members. Through experimentation, we demonstrate substantial improvement in robustness against $L_\infty$ attacks even if the attacker is allowed to backpropagate through SQ (e.g., > 50\% accuracy to PGD(5/255) on CIFAR10 without adversarial training), compared to vanilla DNNs as well as existing ensembles of quantized DNNs. We extend the method to detect attacks and generate robustness profiles in the adversarial information plane (AIP), towards a unified analysis of different threat models by correlating the MI and accuracy.

Multimodal Learning for Crystalline Materials. (arXiv:2312.00111v1 [cs.LG])

Authors: Viggo Moro, Charlotte Loh, Rumen Dangovski, Ali Ghorashi, Andrew Ma, Zhuo Chen, Peter Y. Lu, Thomas Christensen, Marin Soljačić

Artificial intelligence (AI) has revolutionized the field of materials science by improving the prediction of properties and accelerating the discovery of novel materials. In recent years, publicly available material data repositories containing data for various material properties have grown rapidly. In this work, we introduce Multimodal Learning for Crystalline Materials (MLCM), a new method for training a foundation model for crystalline materials via multimodal alignment, where high-dimensional material properties (i.e. modalities) are connected in a shared latent space to produce highly useful material representations. We show the utility of MLCM on multiple axes: (i) MLCM achieves state-of-the-art performance for material property prediction on the challenging Materials Project database; (ii) MLCM enables a novel, highly accurate method for inverse design, allowing one to screen for stable material with desired properties; and (iii) MLCM allows the extraction of interpretable emergent features that may provide insight to material scientists. Further, we explore several novel methods for aligning an arbitrary number of modalities, improving upon prior art in multimodal learning that focuses on bimodal alignment. Our work brings innovations from the ongoing AI revolution into the domain of materials science and identifies materials as a testbed for the next generation of AI.

S2ST: Image-to-Image Translation in the Seed Space of Latent Diffusion. (arXiv:2312.00116v1 [cs.CV])

Authors: Or Greenberg, Eran Kishon, Dani Lischinski

Image-to-image translation (I2IT) refers to the process of transforming images from a source domain to a target domain while maintaining a fundamental connection in terms of image content. In the past few years, remarkable advancements in I2IT were achieved by Generative Adversarial Networks (GANs), which nevertheless struggle with translations requiring high precision. Recently, Diffusion Models have established themselves as the engine of choice for image generation. In this paper we introduce S2ST, a novel framework designed to accomplish global I2IT in complex photorealistic images, such as day-to-night or clear-to-rain translations of automotive scenes. S2ST operates within the seed space of a Latent Diffusion Model, thereby leveraging the powerful image priors learned by the latter. We show that S2ST surpasses state-of-the-art GAN-based I2IT methods, as well as diffusion-based approaches, for complex automotive scenes, improving fidelity while respecting the target domain's appearance across a variety of domains. Notably, S2ST obviates the necessity for training domain-specific translation networks.

Flow Matching Beyond Kinematics: Generating Jets with Particle-ID and Trajectory Displacement Information. (arXiv:2312.00123v1 [hep-ph])

Authors: Joschka Birk, Erik Buhmann, Cedric Ewen, Gregor Kasieczka, David Shih

We introduce the first generative model trained on the JetClass dataset. Our model generates jets at the constituent level, and it is a permutation-equivariant continuous normalizing flow (CNF) trained with the flow matching technique. It is conditioned on the jet type, so that a single model can be used to generate the ten different jet types of JetClass. For the first time, we also introduce a generative model that goes beyond the kinematic features of jet constituents. The JetClass dataset includes more features, such as particle-ID and track impact parameter, and we demonstrate that our CNF can accurately model all of these additional features as well. Our generative model for JetClass expands on the versatility of existing jet generation techniques, enhancing their potential utility in high-energy physics research, and offering a more comprehensive understanding of the generated jets.

Scalable Bayesian uncertainty quantification with data-driven priors for radio interferometric imaging. (arXiv:2312.00125v1 [astro-ph.IM])

Authors: Tobías I. Liaudat, Matthijs Mars, Matthew A. Price, Marcelo Pereyra, Marta M. Betcke, Jason D. McEwen

Next-generation radio interferometers like the Square Kilometer Array have the potential to unlock scientific discoveries thanks to their unprecedented angular resolution and sensitivity. One key to unlocking their potential resides in handling the deluge and complexity of incoming data. This challenge requires building radio interferometric imaging methods that can cope with the massive data sizes and provide high-quality image reconstructions with uncertainty quantification (UQ). This work proposes a method coined QuantifAI to address UQ in radio-interferometric imaging with data-driven (learned) priors for high-dimensional settings. Our model, rooted in the Bayesian framework, uses a physically motivated model for the likelihood. The model exploits a data-driven convex prior, which can encode complex information learned implicitly from simulations and guarantee the log-concavity of the posterior. We leverage probability concentration phenomena of high-dimensional log-concave posteriors that let us obtain information about the posterior, avoiding MCMC sampling techniques. We rely on convex optimisation methods to compute the MAP estimation, which is known to be faster and better scale with dimension than MCMC sampling strategies. Our method allows us to compute local credible intervals, i.e., Bayesian error bars, and perform hypothesis testing of structure on the reconstructed image. In addition, we propose a novel blazing-fast method to compute pixel-wise uncertainties at different scales. We demonstrate our method by reconstructing radio-interferometric images in a simulated setting and carrying out fast and scalable UQ, which we validate with MCMC sampling. Our method shows an improved image quality and more meaningful uncertainties than the benchmark method based on a sparsity-promoting prior. QuantifAI's source code: https://github.com/astro-informatics/QuantifAI.

Low latency optical-based mode tracking with machine learning deployed on FPGAs on a tokamak. (arXiv:2312.00128v1 [physics.plasm-ph])

Authors: Yumou Wei, Ryan F. Forelli, Chris Hansen, Jeffrey P. Levesque, Nhan Tran, Joshua C. Agar, Giuseppe Di Guglielmo, Michael E. Mauel, Gerald A. Navratil

Active feedback control in magnetic confinement fusion devices is desirable to mitigate plasma instabilities and enable robust operation. Optical high-speed cameras provide a powerful, non-invasive diagnostic and can be suitable for these applications. In this study, we process fast camera data, at rates exceeding 100kfps, on $\textit{in situ}$ Field Programmable Gate Array (FPGA) hardware to track magnetohydrodynamic (MHD) mode evolution and generate control signals in real-time. Our system utilizes a convolutional neural network (CNN) model which predicts the $n$=1 MHD mode amplitude and phase using camera images with better accuracy than other tested non-deep-learning-based methods. By implementing this model directly within the standard FPGA readout hardware of the high-speed camera diagnostic, our mode tracking system achieves a total trigger-to-output latency of 17.6$\mu$s and a throughput of up to 120kfps. This study at the High Beta Tokamak-Extended Pulse (HBT-EP) experiment demonstrates an FPGA-based high-speed camera data acquisition and processing system, enabling application in real-time machine-learning-based tokamak diagnostic and control as well as potential applications in other scientific domains.

The Multiverse of Dynamic Mode Decomposition Algorithms. (arXiv:2312.00137v1 [math.DS])

Authors: Matthew J. Colbrook

Dynamic Mode Decomposition (DMD) is a popular data-driven analysis technique used to decompose complex, nonlinear systems into a set of modes, revealing underlying patterns and dynamics through spectral analysis. This review presents a comprehensive and pedagogical examination of DMD, emphasizing the role of Koopman operators in transforming complex nonlinear dynamics into a linear framework. A distinctive feature of this review is its focus on the relationship between DMD and the spectral properties of Koopman operators, with particular emphasis on the theory and practice of DMD algorithms for spectral computations. We explore the diverse "multiverse" of DMD methods, categorized into three main areas: linear regression-based methods, Galerkin approximations, and structure-preserving techniques. Each category is studied for its unique contributions and challenges, providing a detailed overview of significant algorithms and their applications as outlined in Table 1. We include a MATLAB package with examples and applications to enhance the practical understanding of these methods. This review serves as both a practical guide and a theoretical reference for various DMD methods, accessible to both experts and newcomers, and enabling readers to delve into their areas of interest in the expansive field of DMD.

Universal Backdoor Attacks. (arXiv:2312.00157v1 [cs.LG])

Authors: Benjamin Schneider, Nils Lukas, Florian Kerschbaum

Web-scraped datasets are vulnerable to data poisoning, which can be used for backdooring deep image classifiers during training. Since training on large datasets is expensive, a model is trained once and re-used many times. Unlike adversarial examples, backdoor attacks often target specific classes rather than any class learned by the model. One might expect that targeting many classes through a naive composition of attacks vastly increases the number of poison samples. We show this is not necessarily true and more efficient, universal data poisoning attacks exist that allow controlling misclassifications from any source class into any target class with a small increase in poison samples. Our idea is to generate triggers with salient characteristics that the model can learn. The triggers we craft exploit a phenomenon we call inter-class poison transferability, where learning a trigger from one class makes the model more vulnerable to learning triggers for other classes. We demonstrate the effectiveness and robustness of our universal backdoor attacks by controlling models with up to 6,000 classes while poisoning only 0.15% of the training dataset.

Non-uniform Online Learning: Towards Understanding Induction. (arXiv:2312.00170v1 [cs.LG])

Authors: Zhou Lu

Can a physicist make only finite errors in the endless pursuit of the law of nature? This millennium-old question of inductive inference is a fundamental, yet mysterious problem in philosophy, lacking rigorous justifications. While classic online learning theory and inductive inference share a similar sequential decision-making spirit, the former's reliance on an adaptive adversary and worst-case error bounds limits its applicability to the latter. In this work, we introduce the concept of non-uniform online learning, which we argue aligns more closely with the principles of inductive reasoning. This setting assumes a predetermined ground-truth hypothesis and considers non-uniform, hypothesis-wise error bounds. In the realizable setting, we provide a complete characterization of learnability with finite error: a hypothesis class is non-uniform learnable if and only if it's a countable union of Littlestone classes, no matter the observations are adaptively chosen or iid sampled. Additionally, we propose a necessary condition for the weaker criterion of consistency which we conjecture to be tight. To further promote our theory, we extend our result to the more realistic agnostic setting, showing that any countable union of Littlestone classes can be learnt with regret $\tilde{O}(\sqrt{T})$. We hope this work could offer a new perspective of interpreting the power of induction from an online learning viewpoint.

HeTriNet: Heterogeneous Graph Triplet Attention Network for Drug-Target-Disease Interaction. (arXiv:2312.00189v1 [cs.LG])

Authors: Farhan Tanvir, Khaled Mohammed Saifuddin, Tanvir Hossain, Arunkumar Bagavathi, Esra Akbas

Modeling the interactions between drugs, targets, and diseases is paramount in drug discovery and has significant implications for precision medicine and personalized treatments. Current approaches frequently consider drug-target or drug-disease interactions individually, ignoring the interdependencies among all three entities. Within human metabolic systems, drugs interact with protein targets in cells, influencing target activities and subsequently impacting biological pathways to promote healthy functions and treat diseases. Moving beyond binary relationships and exploring tighter triple relationships is essential to understanding drugs' mechanism of action (MoAs). Moreover, identifying the heterogeneity of drugs, targets, and diseases, along with their distinct characteristics, is critical to model these complex interactions appropriately. To address these challenges, we effectively model the interconnectedness of all entities in a heterogeneous graph and develop a novel Heterogeneous Graph Triplet Attention Network (\texttt{HeTriNet}). \texttt{HeTriNet} introduces a novel triplet attention mechanism within this heterogeneous graph structure. Beyond pairwise attention as the importance of an entity for the other one, we define triplet attention to model the importance of pairs for entities in the drug-target-disease triplet prediction problem. Experimental results on real-world datasets show that \texttt{HeTriNet} outperforms several baselines, demonstrating its remarkable proficiency in uncovering novel drug-target-disease relationships.

Enhancing Ligand Pose Sampling for Molecular Docking. (arXiv:2312.00191v1 [q-bio.BM])

Authors: Patricia Suriana, Ron O. Dror

Deep learning promises to dramatically improve scoring functions for molecular docking, leading to substantial advances in binding pose prediction and virtual screening. To train scoring functions-and to perform molecular docking-one must generate a set of candidate ligand binding poses. Unfortunately, the sampling protocols currently used to generate candidate poses frequently fail to produce any poses close to the correct, experimentally determined pose, unless information about the correct pose is provided. This limits the accuracy of learned scoring functions and molecular docking. Here, we describe two improved protocols for pose sampling: GLOW (auGmented sampLing with sOftened vdW potential) and a novel technique named IVES (IteratiVe Ensemble Sampling). Our benchmarking results demonstrate the effectiveness of our methods in improving the likelihood of sampling accurate poses, especially for binding pockets whose shape changes substantially when different ligands bind. This improvement is observed across both experimentally determined and AlphaFold-generated protein structures. Additionally, we present datasets of candidate ligand poses generated using our methods for each of around 5,000 protein-ligand cross-docking pairs, for training and testing scoring functions. To benefit the research community, we provide these cross-docking datasets and an open-source Python implementation of GLOW and IVES at https://github.com/drorlab/GLOW_IVES .

Benchmarking and Enhancing Disentanglement in Concept-Residual Models. (arXiv:2312.00192v1 [cs.LG])

Authors: Renos Zabounidis, Ini Oguntola, Konghao Zhao, Joseph Campbell, Simon Stepputtis, Katia Sycara

Concept bottleneck models (CBMs) are interpretable models that first predict a set of semantically meaningful features, i.e., concepts, from observations that are subsequently used to condition a downstream task. However, the model's performance strongly depends on the engineered features and can severely suffer from incomplete sets of concepts. Prior works have proposed a side channel -- a residual -- that allows for unconstrained information flow to the downstream task, thus improving model performance but simultaneously introducing information leakage, which is undesirable for interpretability. This work proposes three novel approaches to mitigate information leakage by disentangling concepts and residuals, investigating the critical balance between model performance and interpretability. Through extensive empirical analysis on the CUB, OAI, and CIFAR 100 datasets, we assess the performance of each disentanglement method and provide insights into when they work best. Further, we show how each method impacts the ability to intervene over the concepts and their subsequent impact on task performance.

Robust Concept Erasure via Kernelized Rate-Distortion Maximization. (arXiv:2312.00194v1 [cs.LG])

Authors: Somnath Basu Roy Chowdhury, Nicholas Monath, Avinava Dubey, Amr Ahmed, Snigdha Chaturvedi

Distributed representations provide a vector space that captures meaningful relationships between data instances. The distributed nature of these representations, however, entangles together multiple attributes or concepts of data instances (e.g., the topic or sentiment of a text, characteristics of the author (age, gender, etc), etc). Recent work has proposed the task of concept erasure, in which rather than making a concept predictable, the goal is to remove an attribute from distributed representations while retaining other information from the original representation space as much as possible. In this paper, we propose a new distance metric learning-based objective, the Kernelized Rate-Distortion Maximizer (KRaM), for performing concept erasure. KRaM fits a transformation of representations to match a specified distance measure (defined by a labeled concept to erase) using a modified rate-distortion function. Specifically, KRaM's objective function aims to make instances with similar concept labels dissimilar in the learned representation space while retaining other information. We find that optimizing KRaM effectively erases various types of concepts: categorical, continuous, and vector-valued variables from data representations across diverse domains. We also provide a theoretical analysis of several properties of KRaM's objective. To assess the quality of the learned representations, we propose an alignment score to evaluate their similarity with the original representation space. Additionally, we conduct experiments to showcase KRaM's efficacy in various settings, from erasing binary gender variables in word embeddings to vector-valued variables in GPT-3 representations.

Optimal Attack and Defense for Reinforcement Learning. (arXiv:2312.00198v1 [cs.LG])

Authors: Jeremy McMahan, Young Wu, Xiaojin Zhu, Qiaomin Xie

To ensure the usefulness of Reinforcement Learning (RL) in real systems, it is crucial to ensure they are robust to noise and adversarial attacks. In adversarial RL, an external attacker has the power to manipulate the victim agent's interaction with the environment. We study the full class of online manipulation attacks, which include (i) state attacks, (ii) observation attacks (which are a generalization of perceived-state attacks), (iii) action attacks, and (iv) reward attacks. We show the attacker's problem of designing a stealthy attack that maximizes its own expected reward, which often corresponds to minimizing the victim's value, is captured by a Markov Decision Process (MDP) that we call a meta-MDP since it is not the true environment but a higher level environment induced by the attacked interaction. We show that the attacker can derive optimal attacks by planning in polynomial time or learning with polynomial sample complexity using standard RL techniques. We argue that the optimal defense policy for the victim can be computed as the solution to a stochastic Stackelberg game, which can be further simplified into a partially-observable turn-based stochastic game (POTBSG). Neither the attacker nor the victim would benefit from deviating from their respective optimal policies, thus such solutions are truly robust. Although the defense problem is NP-hard, we show that optimal Markovian defenses can be computed (learned) in polynomial time (sample complexity) in many scenarios.

SparseGS: Real-Time 360{\deg} Sparse View Synthesis using Gaussian Splatting. (arXiv:2312.00206v1 [cs.CV])

Authors: Haolin Xiong, Sairisheek Muttukuru, Rishi Upadhyay, Pradyumna Chari, Achuta Kadambi

The problem of novel view synthesis has grown significantly in popularity recently with the introduction of Neural Radiance Fields (NeRFs) and other implicit scene representation methods. A recent advance, 3D Gaussian Splatting (3DGS), leverages an explicit representation to achieve real-time rendering with high-quality results. However, 3DGS still requires an abundance of training views to generate a coherent scene representation. In few shot settings, similar to NeRF, 3DGS tends to overfit to training views, causing background collapse and excessive floaters, especially as the number of training views are reduced. We propose a method to enable training coherent 3DGS-based radiance fields of 360 scenes from sparse training views. We find that using naive depth priors is not sufficient and integrate depth priors with generative and explicit constraints to reduce background collapse, remove floaters, and enhance consistency from unseen viewpoints. Experiments show that our method outperforms base 3DGS by up to 30.5% and NeRF-based methods by up to 15.6% in LPIPS on the MipNeRF-360 dataset with substantially less training and inference cost.

EpiTESTER: Testing Autonomous Vehicles with Epigenetic Algorithm and Attention Mechanism. (arXiv:2312.00207v1 [cs.SE])

Authors: Chengjie Lu, Shaukat Ali, Tao Yue

Testing autonomous vehicles (AVs) under various environmental scenarios that lead the vehicles to unsafe situations is known to be challenging. Given the infinite possible environmental scenarios, it is essential to find critical scenarios efficiently. To this end, we propose a novel testing method, named EpiTESTER, by taking inspiration from epigenetics, which enables species to adapt to sudden environmental changes. In particular, EpiTESTER adopts gene silencing as its epigenetic mechanism, which regulates gene expression to prevent the expression of a certain gene, and the probability of gene expression is dynamically computed as the environment changes. Given different data modalities (e.g., images, lidar point clouds) in the context of AV, EpiTESTER benefits from a multi-model fusion transformer to extract high-level feature representations from environmental factors and then calculates probabilities based on these features with the attention mechanism. To assess the cost-effectiveness of EpiTESTER, we compare it with a classical genetic algorithm (GA) (i.e., without any epigenetic mechanism implemented) and EpiTESTER with equal probability for each gene. We evaluate EpiTESTER with four initial environments from CARLA, an open-source simulator for autonomous driving research, and an end-to-end AV controller, Interfuser. Our results show that EpiTESTER achieved a promising performance in identifying critical scenarios compared to the baselines, showing that applying epigenetic mechanisms is a good option for solving practical problems.

On the Interplay Between Stepsize Tuning and Progressive Sharpening. (arXiv:2312.00209v1 [cs.LG])

Authors: Vincent Roulet, Atish Agarwala, Fabian Pedregosa

Recent empirical work has revealed an intriguing property of deep learning models by which the sharpness (largest eigenvalue of the Hessian) increases throughout optimization until it stabilizes around a critical value at which the optimizer operates at the edge of stability, given a fixed stepsize (Coehn et al, 2022). We investigate empirically how the sharpness evolves when using stepsize-tuners, the Armijo linesearch and Polyak stepsizes, that adapt the stepsize along the iterations to local quantities such as, implicitly, the sharpness itself. We find that the surprisingly poor performance of a classical Armijo linesearch may be well explained by its tendency to ever-increase the sharpness of the objective in the full or large batch regimes. On the other hand, we observe that Polyak stepsizes operate generally at the edge of stability or even slightly beyond, while outperforming its Armijo and constant stepsizes counterparts. We conclude with an analysis that suggests unlocking stepsize tuners requires an understanding of the joint dynamics of the step size and the sharpness.

Uncertainty in Graph Contrastive Learning with Bayesian Neural Networks. (arXiv:2312.00232v1 [cs.LG])

Authors: Alexander Möllers, Alexander Immer, Elvin Isufi, Vincent Fortuin

Graph contrastive learning has shown great promise when labeled data is scarce, but large unlabeled datasets are available. However, it often does not take uncertainty estimation into account. We show that a variational Bayesian neural network approach can be used to improve not only the uncertainty estimates but also the downstream performance on semi-supervised node-classification tasks. Moreover, we propose a new measure of uncertainty for contrastive learning, that is based on the disagreement in likelihood due to different positive samples.

Deep Equilibrium Based Neural Operators for Steady-State PDEs. (arXiv:2312.00234v1 [cs.LG])

Authors: Tanya Marwah, Ashwini Pokle, J. Zico Kolter, Zachary C. Lipton, Jianfeng Lu, Andrej Risteski

Data-driven machine learning approaches are being increasingly used to solve partial differential equations (PDEs). They have shown particularly striking successes when training an operator, which takes as input a PDE in some family, and outputs its solution. However, the architectural design space, especially given structural knowledge of the PDE family of interest, is still poorly understood. We seek to remedy this gap by studying the benefits of weight-tied neural network architectures for steady-state PDEs. To achieve this, we first demonstrate that the solution of most steady-state PDEs can be expressed as a fixed point of a non-linear operator. Motivated by this observation, we propose FNO-DEQ, a deep equilibrium variant of the FNO architecture that directly solves for the solution of a steady-state PDE as the infinite-depth fixed point of an implicit operator layer using a black-box root solver and differentiates analytically through this fixed point resulting in $\mathcal{O}(1)$ training memory. Our experiments indicate that FNO-DEQ-based architectures outperform FNO-based baselines with $4\times$ the number of parameters in predicting the solution to steady-state PDEs such as Darcy Flow and steady-state incompressible Navier-Stokes. Finally, we show FNO-DEQ is more robust when trained with datasets with more noisy observations than the FNO-based baselines, demonstrating the benefits of using appropriate inductive biases in architectural design for different neural network based PDE solvers. Further, we show a universal approximation result that demonstrates that FNO-DEQ can approximate the solution to any steady-state PDE that can be written as a fixed point equation.

Negotiated Representations to Prevent Forgetting in Machine Learning Applications. (arXiv:2312.00237v1 [cs.LG])

Authors: Nuri Korhan, Ceren Öner

Catastrophic forgetting is a significant challenge in the field of machine learning, particularly in neural networks. When a neural network learns to perform well on a new task, it often forgets its previously acquired knowledge or experiences. This phenomenon occurs because the network adjusts its weights and connections to minimize the loss on the new task, which can inadvertently overwrite or disrupt the representations that were crucial for the previous tasks. As a result, the the performance of the network on earlier tasks deteriorates, limiting its ability to learn and adapt to a sequence of tasks. In this paper, we propose a novel method for preventing catastrophic forgetting in machine learning applications, specifically focusing on neural networks. Our approach aims to preserve the knowledge of the network across multiple tasks while still allowing it to learn new information effectively. We demonstrate the effectiveness of our method by conducting experiments on various benchmark datasets, including Split MNIST, Split CIFAR10, Split Fashion MNIST, and Split CIFAR100. These datasets are created by dividing the original datasets into separate, non overlapping tasks, simulating a continual learning scenario where the model needs to learn multiple tasks sequentially without forgetting the previous ones. Our proposed method tackles the catastrophic forgetting problem by incorporating negotiated representations into the learning process, which allows the model to maintain a balance between retaining past experiences and adapting to new tasks. By evaluating our method on these challenging datasets, we aim to showcase its potential for addressing catastrophic forgetting and improving the performance of neural networks in continual learning settings.

Self-similarity of Communities of the ABCD Model. (arXiv:2312.00238v1 [cs.SI])

Authors: Jordan Barrett, Bogumil Kaminski, Pawel Pralat, Francois Theberge

The Artificial Benchmark for Community Detection (ABCD) graph is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs similar to the well-known LFR model but it is faster and can be investigated analytically.

In this paper, we show that the ABCD model exhibits some interesting self-similar behaviour, namely, the degree distribution of ground-truth communities is asymptotically the same as the degree distribution of the whole graph (appropriately normalized based on their sizes). As a result, we can not only estimate the number of edges induced by each community but also the number of self-loops and multi-edges generated during the process. Understanding these quantities is important as (a) rewiring self-loops and multi-edges to keep the graph simple is an expensive part of the algorithm, and (b) every rewiring causes the underlying configuration models to deviate slightly from uniform simple graphs on their corresponding degree sequences.

Curvature Explains Loss of Plasticity. (arXiv:2312.00246v1 [cs.LG])

Authors: Alex Lewandowski, Haruto Tanaka, Dale Schuurmans, Marlos C. Machado

Loss of plasticity is a phenomenon in which neural networks lose their ability to learn from new experience. Despite being empirically observed in several problem settings, little is understood about the mechanisms that lead to loss of plasticity. In this paper, we offer a consistent explanation for plasticity loss, based on an assertion that neural networks lose directions of curvature during training and that plasticity loss can be attributed to this reduction in curvature. To support such a claim, we provide a systematic empirical investigation of plasticity loss across several continual supervised learning problems. Our findings illustrate that curvature loss coincides with and sometimes precedes plasticity loss, while also showing that previous explanations are insufficient to explain loss of plasticity in all settings. Lastly, we show that regularizers which mitigate loss of plasticity also preserve curvature, motivating a simple distributional regularizer that proves to be effective across the problem settings considered.

PyNeRF: Pyramidal Neural Radiance Fields. (arXiv:2312.00252v1 [cs.CV])

Authors: Haithem Turki, Michael Zollhöfer, Christian Richardt, Deva Ramanan

Neural Radiance Fields (NeRFs) can be dramatically accelerated by spatial grid representations. However, they do not explicitly reason about scale and so introduce aliasing artifacts when reconstructing scenes captured at different camera distances. Mip-NeRF and its extensions propose scale-aware renderers that project volumetric frustums rather than point samples but such approaches rely on positional encodings that are not readily compatible with grid methods. We propose a simple modification to grid-based models by training model heads at different spatial grid resolutions. At render time, we simply use coarser grids to render samples that cover larger volumes. Our method can be easily applied to existing accelerated NeRF methods and significantly improves rendering quality (reducing error rates by 20-90% across synthetic and unbounded real-world scenes) while incurring minimal performance overhead (as each model head is quick to evaluate). Compared to Mip-NeRF, we reduce error rates by 20% while training over 60x faster.

Precipitation Nowcasting With Spatial And Temporal Transfer Learning Using Swin-UNETR. (arXiv:2312.00258v1 [physics.ao-ph])

Authors: Ajitabh Kumar

Climate change has led to an increase in frequency of extreme weather events. Early warning systems can prevent disasters and loss of life. Managing such events remain a challenge for both public and private institutions. Precipitation nowcasting can help relevant institutions to better prepare for such events. Numerical weather prediction (NWP) has traditionally been used to make physics based forecasting, and recently deep learning based approaches have been used to reduce turn-around time for nowcasting. In this work, recently proposed Swin-UNETR (Swin UNEt TRansformer) is used for precipitation nowcasting for ten different regions of Europe. Swin-UNETR utilizes a U-shaped network within which a swin transformer-based encoder extracts multi-scale features from multiple input channels of satellite image, while CNN-based decoder makes the prediction. Trained model is capable of nowcasting not only for the regions for which data is available, but can also be used for new regions for which data is not available.

Sample Efficient Reinforcement Learning from Human Feedback via Active Exploration. (arXiv:2312.00267v1 [cs.LG])

Authors: Viraj Mehta, Vikramjeet Das, Ojash Neopane, Yijia Dai, Ilija Bogunovic, Jeff Schneider, Willie Neiswanger

Preference-based feedback is important for many applications in reinforcement learning where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback (RLHF) on large language models. For many applications of RLHF, the cost of acquiring the human feedback can be substantial. In this work, we take advantage of the fact that one can often choose contexts at which to obtain human feedback in order to most efficiently identify a good policy, and formalize this as an offline contextual dueling bandit problem. We give an upper-confidence-bound style algorithm for this problem and prove a polynomial worst-case regret bound. We then provide empirical confirmation in a synthetic setting that our approach outperforms existing methods. After, we extend the setting and methodology for practical use in RLHF training of large language models. Here, our method is able to reach better performance with fewer samples of human preferences than multiple baselines on three real-world datasets.

Academic competitions. (arXiv:2312.00268v1 [cs.LG])

Authors: Hugo Jair Escalante, Aleksandra Kruchinina

Academic challenges comprise effective means for (i) advancing the state of the art, (ii) putting in the spotlight of a scientific community specific topics and problems, as well as (iii) closing the gap for under represented communities in terms of accessing and participating in the shaping of research fields. Competitions can be traced back for centuries and their achievements have had great influence in our modern world. Recently, they (re)gained popularity, with the overwhelming amounts of data that is being generated in different domains, as well as the need of pushing the barriers of existing methods, and available tools to handle such data. This chapter provides a survey of academic challenges in the context of machine learning and related fields. We review the most influential competitions in the last few years and analyze challenges per area of knowledge. The aims of scientific challenges, their goals, major achievements and expectations for the next few years are reviewed.

Towards Clinical Prediction with Transparency: An Explainable AI Approach to Survival Modelling in Residential Aged Care. (arXiv:2312.00271v1 [cs.LG])

Authors: Teo Susnjak, Elise Griffin, Mitchell McCutcheon, Kathleen Potter

Background: Accurate survival time estimates aid end-of-life medical decision-making. Objectives: Develop an interpretable survival model for elderly residential aged care residents using advanced machine learning. Setting: A major Australasian residential aged care provider. Participants: Residents aged 65+ admitted for long-term care from July 2017 to August 2023. Sample size: 11,944 residents across 40 facilities. Predictors: Factors include age, gender, health status, co-morbidities, cognitive function, mood, nutrition, mobility, smoking, sleep, skin integrity, and continence. Outcome: Probability of survival post-admission, specifically calibrated for 6-month survival estimates. Statistical Analysis: Tested CoxPH, EN, RR, Lasso, GB, XGB, and RF models in 20 experiments with a 90/10 train/test split. Evaluated accuracy using C-index, Harrell's C-index, dynamic AUROC, IBS, and calibrated ROC. Chose XGB for its performance and calibrated it for 1, 3, 6, and 12-month predictions using Platt scaling. Employed SHAP values to analyze predictor impacts. Results: GB, XGB, and RF models showed the highest C-Index values (0.714, 0.712, 0.712). The optimal XGB model demonstrated a 6-month survival prediction AUROC of 0.746 (95% CI 0.744-0.749). Key mortality predictors include age, male gender, mobility, health status, pressure ulcer risk, and appetite. Conclusions: The study successfully applies machine learning to create a survival model for aged care, aligning with clinical insights on mortality risk factors and enhancing model interpretability and clinical utility through explainable AI.

Automating Continual Learning. (arXiv:2312.00276v1 [cs.LG])

Authors: Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber

General-purpose learning systems should improve themselves in open-ended fashion in ever-changing environments. Conventional learning algorithms for neural networks, however, suffer from catastrophic forgetting (CF) -- previously acquired skills are forgotten when a new task is learned. Instead of hand-crafting new algorithms for avoiding CF, we propose Automated Continual Learning (ACL) to train self-referential neural networks to meta-learn their own in-context continual (meta-)learning algorithms. ACL encodes all desiderata -- good performance on both old and new tasks -- into its meta-learning objectives. Our experiments demonstrate that ACL effectively solves "in-context catastrophic forgetting"; our ACL-learned algorithms outperform hand-crafted ones, e.g., on the Split-MNIST benchmark in the replay-free setting, and enables continual learning of diverse tasks consisting of multiple few-shot and standard image classification datasets.

Text Attribute Control via Closed-Loop Disentanglement. (arXiv:2312.00277v1 [cs.LG])

Authors: Lei Sha, Thomas Lukasiewicz

Changing an attribute of a text without changing the content usually requires to first disentangle the text into irrelevant attributes and content representations. After that, in the inference phase, the representation of one attribute is tuned to a different value, expecting that the corresponding attribute of the text can also be changed accordingly. The usual way of disentanglement is to add some constraints on the latent space of an encoder-decoder architecture, including adversarial-based constraints and mutual-information-based constraints. However, the previous semi-supervised processes of attribute change are usually not enough to guarantee the success of attribute change and content preservation. In this paper, we propose a novel approach to achieve a robust control of attributes while enhancing content preservation. In this approach, we use a semi-supervised contrastive learning method to encourage the disentanglement of attributes in latent spaces. Differently from previous works, we re-disentangle the reconstructed sentence and compare the re-disentangled latent space with the original latent space, which makes a closed-loop disentanglement process. This also helps content preservation. In addition, the contrastive learning method is also able to replace the role of minimizing mutual information and adversarial training in the disentanglement process, which alleviates the computation cost. We conducted experiments on three text datasets, including the Yelp Service review dataset, the Amazon Product review dataset, and the GoEmotions dataset. The experimental results show the effectiveness of our model.

Age-Based Scheduling for Mobile Edge Computing: A Deep Reinforcement Learning Approach. (arXiv:2312.00279v1 [cs.LG])

Authors: Xingqiu He, Chaoqun You, Tony Q. S. Quek

With the rapid development of Mobile Edge Computing (MEC), various real-time applications have been deployed to benefit people's daily lives. The performance of these applications relies heavily on the freshness of collected environmental information, which can be quantified by its Age of Information (AoI). In the traditional definition of AoI, it is assumed that the status information can be actively sampled and directly used. However, for many MEC-enabled applications, the desired status information is updated in an event-driven manner and necessitates data processing. To better serve these applications, we propose a new definition of AoI and, based on the redefined AoI, we formulate an online AoI minimization problem for MEC systems. Notably, the problem can be interpreted as a Markov Decision Process (MDP), thus enabling its solution through Reinforcement Learning (RL) algorithms. Nevertheless, the traditional RL algorithms are designed for MDPs with completely unknown system dynamics and hence usually suffer long convergence times. To accelerate the learning process, we introduce Post-Decision States (PDSs) to exploit the partial knowledge of the system's dynamics. We also combine PDSs with deep RL to further improve the algorithm's applicability, scalability, and robustness. Numerical results demonstrate that our algorithm outperforms the benchmarks under various scenarios.

Learning to forecast diagnostic parameters using pre-trained weather embedding. (arXiv:2312.00290v1 [cs.LG])

Authors: Peetak P. Mitra, Vivek Ramavajjala

Data-driven weather prediction (DDWP) models are increasingly becoming popular for weather forecasting. However, while operational weather forecasts predict a wide variety of weather variables, DDWPs currently forecast a specific set of key prognostic variables. Non-prognostic ("diagnostic") variables are sometimes modeled separately as dependent variables of the prognostic variables (c.f. FourCastNet), or by including the diagnostic variable as a target in the DDWP. However, the cost of training and deploying bespoke models for each diagnostic variable can increase dramatically with more diagnostic variables, and limit the operational use of such models. Likewise, retraining an entire DDWP each time a new diagnostic variable is added is also cost-prohibitive. We present an two-stage approach that allows new diagnostic variables to be added to an end-to-end DDWP model without the expensive retraining. In the first stage, we train an autoencoder that learns to embed prognostic variables into a latent space. In the second stage, the autoencoder is frozen and "downstream" models are trained to predict diagnostic variables using only the latent representations of prognostic variables as input. Our experiments indicate that models trained using the two-stage approach offer accuracy comparable to training bespoke models, while leading to significant reduction in resource utilization during training and inference. This approach allows for new "downstream" models to be developed as needed, without affecting existing models and thus reducing the friction in operationalizing new models.

Towards Aligned Canonical Correlation Analysis: Preliminary Formulation and Proof-of-Concept Results. (arXiv:2312.00296v1 [cs.LG])

Authors: Biqian Cheng, Evangelos E. Papalexakis, Jia Chen

Canonical Correlation Analysis (CCA) has been widely applied to jointly embed multiple views of data in a maximally correlated latent space. However, the alignment between various data perspectives, which is required by traditional approaches, is unclear in many practical cases. In this work we propose a new framework Aligned Canonical Correlation Analysis (ACCA), to address this challenge by iteratively solving the alignment and multi-view embedding.

Developmental Pretraining (DPT) for Image Classification Networks. (arXiv:2312.00304v1 [cs.LG])

Authors: Niranjan Rajesh, Debayan Gupta

In the backdrop of increasing data requirements of Deep Neural Networks for object recognition that is growing more untenable by the day, we present Developmental PreTraining (DPT) as a possible solution. DPT is designed as a curriculum-based pre-training approach designed to rival traditional pre-training techniques that are data-hungry. These training approaches also introduce unnecessary features that could be misleading when the network is employed in a downstream classification task where the data is sufficiently different from the pre-training data and is scarce. We design the curriculum for DPT by drawing inspiration from human infant visual development. DPT employs a phased approach where carefully-selected primitive and universal features like edges and shapes are taught to the network participating in our pre-training regime. A model that underwent the DPT regime is tested against models with randomised weights to evaluate the viability of DPT.

Multiple Testing of Linear Forms for Noisy Matrix Completion. (arXiv:2312.00305v1 [stat.ME])

Authors: Wanteng Ma, Lilun Du, Dong Xia, Ming Yuan

Many important tasks of large-scale recommender systems can be naturally cast as testing multiple linear forms for noisy matrix completion. These problems, however, present unique challenges because of the subtle bias-and-variance tradeoff of and an intricate dependence among the estimated entries induced by the low-rank structure. In this paper, we develop a general approach to overcome these difficulties by introducing new statistics for individual tests with sharp asymptotics both marginally and jointly, and utilizing them to control the false discovery rate (FDR) via a data splitting and symmetric aggregation scheme. We show that valid FDR control can be achieved with guaranteed power under nearly optimal sample size requirements using the proposed methodology. Extensive numerical simulations and real data examples are also presented to further illustrate its practical merits.

Improving Normalization with the James-Stein Estimator. (arXiv:2312.00313v1 [cs.CV])

Authors: Seyedalireza Khoshsirat, Chandra Kambhamettu

Stein's paradox holds considerable sway in high-dimensional statistics, highlighting that the sample mean, traditionally considered the de facto estimator, might not be the most efficacious in higher dimensions. To address this, the James-Stein estimator proposes an enhancement by steering the sample means toward a more centralized mean vector. In this paper, first, we establish that normalization layers in deep learning use inadmissible estimators for mean and variance. Next, we introduce a novel method to employ the James-Stein estimator to improve the estimation of mean and variance within normalization layers. We evaluate our method on different computer vision tasks: image classification, semantic segmentation, and 3D object classification. Through these evaluations, it is evident that our improved normalization layers consistently yield superior accuracy across all tasks without extra computational burden. Moreover, recognizing that a plethora of shrinkage estimators surpass the traditional estimator in performance, we study two other prominent shrinkage estimators: Ridge and LASSO. Additionally, we provide visual representations to intuitively demonstrate the impact of shrinkage on the estimated layer statistics. Finally, we study the effect of regularization and batch size on our modified batch normalization. The studies show that our method is less sensitive to batch size and regularization, improving accuracy under various setups.

Hypergraph Node Representation Learning with One-Stage Message Passing. (arXiv:2312.00336v1 [cs.LG])

Authors: Shilin Qu, Weiqing Wang, Yuan-Fang Li, Xin Zhou, Fajie Yuan

Hypergraphs as an expressive and general structure have attracted considerable attention from various research domains. Most existing hypergraph node representation learning techniques are based on graph neural networks, and thus adopt the two-stage message passing paradigm (i.e. node -> hyperedge -> node). This paradigm only focuses on local information propagation and does not effectively take into account global information, resulting in less optimal representations. Our theoretical analysis of representative two-stage message passing methods shows that, mathematically, they model different ways of local message passing through hyperedges, and can be unified into one-stage message passing (i.e. node -> node). However, they still only model local information. Motivated by this theoretical analysis, we propose a novel one-stage message passing paradigm to model both global and local information propagation for hypergraphs. We integrate this paradigm into HGraphormer, a Transformer-based framework for hypergraph node representation learning. HGraphormer injects the hypergraph structure information (local information) into Transformers (global information) by combining the attention matrix and hypergraph Laplacian. Extensive experiments demonstrate that HGraphormer outperforms recent hypergraph learning methods on five representative benchmark datasets on the semi-supervised hypernode classification task, setting new state-of-the-art performance, with accuracy improvements between 2.52% and 6.70%. Our code and datasets are available.

Efficient Off-Policy Safe Reinforcement Learning Using Trust Region Conditional Value at Risk. (arXiv:2312.00342v1 [cs.LG])

Authors: Dohyeong Kim, Songhwai Oh

This paper aims to solve a safe reinforcement learning (RL) problem with risk measure-based constraints. As risk measures, such as conditional value at risk (CVaR), focus on the tail distribution of cost signals, constraining risk measures can effectively prevent a failure in the worst case. An on-policy safe RL method, called TRC, deals with a CVaR-constrained RL problem using a trust region method and can generate policies with almost zero constraint violations with high returns. However, to achieve outstanding performance in complex environments and satisfy safety constraints quickly, RL methods are required to be sample efficient. To this end, we propose an off-policy safe RL method with CVaR constraints, called off-policy TRC. If off-policy data from replay buffers is directly used to train TRC, the estimation error caused by the distributional shift results in performance degradation. To resolve this issue, we propose novel surrogate functions, in which the effect of the distributional shift can be reduced, and introduce an adaptive trust-region constraint to ensure a policy not to deviate far from replay buffers. The proposed method has been evaluated in simulation and real-world environments and satisfied safety constraints within a few steps while achieving high returns even in complex robotic tasks.

TRC: Trust Region Conditional Value at Risk for Safe Reinforcement Learning. (arXiv:2312.00344v1 [cs.RO])

Authors: Dohyeong Kim, Songhwai Oh

As safety is of paramount importance in robotics, reinforcement learning that reflects safety, called safe RL, has been studied extensively. In safe RL, we aim to find a policy which maximizes the desired return while satisfying the defined safety constraints. There are various types of constraints, among which constraints on conditional value at risk (CVaR) effectively lower the probability of failures caused by high costs since CVaR is a conditional expectation obtained above a certain percentile. In this paper, we propose a trust region-based safe RL method with CVaR constraints, called TRC. We first derive the upper bound on CVaR and then approximate the upper bound in a differentiable form in a trust region. Using this approximation, a subproblem to get policy gradients is formulated, and policies are trained by iteratively solving the subproblem. TRC is evaluated through safe navigation tasks in simulations with various robots and a sim-to-real environment with a Jackal robot from Clearpath. Compared to other safe RL methods, the performance is improved by 1.93 times while the constraints are satisfied in all experiments.

Quantum Kernel t-Distributed Stochastic Neighbor Embedding. (arXiv:2312.00352v1 [quant-ph])

Authors: Yoshiaki Kawase, Kosuke Mitarai, Keisuke Fujii

Data visualization is important in understanding the characteristics of data that are difficult to see directly. It is used to visualize loss landscapes and optimization trajectories to analyze optimization performance. Popular optimization analysis is performed by visualizing a loss landscape around the reached local or global minimum using principal component analysis. However, this visualization depends on the variational parameters of a quantum circuit rather than quantum states, which makes it difficult to understand the mechanism of optimization process through the property of quantum states. Here, we propose a quantum data visualization method using quantum kernels, which enables us to offer fast and highly accurate visualization of quantum states. In our numerical experiments, we visualize hand-written digits dataset and apply $k$-nearest neighbor algorithm to the low-dimensional data to quantitatively evaluate our proposed method compared with a classical kernel method. As a result, our proposed method achieves comparable accuracy to the state-of-the-art classical kernel method, meaning that the proposed visualization method based on quantum machine learning does not degrade the separability of the input higher dimensional data. Furthermore, we visualize the optimization trajectories of finding the ground states of transverse field Ising model and successfully find the trajectory characteristics. Since quantum states are higher dimensional objects that can only be seen via observables, our visualization method, which inherits the similarity of quantum data, would be useful in understanding the behavior of quantum circuits and algorithms.

Transfer learning for predicting source terms of principal component transport in chemically reactive flow. (arXiv:2312.00356v1 [physics.chem-ph])

Authors: Ki Sung Jung, Tarek Echekki, Jacqueline H. Chen, Mohammad Khalil

The objective of this study is to evaluate whether the number of requisite training samples can be reduced with the use of various transfer learning models for predicting, for example, the chemical source terms of the data-driven reduced-order model that represents the homogeneous ignition process of a hydrogen/air mixture. Principal component analysis is applied to reduce the dimensionality of the hydrogen/air mixture in composition space. Artificial neural networks (ANNs) are used to tabulate the reaction rates of principal components, and subsequently, a system of ordinary differential equations is solved. As the number of training samples decreases at the target task (i.e.,for T0 > 1000 K and various phi), the reduced-order model fails to predict the ignition evolution of a hydrogen/air mixture. Three transfer learning strategies are then applied to the training of the ANN model with a sparse dataset. The performance of the reduced-order model with a sparse dataset is found to be remarkably enhanced if the training of the ANN model is restricted by a regularization term that controls the degree of knowledge transfer from source to target tasks. To this end, a novel transfer learning method is introduced, parameter control via partial initialization and regularization (PaPIR), whereby the amount of knowledge transferred is systemically adjusted for the initialization and regularization of the ANN model in the target task. It is found that an additional performance gain can be achieved by changing the initialization scheme of the ANN model in the target task when the task similarity between source and target tasks is relatively low.

A Generalizable Deep Learning System for Cardiac MRI. (arXiv:2312.00357v1 [eess.IV])

Authors: Rohan Shad, Cyril Zakka, Dhamanpreet Kaur, Robyn Fong, Ross Warren Filice, John Mongan, Kimberly Kalianos, Nishith Khandwala, David Eng, Matthew Leipzig, Walter Witschey, Alejandro de Feria, Victor Ferrari, Euan Ashley, Michael A. Acker, Curtis Langlotz, William Hiesinger

Cardiac MRI allows for a comprehensive assessment of myocardial structure, function, and tissue characteristics. Here we describe a foundational vision system for cardiac MRI, capable of representing the breadth of human cardiovascular disease and health. Our deep learning model is trained via self-supervised contrastive learning, by which visual concepts in cine-sequence cardiac MRI scans are learned from the raw text of the accompanying radiology reports. We train and evaluate our model on data from four large academic clinical institutions in the United States. We additionally showcase the performance of our models on the UK BioBank, and two additional publicly available external datasets. We explore emergent zero-shot capabilities of our system, and demonstrate remarkable performance across a range of tasks; including the problem of left ventricular ejection fraction regression, and the diagnosis of 35 different conditions such as cardiac amyloidosis and hypertrophic cardiomyopathy. We show that our deep learning system is capable of not only understanding the staggering complexity of human cardiovascular disease, but can be directed towards clinical problems of interest yielding impressive, clinical grade diagnostic accuracy with a fraction of the training data typically required for such tasks.

Impact of Data Augmentation on QCNNs. (arXiv:2312.00358v1 [quant-ph])

Authors: Leting Zhouli, Peiyong Wang, Udaya Parampalli

In recent years, Classical Convolutional Neural Networks (CNNs) have been applied for image recognition successfully. Quantum Convolutional Neural Networks (QCNNs) are proposed as a novel generalization to CNNs by using quantum mechanisms. The quantum mechanisms lead to an efficient training process in QCNNs by reducing the size of input from $N$ to $log_2N$. This paper implements and compares both CNNs and QCNNs by testing losses and prediction accuracy on three commonly used datasets. The datasets include the MNIST hand-written digits, Fashion MNIST and cat/dog face images. Additionally, data augmentation (DA), a technique commonly used in CNNs to improve the performance of classification by generating similar images based on original inputs, is also implemented in QCNNs. Surprisingly, the results showed that data augmentation didn't improve QCNNs performance. The reasons and logic behind this result are discussed, hoping to expand our understanding of Quantum machine learning theory.

Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training. (arXiv:2312.00359v1 [cs.LG])

Authors: Yefan Zhou, Tianyu Pang, Keqin Liu, Charles H. Martin, Michael W. Mahoney, Yaoqing Yang

Regularization in modern machine learning is crucial, and it can take various forms in algorithmic design: training set, model family, error function, regularization terms, and optimizations. In particular, the learning rate, which can be interpreted as a temperature-like parameter within the statistical mechanics of learning, plays a crucial role in neural network training. Indeed, many widely adopted training strategies basically just define the decay of the learning rate over time. This process can be interpreted as decreasing a temperature, using either a global learning rate (for the entire model) or a learning rate that varies for each parameter. This paper proposes TempBalance, a straightforward yet effective layer-wise learning rate method. TempBalance is based on Heavy-Tailed Self-Regularization (HT-SR) Theory, an approach which characterizes the implicit self-regularization of different layers in trained models. We demonstrate the efficacy of using HT-SR-motivated metrics to guide the scheduling and balancing of temperature across all network layers during model training, resulting in improved performance during testing. We implement TempBalance on CIFAR10, CIFAR100, SVHN, and TinyImageNet datasets using ResNets, VGGs, and WideResNets with various depths and widths. Our results show that TempBalance significantly outperforms ordinary SGD and carefully-tuned spectral norm regularization. We also show that TempBalance outperforms a number of state-of-the-art optimizers and learning rate schedulers.

Dancing with Images: Video Distillation via Static-Dynamic Disentanglement. (arXiv:2312.00362v1 [cs.CV])

Authors: Ziyu Wang, Yue Xu, Cewu Lu, Yong-Lu Li

Recently, dataset distillation has paved the way towards efficient machine learning, especially for image datasets. However, the distillation for videos, characterized by an exclusive temporal dimension, remains an underexplored domain. In this work, we provide the first systematic study of video distillation and introduce a taxonomy to categorize temporal compression. Our investigation reveals that the temporal information is usually not well learned during distillation , and the temporal dimension of synthetic data contributes little. The observations motivate our unified framework of disentangling the dynamic and static information in the videos. It first distills the videos into still images as static memory and then compensates the dynamic and motion information with a learnable dynamic memory block. Our method achieves state-of-the-art on video datasets at different scales, with notably smaller storage expenditure. Our code will be publicly available.

Benchmarking Multi-Domain Active Learning on Image Classification. (arXiv:2312.00364v1 [cs.LG])

Authors: Jiayi Li, Rohan Taori, Tatsunori B. Hashimoto

Active learning aims to enhance model performance by strategically labeling informative data points. While extensively studied, its effectiveness on large-scale, real-world datasets remains underexplored. Existing research primarily focuses on single-source data, ignoring the multi-domain nature of real-world data. We introduce a multi-domain active learning benchmark to bridge this gap. Our benchmark demonstrates that traditional single-domain active learning strategies are often less effective than random selection in multi-domain scenarios. We also introduce CLIP-GeoYFCC, a novel large-scale image dataset built around geographical domains, in contrast to existing genre-based domain datasets. Analysis on our benchmark shows that all multi-domain strategies exhibit significant tradeoffs, with no strategy outperforming across all datasets or all metrics, emphasizing the need for future research.

Streaming Bayesian Modeling for predicting Fat-Tailed Customer Lifetime Value. (arXiv:2312.00373v1 [cs.LG])

Authors: Alexey V. Calabourdin, Konstantin A. Aksenov

We develop an online learning MCMC approach applicable for hierarchical bayesian models and GLMS. We also develop a fat-tailed LTV model that generalizes over several kinds of fat and thin tails. We demonstrate both developments on commercial LTV data from a large mobile app.

Optimal Sample Complexity of Contrastive Learning. (arXiv:2312.00379v1 [cs.LG])

Authors: Noga Alon, Dmitrii Avdiukhin, Dor Elboim, Orr Fischer, Grigory Yaroslavtsev

Contrastive learning is a highly successful technique for learning representations of data from labeled tuples, specifying the distance relations within the tuple. We study the sample complexity of contrastive learning, i.e. the minimum number of labeled tuples sufficient for getting high generalization accuracy. We give tight bounds on the sample complexity in a variety of settings, focusing on arbitrary distance functions, both general $\ell_p$-distances, and tree metrics. Our main result is an (almost) optimal bound on the sample complexity of learning $\ell_p$-distances for integer $p$. For any $p \ge 1$ we show that $\tilde \Theta(\min(nd,n^2))$ labeled tuples are necessary and sufficient for learning $d$-dimensional representations of $n$-point datasets. Our results hold for an arbitrary distribution of the input samples and are based on giving the corresponding bounds on the Vapnik-Chervonenkis/Natarajan dimension of the associated problems. We further show that the theoretical bounds on sample complexity obtained via VC/Natarajan dimension can have strong predictive power for experimental results, in contrast with the folklore belief about a substantial gap between the statistical learning theory and the practice of deep learning.

Local monotone operator learning using non-monotone operators: MnM-MOL. (arXiv:2312.00386v1 [eess.IV])

Authors: Maneesh John, Jyothi Rikhab Chand, Mathews Jacob

The recovery of magnetic resonance (MR) images from undersampled measurements is a key problem that has seen extensive research in recent years. Unrolled approaches, which rely on end-to-end training of convolutional neural network (CNN) blocks within iterative reconstruction algorithms, offer state-of-the-art performance. These algorithms require a large amount of memory during training, making them difficult to employ in high-dimensional applications. Deep equilibrium (DEQ) models and the recent monotone operator learning (MOL) approach were introduced to eliminate the need for unrolling, thus reducing the memory demand during training. Both approaches require a Lipschitz constraint on the network to ensure that the forward and backpropagation iterations converge. Unfortunately, the constraint often results in reduced performance compared to unrolled methods. The main focus of this work is to relax the constraint on the CNN block in two different ways. Inspired by convex-non-convex regularization strategies, we now impose the monotone constraint on the sum of the gradient of the data term and the CNN block, rather than constrain the CNN itself to be a monotone operator. This approach enables the CNN to learn possibly non-monotone score functions, which can translate to improved performance. In addition, we only restrict the operator to be monotone in a local neighborhood around the image manifold. Our theoretical results show that the proposed algorithm is guaranteed to converge to the fixed point and that the solution is robust to input perturbations, provided that it is initialized close to the true solution. Our empirical results show that the relaxed constraints translate to improved performance and that the approach enjoys robustness to input perturbations similar to MOL.

LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices. (arXiv:2312.00388v1 [cs.LG])

Authors: Junchen Zhao, Yurun Song, Simeng Liu, Ian G. Harris, Sangeetha Abdu Jyothi

Deploying Large Language Models (LLMs) locally on mobile devices presents a significant challenge due to their extensive memory requirements. In this paper, we introduce LinguaLinked, a system for decentralized, distributed LLM inference on mobile devices. LinguaLinked enables collaborative execution of the inference task across multiple trusted devices. LinguaLinked ensures data privacy by processing information locally. LinguaLinked uses three key strategies. First, an optimized model assignment technique segments LLMs and uses linear optimization to align segments with each device's capabilities. Second, an optimized data transmission mechanism ensures efficient and structured data flow between model segments while also maintaining the integrity of the original model structure. Finally, LinguaLinked incorporates a runtime load balancer that actively monitors and redistributes tasks among mobile devices to prevent bottlenecks, enhancing the system's overall efficiency and responsiveness. We demonstrate that LinguaLinked facilitates efficient LLM inference while maintaining consistent throughput and minimal latency through extensive testing across various mobile devices, from high-end to low-end Android devices. In our evaluations, compared to the baseline, LinguaLinked achieves an inference performance acceleration of $1.11\times$ to $1.61\times$ in single-threaded settings, $1.73\times$ to $2.65\times$ with multi-threading. Additionally, runtime load balancing yields an overall inference acceleration of $1.29\times$ to $1.32\times$.

GFN-SR: Symbolic Regression with Generative Flow Networks. (arXiv:2312.00396v1 [cs.LG])

Authors: Sida Li, Ioana Marinescu, Sebastian Musslick

Symbolic regression (SR) is an area of interpretable machine learning that aims to identify mathematical expressions, often composed of simple functions, that best fit in a given set of covariates $X$ and response $y$. In recent years, deep symbolic regression (DSR) has emerged as a popular method in the field by leveraging deep reinforcement learning to solve the complicated combinatorial search problem. In this work, we propose an alternative framework (GFN-SR) to approach SR with deep learning. We model the construction of an expression tree as traversing through a directed acyclic graph (DAG) so that GFlowNet can learn a stochastic policy to generate such trees sequentially. Enhanced with an adaptive reward baseline, our method is capable of generating a diverse set of best-fitting expressions. Notably, we observe that GFN-SR outperforms other SR algorithms in noisy data regimes, owing to its ability to learn a distribution of rewards over a space of candidate solutions.

A Causality-Aware Pattern Mining Scheme for Group Activity Recognition in a Pervasive Sensor Space. (arXiv:2312.00404v1 [cs.LG])

Authors: Hyunju Kim, Heesuk Son, Dongman Lee

Human activity recognition (HAR) is a key challenge in pervasive computing and its solutions have been presented based on various disciplines. Specifically, for HAR in a smart space without privacy and accessibility issues, data streams generated by deployed pervasive sensors are leveraged. In this paper, we focus on a group activity by which a group of users perform a collaborative task without user identification and propose an efficient group activity recognition scheme which extracts causality patterns from pervasive sensor event sequences generated by a group of users to support as good recognition accuracy as the state-of-the-art graphical model. To filter out irrelevant noise events from a given data stream, a set of rules is leveraged to highlight causally related events. Then, a pattern-tree algorithm extracts frequent causal patterns by means of a growing tree structure. Based on the extracted patterns, a weighted sum-based pattern matching algorithm computes the likelihoods of stored group activities to the given test event sequence by means of matched event pattern counts for group activity recognition. We evaluate the proposed scheme using the data collected from our testbed and CASAS datasets where users perform their tasks on a daily basis and validate its effectiveness in a real environment. Experiment results show that the proposed scheme performs higher recognition accuracy and with a small amount of runtime overhead than the existing schemes.

A framework for mining lifestyle profiles through multi-dimensional and high-order mobility feature clustering. (arXiv:2312.00411v1 [cs.LG])

Authors: Yeshuo Shu, Gangcheng Zhang, Keyi Liu, Jintong Tang, Liyan Xu

Human mobility demonstrates a high degree of regularity, which facilitates the discovery of lifestyle profiles. Existing research has yet to fully utilize the regularities embedded in high-order features extracted from human mobility records in such profiling. This study proposes a progressive feature extraction strategy that mines high-order mobility features from users' moving trajectory records from the spatial, temporal, and semantic dimensions. Specific features are extracted such as travel motifs, rhythms decomposed by discrete Fourier transform (DFT) of mobility time series, and vectorized place semantics by word2vec, respectively to the three dimensions, and they are further clustered to reveal the users' lifestyle characteristics. An experiment using a trajectory dataset of over 500k users in Shenzhen, China yields seven user clusters with different lifestyle profiles that can be well interpreted by common sense. The results suggest the possibility of fine-grained user profiling through cross-order trajectory feature engineering and clustering.

From Mutual Information to Expected Dynamics: New Generalization Bounds for Heavy-Tailed SGD. (arXiv:2312.00427v1 [stat.ML])

Authors: Benjamin Dupuis, Paul Viallard

Understanding the generalization abilities of modern machine learning algorithms has been a major research topic over the past decades. In recent years, the learning dynamics of Stochastic Gradient Descent (SGD) have been related to heavy-tailed dynamics. This has been successfully applied to generalization theory by exploiting the fractal properties of those dynamics. However, the derived bounds depend on mutual information (decoupling) terms that are beyond the reach of computability. In this work, we prove generalization bounds over the trajectory of a class of heavy-tailed dynamics, without those mutual information terms. Instead, we introduce a geometric decoupling term by comparing the learning dynamics (depending on the empirical risk) with an expected one (depending on the population risk). We further upper-bound this geometric term, by using techniques from the heavy-tailed and the fractal literature, making it fully computable. Moreover, as an attempt to tighten the bounds, we propose a PAC-Bayesian setting based on perturbed dynamics, in which the same geometric term plays a crucial role and can still be bounded using the techniques described above.

PEFTDebias : Capturing debiasing information using PEFTs. (arXiv:2312.00434v1 [cs.LG])

Authors: Sumit Agarwal, Aditya Srikanth Veerubhotla, Srijan Bansal

The increasing use of foundation models highlights the urgent need to address and eliminate implicit biases present in them that arise during pretraining. In this paper, we introduce PEFTDebias, a novel approach that employs parameter-efficient fine-tuning (PEFT) to mitigate the biases within foundation models. PEFTDebias consists of two main phases: an upstream phase for acquiring debiasing parameters along a specific bias axis, and a downstream phase where these parameters are incorporated into the model and frozen during the fine-tuning process. By evaluating on four datasets across two bias axes namely gender and race, we find that downstream biases can be effectively reduced with PEFTs. In addition, we show that these parameters possess axis-specific debiasing characteristics, enabling their effective transferability in mitigating biases in various downstream tasks. To ensure reproducibility, we release the code to do our experiments.

An Encoding Framework for Binarized Images using HyperDimensional Computing. (arXiv:2312.00454v1 [cs.CV])

Authors: Laura Smets, Werner Van Leekwijck, Ing Jyh Tsang, Steven Latré

Hyperdimensional Computing (HDC) is a brain-inspired and light-weight machine learning method. It has received significant attention in the literature as a candidate to be applied in the wearable internet of things, near-sensor artificial intelligence applications and on-device processing. HDC is computationally less complex than traditional deep learning algorithms and typically achieves moderate to good classification performance. A key aspect that determines the performance of HDC is the encoding of the input data to the hyperdimensional (HD) space. This article proposes a novel light-weight approach relying only on native HD arithmetic vector operations to encode binarized images that preserves similarity of patterns at nearby locations by using point of interest selection and local linear mapping. The method reaches an accuracy of 97.35% on the test set for the MNIST data set and 84.12% for the Fashion-MNIST data set. These results outperform other studies using baseline HDC with different encoding approaches and are on par with more complex hybrid HDC models. The proposed encoding approach also demonstrates a higher robustness to noise and blur compared to the baseline encoding.

Meta-Diversity Search in Complex Systems, A Recipe for Artificial Open-Endedness ?. (arXiv:2312.00455v1 [cs.AI])

Authors: Mayalen Etcheverry (Flowers), Bert Wang-Chak Chan, Clément Moulin-Frier (Flowers), Pierre-Yves Oudeyer (Flowers)

Can we build an artificial system that would be able to generate endless surprises if ran "forever" in Minecraft? While there is not a single path toward solving that grand challenge, this article presents what we believe to be some working ingredients for the endless generation of novel increasingly complex artifacts in Minecraft. Our framework for an open-ended system includes two components: a complex system used to recursively grow and complexify artifacts over time, and a discovery algorithm that leverages the concept of meta-diversity search. Since complex systems have shown to enable the emergence of considerable complexity from set of simple rules, we believe them to be great candidates to generate all sort of artifacts in Minecraft. Yet, the space of possible artifacts that can be generated by these systems is often unknown, challenging to characterize and explore. Therefore automating the long-term discovery of novel and increasingly complex artifacts in these systems is an exciting research field. To approach these challenges, we formulate the problem of meta-diversity search where an artificial "discovery assistant" incrementally learns a diverse set of representations to characterize behaviors and searches to discover diverse patterns within each of them. A successful discovery assistant should continuously seek for novel sources of diversities while being able to quickly specialize the search toward a new unknown type of diversity. To implement those ideas in the Minecraft environment, we simulate an artificial "chemistry" system based on Lenia continuous cellular automaton for generating artifacts, as well as an artificial "discovery assistant" (called Holmes) for the artifact-discovery process. Holmes incrementally learns a hierarchy of modular representations to characterize divergent sources of diversity and uses a goal-based intrinsically-motivated exploration as the diversity search strategy.

Auto-encoding GPS data to reveal individual and collective behaviour. (arXiv:2312.00456v1 [stat.AP])

Authors: Saint-Clair Chabert-Liddell, Nicolas Bez, Pierre Gloaguen, Sophie Donnet, Stéphanie Mahévas

We propose an innovative and generic methodology to analyse individual and collective behaviour through individual trajectory data. The work is motivated by the analysis of GPS trajectories of fishing vessels collected from regulatory tracking data in the context of marine biodiversity conservation and ecosystem-based fisheries management. We build a low-dimensional latent representation of trajectories using convolutional neural networks as non-linear mapping. This is done by training a conditional variational auto-encoder taking into account covariates. The posterior distributions of the latent representations can be linked to the characteristics of the actual trajectories. The latent distributions of the trajectories are compared with the Bhattacharyya coefficient, which is well-suited for comparing distributions. Using this coefficient, we analyse the variation of the individual behaviour of each vessel during time. For collective behaviour analysis, we build proximity graphs and use an extension of the stochastic block model for multiple networks. This model results in a clustering of the individuals based on their set of trajectories. The application to French fishing vessels enables us to obtain groups of vessels whose individual and collective behaviours exhibit spatio-temporal patterns over the period 2014-2018.

A Bayesian approach for prompt optimization in pre-trained language models. (arXiv:2312.00471v1 [cs.LG])

Authors: Antonio Sabbatella, Andrea Ponti, Antonio Candelieri, Ilaria Giordani, Francesco Archetti

A prompt is a sequence of symbol or tokens, selected from a vocabulary according to some rule, which is prepended/concatenated to a textual query. A key problem is how to select the sequence of tokens: in this paper we formulate it as a combinatorial optimization problem. The high dimensionality of the token space com-pounded by the length of the prompt sequence requires a very efficient solution. In this paper we propose a Bayesian optimization method, executed in a continuous em-bedding of the combinatorial space. In this paper we focus on hard prompt tuning (HPT) which directly searches for discrete tokens to be added to the text input with-out requiring access to the large language model (LLM) and can be used also when LLM is available only as a black-box. This is critically important if LLMs are made available in the Model as a Service (MaaS) manner as in GPT-4. The current manu-script is focused on the optimization of discrete prompts for classification tasks. The discrete prompts give rise to difficult combinatorial optimization problem which easily become intractable given the dimension of the token space in realistic applications. The optimization method considered in this paper is Bayesian optimization (BO) which has become the dominant approach in black-box optimization for its sample efficiency along with its modular structure and versatility. In this paper we use BoTorch, a library for Bayesian optimization research built on top of pyTorch. Albeit preliminary and obtained using a 'vanilla' version of BO, the experiments on RoB-ERTa on six benchmarks, show a good performance across a variety of tasks and enable an analysis of the tradeoff between size of the search space, accuracy and wall clock time.

Interpretable Meta-Learning of Physical Systems. (arXiv:2312.00477v1 [cs.LG])

Authors: Matthieu Blanke, Marc Lelarge

Machine learning methods can be a valuable aid in the scientific process, but they need to face challenging settings where data come from inhomogeneous experimental conditions. Recent meta-learning methods have made significant progress in multi-task learning, but they rely on black-box neural networks, resulting in high computational costs and limited interpretability. Leveraging the structure of the learning problem, we argue that multi-environment generalization can be achieved using a simpler learning model, with an affine structure with respect to the learning task. Crucially, we prove that this architecture can identify the physical parameters of the system, enabling interpreable learning. We demonstrate the competitive generalization performance and the low computational cost of our method by comparing it to state-of-the-art algorithms on physical systems, ranging from toy models to complex, non-analytical systems. The interpretability of our method is illustrated with original applications to physical-parameter-induced adaptation and to adaptive control.

MultiView Independent Component Analysis with Delays. (arXiv:2312.00484v1 [cs.LG])

Authors: Ambroise Heurtebise, Pierre Ablin, Alexandre Gramfort

Linear Independent Component Analysis (ICA) is a blind source separation technique that has been used in various domains to identify independent latent sources from observed signals. In order to obtain a higher signal-to-noise ratio, the presence of multiple views of the same sources can be used. In this work, we present MultiView Independent Component Analysis with Delays (MVICAD). This algorithm builds on the MultiView ICA model by allowing sources to be delayed versions of some shared sources: sources are shared across views up to some unknown latencies that are view- and source-specific. Using simulations, we demonstrate that MVICAD leads to better unmixing of the sources. Moreover, as ICA is often used in neuroscience, we show that latencies are age-related when applied to Cam-CAN, a large-scale magnetoencephalography (MEG) dataset. These results demonstrate that the MVICAD model can reveal rich effects on neural signals without human supervision.

Backbone-based Dynamic Graph Spatio-Temporal Network for Epidemic Forecasting. (arXiv:2312.00485v1 [cs.LG])

Authors: Junkai Mao, Yuexing Han, Gouhei Tanaka, Bing Wang

Accurate epidemic forecasting is a critical task in controlling disease transmission. Many deep learning-based models focus only on static or dynamic graphs when constructing spatial information, ignoring their relationship. Additionally, these models often rely on recurrent structures, which can lead to error accumulation and computational time consumption. To address the aforementioned problems, we propose a novel model called Backbone-based Dynamic Graph Spatio-Temporal Network (BDGSTN). Intuitively, the continuous and smooth changes in graph structure, make adjacent graph structures share a basic pattern. To capture this property, we use adaptive methods to generate static backbone graphs containing the primary information and temporal models to generate dynamic temporal graphs of epidemic data, fusing them to generate a backbone-based dynamic graph. To overcome potential limitations associated with recurrent structures, we introduce a linear model DLinear to handle temporal dependencies and combine it with dynamic graph convolution for epidemic forecasting. Extensive experiments on two datasets demonstrate that BDGSTN outperforms baseline models and ablation comparison further verifies the effectiveness of model components. Furthermore, we analyze and measure the significance of backbone and temporal graphs by using information metrics from different aspects. Finally, we compare model parameter volume and training time to confirm the superior complexity and efficiency of BDGSTN.

REDUCR: Robust Data Downsampling Using Class Priority Reweighting. (arXiv:2312.00486v1 [cs.LG])

Authors: William Bankes, George Hughes, Ilija Bogunovic, Zi Wang

Modern machine learning models are becoming increasingly expensive to train for real-world image and text classification tasks, where massive web-scale data is collected in a streaming fashion. To reduce the training cost, online batch selection techniques have been developed to choose the most informative datapoints. However, these techniques can suffer from poor worst-class generalization performance due to class imbalance and distributional shifts. This work introduces REDUCR, a robust and efficient data downsampling method that uses class priority reweighting. REDUCR reduces the training data while preserving worst-class generalization performance. REDUCR assigns priority weights to datapoints in a class-aware manner using an online learning algorithm. We demonstrate the data efficiency and robust performance of REDUCR on vision and text classification tasks. On web-scraped datasets with imbalanced class distributions, REDUCR significantly improves worst-class test accuracy (and average accuracy), surpassing state-of-the-art methods by around 15%.

On the Out-Of-Distribution Robustness of Self-Supervised Representation Learning for Phonocardiogram Signals. (arXiv:2312.00502v1 [cs.LG])

Authors: Aristotelis Ballas, Vasileios Papapanagiotou, Christos Diou

Objective: Despite the recent increase in research activity, deep-learning models have not yet been widely accepted in medicine. The shortage of high-quality annotated data often hinders the development of robust and generalizable models, which do not suffer from degraded effectiveness when presented with newly-collected, out-of-distribution (OOD) datasets. Methods: Contrastive Self-Supervised Learning (SSL) offers a potential solution to the scarcity of labeled data as it takes advantage of unlabeled data to increase model effectiveness and robustness. In this research, we propose applying contrastive SSL for detecting abnormalities in phonocardiogram (PCG) samples by learning a generalized representation of the signal. Specifically, we perform an extensive comparative evaluation of a wide range of audio-based augmentations and evaluate trained classifiers on multiple datasets across different downstream tasks. Results: We experimentally demonstrate that, depending on its training distribution, the effectiveness of a fully-supervised model can degrade up to 32% when evaluated on unseen data, while SSL models only lose up to 10% or even improve in some cases. Conclusions: Contrastive SSL pretraining can assist in providing robust classifiers which can generalize to unseen, OOD data, without relying on time- and labor-intensive annotation processes by medical experts. Furthermore, the proposed extensive evaluation protocol sheds light on the most promising and appropriate augmentations for robust PCG signal processing. Significance: We provide researchers and practitioners with a roadmap towards producing robust models for PCG classification, in addition to an open-source codebase for developing novel approaches.

VEXIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity. (arXiv:2312.00507v1 [cs.PL])

Authors: S. VenkataKeerthy, Yashas Andaluri, Sayan Dey, Soumya Banerjee, Ramakrishna Upadrasta

We propose VEXIR2Vec, a code embedding framework for finding similar functions in binaries. Our representations rely on VEX IR, the intermediate representation used by binary analysis tools like Valgrind and angr. Our proposed embeddings encode both syntactic and semantic information to represent a function, and is both application and architecture independent. We also propose POV, a custom Peephole Optimization engine that normalizes the VEX IR for effective similarity analysis. We design several optimizations like copy/constant propagation, constant folding, common subexpression elimination and load-store elimination in POV.

We evaluate our framework on two experiments -- diffing and searching -- involving binaries targeting different architectures, compiled using different compilers and versions, optimization sequences, and obfuscations. We show results on several standard projects and on real-world vulnerabilities. Our results show that VEXIR2Vec achieves superior precision and recall values compared to the state-of-the-art works. Our framework is highly scalable and is built as a multi-threaded, parallel library by only using open-source tools. VEXIR2Vec achieves about $3.2 \times$ speedup on the closest competitor, and orders-of-magnitude speedup on other tools.

Bayesian causal discovery from unknown general interventions. (arXiv:2312.00509v1 [stat.ME])

Authors: Alessandro Mascaro, Federico Castelletti

We consider the problem of learning causal Directed Acyclic Graphs (DAGs) using combinations of observational and interventional experimental data. Current methods tailored to this setting assume that interventions either destroy parent-child relations of the intervened (target) nodes or only alter such relations without modifying the parent sets, even when the intervention targets are unknown. We relax this assumption by proposing a Bayesian method for causal discovery from general interventions, which allow for modifications of the parent sets of the unknown targets. Even in this framework, DAGs and general interventions may be identifiable only up to some equivalence classes. We provide graphical characterizations of such interventional Markov equivalence and devise compatible priors for Bayesian inference that guarantee score equivalence of indistinguishable structures. We then develop a Markov Chain Monte Carlo (MCMC) scheme to approximate the posterior distribution over DAGs, intervention targets and induced parent sets. Finally, we evaluate the proposed methodology on both simulated and real protein expression data.

Spatio-Temporal-Decoupled Masked Pre-training for Traffic Forecasting. (arXiv:2312.00516v1 [cs.LG])

Authors: Haotian Gao, Renhe Jiang, Zheng Dong, Jinliang Deng, Xuan Song

Accurate forecasting of multivariate traffic flow time series remains challenging due to substantial spatio-temporal heterogeneity and complex long-range correlative patterns. To address this, we propose Spatio-Temporal-Decoupled Masked Pre-training (STD-MAE), a novel framework that employs masked autoencoders to learn and encode complex spatio-temporal dependencies via pre-training. Specifically, we use two decoupled masked autoencoders to reconstruct the traffic data along spatial and temporal axes using a self-supervised pre-training approach. These mask reconstruction mechanisms capture the long-range correlations in space and time separately. The learned hidden representations are then used to augment the downstream spatio-temporal traffic predictor. A series of quantitative and qualitative evaluations on four widely-used traffic benchmarks (PEMS03, PEMS04, PEMS07, and PEMS08) are conducted to verify the state-of-the-art performance, with STD-MAE explicitly enhancing the downstream spatio-temporal models' ability to capture long-range intricate spatial and temporal patterns. Codes are available at https://github.com/Jimmy-7664/STD_MAE.

RIS-Based On-the-Air Semantic Communications -- a Diffractional Deep Neural Network Approach. (arXiv:2312.00535v1 [eess.SP])

Authors: Shuyi Chen, Yingzhe Hui, Yifan Qin, Yueyi Yuan, Weixiao Meng, Xuewen Luo, Hsiao-Hwa Chen

Semantic communication has gained significant attention recently due to its advantages in achieving higher transmission efficiency by focusing on semantic information instead of bit-level information. However, current AI-based semantic communication methods require digital hardware for implementation. With the rapid advancement on reconfigurable intelligence surfaces (RISs), a new approach called on-the-air diffractional deep neural networks (D$^2$NN) can be utilized to enable semantic communications on the wave domain. This paper proposes a new paradigm of RIS-based on-the-air semantic communications, where the computational process occurs inherently as wireless signals pass through RISs. We present the system model and discuss the data and control flows of this scheme, followed by a performance analysis using image transmission as an example. In comparison to traditional hardware-based approaches, RIS-based semantic communications offer appealing features, such as light-speed computation, low computational power requirements, and the ability to handle multiple tasks simultaneously.

A Preconditioned Interior Point Method for Support Vector Machines Using an ANOVA-Decomposition and NFFT-Based Matrix-Vector Products. (arXiv:2312.00538v1 [math.NA])

Authors: Theresa Wagner, John W. Pearson, Martin Stoll

In this paper we consider the numerical solution to the soft-margin support vector machine optimization problem. This problem is typically solved using the SMO algorithm, given the high computational complexity of traditional optimization algorithms when dealing with large-scale kernel matrices. In this work, we propose employing an NFFT-accelerated matrix-vector product using an ANOVA decomposition for the feature space that is used within an interior point method for the overall optimization problem. As this method requires the solution of a linear system of saddle point form we suggest a preconditioning approach that is based on low-rank approximations of the kernel matrix together with a Krylov subspace solver. We compare the accuracy of the ANOVA-based kernel with the default LIBSVM implementation. We investigate the performance of the different preconditioners as well as the accuracy of the ANOVA kernel on several large-scale datasets.

Target-agnostic Source-free Domain Adaptation for Regression Tasks. (arXiv:2312.00540v1 [cs.LG])

Authors: Tianlang He, Zhiqiu Xia, Jierun Chen, Haoliang Li, S.-H. Gary Chan

Unsupervised domain adaptation (UDA) seeks to bridge the domain gap between the target and source using unlabeled target data. Source-free UDA removes the requirement for labeled source data at the target to preserve data privacy and storage. However, work on source-free UDA assumes knowledge of domain gap distribution, and hence is limited to either target-aware or classification task. To overcome it, we propose TASFAR, a novel target-agnostic source-free domain adaptation approach for regression tasks. Using prediction confidence, TASFAR estimates a label density map as the target label distribution, which is then used to calibrate the source model on the target domain. We have conducted extensive experiments on four regression tasks with various domain gaps, namely, pedestrian dead reckoning for different users, image-based people counting in different scenes, housing-price prediction at different districts, and taxi-trip duration prediction from different departure points. TASFAR is shown to substantially outperform the state-of-the-art source-free UDA approaches by averagely reducing 22% errors for the four tasks and achieve notably comparable accuracy as source-based UDA without using source data.

Domain Adaptive Imitation Learning with Visual Observation. (arXiv:2312.00548v1 [cs.LG])

Authors: Sungho Choi, Seungyul Han, Woojun Kim, Jongseong Chae, Whiyoung Jung, Youngchul Sung

In this paper, we consider domain-adaptive imitation learning with visual observation, where an agent in a target domain learns to perform a task by observing expert demonstrations in a source domain. Domain adaptive imitation learning arises in practical scenarios where a robot, receiving visual sensory data, needs to mimic movements by visually observing other robots from different angles or observing robots of different shapes. To overcome the domain shift in cross-domain imitation learning with visual observation, we propose a novel framework for extracting domain-independent behavioral features from input observations that can be used to train the learner, based on dual feature extraction and image reconstruction. Empirical results demonstrate that our approach outperforms previous algorithms for imitation learning from visual observation with domain shift.

Interior Point Constrained Reinforcement Learning with Global Convergence Guarantees. (arXiv:2312.00561v1 [cs.LG])

Authors: Tingting Ni, Maryam Kamgarpour

We consider discounted infinite horizon constrained Markov decision processes (CMDPs) where the goal is to find an optimal policy that maximizes the expected cumulative reward subject to expected cumulative constraints. Motivated by the application of CMDPs in online learning of safety-critical systems, we focus on developing an algorithm that ensures constraint satisfaction during learning. To this end, we develop a zeroth-order interior point approach based on the log barrier function of the CMDP. Under the commonly assumed conditions of Fisher non-degeneracy and bounded transfer error of the policy parameterization, we establish the theoretical properties of the algorithm. In particular, in contrast to existing CMDP approaches that ensure policy feasibility only upon convergence, our algorithm guarantees feasibility of the policies during the learning process and converges to the optimal policy with a sample complexity of $O(\varepsilon^{-6})$. In comparison to the state-of-the-art policy gradient-based algorithm, C-NPG-PDA, our algorithm requires an additional $O(\varepsilon^{-2})$ samples to ensure policy feasibility during learning with same Fisher-non-degenerate parameterization.

Pathway to a fully data-driven geotechnics: lessons from materials informatics. (arXiv:2312.00581v1 [cs.LG])

Authors: Stephen Wu, Yu Otake, Yosuke Higo, Ikumasa Yoshida

This paper elucidates the challenges and opportunities inherent in integrating data-driven methodologies into geotechnics, drawing inspiration from the success of materials informatics. Highlighting the intricacies of soil complexity, heterogeneity, and the lack of comprehensive data, the discussion underscores the pressing need for community-driven database initiatives and open science movements. By leveraging the transformative power of deep learning, particularly in feature extraction from high-dimensional data and the potential of transfer learning, we envision a paradigm shift towards a more collaborative and innovative geotechnics field. The paper concludes with a forward-looking stance, emphasizing the revolutionary potential brought about by advanced computational tools like large language models in reshaping geotechnics informatics.

Adaptive Parameter-Free Robust Learning using Latent Bernoulli Variables. (arXiv:2312.00585v1 [stat.ML])

Authors: Aleksandr Karakulev (1), Dave Zachariah (2), Prashant Singh (1 and 3) ((1) Division of Scientific Computing, (2) Division of Systems and Control, (3) Science for Life Laboratory, Department of Information Technology, Uppsala University)

We present an efficient parameter-free approach for statistical learning from corrupted training sets. We identify corrupted and non-corrupted samples using latent Bernoulli variables, and therefore formulate the robust learning problem as maximization of the likelihood where latent variables are marginalized out. The resulting optimization problem is solved via variational inference using an efficient Expectation-Maximization based method. The proposed approach improves over the state-of-the-art by automatically inferring the corruption level and identifying outliers, while adding minimal computational overhead. We demonstrate our robust learning method on a wide variety of machine learning tasks including online learning and deep learning where it exhibits ability to adapt to different levels of noise and attain high prediction accuracy.

Explainable Fraud Detection with Deep Symbolic Classification. (arXiv:2312.00586v1 [cs.LG])

Authors: Samantha Visbeek, Erman Acar, Floris den Hengst

There is a growing demand for explainable, transparent, and data-driven models within the domain of fraud detection. Decisions made by fraud detection models need to be explainable in the event of a customer dispute. Additionally, the decision-making process in the model must be transparent to win the trust of regulators and business stakeholders. At the same time, fraud detection solutions can benefit from data due to the noisy, dynamic nature of fraud and the availability of large historical data sets. Finally, fraud detection is notorious for its class imbalance: there are typically several orders of magnitude more legitimate transactions than fraudulent ones. In this paper, we present Deep Symbolic Classification (DSC), an extension of the Deep Symbolic Regression framework to classification problems. DSC casts classification as a search problem in the space of all analytic functions composed of a vocabulary of variables, constants, and operations and optimizes for an arbitrary evaluation metric directly. The search is guided by a deep neural network trained with reinforcement learning. Because the functions are mathematical expressions that are in closed-form and concise, the model is inherently explainable both at the level of a single classification decision and the model's decision process. Furthermore, the class imbalance problem is successfully addressed by optimizing for metrics that are robust to class imbalance such as the F1 score. This eliminates the need for oversampling and undersampling techniques that plague traditional approaches. Finally, the model allows to explicitly balance between the prediction accuracy and the explainability. An evaluation on the PaySim data set demonstrates competitive predictive performance with state-of-the-art models, while surpassing them in terms of explainability. This establishes DSC as a promising model for fraud detection systems.

Tracking Object Positions in Reinforcement Learning: A Metric for Keypoint Detection (extended version). (arXiv:2312.00592v1 [cs.LG])

Authors: Emma Cramer, Jonas Reiher, Sebastian Trimpe

Reinforcement learning (RL) for robot control typically requires a detailed representation of the environment state, including information about task-relevant objects not directly measurable. Keypoint detectors, such as spatial autoencoders (SAEs), are a common approach to extracting a low-dimensional representation from high-dimensional image data. SAEs aim at spatial features such as object positions, which are often useful representations in robotic RL. However, whether an SAE is actually able to track objects in the scene and thus yields a spatial state representation well suited for RL tasks has rarely been examined due to a lack of established metrics. In this paper, we propose to assess the performance of an SAE instance by measuring how well keypoints track ground truth objects in images. We present a computationally lightweight metric and use it to evaluate common baseline SAE architectures on image data from a simulated robot task. We find that common SAEs differ substantially in their spatial extraction capability. Furthermore, we validate that SAEs that perform well in our metric achieve superior performance when used in downstream RL. Thus, our metric is an effective and lightweight indicator of RL performance before executing expensive RL training. Building on these insights, we identify three key modifications of SAE architectures to improve tracking performance. We make our code available at anonymous.4open.science/r/sae-rl.

Improving Plasticity in Online Continual Learning via Collaborative Learning. (arXiv:2312.00600v1 [cs.LG])

Authors: Maorong Wang, Nicolas Michel, Ling Xiao, Toshihiko Yamasaki

Online Continual Learning (CL) solves the problem of learning the ever-emerging new classification tasks from a continuous data stream. Unlike its offline counterpart, in online CL, the training data can only be seen once. Most existing online CL research regards catastrophic forgetting (i.e., model stability) as almost the only challenge. In this paper, we argue that the model's capability to acquire new knowledge (i.e., model plasticity) is another challenge in online CL. While replay-based strategies have been shown to be effective in alleviating catastrophic forgetting, there is a notable gap in research attention toward improving model plasticity. To this end, we propose Collaborative Continual Learning (CCL), a collaborative learning based strategy to improve the model's capability in acquiring new concepts. Additionally, we introduce Distillation Chain (DC), a novel collaborative learning scheme to boost the training of the models. We adapted CCL-DC to existing representative online CL works. Extensive experiments demonstrate that even if the learners are well-trained with state-of-the-art online CL methods, our strategy can still improve model plasticity dramatically, and thereby improve the overall performance by a large margin.

Investigating a domain adaptation approach for integrating different measurement instruments in a longitudinal clinical registry. (arXiv:2312.00616v1 [cs.LG])

Authors: Maren Hackenberg, Michelle Pfaffenlehner, Max Behrens, Astrid Pechmann, Janbernd Kirschner, Harald Binder

In a longitudinal clinical registry, different measurement instruments might have been used for assessing individuals at different time points. To combine them, we investigate deep learning techniques for obtaining a joint latent representation, to which the items of different measurement instruments are mapped. This corresponds to domain adaptation, an established concept in computer science for image data. Using the proposed approach as an example, we evaluate the potential of domain adaptation in a longitudinal cohort setting with a rather small number of time points, motivated by an application with different motor function measurement instruments in a registry of spinal muscular atrophy (SMA) patients. There, we model trajectories in the latent representation by ordinary differential equations (ODEs), where person-specific ODE parameters are inferred from baseline characteristics. The goodness of fit and complexity of the ODE solutions then allows to judge the measurement instrument mappings. We subsequently explore how alignment can be improved by incorporating corresponding penalty terms into model fitting. To systematically investigate the effect of differences between measurement instruments, we consider several scenarios based on modified SMA data, including scenarios where a mapping should be feasible in principle and scenarios where no perfect mapping is available. While misalignment increases in more complex scenarios, some structure is still recovered, even if the availability of measurement instruments depends on patient state. A reasonable mapping is feasible also in the more complex real SMA dataset. These results indicate that domain adaptation might be more generally useful in statistical modeling for longitudinal registry data.

Practical Path-based Bayesian Optimization. (arXiv:2312.00622v1 [cs.LG])

Authors: Jose Pablo Folch, James Odgers, Shiqiang Zhang, Robert M Lee, Behrang Shafei, David Walz, Calvin Tsay, Mark van der Wilk, Ruth Misener

There has been a surge in interest in data-driven experimental design with applications to chemical engineering and drug manufacturing. Bayesian optimization (BO) has proven to be adaptable to such cases, since we can model the reactions of interest as expensive black-box functions. Sometimes, the cost of this black-box functions can be separated into two parts: (a) the cost of the experiment itself, and (b) the cost of changing the input parameters. In this short paper, we extend the SnAKe algorithm to deal with both types of costs simultaneously. We further propose extensions to the case of a maximum allowable input change, as well as to the multi-objective setting.

Forecasting Trends in Food Security: a Reservoir Computing Approach. (arXiv:2312.00626v1 [cs.LG])

Authors: Joschka Herteux, Christoph Räth, Amine Baha, Giulia Martini, Duccio Piovani

Early warning systems are an essential tool for effective humanitarian action. Advance warnings on impending disasters facilitate timely and targeted response which help save lives, livelihoods, and scarce financial resources. In this work we present a new quantitative methodology to forecast levels of food consumption for 60 consecutive days, at the sub-national level, in four countries: Mali, Nigeria, Syria, and Yemen. The methodology is built on publicly available data from the World Food Programme's integrated global hunger monitoring system which collects, processes, and displays daily updates on key food security metrics, conflict, weather events, and other drivers of food insecurity across 90 countries (https://hungermap.wfp.org/). In this study, we assessed the performance of various models including ARIMA, XGBoost, LSTMs, CNNs, and Reservoir Computing (RC), by comparing their Root Mean Squared Error (RMSE) metrics. This comprehensive analysis spanned classical statistical, machine learning, and deep learning approaches. Our findings highlight Reservoir Computing as a particularly well-suited model in the field of food security given both its notable resistance to over-fitting on limited data samples and its efficient training capabilities. The methodology we introduce establishes the groundwork for a global, data-driven early warning system designed to anticipate and detect food insecurity.

EvE: Exploiting Generative Priors for Radiance Field Enrichment. (arXiv:2312.00639v1 [cs.CV])

Authors: Karim Kassab, Antoine Schnepf, Jean-Yves Franceschi, Laurent Caraffa, Jeremie Mary, Valérie Gouet-Brunet

Modeling large-scale scenes from unconstrained image collections in-the-wild has proven to be a major challenge in computer vision. Existing methods tackling in-the-wild neural rendering operate in a closed-world setting, where knowledge is limited to a scene's captured images within a training set. We propose EvE, which is, to the best of our knowledge, the first method leveraging generative priors to improve in-the-wild scene modeling. We employ pre-trained generative networks to enrich K-Planes representations with extrinsic knowledge. To this end, we define an alternating training procedure to conduct optimization guidance of K-Planes trained on the training set. We carry out extensive experiments and verify the merit of our method on synthetic data as well as real tourism photo collections. EvE enhances rendered scenes with richer details and outperforms the state of the art on the task of novel view synthesis in-the-wild. Our project page can be found at https://eve-nvs.github.io .

One to beat them all: "RYU'' -- a unifying framework for the construction of safe balls. (arXiv:2312.00640v1 [math.OC])

Authors: Thu-Le Tran, Clément Elvira, Hong-Phuong Dang, Cédric Herzet

In this paper, we put forth a novel framework (named ``RYU'') for the construction of ``safe'' balls, i.e. regions that provably contain the dual solution of a target optimization problem. We concentrate on the standard setup where the cost function is the sum of two terms: a closed, proper, convex Lipschitz-smooth function and a closed, proper, convex function. The RYU framework is shown to generalize or improve upon all the results proposed in the last decade for the considered family of optimization problems.

Hashmarks: Privacy-Preserving Benchmarks for High-Stakes AI Evaluation. (arXiv:2312.00645v1 [cs.LG])

Authors: Paul Bricman

There is a growing need to gain insight into language model capabilities that relate to sensitive topics, such as bioterrorism or cyberwarfare. However, traditional open source benchmarks are not fit for the task, due to the associated practice of publishing the correct answers in human-readable form. At the same time, enforcing mandatory closed-quarters evaluations might stifle development and erode trust. In this context, we propose hashmarking, a protocol for evaluating language models in the open without having to disclose the correct answers. In its simplest form, a hashmark is a benchmark whose reference solutions have been cryptographically hashed prior to publication. Following an overview of the proposed evaluation protocol, we go on to assess its resilience against traditional attack vectors (e.g. rainbow table attacks), as well as against failure modes unique to increasingly capable generative models.

Machine Learning for Health symposium 2023 -- Findings track. (arXiv:2312.00655v1 [cs.LG])

Authors: Stefan Hegselmann, Antonio Parziale, Divya Shanmugam, Shengpu Tang, Mercy Nyamewaa Asiedu, Serina Chang, Thomas Hartvigsen, Harvineet Singh

A collection of the accepted Findings papers that were presented at the 3rd Machine Learning for Health symposium (ML4H 2023), which was held on December 10, 2023, in New Orleans, Louisiana, USA. ML4H 2023 invited high-quality submissions on relevant problems in a variety of health-related disciplines including healthcare, biomedicine, and public health. Two submission tracks were offered: the archival Proceedings track, and the non-archival Findings track. Proceedings were targeted at mature work with strong technical sophistication and a high impact to health. The Findings track looked for new ideas that could spark insightful discussion, serve as valuable resources for the community, or could enable new collaborations. Submissions to the Proceedings track, if not accepted, were automatically considered for the Findings track. All the manuscripts submitted to ML4H Symposium underwent a double-blind peer-review process.

Simple Transferability Estimation for Regression Tasks. (arXiv:2312.00656v1 [cs.LG])

Authors: Cuong N. Nguyen, Phong Tran, Lam Si Tung Ho, Vu Dinh, Anh T. Tran, Tal Hassner, Cuong V. Nguyen

We consider transferability estimation, the problem of estimating how well deep learning models transfer from a source to a target task. We focus on regression tasks, which received little previous attention, and propose two simple and computationally efficient approaches that estimate transferability based on the negative regularized mean squared error of a linear regression model. We prove novel theoretical results connecting our approaches to the actual transferability of the optimal target models obtained from the transfer learning process. Despite their simplicity, our approaches significantly outperform existing state-of-the-art regression transferability estimators in both accuracy and efficiency. On two large-scale keypoint regression benchmarks, our approaches yield 12% to 36% better results on average while being at least 27% faster than previous state-of-the-art methods.

Resource-constrained knowledge diffusion processes inspired by human peer learning. (arXiv:2312.00660v1 [cs.LG])

Authors: Ehsan Beikihassan, Amy K.Hoover, Ioannis Koutis, Ali Parviz, Niloofar Aghaieabiane

We consider a setting where a population of artificial learners is given, and the objective is to optimize aggregate measures of performance, under constraints on training resources. The problem is motivated by the study of peer learning in human educational systems. In this context, we study natural knowledge diffusion processes in networks of interacting artificial learners. By `natural', we mean processes that reflect human peer learning where the students' internal state and learning process is mostly opaque, and the main degree of freedom lies in the formation of peer learning groups by a coordinator who can potentially evaluate the learners before assigning them to peer groups. Among else, we empirically show that such processes indeed make effective use of the training resources, and enable the design of modular neural models that have the capacity to generalize without being prone to overfitting noisy labels.

Nonparametric Variational Regularisation of Pretrained Transformers. (arXiv:2312.00662v1 [cs.LG])

Authors: Fabio Fehr, James Henderson

The current paradigm of large-scale pre-training and fine-tuning Transformer large language models has lead to significant improvements across the board in natural language processing. However, such large models are susceptible to overfitting to their training data, and as a result the models perform poorly when the domain changes. Also, due to the model's scale, the cost of fine-tuning the model to the new domain is large. Nonparametric Variational Information Bottleneck (NVIB) has been proposed as a regulariser for training cross-attention in Transformers, potentially addressing the overfitting problem. We extend the NVIB framework to replace all types of attention functions in Transformers, and show that existing pretrained Transformers can be reinterpreted as Nonparametric Variational (NV) models using a proposed identity initialisation. We then show that changing the initialisation introduces a novel, information-theoretic post-training regularisation in the attention mechanism, which improves out-of-domain generalisation without any training. This success supports the hypothesis that pretrained Transformers are implicitly NV Bayesian models.

CellMixer: Annotation-free Semantic Cell Segmentation of Heterogeneous Cell Populations. (arXiv:2312.00671v1 [cs.CV])

Authors: Mehdi Naouar, Gabriel Kalweit, Anusha Klett, Yannick Vogt, Paula Silvestrini, Diana Laura Infante Ramirez, Roland Mertelsmann, Joschka Boedecker, Maria Kalweit

In recent years, several unsupervised cell segmentation methods have been presented, trying to omit the requirement of laborious pixel-level annotations for the training of a cell segmentation model. Most if not all of these methods handle the instance segmentation task by focusing on the detection of different cell instances ignoring their type. While such models prove adequate for certain tasks, like cell counting, other applications require the identification of each cell's type. In this paper, we present CellMixer, an innovative annotation-free approach for the semantic segmentation of heterogeneous cell populations. Our augmentation-based method enables the training of a segmentation model from image-level labels of homogeneous cell populations. Our results show that CellMixer can achieve competitive segmentation performance across multiple cell types and imaging modalities, demonstrating the method's scalability and potential for broader applications in medical imaging, cellular biology, and diagnostics.

Towards Transparency in Coreference Resolution: A Quantum-Inspired Approach. (arXiv:2312.00688v1 [cs.CL])

Authors: Hadi Wazni, Mehrnoosh Sadrzadeh

Guided by grammatical structure, words compose to form sentences, and guided by discourse structure, sentences compose to form dialogues and documents. The compositional aspect of sentence and discourse units is often overlooked by machine learning algorithms. A recent initiative called Quantum Natural Language Processing (QNLP) learns word meanings as points in a Hilbert space and acts on them via a translation of grammatical structure into Parametrised Quantum Circuits (PQCs). Previous work extended the QNLP translation to discourse structure using points in a closure of Hilbert spaces. In this paper, we evaluate this translation on a Winograd-style pronoun resolution task. We train a Variational Quantum Classifier (VQC) for binary classification and implement an end-to-end pronoun resolution system. The simulations executed on IBMQ software converged with an F1 score of 87.20%. The model outperformed two out of three classical coreference resolution systems and neared state-of-the-art SpanBERT. A mixed quantum-classical model yet improved these results with an F1 score increase of around 6%.

Object Detector Differences when using Synthetic and Real Training Data. (arXiv:2312.00694v1 [cs.CV])

Authors: Martin Georg Ljungqvist, Otto Nordander, Markus Skans, Arvid Mildner, Tony Liu, Pierre Nugues

To train well-performing generalizing neural networks, sufficiently large and diverse datasets are needed. Collecting data while adhering to privacy legislation becomes increasingly difficult and annotating these large datasets is both a resource-heavy and time-consuming task. An approach to overcome these difficulties is to use synthetic data since it is inherently scalable and can be automatically annotated. However, how training on synthetic data affects the layers of a neural network is still unclear. In this paper, we train the YOLOv3 object detector on real and synthetic images from city environments. We perform a similarity analysis using Centered Kernel Alignment (CKA) to explore the effects of training on synthetic data on a layer-wise basis. The analysis captures the architecture of the detector while showing both different and similar patterns between different models. With this similarity analysis we want to give insights on how training synthetic data affects each layer and to give a better understanding of the inner workings of complex neural networks. The results show that the largest similarity between a detector trained on real data and a detector trained on synthetic data was in the early layers, and the largest difference was in the head part. The results also show that no major difference in performance or similarity could be seen between frozen and unfrozen backbone.

GIFT: Generative Interpretable Fine-Tuning Transformers. (arXiv:2312.00700v1 [cs.CV])

Authors: Chinmay Savadikar, Xi Song, Tianfu Wu

We present GIFT (Generative Interpretable Fine-tuning Transformers) for fine-tuning pretrained (often large) Transformer models at downstream tasks in a parameter-efficient way with built-in interpretability. Our GIFT is a deep parameter-residual learning method, which addresses two problems in fine-tuning a pretrained Transformer model: Where to apply the parameter-efficient fine-tuning (PEFT) to be extremely lightweight yet sufficiently expressive, and How to learn the PEFT to better exploit the knowledge of the pretrained model in a direct way? For the former, we select the final projection (linear) layer in the multi-head self-attention of a Transformer model, and verify its effectiveness. For the latter, in contrast to the prior art that directly introduce new model parameters (often in low-rank approximation form) to be learned in fine-tuning with downstream data, we propose a method for learning to generate the fine-tuning parameters. Our GIFT is a hyper-Transformer which take as input the pretrained parameters of the projection layer to generate its fine-tuning parameters using a proposed Parameter-to-Cluster Attention (PaCa). The PaCa results in a simple clustering-based forward explainer that plays the role of semantic segmentation in testing. In experiments, our proposed GIFT is tested on the VTAB benchmark and the fine-grained visual classification (FGVC) benchmark. It obtains significantly better performance than the prior art. Our code is available at https://github.com/savadikarc/gift

SpaCE: The Spatial Confounding Environment. (arXiv:2312.00710v1 [cs.LG])

Authors: Mauricio Tec, Ana Trisovic, Michelle Audirac, Sophie Woodward, Naeem Khoshnevis, Francesca Dominici

Spatial confounding poses a significant challenge in scientific studies involving spatial data, where unobserved spatial variables can influence both treatment and outcome, possibly leading to spurious associations. To address this problem, we introduce SpaCE: The Spatial Confounding Environment, the first toolkit to provide realistic benchmark datasets and tools for systematically evaluating causal inference methods designed to alleviate spatial confounding. Each dataset includes training data, true counterfactuals, a spatial graph with coordinates, and smoothness and confounding scores characterizing the effect of a missing spatial confounder. It also includes realistic semi-synthetic outcomes and counterfactuals, generated using state-of-the-art machine learning ensembles, following best practices for causal inference benchmarks. The datasets cover real treatment and covariates from diverse domains, including climate, health and social sciences. SpaCE facilitates an automated end-to-end pipeline, simplifying data loading, experimental setup, and evaluating machine learning and causal inference models. The SpaCE project provides several dozens of datasets of diverse sizes and spatial complexity. It is publicly available as a Python package, encouraging community feedback and contributions.

Removing Biases from Molecular Representations via Information Maximization. (arXiv:2312.00718v1 [cs.LG])

Authors: Chenyu Wang, Sharut Gupta, Caroline Uhler, Tommi Jaakkola

High-throughput drug screening -- using cell imaging or gene expression measurements as readouts of drug effect -- is a critical tool in biotechnology to assess and understand the relationship between the chemical structure and biological activity of a drug. Since large-scale screens have to be divided into multiple experiments, a key difficulty is dealing with batch effects, which can introduce systematic errors and non-biological associations in the data. We propose InfoCORE, an Information maximization approach for COnfounder REmoval, to effectively deal with batch effects and obtain refined molecular representations. InfoCORE establishes a variational lower bound on the conditional mutual information of the latent representations given a batch identifier. It adaptively reweighs samples to equalize their implied batch distribution. Extensive experiments on drug screening data reveal InfoCORE's superior performance in a multitude of tasks including molecular property prediction and molecule-phenotype retrieval. Additionally, we show results for how InfoCORE offers a versatile framework and resolves general distribution shifts and issues of data fairness by minimizing correlation with spurious features or removing sensitive attributes. The code is available at https://github.com/uhlerlab/InfoCORE.

Safe Reinforcement Learning in Tensor Reproducing Kernel Hilbert Space. (arXiv:2312.00727v1 [cs.LG])

Authors: Xiaoyuan Cheng, Boli Chen, Liz Varga, Yukun Hu

This paper delves into the problem of safe reinforcement learning (RL) in a partially observable environment with the aim of achieving safe-reachability objectives. In traditional partially observable Markov decision processes (POMDP), ensuring safety typically involves estimating the belief in latent states. However, accurately estimating an optimal Bayesian filter in POMDP to infer latent states from observations in a continuous state space poses a significant challenge, largely due to the intractable likelihood. To tackle this issue, we propose a stochastic model-based approach that guarantees RL safety almost surely in the face of unknown system dynamics and partial observation environments. We leveraged the Predictive State Representation (PSR) and Reproducing Kernel Hilbert Space (RKHS) to represent future multi-step observations analytically, and the results in this context are provable. Furthermore, we derived essential operators from the kernel Bayes' rule, enabling the recursive estimation of future observations using various operators. Under the assumption of \textit{undercompleness}, a polynomial sample complexity is established for the RL algorithm for the infinite size of observation and action spaces, ensuring an $\epsilon-$suboptimal safe policy guarantee.

Scalable Meta-Learning with Gaussian Processes. (arXiv:2312.00742v1 [stat.ML])

Authors: Petru Tighineanu, Lukas Grossberger, Paul Baireuther, Kathrin Skubch, Stefan Falkner, Julia Vinogradska, Felix Berkenkamp

Meta-learning is a powerful approach that exploits historical data to quickly solve new tasks from the same distribution. In the low-data regime, methods based on the closed-form posterior of Gaussian processes (GP) together with Bayesian optimization have achieved high performance. However, these methods are either computationally expensive or introduce assumptions that hinder a principled propagation of uncertainty between task models. This may disrupt the balance between exploration and exploitation during optimization. In this paper, we develop ScaML-GP, a modular GP model for meta-learning that is scalable in the number of tasks. Our core contribution is a carefully designed multi-task kernel that enables hierarchical training and task scalability. Conditioning ScaML-GP on the meta-data exposes its modular nature yielding a test-task prior that combines the posteriors of meta-task GPs. In synthetic and real-world meta-learning experiments, we demonstrate that ScaML-GP can learn efficiently both with few and many meta-tasks.

Domain Adaptation: Learning Bounds and Algorithms. (arXiv:0902.3430v3 [cs.LG] UPDATED)

Authors: Yishay Mansour, Mehryar Mohri, Afshin Rostamizadeh

This paper addresses the general problem of domain adaptation which arises in a variety of applications where the distribution of the labeled sample available somewhat differs from that of the test data. Building on previous work by Ben-David et al. (2007), we introduce a novel distance between distributions, discrepancy distance, that is tailored to adaptation problems with arbitrary loss functions. We give Rademacher complexity bounds for estimating the discrepancy distance from finite samples for different loss functions. Using this distance, we derive novel generalization bounds for domain adaptation for a wide family of loss functions. We also present a series of novel adaptation bounds for large classes of regularization-based algorithms, including support vector machines and kernel ridge regression based on the empirical discrepancy. This motivates our analysis of the problem of minimizing the empirical discrepancy for various loss functions for which we also give novel algorithms. We report the results of preliminary experiments that demonstrate the benefits of our discrepancy minimization algorithms for domain adaptation.

ChebNet: Efficient and Stable Constructions of Deep Neural Networks with Rectified Power Units via Chebyshev Approximations. (arXiv:1911.05467v3 [cs.LG] UPDATED)

Authors: Shanshan Tang, Bo Li, Haijun Yu

In a previous study [B. Li, S. Tang and H. Yu, Commun. Comput. Phy. 27(2):379-411, 2020], it is shown that deep neural networks built with rectified power units (RePU) as activation functions can give better approximation for sufficient smooth functions than those built with rectified linear units, by converting polynomial approximations using power series into deep neural networks with optimal complexity and no approximation error. However, in practice, power series approximations are not easy to obtain due to the associated stability issue. In this paper, we propose a new and more stable way to construct RePU deep neural networks based on Chebyshev polynomial approximations. By using a hierarchical structure of Chebyshev polynomial approximation in frequency domain, we obtain efficient and stable deep neural network construction, which we call ChebNet. The approximation of smooth functions by ChebNets is no worse than the approximation by deep RePU nets using power series. On the same time, ChebNets are much more stable. Numerical results show that the constructed ChebNets can be further fine-tuned to obtain much better results than those obtained by tuning deep RePU nets constructed by power series approach. As spectral accuracy is hard to obtain by direct training of deep neural networks, ChebNets provide a practical way to obtain spectral accuracy, it is expected to be useful in real applications that require efficient approximations of smooth functions.

Fast Deep Mixtures of Gaussian Process Experts. (arXiv:2006.13309v4 [cs.LG] UPDATED)

Authors: Clement Etienam, Kody Law, Sara Wade, Vitaly Zankin

Mixtures of experts have become an indispensable tool for flexible modelling in a supervised learning context, allowing not only the mean function but the entire density of the output to change with the inputs. Sparse Gaussian processes (GP) have shown promise as a leading candidate for the experts in such models, and in this article, we propose to design the gating network for selecting the experts from such mixtures of sparse GPs using a deep neural network (DNN). Furthermore, a fast one pass algorithm called Cluster-Classify-Regress (CCR) is leveraged to approximate the maximum a posteriori (MAP) estimator extremely quickly. This powerful combination of model and algorithm together delivers a novel method which is flexible, robust, and extremely efficient. In particular, the method is able to outperform competing methods in terms of accuracy and uncertainty quantification. The cost is competitive on low-dimensional and small data sets, but is significantly lower for higher-dimensional and big data sets. Iteratively maximizing the distribution of experts given allocations and allocations given experts does not provide significant improvement, which indicates that the algorithm achieves a good approximation to the local MAP estimator very fast. This insight can be useful also in the context of other mixture of experts models.

Interpreting and Disentangling Feature Components of Various Complexity from DNNs. (arXiv:2006.15920v2 [cs.LG] UPDATED)

Authors: Jie Ren, Mingjie Li, Zexu Liu, Quanshi Zhang

This paper aims to define, quantify, and analyze the feature complexity that is learned by a DNN. We propose a generic definition for the feature complexity. Given the feature of a certain layer in the DNN, our method disentangles feature components of different complexity orders from the feature. We further design a set of metrics to evaluate the reliability, the effectiveness, and the significance of over-fitting of these feature components. Furthermore, we successfully discover a close relationship between the feature complexity and the performance of DNNs. As a generic mathematical tool, the feature complexity and the proposed metrics can also be used to analyze the success of network compression and knowledge distillation.

Decentralized policy learning with partial observation and mechanical constraints for multiperson modeling. (arXiv:2007.03155v2 [cs.LG] UPDATED)

Authors: Keisuke Fujii, Naoya Takeishi, Yoshinobu Kawahara, Kazuya Takeda

Extracting the rules of real-world multi-agent behaviors is a current challenge in various scientific and engineering fields. Biological agents independently have limited observation and mechanical constraints; however, most of the conventional data-driven models ignore such assumptions, resulting in lack of biological plausibility and model interpretability for behavioral analyses. Here we propose sequential generative models with partial observation and mechanical constraints in a decentralized manner, which can model agents' cognition and body dynamics, and predict biologically plausible behaviors. We formulate this as a decentralized multi-agent imitation-learning problem, leveraging binary partial observation and decentralized policy models based on hierarchical variational recurrent neural networks with physical and biomechanical penalties. Using real-world basketball and soccer datasets, we show the effectiveness of our method in terms of the constraint violations, long-term trajectory prediction, and partial observation. Our approach can be used as a multi-agent simulator to generate realistic trajectories using real-world data.

A Unified Approach to Interpreting and Boosting Adversarial Transferability. (arXiv:2010.04055v2 [cs.LG] UPDATED)

Authors: Xin Wang, Jie Ren, Shuyun Lin, Xiangming Zhu, Yisen Wang, Quanshi Zhang

In this paper, we use the interaction inside adversarial perturbations to explain and boost the adversarial transferability. We discover and prove the negative correlation between the adversarial transferability and the interaction inside adversarial perturbations. The negative correlation is further verified through different DNNs with various inputs. Moreover, this negative correlation can be regarded as a unified perspective to understand current transferability-boosting methods. To this end, we prove that some classic methods of enhancing the transferability essentially decease interactions inside adversarial perturbations. Based on this, we propose to directly penalize interactions during the attacking process, which significantly improves the adversarial transferability.

Topological properties of basins of attraction and expressiveness of width bounded neural networks. (arXiv:2011.04923v6 [cs.LG] UPDATED)

Authors: Hans-Peter Beise, Steve Dias Da Cruz

In Radhakrishnan et al. [2020], the authors empirically show that autoencoders trained with usual SGD methods shape out basins of attraction around their training data. We consider network functions of width not exceeding the input dimension and prove that in this situation basins of attraction are bounded and their complement cannot have bounded components. Our conditions in these results are met in several experiments of the latter work and we thus address a question posed therein. We also show that under some more restrictive conditions the basins of attraction are path-connected. The tightness of the conditions in our results is demonstrated by means of several examples. Finally, the arguments used to prove the above results allow us to derive a root cause why scalar-valued neural network functions that fulfill our bounded width condition are not dense in spaces of continuous functions.

Optimal Stopping via Randomized Neural Networks. (arXiv:2104.13669v4 [stat.ML] UPDATED)

Authors: Calypso Herrera, Florian Krach, Pierre Ruyssen, Josef Teichmann

This paper presents the benefits of using randomized neural networks instead of standard basis functions or deep neural networks to approximate the solutions of optimal stopping problems. The key idea is to use neural networks, where the parameters of the hidden layers are generated randomly and only the last layer is trained, in order to approximate the continuation value. Our approaches are applicable to high dimensional problems where the existing approaches become increasingly impractical. In addition, since our approaches can be optimized using simple linear regression, they are easy to implement and theoretical guarantees can be provided. We test our approaches for American option pricing on Black--Scholes, Heston and rough Heston models and for optimally stopping a fractional Brownian motion. In all cases, our algorithms outperform the state-of-the-art and other relevant machine learning approaches in terms of computation time while achieving comparable results. Moreover, we show that they can also be used to efficiently compute Greeks of American options.

PyTorch Geometric Signed Directed: A Software Package on Graph Neural Networks for Signed and Directed Graphs. (arXiv:2202.10793v6 [cs.LG] UPDATED)

Authors: Yixuan He, Xitong Zhang, Junjie Huang, Benedek Rozemberczki, Mihai Cucuringu, Gesine Reinert

Networks are ubiquitous in many real-world applications (e.g., social networks encoding trust/distrust relationships, correlation networks arising from time series data). While many networks are signed or directed, or both, there is a lack of unified software packages on graph neural networks (GNNs) specially designed for signed and directed networks. In this paper, we present PyTorch Geometric Signed Directed (PyGSD), a software package which fills this gap. Along the way, we evaluate the implemented methods with experiments with a view to providing insights into which method to choose for a given task. The deep learning framework consists of easy-to-use GNN models, synthetic and real-world data, as well as task-specific evaluation metrics and loss functions for signed and directed networks. As an extension library for PyG, our proposed software is maintained with open-source releases, detailed documentation, continuous integration, unit tests and code coverage checks. The GitHub repository of the library is https://github.com/SherylHYX/pytorch_geometric_signed_directed.

Bandwidth Selection for Gaussian Kernel Ridge Regression via Jacobian Control. (arXiv:2205.11956v4 [stat.ML] UPDATED)

Authors: Oskar Allerbo, Rebecka Jörnsten

Most machine learning methods require tuning of hyper-parameters. For kernel ridge regression with the Gaussian kernel, the hyper-parameter is the bandwidth. The bandwidth specifies the length scale of the kernel and has to be carefully selected to obtain a model with good generalization. The default methods for bandwidth selection, cross-validation and marginal likelihood maximization, often yield good results, albeit at high computational costs. Inspired by Jacobian regularization, we formulate an approximate expression for how the derivatives of the functions inferred by kernel ridge regression with the Gaussian kernel depend on the kernel bandwidth. We use this expression to propose a closed-form, computationally feather-light, bandwidth selection heuristic, based on controlling the Jacobian. In addition, the Jacobian expression illuminates how the bandwidth selection is a trade-off between the smoothness of the inferred function and the conditioning of the training data kernel matrix. We show on real and synthetic data that compared to cross-validation and marginal likelihood maximization, our method is on pair in terms of model performance, but up to six orders of magnitude faster.

Classification Utility, Fairness, and Compactness via Tunable Information Bottleneck and R\'enyi Measures. (arXiv:2206.10043v3 [cs.LG] UPDATED)

Authors: Adam Gronowski, William Paul, Fady Alajaji, Bahman Gharesifard, Philippe Burlina

Designing machine learning algorithms that are accurate yet fair, not discriminating based on any sensitive attribute, is of paramount importance for society to accept AI for critical applications. In this article, we propose a novel fair representation learning method termed the R\'enyi Fair Information Bottleneck Method (RFIB) which incorporates constraints for utility, fairness, and compactness (compression) of representation, and apply it to image and tabular data classification. A key attribute of our approach is that we consider - in contrast to most prior work - both demographic parity and equalized odds as fairness constraints, allowing for a more nuanced satisfaction of both criteria. Leveraging a variational approach, we show that our objectives yield a loss function involving classical Information Bottleneck (IB) measures and establish an upper bound in terms of two R\'enyi measures of order $\alpha$ on the mutual information IB term measuring compactness between the input and its encoded embedding. We study the influence of the $\alpha$ parameter as well as two other tunable IB parameters on achieving utility/fairness trade-off goals, and show that the $\alpha$ parameter gives an additional degree of freedom that can be used to control the compactness of the representation. Experimenting on three different image datasets (EyePACS, CelebA, and FairFace) and two tabular datasets (Adult and COMPAS), using both binary and categorical sensitive attributes, we show that on various utility, fairness, and compound utility/fairness metrics RFIB outperforms current state-of-the-art approaches.

Identify ambiguous tasks combining crowdsourced labels by weighting Areas Under the Margin. (arXiv:2209.15380v3 [cs.LG] UPDATED)

Authors: Tanguy Lefort, Benjamin Charlier, Alexis Joly, Joseph Salmon

In supervised learning - for instance in image classification - modern massive datasets are commonly labeled by a crowd of workers. The obtained labels in this crowdsourcing setting are then aggregated for training, generally leveraging a per-worker trust score. Yet, such workers oriented approaches discard the tasks' ambiguity. Ambiguous tasks might fool expert workers, which is often harmful for the learning step. In standard supervised learning settings - with one label per task - the Area Under the Margin (AUM) was tailored to identify mislabeled data. We adapt the AUM to identify ambiguous tasks in crowdsourced learning scenarios, introducing the Weighted Areas Under the Margin (WAUM). The WAUM is an average of AUMs weighted according to task-dependent scores. We show that the WAUM can help discarding ambiguous tasks from the training set, leading to better generalization performance. We report improvements over existing strategies for learning with a crowd, both on simulated settings, and on real datasets such as CIFAR-10H (a crowdsourced dataset with a high number of answered labels),LabelMe and Music (two datasets with few answered votes).

Defects of Convolutional Decoder Networks in Frequency Representation. (arXiv:2210.09020v2 [cs.LG] UPDATED)

Authors: Ling Tang, Wen Shen, Zhanpeng Zhou, Yuefeng Chen, Quanshi Zhang

In this paper, we prove the representation defects of a cascaded convolutional decoder network, considering the capacity of representing different frequency components of an input sample. We conduct the discrete Fourier transform on each channel of the feature map in an intermediate layer of the decoder network. Then, we extend the 2D circular convolution theorem to represent the forward and backward propagations through convolutional layers in the frequency domain. Based on this, we prove three defects in representing feature spectrums. First, we prove that the convolution operation, the zero-padding operation, and a set of other settings all make a convolutional decoder network more likely to weaken high-frequency components. Second, we prove that the upsampling operation generates a feature spectrum, in which strong signals repetitively appear at certain frequencies. Third, we prove that if the frequency components in the input sample and frequency components in the target output for regression have a small shift, then the decoder usually cannot be effectively learned.

Automatic Diagnosis of Myocarditis Disease in Cardiac MRI Modality using Deep Transformers and Explainable Artificial Intelligence. (arXiv:2210.14611v2 [cs.CV] UPDATED)

Authors: Mahboobeh Jafari, Afshin Shoeibi, Navid Ghassemi, Jonathan Heras, Sai Ho Ling, Amin Beheshti, Yu-Dong Zhang, Shui-Hua Wang, Roohallah Alizadehsani, Juan M. Gorriz, U. Rajendra Acharya, Hamid Alinejad Rokny

Myocarditis is a significant cardiovascular disease (CVD) that poses a threat to the health of many individuals by causing damage to the myocardium. The occurrence of microbes and viruses, including the likes of HIV, plays a crucial role in the development of myocarditis disease (MCD). The images produced during cardiac magnetic resonance imaging (CMRI) scans are low contrast, which can make it challenging to diagnose cardiovascular diseases. In other hand, checking numerous CMRI slices for each CVD patient can be a challenging task for medical doctors. To overcome the existing challenges, researchers have suggested the use of artificial intelligence (AI)-based computer-aided diagnosis systems (CADS). The presented paper outlines a CADS for the detection of MCD from CMR images, utilizing deep learning (DL) methods. The proposed CADS consists of several steps, including dataset, preprocessing, feature extraction, classification, and post-processing. First, the Z-Alizadeh dataset was selected for the experiments. Subsequently, the CMR images underwent various preprocessing steps, including denoising, resizing, as well as data augmentation (DA) via CutMix and MixUp techniques. In the following, the most current deep pre-trained and transformer models are used for feature extraction and classification on the CMR images. The findings of our study reveal that transformer models exhibit superior performance in detecting MCD as opposed to pre-trained architectures. In terms of DL architectures, the Turbulence Neural Transformer (TNT) model exhibited impressive accuracy, reaching 99.73% utilizing a 10-fold cross-validation approach. Additionally, to pinpoint areas of suspicion for MCD in CMRI images, the Explainable-based Grad Cam method was employed.

RaLiBEV: Radar and LiDAR BEV Fusion Learning for Anchor Box Free Object Detection Systems. (arXiv:2211.06108v4 [cs.CV] UPDATED)

Authors: Yanlong Yang, Jianan Liu, Tao Huang, Qing-Long Han, Gang Ma, Bing Zhu

In autonomous driving, LiDAR and radar play important roles in the perception of the surrounding environment. LiDAR provides accurate 3D spatial sensing information but cannot work in adverse weather like fog. On the other hand, the radar signal can be diffracted when encountering raindrops or mist particles thanks to its wavelength, but it suffers from large noise. Recent state-of-the-art works reveal that fusion of radar and LiDAR can lead to robust detection in adverse weather. The existing works adopt convolutional neural network architecture to extract features from each sensor data, then align and aggregate the two branch features to predict object detection results. However, these methods have low accuracy of bounding box estimations due to a simple design of label assignment and fusion strategies. In this paper, we propose a bird's-eye view fusion learning-based anchor box-free object detection system, which fuses the feature derived from the radar range-azimuth heatmap and the LiDAR point cloud to estimate possible objects. Different label assignment strategies have been designed to facilitate the consistency between the classification of foreground or background anchor points and the corresponding bounding box regressions. Furthermore, the performance of the proposed object detector is further enhanced by employing a novel interactive transformer module. The superior performance of the methods proposed in this paper has been demonstrated using the recently published Oxford Radar RobotCar dataset. Our system's average precision significantly outperforms the state-of-the-art method by 13.1% and 19.0% at IoU of 0.8 under 'Clear+Foggy' training conditions for 'Clear' and 'Foggy' testing, respectively.

Bayesian Learning with Information Gain Provably Bounds Risk for a Robust Adversarial Defense. (arXiv:2212.02003v2 [cs.LG] UPDATED)

Authors: Bao Gia Doan, Ehsan Abbasnejad, Javen Qinfeng Shi, Damith C. Ranasinghe

We present a new algorithm to learn a deep neural network model robust against adversarial attacks. Previous algorithms demonstrate an adversarially trained Bayesian Neural Network (BNN) provides improved robustness. We recognize the adversarial learning approach for approximating the multi-modal posterior distribution of a Bayesian model can lead to mode collapse; consequently, the model's achievements in robustness and performance are sub-optimal. Instead, we first propose preventing mode collapse to better approximate the multi-modal posterior distribution. Second, based on the intuition that a robust model should ignore perturbations and only consider the informative content of the input, we conceptualize and formulate an information gain objective to measure and force the information learned from both benign and adversarial training instances to be similar. Importantly. we prove and demonstrate that minimizing the information gain objective allows the adversarial risk to approach the conventional empirical risk. We believe our efforts provide a step toward a basis for a principled method of adversarially training BNNs. Our model demonstrate significantly improved robustness--up to 20%--compared with adversarial training and Adv-BNN under PGD attacks with 0.035 distortion on both CIFAR-10 and STL-10 datasets.

Adaptive Deep Neural Network Inference Optimization with EENet. (arXiv:2301.07099v2 [cs.LG] UPDATED)

Authors: Fatih Ilhan, Ka-Ho Chow, Sihao Hu, Tiansheng Huang, Selim Tekin, Wenqi Wei, Yanzhao Wu, Myungjin Lee, Ramana Kompella, Hugo Latapie, Gaowen Liu, Ling Liu

Well-trained deep neural networks (DNNs) treat all test samples equally during prediction. Adaptive DNN inference with early exiting leverages the observation that some test examples can be easier to predict than others. This paper presents EENet, a novel early-exiting scheduling framework for multi-exit DNN models. Instead of having every sample go through all DNN layers during prediction, EENet learns an early exit scheduler, which can intelligently terminate the inference earlier for certain predictions, which the model has high confidence of early exit. As opposed to previous early-exiting solutions with heuristics-based methods, our EENet framework optimizes an early-exiting policy to maximize model accuracy while satisfying the given per-sample average inference budget. Extensive experiments are conducted on four computer vision datasets (CIFAR-10, CIFAR-100, ImageNet, Cityscapes) and two NLP datasets (SST-2, AgNews). The results demonstrate that the adaptive inference by EENet can outperform the representative existing early exit techniques. We also perform a detailed visualization analysis of the comparison results to interpret the benefits of EENet.

Improving Open-Set Semi-Supervised Learning with Self-Supervision. (arXiv:2301.10127v3 [cs.LG] UPDATED)

Authors: Erik Wallin, Lennart Svensson, Fredrik Kahl, Lars Hammarstrand

Open-set semi-supervised learning (OSSL) embodies a practical scenario within semi-supervised learning, wherein the unlabeled training set encompasses classes absent from the labeled set. Many existing OSSL methods assume that these out-of-distribution data are harmful and put effort into excluding data belonging to unknown classes from the training objective. In contrast, we propose an OSSL framework that facilitates learning from all unlabeled data through self-supervision. Additionally, we utilize an energy-based score to accurately recognize data belonging to the known classes, making our method well-suited for handling uncurated data in deployment. We show through extensive experimental evaluations that our method yields state-of-the-art results on many of the evaluated benchmark problems in terms of closed-set accuracy and open-set recognition when compared with existing methods for OSSL. Our code is available at https://github.com/walline/ssl-tf2-sefoss.

CHeart: A Conditional Spatio-Temporal Generative Model for Cardiac Anatomy. (arXiv:2301.13098v3 [eess.IV] UPDATED)

Authors: Mengyun Qiao, Shuo Wang, Huaqi Qiu, Antonio de Marvao, Declan P. O'Regan, Daniel Rueckert, Wenjia Bai

Two key questions in cardiac image analysis are to assess the anatomy and motion of the heart from images; and to understand how they are associated with non-imaging clinical factors such as gender, age and diseases. While the first question can often be addressed by image segmentation and motion tracking algorithms, our capability to model and to answer the second question is still limited. In this work, we propose a novel conditional generative model to describe the 4D spatio-temporal anatomy of the heart and its interaction with non-imaging clinical factors. The clinical factors are integrated as the conditions of the generative modelling, which allows us to investigate how these factors influence the cardiac anatomy. We evaluate the model performance in mainly two tasks, anatomical sequence completion and sequence generation. The model achieves a high performance in anatomical sequence completion, comparable to or outperforming other state-of-the-art generative models. In terms of sequence generation, given clinical conditions, the model can generate realistic synthetic 4D sequential anatomies that share similar distributions with the real data.

Timewarp: Transferable Acceleration of Molecular Dynamics by Learning Time-Coarsened Dynamics. (arXiv:2302.01170v2 [stat.ML] UPDATED)

Authors: Leon Klein, Andrew Y. K. Foong, Tor Erlend Fjelde, Bruno Mlodozeniec, Marc Brockschmidt, Sebastian Nowozin, Frank Noé, Ryota Tomioka

Molecular dynamics (MD) simulation is a widely used technique to simulate molecular systems, most commonly at the all-atom resolution where equations of motion are integrated with timesteps on the order of femtoseconds ($1\textrm{fs}=10^{-15}\textrm{s}$). MD is often used to compute equilibrium properties, which requires sampling from an equilibrium distribution such as the Boltzmann distribution. However, many important processes, such as binding and folding, occur over timescales of milliseconds or beyond, and cannot be efficiently sampled with conventional MD. Furthermore, new MD simulations need to be performed for each molecular system studied. We present Timewarp, an enhanced sampling method which uses a normalising flow as a proposal distribution in a Markov chain Monte Carlo method targeting the Boltzmann distribution. The flow is trained offline on MD trajectories and learns to make large steps in time, simulating the molecular dynamics of $10^{5} - 10^{6}\:\textrm{fs}$. Crucially, Timewarp is transferable between molecular systems: once trained, we show that it generalises to unseen small peptides (2-4 amino acids) at all-atom resolution, exploring their metastable states and providing wall-clock acceleration of sampling compared to standard MD. Our method constitutes an important step towards general, transferable algorithms for accelerating MD.

Transfer Learning Enhanced Full Waveform Inversion. (arXiv:2302.11259v2 [cs.LG] UPDATED)

Authors: Stefan Kollmannsberger, Divya Singh, Leon Herrmann

We propose a way to favorably employ neural networks in the field of non-destructive testing using Full Waveform Inversion (FWI). The presented methodology discretizes the unknown material distribution in the domain with a neural network within an adjoint optimization. To further increase efficiency of the FWI, pretrained neural networks are used to provide a good starting point for the inversion. This reduces the number of iterations in the Full Waveform Inversion for specific, yet generalizable settings.

Does a Neural Network Really Encode Symbolic Concepts?. (arXiv:2302.13080v2 [cs.LG] UPDATED)

Authors: Mingjie Li, Quanshi Zhang

Recently, a series of studies have tried to extract interactions between input variables modeled by a DNN and define such interactions as concepts encoded by the DNN. However, strictly speaking, there still lacks a solid guarantee whether such interactions indeed represent meaningful concepts. Therefore, in this paper, we examine the trustworthiness of interaction concepts from four perspectives. Extensive empirical studies have verified that a well-trained DNN usually encodes sparse, transferable, and discriminative concepts, which is partially aligned with human intuition.

Bayesian Neural Networks Avoid Encoding Complex and Perturbation-Sensitive Concepts. (arXiv:2302.13095v2 [cs.LG] UPDATED)

Authors: Qihan Ren, Huiqi Deng, Yunuo Chen, Siyu Lou, Quanshi Zhang

In this paper, we focus on mean-field variational Bayesian Neural Networks (BNNs) and explore the representation capacity of such BNNs by investigating which types of concepts are less likely to be encoded by the BNN. It has been observed and studied that a relatively small set of interactive concepts usually emerge in the knowledge representation of a sufficiently-trained neural network, and such concepts can faithfully explain the network output. Based on this, our study proves that compared to standard deep neural networks (DNNs), it is less likely for BNNs to encode complex concepts. Experiments verify our theoretical proofs. Note that the tendency to encode less complex concepts does not necessarily imply weak representation power, considering that complex concepts exhibit low generalization power and high adversarial vulnerability. The code is available at https://github.com/sjtu-xai-lab/BNN-concepts.

Active Learning and Bayesian Optimization: a Unified Perspective to Learn with a Goal. (arXiv:2303.01560v3 [cs.LG] UPDATED)

Authors: Francesco Di Fiore, Michela Nardelli, Laura Mainini

Science and Engineering applications are typically associated with expensive optimization problem to identify optimal design solutions and states of the system of interest. Bayesian optimization and active learning compute surrogate models through efficient adaptive sampling schemes to assist and accelerate this search task toward a given optimization goal. Both those methodologies are driven by specific infill/learning criteria which quantify the utility with respect to the set goal of evaluating the objective function for unknown combinations of optimization variables. While the two fields have seen an exponential growth in popularity in the past decades, their dualism and synergy have received relatively little attention to date. This paper discusses and formalizes the synergy between Bayesian optimization and active learning as symbiotic adaptive sampling methodologies driven by common principles. In particular, we demonstrate this unified perspective through the formalization of the analogy between the Bayesian infill criteria and active learning criteria as driving principles of both the goal-driven procedures. To support our original perspective, we propose a general classification of adaptive sampling techniques to highlight similarities and differences between the vast families of adaptive sampling, active learning, and Bayesian optimization. Accordingly, the synergy is demonstrated mapping the Bayesian infill criteria with the active learning criteria, and is formalized for searches informed by both a single information source and multiple levels of fidelity. In addition, we provide guidelines to apply those learning criteria investigating the performance of different Bayesian schemes for a variety of benchmark problems to highlight benefits and limitations over mathematical properties that characterize real-world applications.

Bayesian CART models for insurance claims frequency. (arXiv:2303.01923v3 [stat.ML] UPDATED)

Authors: Yaojun Zhang, Lanpeng Ji, Georgios Aivaliotis, Charles Taylor

Accuracy and interpretability of a (non-life) insurance pricing model are essential qualities to ensure fair and transparent premiums for policy-holders, that reflect their risk. In recent years, the classification and regression trees (CARTs) and their ensembles have gained popularity in the actuarial literature, since they offer good prediction performance and are relatively easily interpretable. In this paper, we introduce Bayesian CART models for insurance pricing, with a particular focus on claims frequency modelling. Additionally to the common Poisson and negative binomial (NB) distributions used for claims frequency, we implement Bayesian CART for the zero-inflated Poisson (ZIP) distribution to address the difficulty arising from the imbalanced insurance claims data. To this end, we introduce a general MCMC algorithm using data augmentation methods for posterior tree exploration. We also introduce the deviance information criterion (DIC) for the tree model selection. The proposed models are able to identify trees which can better classify the policy-holders into risk groups. Some simulations and real insurance data will be discussed to illustrate the applicability of these models.

Use Perturbations when Learning from Explanations. (arXiv:2303.06419v3 [cs.LG] UPDATED)

Authors: Juyeon Heo, Vihari Piratla, Matthew Wicker, Adrian Weller

Machine learning from explanations (MLX) is an approach to learning that uses human-provided explanations of relevant or irrelevant features for each input to ensure that model predictions are right for the right reasons. Existing MLX approaches rely on local model interpretation methods and require strong model smoothing to align model and human explanations, leading to sub-optimal performance. We recast MLX as a robustness problem, where human explanations specify a lower dimensional manifold from which perturbations can be drawn, and show both theoretically and empirically how this approach alleviates the need for strong model smoothing. We consider various approaches to achieving robustness, leading to improved performance over prior MLX methods. Finally, we show how to combine robustness with an earlier MLX method, yielding state-of-the-art results on both synthetic and real-world benchmarks.

A machine-learning approach to thunderstorm forecasting through post-processing of simulation data. (arXiv:2303.08736v2 [physics.ao-ph] UPDATED)

Authors: Kianusch Vahid Yousefnia, Tobias Bölle, Isabella Zöbisch, Thomas Gerz

Thunderstorms pose a major hazard to society and economy, which calls for reliable thunderstorm forecasts. In this work, we introduce SALAMA, a feedforward neural network model for identifying thunderstorm occurrence in numerical weather prediction (NWP) data. The model is trained on convection-resolving ensemble forecasts over Central Europe and lightning observations. Given only a set of pixel-wise input parameters that are extracted from NWP data and related to thunderstorm development, SALAMA infers the probability of thunderstorm occurrence in a reliably calibrated manner. For lead times up to eleven hours, we find a forecast skill superior to classification based only on NWP reflectivity. Varying the spatiotemporal criteria by which we associate lightning observations with NWP data, we show that the time scale for skillful thunderstorm predictions increases linearly with the spatial scale of the forecast.

HarsanyiNet: Computing Accurate Shapley Values in a Single Forward Propagation. (arXiv:2304.01811v2 [cs.LG] UPDATED)

Authors: Lu Chen, Siyu Lou, Keyan Zhang, Jin Huang, Quanshi Zhang

The Shapley value is widely regarded as a trustworthy attribution metric. However, when people use Shapley values to explain the attribution of input variables of a deep neural network (DNN), it usually requires a very high computational cost to approximate relatively accurate Shapley values in real-world applications. Therefore, we propose a novel network architecture, the HarsanyiNet, which makes inferences on the input sample and simultaneously computes the exact Shapley values of the input variables in a single forward propagation. The HarsanyiNet is designed on the theoretical foundation that the Shapley value can be reformulated as the redistribution of Harsanyi interactions encoded by the network.

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment. (arXiv:2304.06767v4 [cs.LG] UPDATED)

Authors: Hanze Dong, Wei Xiong, Deepanshu Goyal, Yihan Zhang, Winnie Chow, Rui Pan, Shizhe Diao, Jipeng Zhang, Kashun Shum, Tong Zhang

Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. Such biases can produce suboptimal samples, skewed outcomes, and unfairness, with potentially serious consequences. Consequently, aligning these models with human ethics and preferences is an essential step toward ensuring their responsible and effective deployment in real-world applications. Prior research has primarily employed Reinforcement Learning from Human Feedback (RLHF) to address this problem, where generative models are fine-tuned with RL algorithms guided by a human-feedback-informed reward model. However, the inefficiencies and instabilities associated with RL algorithms frequently present substantial obstacles to the successful alignment, necessitating the development of a more robust and streamlined approach. To this end, we introduce a new framework, Reward rAnked FineTuning (RAFT), designed to align generative models effectively. Utilizing a reward model and a sufficient number of samples, our approach selects the high-quality samples, discarding those that exhibit undesired behavior, and subsequently enhancing the model by fine-tuning on these filtered samples. Our studies show that RAFT can effectively improve the model performance in both reward learning and other automated metrics in both large language models and diffusion models.

ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness. (arXiv:2304.10703v2 [cs.CL] UPDATED)

Authors: Archiki Prasad, Swarnadeep Saha, Xiang Zhou, Mohit Bansal

Multi-step reasoning ability is fundamental to many natural language tasks, yet it is unclear what constitutes a good reasoning chain and how to evaluate them. Most existing methods focus solely on whether the reasoning chain leads to the correct conclusion, but this answer-oriented view may confound reasoning quality with other spurious shortcuts to predict the answer. To bridge this gap, we evaluate reasoning chains by viewing them as informal proofs that derive the final answer. Specifically, we propose ReCEval (Reasoning Chain Evaluation), a framework that evaluates reasoning chains via two key properties: (1) correctness, i.e., each step makes a valid inference based on information contained within the step, preceding steps, and input context, and (2) informativeness, i.e., each step provides new information that is helpful towards deriving the generated answer. We evaluate these properties by developing metrics using natural language inference models and V-Information. On multiple datasets, we show that ReCEval effectively identifies various error types and yields notable improvements compared to prior methods. We analyze the impact of step boundaries, and previous steps on evaluating correctness and demonstrate that our informativeness metric captures the expected flow of information in high-quality reasoning chains. Finally, we show that scoring reasoning chains based on ReCEval improves downstream task performance. Our code is publicly available at: https://github.com/archiki/ReCEval

Phylo2Vec: a vector representation for binary trees. (arXiv:2304.12693v2 [q-bio.PE] UPDATED)

Authors: Matthew J Penn, Neil Scheidwasser, Mark P Khurana, David A Duchêne, Christl A Donnelly, Samir Bhatt

Binary phylogenetic trees inferred from biological data are central to understanding the shared evolutionary history of organisms. Inferring the placement of latent nodes in a tree by any optimality criterion (e.g., maximum likelihood) is an NP-hard problem, propelling the development of myriad heuristic approaches. Yet, these heuristics often lack a systematic means of uniformly sampling random trees or effectively exploring a tree space that grows factorially, which are crucial to optimisation problems such as machine learning. Accordingly, we present Phylo2Vec, a new parsimonious representation of a phylogenetic tree. Phylo2Vec maps any binary tree with $n$ leaves to an integer vector of length $n$. We prove that Phylo2Vec is both well-defined and bijective to the space of phylogenetic trees. The advantages of Phylo2Vec are twofold: i) easy uniform sampling of binary trees and ii) systematic ability to traverse tree space in very large or small jumps. As a proof of concept, we use Phylo2Vec for maximum likelihood inference on five real-world datasets and show that a simple hill climbing-based optimisation efficiently traverses the vastness of tree space from a random to an optimal tree.

SmartChoices: Augmenting Software with Learned Implementations. (arXiv:2304.13033v2 [cs.SE] UPDATED)

Authors: Daniel Golovin, Gabor Bartok, Eric Chen, Emily Donahue, Tzu-Kuo Huang, Efi Kokiopoulou, Ruoyan Qin, Nikhil Sarda, Justin Sybrandt, Vincent Tjeng

We are living in a golden age of machine learning. Powerful models perform many tasks far better than is possible using traditional software engineering approaches alone. However, developing and deploying these models in existing software systems remains challenging. In this paper, we present SmartChoices, a novel approach to incorporating machine learning into mature software stacks easily, safely, and effectively. We highlight key design decisions and present case studies applying SmartChoices within a range of large-scale industrial systems.

Contrastive losses as generalized models of global epistasis. (arXiv:2305.03136v3 [q-bio.PE] UPDATED)

Authors: David H. Brookes, Jakub Otwinowski, Sam Sinai

Fitness functions map large combinatorial spaces of biological sequences to properties of interest. Inferring these multimodal functions from experimental data is a central task in modern protein engineering. Global epistasis models are an effective and physically-grounded class of models for estimating fitness functions from observed data. These models assume that a sparse latent function is transformed by a monotonic nonlinearity to emit measurable fitness. Here we demonstrate that minimizing contrastive loss functions, such as the Bradley-Terry loss, is a simple and flexible technique for extracting the sparse latent function implied by global epistasis. We argue by way of a fitness-epistasis uncertainty principle that the nonlinearities in global epistasis models can produce observed fitness functions that do not admit sparse representations, and thus may be inefficient to learn from observations when using a Mean Squared Error (MSE) loss (a common practice). We show that contrastive losses are able to accurately estimate a ranking function from limited data even in regimes where MSE is ineffective. We validate the practical utility of this insight by showing contrastive loss functions result in consistently improved performance on benchmark tasks.

Generating high-quality 3DMPCs by adaptive data acquisition and NeREF-based radiometric calibration with UGV plant phenotyping system. (arXiv:2305.06777v2 [eess.IV] UPDATED)

Authors: Pengyao Xie, Zhihong Ma, Ruiming Du, Xin Yang, Haiyan Cen

Fusion of 3D and MS imaging data has a great potential for high-throughput plant phenotyping of structural and biochemical as well as physiological traits simultaneously, which is important for decision support in agriculture and for crop breeders in selecting the best genotypes. However, lacking of 3D data integrity of various plant canopy structures and low-quality of MS images caused by the complex illumination effects make a great challenge, especially at the proximal imaging scale. Therefore, this study proposed a novel approach for adaptive data acquisition and radiometric calibration to generate high-quality 3DMPCs of plants. An efficient NBV planning method based on an UGV plant phenotyping system with a multi-sensor-equipped robotic arm was proposed to achieve adaptive data acquisition. The NeREF was employed to predict the DN values of the hemispherical reference for radiometric calibration. For NBV planning, the average total time for single plant at a joint speed of 1.55 rad/s was about 62.8 s, with an average reduction of 18.0% compared to the unplanned. The integrity of the whole-plant data was improved by an average of 23.6% compared to the fixed viewpoints alone. Compared with the ASD measurements, the RMSE of the reflectance spectra obtained from 3DMPCs at different regions of interest was 0.08 with an average decrease of 58.93% compared to the results obtained from the single-frame of MS images without 3D radiometric calibration. The 3D-calibrated plant 3DMPCs improved the predictive accuracy of PLSR for chlorophyll content, with an average increase of 0.07 in R2 and an average decrease of 21.25% in RMSE. Our approach introduced a fresh perspective on generating high-quality 3DMPCs of plants under the natural light condition, enabling more precise analysis of plant morphological and physiological parameters.

Uncertainty Estimation and Out-of-Distribution Detection for Deep Learning-Based Image Reconstruction using the Local Lipschitz. (arXiv:2305.07618v3 [cs.CV] UPDATED)

Authors: Danyal F. Bhutto, Bo Zhu, Jeremiah Z. Liu, Neha Koonjoo, Hongwei B. Li, Bruce R. Rosen, Matthew S. Rosen

Accurate image reconstruction is at the heart of diagnostics in medical imaging. Supervised deep learning-based approaches have been investigated for solving inverse problems including image reconstruction. However, these trained models encounter unseen data distributions that are widely shifted from training data during deployment. Therefore, it is essential to assess whether a given input falls within the training data distribution for diagnostic purposes. Uncertainty estimation approaches exist but focus on providing an uncertainty map to radiologists, rather than assessing the training distribution fit. In this work, we propose a method based on the local Lipschitz-based metric to distinguish out-of-distribution images from in-distribution with an area under the curve of 99.94%. Empirically, we demonstrate a very strong relationship between the local Lipschitz value and mean absolute error (MAE), supported by a high Spearman's rank correlation coefficient of 0.8475, which determines the uncertainty estimation threshold for optimal model performance. Through the identification of false positives, the local Lipschitz and MAE relationship was used to guide data augmentation and reduce model uncertainty. Our study was validated using the AUTOMAP architecture for sensor-to-image Magnetic Resonance Imaging (MRI) reconstruction. We compare our proposed approach with baseline methods: Monte-Carlo dropout and deep ensembles, and further analysis included MRI denoising and Computed Tomography (CT) sparse-to-full view reconstruction using UNET architectures. We show that our approach is applicable to various architectures and learned functions, especially in the realm of medical image reconstruction, where preserving the diagnostic accuracy of reconstructed images remains paramount.

On the Trade-off of Intra-/Inter-class Diversity for Supervised Pre-training. (arXiv:2305.12224v2 [cs.LG] UPDATED)

Authors: Jieyu Zhang, Bohan Wang, Zhengyu Hu, Pang Wei Koh, Alexander Ratner

Pre-training datasets are critical for building state-of-the-art machine learning models, motivating rigorous study on their impact on downstream tasks. In this work, we study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset. Empirically, we found that with the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity. To understand the underlying mechanism, we show theoretically that the downstream performance depends monotonically on both types of diversity. Notably, our theory reveals that the optimal class-to-sample ratio (#classes / #samples per class) is invariant to the size of the pre-training dataset, which motivates an application of predicting the optimal number of pre-training classes. We demonstrate the effectiveness of this application by an improvement of around 2 points on the downstream tasks when using ImageNet as the pre-training dataset.

Mixture of Experts with Uncertainty Voting for Imbalanced Deep Regression Problems. (arXiv:2305.15178v2 [cs.LG] UPDATED)

Authors: Yuchang Jiang, Vivien Sainte Fare Garnot, Konrad Schindler, Jan Dirk Wegner

Data imbalance is ubiquitous when applying machine learning to real-world problems, particularly regression problems. If training data are imbalanced, the learning is dominated by the densely covered regions of the target distribution, consequently, the learned regressor tends to exhibit poor performance in sparsely covered regions. Beyond standard measures like over-sampling or re-weighting, there are two main directions to handle learning from imbalanced data. For regression, recent work relies on the continuity of the distribution; whereas for classification there has been a trend to employ mixture-of-expert models and let some ensemble members specialize in predictions for the sparser regions. In our method, dubbed MOUV, we propose to leverage recent work on probabilistic deep learning and integrate it in a mixture-of-experts approach for imbalanced regression. We replace traditional regression losses with negative log-likelihood which also predicts sample-wise aleatoric uncertainty. We show experimentally that such a loss handles the imbalance better. Secondly, we use the readily available aleatoric uncertainty values to fuse the predictions of a mixture-of-experts model, thus obviating the need for a separate aggregation module. We compare our method with existing alternatives on multiple public benchmarks and show that MOUV consistently outperforms the prior art, while at the same time producing better calibrated uncertainty estimates. Our code is available at link-upon-publication.

On the Identifiability of Switching Dynamical Systems. (arXiv:2305.15925v3 [stat.ML] UPDATED)

Authors: Carles Balsells-Rodas, Yixin Wang, Yingzhen Li

In the realm of interpretability and out-of-distribution generalisation, the identifiability of latent variable models has emerged as a captivating field of inquiry. In this work, we delve into the identifiability of Switching Dynamical Systems, taking an initial stride toward extending identifiability analysis to sequential latent variable models. We first prove the identifiability of Markov Switching Models, which commonly serve as the prior distribution for the continuous latent variables in Switching Dynamical Systems. We present identification conditions for first-order Markov dependency structures, whose transition distribution is parametrised via non-linear Gaussians. We then establish the identifiability of the latent variables and non-linear mappings in Switching Dynamical Systems up to affine transformations, by leveraging identifiability analysis techniques from identifiable deep latent variable models. We finally develop estimation algorithms for identifiable Switching Dynamical Systems. Throughout empirical studies, we demonstrate the practicality of identifiable Switching Dynamical Systems for segmenting high-dimensional time series such as videos, and showcase the use of identifiable Markov Switching Models for regime-dependent causal discovery in climate data.

A Policy Gradient Method for Confounded POMDPs. (arXiv:2305.17083v2 [stat.ML] UPDATED)

Authors: Mao Hong, Zhengling Qi, Yanxun Xu

In this paper, we propose a policy gradient method for confounded partially observable Markov decision processes (POMDPs) with continuous state and observation spaces in the offline setting. We first establish a novel identification result to non-parametrically estimate any history-dependent policy gradient under POMDPs using the offline data. The identification enables us to solve a sequence of conditional moment restrictions and adopt the min-max learning procedure with general function approximation for estimating the policy gradient. We then provide a finite-sample non-asymptotic bound for estimating the gradient uniformly over a pre-specified policy class in terms of the sample size, length of horizon, concentratability coefficient and the measure of ill-posedness in solving the conditional moment restrictions. Lastly, by deploying the proposed gradient estimation in the gradient ascent algorithm, we show the global convergence of the proposed algorithm in finding the history-dependent optimal policy under some technical conditions. To the best of our knowledge, this is the first work studying the policy gradient method for POMDPs under the offline setting.

Action valuation of on- and off-ball soccer players based on multi-agent deep reinforcement learning. (arXiv:2305.17886v2 [cs.AI] UPDATED)

Authors: Hiroshi Nakahara, Kazushi Tsutsui, Kazuya Takeda, Keisuke Fujii

Analysis of invasive sports such as soccer is challenging because the game situation changes continuously in time and space, and multiple agents individually recognize the game situation and make decisions. Previous studies using deep reinforcement learning have often considered teams as a single agent and valued the teams and players who hold the ball in each discrete event. Then it was challenging to value the actions of multiple players, including players far from the ball, in a spatiotemporally continuous state space. In this paper, we propose a method of valuing possible actions for on- and off-ball soccer players in a single holistic framework based on multi-agent deep reinforcement learning. We consider a discrete action space in a continuous state space that mimics that of Google research football and leverages supervised learning for actions in reinforcement learning. In the experiment, we analyzed the relationships with conventional indicators, season goals, and game ratings by experts, and showed the effectiveness of the proposed method. Our approach can assess how multiple players move continuously throughout the game, which is difficult to be discretized or labeled but vital for teamwork, scouting, and fan engagement.

A rule-general abductive learning by rough sets. (arXiv:2305.19718v3 [cs.LG] UPDATED)

Authors: Xu-chang Guo, Hou-biao Li

In real-world tasks, there is usually a large amount of unlabeled data and labeled data. The task of combining the two to learn is known as semi-supervised learning. Experts can use logical rules to label unlabeled data, but this operation is costly. The combination of perception and reasoning has a good effect in processing such semi-supervised tasks with domain knowledge. However, acquiring domain knowledge and the correction, reduction and generation of rules remain complex problems to be solved. Rough set theory is an important method for solving knowledge processing in information systems. In this paper, we propose a rule general abductive learning by rough set (RS-ABL). By transforming the target concept and sub-concepts of rules into information tables, rough set theory is used to solve the acquisition of domain knowledge and the correction, reduction and generation of rules at a lower cost. This framework can also generate more extensive negative rules to enhance the breadth of the knowledge base. Compared with the traditional semi-supervised learning method, RS-ABL has higher accuracy in dealing with semi-supervised tasks.

Learning Causally Disentangled Representations via the Principle of Independent Causal Mechanisms. (arXiv:2306.01213v2 [cs.LG] UPDATED)

Authors: Aneesh Komanduri, Yongkai Wu, Feng Chen, Xintao Wu

Learning disentangled causal representations is a challenging problem that has gained significant attention recently due to its implications for extracting meaningful information for downstream tasks. In this work, we define a new notion of causal disentanglement from the perspective of independent causal mechanisms. We propose ICM-VAE, a framework for learning causally disentangled representations supervised by causally related observed labels. We model causal mechanisms using learnable flow-based diffeomorphic functions to map noise variables to latent causal variables. Further, to promote the disentanglement of causal factors, we propose a causal disentanglement prior that utilizes the known causal structure to encourage learning a causally factorized distribution in the latent space. Under relatively mild conditions, we provide theoretical results showing the identifiability of causal factors and mechanisms up to permutation and elementwise reparameterization. We empirically demonstrate that our framework induces highly disentangled causal factors, improves interventional robustness, and is compatible with counterfactual generation.

Large Language Models of Code Fail at Completing Code with Potential Bugs. (arXiv:2306.03438v2 [cs.LG] UPDATED)

Authors: Tuan Dinh, Jinman Zhao, Samson Tan, Renato Negrinho, Leonard Lausen, Sheng Zha, George Karypis

Large language models of code (Code-LLMs) have recently brought tremendous advances to code completion, a fundamental feature of programming assistance and code intelligence. However, most existing works ignore the possible presence of bugs in the code context for generation, which are inevitable in software development. Therefore, we introduce and study the buggy-code completion problem, inspired by the realistic scenario of real-time code suggestion where the code context contains potential bugs -- anti-patterns that can become bugs in the completed program. To systematically study the task, we introduce two datasets: one with synthetic bugs derived from semantics-altering operator changes (buggy-HumanEval) and one with realistic bugs derived from user submissions to coding problems (buggy-FixEval). We find that the presence of potential bugs significantly degrades the generation performance of the high-performing Code-LLMs. For instance, the passing rates of CODEGEN-2B-MONO on test cases of buggy-HumanEval drop more than 50% given a single potential bug in the context. Finally, we investigate several post-hoc methods for mitigating the adverse effect of potential bugs and find that there remains a significant gap in post-mitigation performance.

Matched Pair Calibration for Ranking Fairness. (arXiv:2306.03775v3 [cs.LG] UPDATED)

Authors: Hannah Korevaar, Chris McConnell, Edmund Tong, Erik Brinkman, Alana Shine, Misam Abbas, Blossom Metevier, Sam Corbett-Davies, Khalid El-Arini

We propose a test of fairness in score-based ranking systems called matched pair calibration. Our approach constructs a set of matched item pairs with minimal confounding differences between subgroups before computing an appropriate measure of ranking error over the set. The matching step ensures that we compare subgroup outcomes between identically scored items so that measured performance differences directly imply unfairness in subgroup-level exposures. We show how our approach generalizes the fairness intuitions of calibration from a binary classification setting to ranking and connect our approach to other proposals for ranking fairness measures. Moreover, our strategy shows how the logic of marginal outcome tests extends to cases where the analyst has access to model scores. Lastly, we provide an example of applying matched pair calibration to a real-word ranking data set to demonstrate its efficacy in detecting ranking bias.

G-NM: A Group of Numerical Time Series Prediction Models. (arXiv:2306.11667v5 [cs.LG] UPDATED)

Authors: Juyoung Yun

In this study, we focus on the development and implementation of a comprehensive ensemble of numerical time series forecasting models, collectively referred to as the Group of Numerical Time Series Prediction Model (G-NM). This inclusive set comprises traditional models such as Autoregressive Integrated Moving Average (ARIMA), Holt-Winters' method, and Support Vector Regression (SVR), in addition to modern neural network models including Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM). G-NM is explicitly constructed to augment our predictive capabilities related to patterns and trends inherent in complex natural phenomena. By utilizing time series data relevant to these events, G-NM facilitates the prediction of such phenomena over extended periods. The primary objective of this research is to both advance our understanding of such occurrences and to significantly enhance the accuracy of our forecasts. G-NM encapsulates both linear and non-linear dependencies, seasonalities, and trends present in time series data. Each of these models contributes distinct strengths, from ARIMA's resilience in handling linear trends and seasonality, SVR's proficiency in capturing non-linear patterns, to LSTM's adaptability in modeling various components of time series data. Through the exploitation of the G-NM potential, we strive to advance the state-of-the-art in large-scale time series forecasting models. We anticipate that this research will represent a significant stepping stone in our ongoing endeavor to comprehend and forecast the complex events that constitute the natural world.

Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings. (arXiv:2306.17670v3 [cs.NE] UPDATED)

Authors: Ilyass Hammouamri, Ismail Khalfaoui-Hassani, Timothée Masquelier

Spiking Neural Networks (SNNs) are a promising research direction for building power-efficient information processing systems, especially for temporal tasks such as speech recognition. In SNNs, delays refer to the time needed for one spike to travel from one neuron to another. These delays matter because they influence the spike arrival times, and it is well-known that spiking neurons respond more strongly to coincident input spikes. More formally, it has been shown theoretically that plastic delays greatly increase the expressivity in SNNs. Yet, efficient algorithms to learn these delays have been lacking. Here, we propose a new discrete-time algorithm that addresses this issue in deep feedforward SNNs using backpropagation, in an offline manner. To simulate delays between consecutive layers, we use 1D convolutions across time. The kernels contain only a few non-zero weights - one per synapse - whose positions correspond to the delays. These positions are learned together with the weights using the recently proposed Dilated Convolution with Learnable Spacings (DCLS). We evaluated our method on three datasets: the Spiking Heidelberg Dataset (SHD), the Spiking Speech Commands (SSC) and its non-spiking version Google Speech Commands v0.02 (GSC) benchmarks, which require detecting temporal patterns. We used feedforward SNNs with two or three hidden fully connected layers, and vanilla leaky integrate-and-fire neurons. We showed that fixed random delays help and that learning them helps even more. Furthermore, our method outperformed the state-of-the-art in the three datasets without using recurrent connections and with substantially fewer parameters. Our work demonstrates the potential of delay learning in developing accurate and precise models for temporal data processing. Our code is based on PyTorch / SpikingJelly and available at: https://github.com/Thvnvtos/SNN-delays

Adversarial Attacks and Defenses on 3D Point Cloud Classification: A Survey. (arXiv:2307.00309v2 [cs.CV] UPDATED)

Authors: Hanieh Naderi, Ivan V. Bajić

Deep learning has successfully solved a wide range of tasks in 2D vision as a dominant AI technique. Recently, deep learning on 3D point clouds is becoming increasingly popular for addressing various tasks in this field. Despite remarkable achievements, deep learning algorithms are vulnerable to adversarial attacks. These attacks are imperceptible to the human eye but can easily fool deep neural networks in the testing and deployment stage. To encourage future research, this survey summarizes the current progress on adversarial attack and defense techniques on point cloud classification.This paper first introduces the principles and characteristics of adversarial attacks and summarizes and analyzes adversarial example generation methods in recent years. Additionally, it provides an overview of defense strategies, organized into data-focused and model-focused methods. Finally, it presents several current challenges and potential future research directions in this domain.

GeoPhy: Differentiable Phylogenetic Inference via Geometric Gradients of Tree Topologies. (arXiv:2307.03675v2 [cs.LG] UPDATED)

Authors: Takahiro Mimori, Michiaki Hamada

Phylogenetic inference, grounded in molecular evolution models, is essential for understanding the evolutionary relationships in biological data. Accounting for the uncertainty of phylogenetic tree variables, which include tree topologies and evolutionary distances on branches, is crucial for accurately inferring species relationships from molecular data and tasks requiring variable marginalization. Variational Bayesian methods are key to developing scalable, practical models; however, it remains challenging to conduct phylogenetic inference without restricting the combinatorially vast number of possible tree topologies. In this work, we introduce a novel, fully differentiable formulation of phylogenetic inference that leverages a unique representation of topological distributions in continuous geometric spaces. Through practical considerations on design spaces and control variates for gradient estimations, our approach, GeoPhy, enables variational inference without limiting the topological candidates. In experiments using real benchmark datasets, GeoPhy significantly outperformed other approximate Bayesian methods that considered whole topologies.

Intrinsically motivated graph exploration using network theories of human curiosity. (arXiv:2307.04962v4 [cs.LG] UPDATED)

Authors: Shubhankar P. Patankar, Mathieu Ouellet, Juan Cervino, Alejandro Ribeiro, Kieran A. Murphy, Dani S. Bassett

Intrinsically motivated exploration has proven useful for reinforcement learning, even without additional extrinsic rewards. When the environment is naturally represented as a graph, how to guide exploration best remains an open question. In this work, we propose a novel approach for exploring graph-structured data motivated by two theories of human curiosity: the information gap theory and the compression progress theory. The theories view curiosity as an intrinsic motivation to optimize for topological features of subgraphs induced by nodes visited in the environment. We use these proposed features as rewards for graph neural-network-based reinforcement learning. On multiple classes of synthetically generated graphs, we find that trained agents generalize to longer exploratory walks and larger environments than are seen during training. Our method computes more efficiently than the greedy evaluation of the relevant topological properties. The proposed intrinsic motivations bear particular relevance for recommender systems. We demonstrate that next-node recommendations considering curiosity are more predictive of human choices than PageRank centrality in several real-world graph environments.

A Definition of Continual Reinforcement Learning. (arXiv:2307.11046v2 [cs.LG] UPDATED)

Authors: David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh

In a standard view of the reinforcement learning problem, an agent's goal is to efficiently identify a policy that maximizes long-term reward. However, this perspective is based on a restricted view of learning as finding a solution, rather than treating learning as endless adaptation. In contrast, continual reinforcement learning refers to the setting in which the best agents never stop learning. Despite the importance of continual reinforcement learning, the community lacks a simple definition of the problem that highlights its commitments and makes its primary concepts precise and clear. To this end, this paper is dedicated to carefully defining the continual reinforcement learning problem. We formalize the notion of agents that "never stop learning" through a new mathematical language for analyzing and cataloging agents. Using this new language, we define a continual learning agent as one that can be understood as carrying out an implicit search process indefinitely, and continual reinforcement learning as the setting in which the best agents are all continual learning agents. We provide two motivating examples, illustrating that traditional views of multi-task reinforcement learning and continual supervised learning are special cases of our definition. Collectively, these definitions and perspectives formalize many intuitive concepts at the heart of learning, and open new research pathways surrounding continual learning agents.

Duet: efficient and scalable hybriD neUral rElation undersTanding. (arXiv:2307.13494v5 [cs.DB] UPDATED)

Authors: Kaixin Zhang, Hongzhi Wang, Yabin Lu, Ziqi Li, Chang Shu, Yu Yan, Donghua Yang

Learned cardinality estimation methods have achieved high precision compared to traditional methods. Among learned methods, query-driven approaches have faced the workload drift problem for a long time. Although both data-driven and hybrid methods are proposed to avoid this problem, most of them suffer from high training and estimation costs, limited scalability, instability, and long-tail distribution problems on high-dimensional tables, which seriously affects the practical application of learned cardinality estimators. In this paper, we prove that most of these problems are directly caused by the widely used progressive sampling. We solve this problem by introducing predicate information into the autoregressive model and propose Duet, a stable, efficient, and scalable hybrid method to estimate cardinality directly without sampling or any non-differentiable process, which can not only reduce the inference complexity from $O(n)$ to $O(1)$ compared to Naru and UAE but also achieve higher accuracy on high cardinality and high-dimensional tables. Experimental results show that Duet can achieve all the design goals above and be much more practical. Besides, Duet even has a lower inference cost on CPU than that of most learned methods on GPU.

Understanding Forward Process of Convolutional Neural Network. (arXiv:2307.15090v2 [cs.LG] UPDATED)

Authors: Peixin Tian

This paper reveal the selective rotation in the CNNs' forward processing. It elucidates the activation function as a discerning mechanism that unifies and quantizes the rotational aspects of the input data. Experiments show how this defined methodology reflects the progress network distinguish inputs based on statistical indicators, which can be comprehended or analyzed by applying structured mathematical tools. Our findings also unveil the consistency between artificial neural networks and the human brain in their data processing pattern.

Stable Adam Optimization for 16-bit Neural Networks Training. (arXiv:2307.16189v7 [cs.LG] UPDATED)

Authors: Juyoung Yun

In this research, we address critical concerns related to the numerical instability observed in 16-bit computations of machine learning models. Such instability, particularly when employing popular optimization algorithms like Adam, often leads to unstable training of deep neural networks. This not only disrupts the learning process but also poses significant challenges in deploying dependable models in real-world applications. Our investigation identifies the epsilon hyperparameter as the primary source of this instability. A nuanced exploration reveals that subtle adjustments to epsilon within 16-bit computations can enhance the numerical stability of Adam, enabling more stable training of 16-bit neural networks. We propose a novel, dependable approach that leverages updates from the Adam optimizer to bolster the stability of the learning process. Our contributions provide deeper insights into optimization challenges in low-precision computations and offer solutions to ensure the stability of deep neural network training, paving the way for their dependable use in various applications.

Hessian-Aware Bayesian Optimization for Decision Making Systems. (arXiv:2308.00629v4 [cs.LG] UPDATED)

Authors: Mohit Rajpal, Lac Gia Tran, Yehong Zhang, Bryan Kian Hsiang Low

Many approaches for optimizing decision making systems rely on gradient based methods requiring informative feedback from the environment. However, in the case where such feedback is sparse or uninformative, such approaches may result in poor performance. Derivative-free approaches such as Bayesian Optimization mitigate the dependency on the quality of gradient feedback, but are known to scale poorly in the high-dimension setting of complex decision making systems. This problem is exacerbated if the system requires interactions between several actors cooperating to accomplish a shared goal. To address the dimensionality challenge, we propose a compact multi-layered architecture modeling the dynamics of actor interactions through the concept of role. We introduce Hessian-aware Bayesian Optimization to efficiently optimize the multi-layered architecture parameterized by a large number of parameters, and give the first improved regret bound in additive high-dimensional Bayesian Optimization since Mutny & Krause (2018). Our approach shows strong empirical results under malformed or sparse reward.

Constructing Custom Thermodynamics Using Deep Learning. (arXiv:2308.04119v2 [cond-mat.soft] UPDATED)

Authors: Xiaoli Chen, Beatrice W. Soh, Zi-En Ooi, Eleonore Vissol-Gaudin, Haijun Yu, Kostya S. Novoselov, Kedar Hippalgaonkar, Qianxiao Li

One of the most exciting applications of artificial intelligence (AI) is automated scientific discovery based on previously amassed data, coupled with restrictions provided by known physical principles, including symmetries and conservation laws. Such automated hypothesis creation and verification can assist scientists in studying complex phenomena, where traditional physical intuition may fail. Here we develop a platform based on a generalized Onsager principle to learn macroscopic dynamical descriptions of arbitrary stochastic dissipative systems directly from observations of their microscopic trajectories. Our method simultaneously constructs reduced thermodynamic coordinates and interprets the dynamics on these coordinates. We demonstrate its effectiveness by studying theoretically and validating experimentally the stretching of long polymer chains in an externally applied field. Specifically, we learn three interpretable thermodynamic coordinates and build a dynamical landscape of polymer stretching, including the identification of stable and transition states and the control of the stretching rate. Our general methodology can be used to address a wide range of scientific and technological applications.

A Comparative Study of Text Embedding Models for Semantic Text Similarity in Bug Reports. (arXiv:2308.09193v2 [cs.SE] UPDATED)

Authors: Avinash Patil, Kihwan Han, Aryan Jadon

Bug reports are an essential aspect of software development, and it is crucial to identify and resolve them quickly to ensure the consistent functioning of software systems. Retrieving similar bug reports from an existing database can help reduce the time and effort required to resolve bugs. In this paper, we compared the effectiveness of semantic textual similarity methods for retrieving similar bug reports based on a similarity score. We explored several embedding models such as TF-IDF (Baseline), FastText, Gensim, BERT, and ADA. We used the Software Defects Data containing bug reports for various software projects to evaluate the performance of these models. Our experimental results showed that BERT generally outperformed the rest of the models regarding recall, followed by ADA, Gensim, FastText, and TFIDF. Our study provides insights into the effectiveness of different embedding methods for retrieving similar bug reports and highlights the impact of selecting the appropriate one for this task. Our code is available on GitHub.

Neurological Prognostication of Post-Cardiac-Arrest Coma Patients Using EEG Data: A Dynamic Survival Analysis Framework with Competing Risks. (arXiv:2308.11645v2 [eess.SP] UPDATED)

Authors: Xiaobin Shen, Jonathan Elmer, George H. Chen

Patients resuscitated from cardiac arrest who enter a coma are at high risk of death. Forecasting neurological outcomes of these patients (the task of neurological prognostication) could help with treatment decisions. In this paper, we propose, to the best of our knowledge, the first dynamic framework for neurological prognostication of post-cardiac-arrest comatose patients using EEG data: our framework makes predictions for a patient over time as more EEG data become available, and different training patients' available EEG time series could vary in length. Predictions are phrased in terms of either time-to-event outcomes (time-to-awakening or time-to-death) or as the patient's probability of awakening or of dying across multiple time horizons. Our framework uses any dynamic survival analysis model that supports competing risks in the form of estimating patient-level cumulative incidence functions. We consider three competing risks as to what happens first to a patient: awakening, being withdrawn from life-sustaining therapies (and thus deterministically dying), or dying (by other causes). We demonstrate our framework by benchmarking three existing dynamic survival analysis models that support competing risks on a real dataset of 922 patients. Our main experimental findings are that: (1) the classical Fine and Gray model which only uses a patient's static features and summary statistics from the patient's latest hour's worth of EEG data is highly competitive, achieving accuracy scores as high as the recently developed Dynamic-DeepHit model that uses substantially more of the patient's EEG data; and (2) in an ablation study, we show that our choice of modeling three competing risks results in a model that is at least as accurate while learning more information than simpler models (using two competing risks or a standard survival analysis setup with no competing risks).

TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs. (arXiv:2308.13490v2 [cs.LG] UPDATED)

Authors: Phitchaya Mangpo Phothilimthana, Sami Abu-El-Haija, Kaidi Cao, Bahare Fatemi, Charith Mendis, Bryan Perozzi

Precise hardware performance models play a crucial role in code optimizations. They can assist compilers in making heuristic decisions or aid autotuners in identifying the optimal configuration for a given program. For example, the autotuner for XLA, a machine learning compiler, discovered 10-20% speedup on state-of-the-art models serving substantial production traffic at Google. Although there exist a few datasets for program performance prediction, they target small sub-programs such as basic blocks or kernels. This paper introduces TpuGraphs, a performance prediction dataset on full tensor programs, represented as computational graphs, running on Tensor Processing Units (TPUs). Each graph in the dataset represents the main computation of a machine learning workload, e.g., a training epoch or an inference step. Each data sample contains a computational graph, a compilation configuration, and the execution time of the graph when compiled with the configuration. The graphs in the dataset are collected from open-source machine learning programs, featuring popular model architectures, e.g., ResNet, EfficientNet, Mask R-CNN, and Transformer. TpuGraphs provides 25x more graphs than the largest graph property prediction dataset (with comparable graph sizes), and 770x larger graphs on average compared to existing performance prediction datasets on machine learning programs. This graph-level prediction task on large graphs introduces new challenges in learning, ranging from scalability, training efficiency, to model quality.

Linear Oscillation: A Novel Activation Function for Vision Transformer. (arXiv:2308.13670v4 [cs.LG] UPDATED)

Authors: Juyoung Yun

Activation functions are the linchpins of deep learning, profoundly influencing both the representational capacity and training dynamics of neural networks. They shape not only the nature of representations but also optimize convergence rates and enhance generalization potential. Appreciating this critical role, we present the Linear Oscillation (LoC) activation function, defined as $f(x) = x \times \sin(\alpha x + \beta)$. Distinct from conventional activation functions which primarily introduce non-linearity, LoC seamlessly blends linear trajectories with oscillatory deviations. The nomenclature "Linear Oscillation" is a nod to its unique attribute of infusing linear activations with harmonious oscillations, capturing the essence of the "Importance of Confusion". This concept of "controlled confusion" within network activations is posited to foster more robust learning, particularly in contexts that necessitate discerning subtle patterns. Our empirical studies reveal that, when integrated into diverse neural architectures, the LoC activation function consistently outperforms established counterparts like ReLU and Sigmoid. The stellar performance exhibited by the avant-garde Vision Transformer model using LoC further validates its efficacy. This study illuminates the remarkable benefits of the LoC over other prominent activation functions. It champions the notion that intermittently introducing deliberate complexity or "confusion" during training can spur more profound and nuanced learning. This accentuates the pivotal role of judiciously selected activation functions in shaping the future of neural network training.

RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback. (arXiv:2309.00267v2 [cs.CL] UPDATED)

Authors: Harrison Lee, Samrat Phatale, Hassan Mansoor, Thomas Mesnard, Johan Ferret, Kellie Lu, Colton Bishop, Ethan Hall, Victor Carbune, Abhinav Rastogi, Sushant Prakash

Reinforcement learning from human feedback (RLHF) has proven effective in aligning large language models (LLMs) with human preferences. However, gathering high-quality human preference labels can be a time-consuming and expensive endeavor. RL from AI Feedback (RLAIF), introduced by Bai et al., offers a promising alternative that leverages a powerful off-the-shelf LLM to generate preferences in lieu of human annotators. Across the tasks of summarization, helpful dialogue generation, and harmless dialogue generation, RLAIF achieves comparable or superior performance to RLHF, as rated by human evaluators. Furthermore, RLAIF demonstrates the ability to outperform a supervised fine-tuned baseline even when the LLM preference labeler is the same size as the policy. In another experiment, directly prompting the LLM for reward scores achieves superior performance to the canonical RLAIF setup, where LLM preference labels are first distilled into a reward model. Finally, we conduct extensive studies on techniques for generating aligned AI preferences. Our results suggest that RLAIF can achieve human-level performance, offering a potential solution to the scalability limitations of RLHF.

PathLDM: Text conditioned Latent Diffusion Model for Histopathology. (arXiv:2309.00748v2 [cs.CV] UPDATED)

Authors: Srikar Yellapragada, Alexandros Graikos, Prateek Prasanna, Tahsin Kurc, Joel Saltz, Dimitris Samaras

To achieve high-quality results, diffusion models must be trained on large datasets. This can be notably prohibitive for models in specialized domains, such as computational pathology. Conditioning on labeled data is known to help in data-efficient model training. Therefore, histopathology reports, which are rich in valuable clinical information, are an ideal choice as guidance for a histopathology generative model. In this paper, we introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images. Leveraging the rich contextual information provided by pathology text reports, our approach fuses image and textual data to enhance the generation process. By utilizing GPT's capabilities to distill and summarize complex text reports, we establish an effective conditioning mechanism. Through strategic conditioning and necessary architectural enhancements, we achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.

QuantEase: Optimization-based Quantization for Language Models. (arXiv:2309.01885v2 [stat.ML] UPDATED)

Authors: Kayhan Behdin, Ayan Acharya, Aman Gupta, Qingquan Song, Siyu Zhu, Sathiya Keerthi, Rahul Mazumder

With the rising popularity of Large Language Models (LLMs), there has been an increasing interest in compression techniques that enable their efficient deployment. This study focuses on the Post-Training Quantization (PTQ) of LLMs. Drawing from recent advances, our work introduces QuantEase, a layer-wise quantization framework where individual layers undergo separate quantization. The problem is framed as a discrete-structured non-convex optimization, prompting the development of algorithms rooted in Coordinate Descent (CD) techniques. These CD-based methods provide high-quality solutions to the complex non-convex layer-wise quantization problems. Notably, our CD-based approach features straightforward updates, relying solely on matrix and vector operations, circumventing the need for matrix inversion or decomposition. We also explore an outlier-aware variant of our approach, allowing for retaining significant weights (outliers) with complete precision. Our proposal attains state-of-the-art performance in terms of perplexity and zero-shot accuracy in empirical evaluations across various LLMs and datasets, with relative improvements up to 15% over methods such as GPTQ. Leveraging careful linear algebra optimizations, QuantEase can quantize models like Falcon-180B on a single NVIDIA A100 GPU in $\sim$3 hours. Particularly noteworthy is our outlier-aware algorithm's capability to achieve near or sub-3-bit quantization of LLMs with an acceptable drop in accuracy, obviating the need for non-uniform quantization or grouping techniques, improving upon methods such as SpQR by up to two times in terms of perplexity.

Symplectic Structure-Aware Hamiltonian (Graph) Embeddings. (arXiv:2309.04885v3 [cs.LG] UPDATED)

Authors: Jiaxu Liu, Xinping Yi, Tianle Zhang, Xiaowei Huang

In traditional Graph Neural Networks (GNNs), the assumption of a fixed embedding manifold often limits their adaptability to diverse graph geometries. Recently, Hamiltonian system-inspired GNNs have been proposed to address the dynamic nature of such embeddings by incorporating physical laws into node feature updates. We present Symplectic Structure-Aware Hamiltonian GNN (SAH-GNN), a novel approach that generalizes Hamiltonian dynamics for more flexible node feature updates. Unlike existing Hamiltonian approaches, SAH-GNN employs Riemannian optimization on the symplectic Stiefel manifold to adaptively learn the underlying symplectic structure, circumventing the limitations of existing Hamiltonian GNNs that rely on a pre-defined form of standard symplectic structure. This innovation allows SAH-GNN to automatically adapt to various graph datasets without extensive hyperparameter tuning. Moreover, it conserves energy during training meaning the implicit Hamiltonian system is physically meaningful. Finally, we empirically validate SAH-GNN's superiority and adaptability in node classification tasks across multiple types of graph datasets.

Beta Diffusion. (arXiv:2309.07867v3 [cs.LG] UPDATED)

Authors: Mingyuan Zhou, Tianqi Chen, Zhendong Wang, Huangjie Zheng

We introduce beta diffusion, a novel generative modeling method that integrates demasking and denoising to generate data within bounded ranges. Using scaled and shifted beta distributions, beta diffusion utilizes multiplicative transitions over time to create both forward and reverse diffusion processes, maintaining beta distributions in both the forward marginals and the reverse conditionals, given the data at any point in time. Unlike traditional diffusion-based generative models relying on additive Gaussian noise and reweighted evidence lower bounds (ELBOs), beta diffusion is multiplicative and optimized with KL-divergence upper bounds (KLUBs) derived from the convexity of the KL divergence. We demonstrate that the proposed KLUBs are more effective for optimizing beta diffusion compared to negative ELBOs, which can also be derived as the KLUBs of the same KL divergence with its two arguments swapped. The loss function of beta diffusion, expressed in terms of Bregman divergence, further supports the efficacy of KLUBs for optimization. Experimental results on both synthetic data and natural images demonstrate the unique capabilities of beta diffusion in generative modeling of range-bounded data and validate the effectiveness of KLUBs in optimizing diffusion models, thereby making them valuable additions to the family of diffusion-based generative models and the optimization techniques used to train them.

Efficient Pauli channel estimation with logarithmic quantum memory. (arXiv:2309.14326v2 [quant-ph] UPDATED)

Authors: Sitan Chen, Weiyuan Gong

Here we revisit one of the prototypical tasks for characterizing the structure of noise in quantum devices: estimating every eigenvalue of an $n$-qubit Pauli noise channel to error $\epsilon$. Prior work (Chen et al., 2022) proved no-go theorems for this task in the practical regime where one has a limited amount of quantum memory, e.g. any protocol with $\le 0.99n$ ancilla qubits of quantum memory must make exponentially many measurements, provided it is non-concatenating. Such protocols can only interact with the channel by repeatedly preparing a state, passing it through the channel, and measuring immediately afterward.

This left open a natural question: does the lower bound hold even for general protocols, i.e. ones which chain together many queries to the channel, interleaved with arbitrary data-processing channels, before measuring? Surprisingly, in this work we show the opposite: there is a protocol that can estimate the eigenvalues of a Pauli channel to error $\epsilon$ using only $O(\log n/\epsilon^2)$ ancilla qubits and $\tilde{O}(n^2/\epsilon^2)$ measurements. In contrast, we show that any protocol with zero ancilla, even a concatenating one, must make $\Omega(2^n/\epsilon^2)$ measurements, which is tight.

Our results imply, to our knowledge, the first quantum learning task where logarithmically many qubits of quantum memory suffice for an exponential statistical advantage.

UAV-assisted Semantic Communication with Hybrid Action Reinforcement Learning. (arXiv:2309.16713v2 [cs.NI] UPDATED)

Authors: Peiyuan Si, Jun Zhao, Kwok-Yan Lam, Qing Yang

In this paper, we aim to explore the use of uplink semantic communications with the assistance of UAV in order to improve data collection effiicency for metaverse users in remote areas. To reduce the time for uplink data collection while balancing the trade-off between reconstruction quality and computational energy cost, we propose a hybrid action reinforcement learning (RL) framework to make decisions on semantic model scale, channel allocation, transmission power, and UAV trajectory. The variables are classified into discrete type and continuous type, which are optimized by two different RL agents to generate the combined action. Simulation results indicate that the proposed hybrid action reinforcement learning framework can effectively improve the efficiency of uplink semantic data collection under different parameter settings and outperforms the benchmark scenarios.

Persona-Coded Poly-Encoder: Persona-Guided Multi-Stream Conversational Sentence Scoring. (arXiv:2309.16770v2 [cs.CL] UPDATED)

Authors: Junfeng Liu, Christopher Symons, Ranga Raju Vatsavai

Recent advances in machine learning and deep learning have led to the widespread use of Conversational AI in many practical applications. However, it is still very challenging to leverage auxiliary information that can provide conversational context or personalized tuning to improve the quality of conversations. For example, there has only been limited research on using an individuals persona information to improve conversation quality, and even state-of-the-art conversational AI techniques are unable to effectively leverage signals from heterogeneous sources of auxiliary data, such as multi-modal interaction data, demographics, SDOH data, etc. In this paper, we present a novel Persona-Coded Poly-Encoder method that leverages persona information in a multi-stream encoding scheme to improve the quality of response generation for conversations. To show the efficacy of the proposed method, we evaluate our method on two different persona-based conversational datasets, and compared against two state-of-the-art methods. Our experimental results and analysis demonstrate that our method can improve conversation quality over the baseline method Poly-Encoder by 3.32% and 2.94% in terms of BLEU score and HR@1, respectively. More significantly, our method offers a path to better utilization of multi-modal data in conversational tasks. Lastly, our study outlines several challenges and future research directions for advancing personalized conversational AI technology.

Distributional PAC-Learning from Nisan's Natural Proofs. (arXiv:2310.03641v2 [cs.CC] UPDATED)

Authors: Ari Karchmer

Carmosino et al. (2016) demonstrated that natural proofs of circuit lower bounds for $\Lambda$ imply efficient algorithms for learning $\Lambda$-circuits, but only over \textit{the uniform distribution}, with \textit{membership queries}, and provided $\AC^0[p] \subseteq \Lambda$. We consider whether this implication can be generalized to $\Lambda \not\supseteq \AC^0[p]$, and to learning algorithms which use only random examples and learn over arbitrary example distributions (Valiant's PAC-learning model).

We first observe that, if, for any circuit class $\Lambda$, there is an implication from natural proofs for $\Lambda$ to PAC-learning for $\Lambda$, then standard assumptions from lattice-based cryptography do not hold. In particular, we observe that depth-2 majority circuits are a (conditional) counter example to the implication, since Nisan (1993) gave a natural proof, but Klivans and Sherstov (2009) showed hardness of PAC-learning under lattice-based assumptions. We thus ask: what learning algorithms can we reasonably expect to follow from Nisan's natural proofs?

Our main result is that all natural proofs arising from a type of communication complexity argument, including Nisan's, imply PAC-learning algorithms in a new \textit{distributional} variant (i.e., an ``average-case'' relaxation) of Valiant's PAC model. Our distributional PAC model is stronger than the average-case prediction model of Blum et al. (1993) and the heuristic PAC model of Nanashima (2021), and has several important properties which make it of independent interest, such as being \textit{boosting-friendly}. The main applications of our result are new distributional PAC-learning algorithms for depth-2 majority circuits, polytopes and DNFs over natural target distributions, as well as the nonexistence of encoded-input weak PRFs that can be evaluated by depth-2 majority circuits.

HyperAttention: Long-context Attention in Near-Linear Time. (arXiv:2310.05869v3 [cs.LG] UPDATED)

Authors: Insu Han, Rajesh Jayaram, Amin Karbasi, Vahab Mirrokni, David P. Woodruff, Amir Zandieh

We present an approximate attention mechanism named HyperAttention to address the computational challenges posed by the growing complexity of long contexts used in Large Language Models (LLMs). Recent work suggests that in the worst-case scenario, quadratic time is necessary unless the entries of the attention matrix are bounded or the matrix has low stable rank. We introduce two parameters which measure: (1) the max column norm in the normalized attention matrix, and (2) the ratio of row norms in the unnormalized attention matrix after detecting and removing large entries. We use these fine-grained parameters to capture the hardness of the problem. Despite previous lower bounds, we are able to achieve a linear time sampling algorithm even when the matrix has unbounded entries or a large stable rank, provided the above parameters are small. HyperAttention features a modular design that easily accommodates integration of other fast low-level implementations, particularly FlashAttention. Empirically, employing Locality Sensitive Hashing (LSH) to identify large entries, HyperAttention outperforms existing methods, giving significant speed improvements compared to state-of-the-art solutions like FlashAttention. We validate the empirical performance of HyperAttention on a variety of different long-context length datasets. For example, HyperAttention makes the inference time of ChatGLM2 50\% faster on 32k context length while perplexity increases from 5.6 to 6.3. On larger context length, e.g., 131k, with causal masking, HyperAttention offers 5-fold speedup on a single attention layer.

MuseChat: A Conversational Music Recommendation System for Videos. (arXiv:2310.06282v3 [cs.LG] UPDATED)

Authors: Zhikang Dong, Bin Chen, Xiulong Liu, Pawel Polak, Peng Zhang

Music recommendation for videos attracts growing interest in multi-modal research. However, existing systems focus primarily on content compatibility, often ignoring the users' preferences. Their inability to interact with users for further refinements or to provide explanations leads to a less satisfying experience. We address these issues with MuseChat, a first-of-its-kind dialogue-based recommendation system that personalizes music suggestions for videos. Our system consists of two key functionalities with associated modules: recommendation and reasoning. The recommendation module takes a video along with optional information including previous suggested music and user's preference as inputs and retrieves an appropriate music matching the context. The reasoning module, equipped with the power of Large Language Model (Vicuna-7B) and extended to multi-modal inputs, is able to provide reasonable explanation for the recommended music. To evaluate the effectiveness of MuseChat, we build a large-scale dataset, conversational music recommendation for videos, that simulates a two-turn interaction between a user and a recommender based on accurate music track information. Experiment results show that MuseChat achieves significant improvements over existing video-based music retrieval methods as well as offers strong interpretability and interactability.

iTransformer: Inverted Transformers Are Effective for Time Series Forecasting. (arXiv:2310.06625v2 [cs.LG] UPDATED)

Authors: Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, Mingsheng Long

The recent boom of linear forecasting models questions the ongoing passion for architectural modifications of Transformer-based forecasters. These forecasters leverage Transformers to model the global dependencies over temporal tokens of time series, with each token formed by multiple variates of the same timestamp. However, Transformers are challenged in forecasting series with larger lookback windows due to performance degradation and computation explosion. Besides, the embedding for each temporal token fuses multiple variates that represent potential delayed events and distinct physical measurements, which may fail in learning variate-centric representations and result in meaningless attention maps. In this work, we reflect on the competent duties of Transformer components and repurpose the Transformer architecture without any modification to the basic components. We propose iTransformer that simply applies the attention and feed-forward network on the inverted dimensions. Specifically, the time points of individual series are embedded into variate tokens which are utilized by the attention mechanism to capture multivariate correlations; meanwhile, the feed-forward network is applied for each variate token to learn nonlinear representations. The iTransformer model achieves state-of-the-art on challenging real-world datasets, which further empowers the Transformer family with promoted performance, generalization ability across different variates, and better utilization of arbitrary lookback windows, making it a nice alternative as the fundamental backbone of time series forecasting.

Diffusion Models for Wireless Communications. (arXiv:2310.07312v3 [cs.IT] UPDATED)

Authors: Mehdi Letafati, Samad Ali, Matti Latva-aho

Innovative foundation models, such as GPT-4 and stable diffusion models, have made a paradigm shift in the realm of artificial intelligence (AI) towards generative AI-based systems. AI and machine learning (AI/ML) algorithms are envisioned to be pervasively incorporated into the future wireless communications systems. In this article, we outline the applications of diffusion models in wireless communication systems, which are a new family of probabilistic generative models that have showcased state-of-the-art performance. The key idea is to decompose data generation process over "denoising" steps, gradually generating samples out of noise. Based on two case studies presented, we show how diffusion models can be employed for the development of resilient AI-native communication systems. Specifically, we propose denoising diffusion probabilistic models (DDPM) for a wireless communication scheme with non-ideal transceivers, where 30% improvement is achieved in terms of bit error rate. In the other example, DDPM is employed at the transmitter to shape the constellation symbols, highlighting a robust out-of-distribution performance.

Contextualized Policy Recovery: Modeling and Interpreting Medical Decisions with Adaptive Imitation Learning. (arXiv:2310.07918v3 [cs.LG] UPDATED)

Authors: Jannik Deuschel, Caleb N. Ellington, Benjamin J. Lengerich, Yingtao Luo, Pascal Friederich, Eric P. Xing

Interpretable policy learning seeks to estimate intelligible decision policies from observed actions; however, existing models fall short by forcing a tradeoff between accuracy and interpretability. This tradeoff limits data-driven interpretations of human decision-making process. e.g. to audit medical decisions for biases and suboptimal practices, we require models of decision processes which provide concise descriptions of complex behaviors. Fundamentally, existing approaches are burdened by this tradeoff because they represent the underlying decision process as a universal policy, when in fact human decisions are dynamic and can change drastically with contextual information. Thus, we propose Contextualized Policy Recovery (CPR), which re-frames the problem of modeling complex decision processes as a multi-task learning problem in which complex decision policies are comprised of context-specific policies. CPR models each context-specific policy as a linear observation-to-action mapping, and generates new decision models $\textit{on-demand}$ as contexts are updated with new observations. CPR is compatible with fully offline and partially observable decision environments, and can be tailored to incorporate any recurrent black-box model or interpretable decision model. We assess CPR through studies on simulated and real data, achieving state-of-the-art performance on the canonical tasks of predicting antibiotic prescription in intensive care units ($+22\%$ AUROC vs. previous SOTA) and predicting MRI prescription for Alzheimer's patients ($+7.7\%$ AUROC vs. previous SOTA). With this improvement in predictive performance, CPR closes the accuracy gap between interpretable and black-box methods for policy learning, allowing high-resolution exploration and analysis of context-specific decision models.

In-Context Pretraining: Language Modeling Beyond Document Boundaries. (arXiv:2310.10638v4 [cs.CL] UPDATED)

Authors: Weijia Shi, Sewon Min, Maria Lomeli, Chunting Zhou, Margaret Li, Rich James, Xi Victoria Lin, Noah A. Smith, Luke Zettlemoyer, Scott Yih, Mike Lewis

Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion. Existing pretraining pipelines train LMs by concatenating random sets of short documents to create input contexts but the prior documents provide no signal for predicting the next document. We instead present In-Context Pretraining, a new approach where language models are pretrained on a sequence of related documents, thereby explicitly encouraging them to read and reason across document boundaries. We can do In-Context Pretraining by simply changing the document ordering so that each context contains related documents, and directly applying existing pretraining pipelines. However, this document sorting problem is challenging. There are billions of documents and we would like the sort to maximize contextual similarity for every document without repeating any data. To do this, we introduce approximate algorithms for finding related documents with efficient nearest neighbor search and constructing coherent input contexts with a graph traversal algorithm. Our experiments show In-Context Pretraining offers a simple and scalable approach to significantly enhance LMs'performance: we see notable improvements in tasks that require more complex contextual reasoning, including in-context learning (+8%), reading comprehension (+15%), faithfulness to previous contexts (+16%), long-context reasoning (+5%), and retrieval augmentation (+9%).

Denoising Heat-inspired Diffusion with Insulators for Collision Free Motion Planning. (arXiv:2310.12609v2 [cs.RO] UPDATED)

Authors: Junwoo Chang, Hyunwoo Ryu, Jiwoo Kim, Soochul Yoo, Joohwan Seo, Nikhil Prakash, Jongeun Choi, Roberto Horowitz

Diffusion models have risen as a powerful tool in robotics due to their flexibility and multi-modality. While some of these methods effectively address complex problems, they often depend heavily on inference-time obstacle detection and require additional equipment. Addressing these challenges, we present a method that, during inference time, simultaneously generates only reachable goals and plans motions that avoid obstacles, all from a single visual input. Central to our approach is the novel use of a collision-avoiding diffusion kernel for training. Through evaluations against behavior-cloning and classical diffusion models, our framework has proven its robustness. It is particularly effective in multi-modal environments, navigating toward goals and avoiding unreachable ones blocked by obstacles, while ensuring collision avoidance.

Universal representation by Boltzmann machines with Regularised Axons. (arXiv:2310.14395v2 [cond-mat.stat-mech] UPDATED)

Authors: Przemysław R. Grzybowski, Antoni Jankiewicz, Eloy Piñol, David Cirauqui, Dorota H. Grzybowska, Paweł M. Petrykowski, Miguel Ángel García-March, Maciej Lewenstein, Gorka Muñoz-Gil, Alejandro Pozas-Kerstjens

It is widely known that Boltzmann machines are capable of representing arbitrary probability distributions over the values of their visible neurons, given enough hidden ones. However, sampling -- and thus training -- these models can be numerically hard. Recently we proposed a regularisation of the connections of Boltzmann machines, in order to control the energy landscape of the model, paving a way for efficient sampling and training. Here we formally prove that such regularised Boltzmann machines preserve the ability to represent arbitrary distributions. This is in conjunction with controlling the number of energy local minima, thus enabling easy \emph{guided} sampling and training. Furthermore, we explicitly show that regularised Boltzmann machines can store exponentially many arbitrarily correlated visible patterns with perfect retrieval, and we connect them to the Dense Associative Memory networks.

VMAF Re-implementation on PyTorch: Some Experimental Results. (arXiv:2310.15578v3 [cs.LG] UPDATED)

Authors: Kirill Aistov, Maxim Koroteev

Based on the standard VMAF implementation we propose an implementation of VMAF using PyTorch framework. For this implementation comparisons with the standard (libvmaf) show the discrepancy $\lesssim 10^{-2}$ in VMAF units. We investigate gradients computation when using VMAF as an objective function and demonstrate that training using this function does not result in ill-behaving gradients. The implementation is then used to train a preprocessing filter. It is demonstrated that its performance is superior to the unsharp masking filter. The resulting filter is also easy for implementation and can be applied in video processing tasks for video copression improvement. This is confirmed by the results of numerical experiments.

Harnessing machine learning for accurate treatment of overlapping opacity species in GCMs. (arXiv:2311.00775v2 [astro-ph.EP] UPDATED)

Authors: Aaron David Schneider, Paul Mollière, Gilles Louppe, Ludmila Carone, Uffe Gråe Jørgensen, Leen Decin, Christiane Helling

To understand high precision observations of exoplanets and brown dwarfs, we need detailed and complex general circulation models (GCMs) that incorporate hydrodynamics, chemistry, and radiation. In this study, we specifically examine the coupling between chemistry and radiation in GCMs and compare different methods for mixing opacities of different chemical species in the correlated-k assumption, when equilibrium chemistry cannot be assumed. We propose a fast machine learning method based on DeepSets (DS), which effectively combines individual correlated-k opacities (k-tables). We evaluate the DS method alongside other published methods like adaptive equivalent extinction (AEE) and random overlap with rebinning and resorting (RORR). We integrate these mixing methods into our GCM (expeRT/MITgcm) and assess their accuracy and performance for the example of the hot Jupiter HD~209458 b. Our findings indicate that the DS method is both accurate and efficient for GCM usage, whereas RORR is too slow. Additionally, we observe that the accuracy of AEE depends on its specific implementation and may introduce numerical issues in achieving radiative transfer solution convergence. We then apply the DS mixing method in a simplified chemical disequilibrium situation, where we model the rainout of TiO and VO, and confirm that the rainout of TiO and VO would hinder the formation of a stratosphere. To further expedite the development of consistent disequilibrium chemistry calculations in GCMs, we provide documentation and code for coupling the DS mixing method with correlated-k radiative transfer solvers. The DS method has been extensively tested to be accurate enough for GCMs, however, other methods might be needed for accelerating atmospheric retrievals.

Upper and lower bounds for the Lipschitz constant of random neural networks. (arXiv:2311.01356v2 [stat.ML] UPDATED)

Authors: Paul Geuchen, Thomas Heindl, Dominik Stöger, Felix Voigtlaender

Empirical studies have widely demonstrated that neural networks are highly sensitive to small, adversarial perturbations of the input. The worst-case robustness against these so-called adversarial examples can be quantified by the Lipschitz constant of the neural network. In this paper, we study upper and lower bounds for the Lipschitz constant of random ReLU neural networks. Specifically, we assume that the weights and biases follow a generalization of the He initialization, where general symmetric distributions for the biases are permitted. For shallow neural networks, we characterize the Lipschitz constant up to an absolute numerical constant. For deep networks with fixed depth and sufficiently large width, our established bounds differ by a factor that is logarithmic in the width.

Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models. (arXiv:2311.03687v2 [cs.PF] UPDATED)

Authors: Longteng Zhang, Xiang Liu, Zeyu Li, Xinglin Pan, Peijie Dong, Ruibo Fan, Rui Guo, Xin Wang, Qiong Luo, Shaohuai Shi, Xiaowen Chu

Large Language Models (LLMs) have seen great advance in both academia and industry, and their popularity results in numerous open-source frameworks and techniques in accelerating LLM pre-training, fine-tuning, and inference. Training and deploying LLMs are expensive as it requires considerable computing resources and memory, hence many efficient approaches have been developed for improving system pipelines as well as operators. However, the runtime performance can vary significantly across hardware and software stacks, which makes it difficult to choose the best configuration. In this work, we aim to benchmark the performance from both macro and micro perspectives. First, we benchmark the end-to-end performance of pre-training, fine-tuning, and serving LLMs in different sizes , i.e., 7, 13, and 70 billion parameters (7B, 13B, and 70B) on three 8-GPU platforms with and without individual optimization techniques, including ZeRO, quantization, recomputation, FlashAttention. Then, we dive deeper to provide a detailed runtime analysis of the sub-modules, including computing and communication operators in LLMs. For end users, our benchmark and findings help better understand different optimization techniques, training and inference frameworks, together with hardware platforms in choosing configurations for deploying LLMs. For researchers, our in-depth module-wise analyses discover potential opportunities for future work to further optimize the runtime performance of LLMs.

Token-Level Adaptation of LoRA Adapters for Downstream Task Generalization. (arXiv:2311.10847v2 [cs.CL] UPDATED)

Authors: Joshua Belofsky

This paper introduces a method for adapting LoRA adapters in smaller-sized language models to arbitrary downstream tasks. Unlike standard mixture-of-expert architectures, our method employs a gradient-free routing function to choose a weighted combination of experts without increasing the compute requirements for training or inference. The results show that token-level adaptation of LoRA adapters outperforms the base Llama-2-7b model across mathematical (GSM8K), scientific (ARC-Challenge), reading comprehension (SQuAD), and coding (CodeAlpaca-20k) tasks. Further evaluations also show that the average performance of token-level adaptation outperforms individual models fine-tuned for each of the tasks with the best performance observed in adaptation of every-other token during inference. The code for this study is made available through a public repository.

Fast Controllable Diffusion Models for Undersampled MRI Reconstruction. (arXiv:2311.12078v2 [eess.IV] UPDATED)

Authors: Wei Jiang, Zhuang Xiong, Feng Liu, Nan Ye, Hongfu Sun

Supervised deep learning methods have shown promise in undersampled Magnetic Resonance Imaging (MRI) reconstruction, but their requirement for paired data limits their generalizability to the diverse MRI acquisition parameters. Recently, unsupervised controllable generative diffusion models have been applied to undersampled MRI reconstruction, without paired data or model retraining for different MRI acquisitions. However, diffusion models are generally slow in sampling and state-of-the-art acceleration techniques can lead to sub-optimal results when directly applied to the controllable generation process. This study introduces a new algorithm called Predictor-Projector-Noisor (PPN), which enhances and accelerates controllable generation of diffusion models for undersampled MRI reconstruction. Our results demonstrate that PPN produces high-fidelity MR images that conform to undersampled k-space measurements with significantly shorter reconstruction time than other controllable sampling methods. In addition, the unsupervised PPN accelerated diffusion models are adaptable to different MRI acquisition parameters, making them more practical for clinical use than supervised learning techniques.

A density estimation perspective on learning from pairwise human preferences. (arXiv:2311.14115v2 [cs.LG] UPDATED)

Authors: Vincent Dumoulin, Daniel D. Johnson, Pablo Samuel Castro, Hugo Larochelle, Yann Dauphin

Learning from human feedback (LHF) -- and in particular learning from pairwise preferences -- has recently become a crucial ingredient in training large language models (LLMs), and has been the subject of much research. Most recent works frame it as a reinforcement learning problem, where a reward function is learned from pairwise preference data and the LLM is treated as a policy which is adapted to maximize the rewards, often under additional regularization constraints. We propose an alternative interpretation which centers on the generative process for pairwise preferences and treats LHF as a density estimation problem. We provide theoretical and empirical results showing that for a family of generative processes defined via preference behavior distribution equations, training a reward function on pairwise preferences effectively models an annotator's implicit preference distribution. Finally, we discuss and present findings on "annotator misspecification" -- failure cases where wrong modeling assumptions are made about annotator behavior, resulting in poorly-adapted models -- suggesting that approaches that learn from pairwise human preferences could have trouble learning from a population of annotators with diverse viewpoints.

Decision Tree Psychological Risk Assessment in Currency Trading. (arXiv:2311.15222v2 [cs.LG] UPDATED)

Authors: Jai Pal

This research paper focuses on the integration of Artificial Intelligence (AI) into the currency trading landscape, positing the development of personalized AI models, essentially functioning as intelligent personal assistants tailored to the idiosyncrasies of individual traders. The paper posits that AI models are capable of identifying nuanced patterns within the trader's historical data, facilitating a more accurate and insightful assessment of psychological risk dynamics in currency trading. The PRI is a dynamic metric that experiences fluctuations in response to market conditions that foster psychological fragility among traders. By employing sophisticated techniques, a classifying decision tree is crafted, enabling clearer decision-making boundaries within the tree structure. By incorporating the user's chronological trade entries, the model becomes adept at identifying critical junctures when psychological risks are heightened. The real-time nature of the calculations enhances the model's utility as a proactive tool, offering timely alerts to traders about impending moments of psychological risks. The implications of this research extend beyond the confines of currency trading, reaching into the realms of other industries where the judicious application of personalized modeling emerges as an efficient and strategic approach. This paper positions itself at the intersection of cutting-edge technology and the intricate nuances of human psychology, offering a transformative paradigm for decision making support in dynamic and high-pressure environments.

Stability-Informed Initialization of Neural Ordinary Differential Equations. (arXiv:2311.15890v2 [cs.LG] UPDATED)

Authors: Theodor Westny, Arman Mohammadi, Daniel Jung, Erik Frisk

This paper addresses the training of Neural Ordinary Differential Equations (neural ODEs), and in particular explores the interplay between numerical integration techniques, stability regions, step size, and initialization techniques. It is shown how the choice of integration technique implicitly regularizes the learned model, and how the solver's corresponding stability region affects training and prediction performance. From this analysis, a stability-informed parameter initialization technique is introduced. The effectiveness of the initialization method is displayed across several learning benchmarks and industrial applications.

SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery. (arXiv:2311.17179v2 [cs.CV] UPDATED)

Authors: Konstantin Klemmer, Esther Rolf, Caleb Robinson, Lester Mackey, Marc Rußwurm

Geographic location is essential for modeling tasks in fields ranging from ecology to epidemiology to the Earth system sciences. However, extracting relevant and meaningful characteristics of a location can be challenging, often entailing expensive data fusion or data distillation from global imagery datasets. To address this challenge, we introduce Satellite Contrastive Location-Image Pretraining (SatCLIP), a global, general-purpose geographic location encoder that learns an implicit representation of locations from openly available satellite imagery. Trained location encoders provide vector embeddings summarizing the characteristics of any given location for convenient usage in diverse downstream tasks. We show that SatCLIP embeddings, pretrained on globally sampled multi-spectral Sentinel-2 satellite data, can be used in various predictive tasks that depend on location information but not necessarily satellite imagery, including temperature prediction, animal recognition in imagery, and population density estimation. Across tasks, SatCLIP embeddings consistently outperform embeddings from existing pretrained location encoders, ranging from models trained on natural images to models trained on semantic context. SatCLIP embeddings also help to improve geographic generalization. This demonstrates the potential of general-purpose location encoders and opens the door to learning meaningful representations of our planet from the vast, varied, and largely untapped modalities of geospatial data.

Enhancing the Performance of Neural Networks Through Causal Discovery and Integration of Domain Knowledge. (arXiv:2311.17303v2 [cs.LG] UPDATED)

Authors: Xiaoge Zhang, Xiao-Lin Wang, Fenglei Fan, Yiu-Ming Cheung, Indranil Bose

In this paper, we develop a generic methodology to encode hierarchical causality structure among observed variables into a neural network in order to improve its predictive performance. The proposed methodology, called causality-informed neural network (CINN), leverages three coherent steps to systematically map the structural causal knowledge into the layer-to-layer design of neural network while strictly preserving the orientation of every causal relationship. In the first step, CINN discovers causal relationships from observational data via directed acyclic graph (DAG) learning, where causal discovery is recast as a continuous optimization problem to avoid the combinatorial nature. In the second step, the discovered hierarchical causality structure among observed variables is systematically encoded into neural network through a dedicated architecture and customized loss function. By categorizing variables in the causal DAG as root, intermediate, and leaf nodes, the hierarchical causal DAG is translated into CINN with a one-to-one correspondence between nodes in the causal DAG and units in the CINN while maintaining the relative order among these nodes. Regarding the loss function, both intermediate and leaf nodes in the DAG graph are treated as target outputs during CINN training so as to drive co-learning of causal relationships among different types of nodes. As multiple loss components emerge in CINN, we leverage the projection of conflicting gradients to mitigate gradient interference among the multiple learning tasks. Computational experiments across a broad spectrum of UCI data sets demonstrate substantial advantages of CINN in predictive performance over other state-of-the-art methods. In addition, an ablation study underscores the value of integrating structural and quantitative causal knowledge in enhancing the neural network's predictive performance incrementally.

On the Adversarial Robustness of Graph Contrastive Learning Methods. (arXiv:2311.17853v2 [cs.LG] UPDATED)

Authors: Filippo Guerranti, Zinuo Yi, Anna Starovoit, Rafiq Kamel, Simon Geisler, Stephan Günnemann

Contrastive learning (CL) has emerged as a powerful framework for learning representations of images and text in a self-supervised manner while enhancing model robustness against adversarial attacks. More recently, researchers have extended the principles of contrastive learning to graph-structured data, giving birth to the field of graph contrastive learning (GCL). However, whether GCL methods can deliver the same advantages in adversarial robustness as their counterparts in the image and text domains remains an open question. In this paper, we introduce a comprehensive robustness evaluation protocol tailored to assess the robustness of GCL models. We subject these models to adaptive adversarial attacks targeting the graph structure, specifically in the evasion scenario. We evaluate node and graph classification tasks using diverse real-world datasets and attack strategies. With our work, we aim to offer insights into the robustness of GCL methods and hope to open avenues for potential future research directions.

Learning Robust Precipitation Forecaster by Temporal Frame Interpolation. (arXiv:2311.18341v2 [cs.LG] UPDATED)

Authors: Lu Han, Xu-Yang Chen, Han-Jia Ye, De-Chuan Zhan

Recent advances in deep learning have significantly elevated weather prediction models. However, these models often falter in real-world scenarios due to their sensitivity to spatial-temporal shifts. This issue is particularly acute in weather forecasting, where models are prone to overfit to local and temporal variations, especially when tasked with fine-grained predictions. In this paper, we address these challenges by developing a robust precipitation forecasting model that demonstrates resilience against such spatial-temporal discrepancies. We introduce Temporal Frame Interpolation (TFI), a novel technique that enhances the training dataset by generating synthetic samples through interpolating adjacent frames from satellite imagery and ground radar data, thus improving the model's robustness against frame noise. Moreover, we incorporate a unique Multi-Level Dice (ML-Dice) loss function, leveraging the ordinal nature of rainfall intensities to improve the model's performance. Our approach has led to significant improvements in forecasting precision, culminating in our model securing \textit{1st place} in the transfer learning leaderboard of the \textit{Weather4cast'23} competition. This achievement not only underscores the effectiveness of our methodologies but also establishes a new standard for deep learning applications in weather forecasting. Our code and weights have been public on \url{https://github.com/Secilia-Cxy/UNetTFI}.

Continuous 16-bit Training: Accelerating 32-bit Pre-Trained Neural Networks. (arXiv:2311.18587v2 [cs.LG] UPDATED)

Authors: Juyoung Yun

In the field of deep learning, the prevalence of models initially trained with 32-bit precision is a testament to its robustness and accuracy. However, the continuous evolution of these models often demands further training, which can be resource-intensive. This study introduces a novel approach where we continue the training of these pre-existing 32-bit models using 16-bit precision. This technique not only caters to the need for efficiency in computational resources but also significantly improves the speed of additional training phases. By adopting 16-bit precision for ongoing training, we are able to substantially decrease memory requirements and computational burden, thereby accelerating the training process in a resource-limited setting. Our experiments show that this method maintains the high standards of accuracy set by the original 32-bit training while providing a much-needed boost in training speed. This approach is especially pertinent in today's context, where most models are initially trained in 32-bit and require periodic updates and refinements. The findings from our research suggest that this strategy of 16-bit continuation training can be a key solution for sustainable and efficient deep learning, offering a practical way to enhance pre-trained models rapidly and in a resource-conscious manner.

MLLMs-Augmented Visual-Language Representation Learning. (arXiv:2311.18765v2 [cs.CV] UPDATED)

Authors: Yanqing Liu, Kai Wang, Wenqi Shao, Ping Luo, Yu Qiao, Mike Zheng Shou, Kaipeng Zhang, Yang You

Visual-language pre-training (VLP) has achieved remarkable success in multi-modal tasks, largely attributed to the availability of large-scale image-text datasets. In this work, we demonstrate that multi-modal large language models (MLLMs) can enhance visual-language representation learning by improving data quality. Our approach is simple, utilizing MLLMs to extend multiple captions for each image. To prevent the bias introduced by MLLMs' hallucinations and intrinsic caption styles, we propose "text shearing" to maintain the same length for extended captions as that of the original captions. In image-text retrieval, our method consistently obtains 5.6 ~ 35.0% and 16.8 ~ 46.1% improvement on R@1 under the fine-tuning and zero-shot settings, respectively. Notably, we obtain zero-shot results that are comparable to fine-tuning on target datasets, which encourages more exploration of the versatile use of MLLMs.