Learning Collaborative Information Dissemination with Graph-based Multi-Agent Reinforcement Learning. (arXiv:2308.16198v1 [cs.LG])

Authors: Raffaele Galliera, Kristen Brent Venable, Matteo Bassani, Niranjan Suri

In modern communication systems, efficient and reliable information dissemination is crucial for supporting critical operations across domains like disaster response, autonomous vehicles, and sensor networks. This paper introduces a Multi-Agent Reinforcement Learning (MARL) approach as a significant step forward in achieving more decentralized, efficient, and collaborative solutions. We propose a Decentralized-POMDP formulation for information dissemination, empowering each agent to independently decide on message forwarding. This constitutes a significant paradigm shift from traditional heuristics based on Multi-Point Relay (MPR) selection. Our approach harnesses Graph Convolutional Reinforcement Learning, employing Graph Attention Networks (GAT) with dynamic attention to capture essential network features. We propose two approaches, L-DGN and HL-DGN, which differ in the information that is exchanged among agents. We evaluate the performance of our decentralized approaches, by comparing them with a widely-used MPR heuristic, and we show that our trained policies are able to efficiently cover the network while bypassing the MPR set selection process. Our approach promises a first step toward bolstering the resilience of real-world broadcast communication infrastructures via learned, collaborative information dissemination.

MASA-TCN: Multi-anchor Space-aware Temporal Convolutional Neural Networks for Continuous and Discrete EEG Emotion Recognition. (arXiv:2308.16207v1 [cs.LG])

Authors: Yi Ding, Su Zhang, Chuangao Tang, Cuntai Guan

Emotion recognition using electroencephalogram (EEG) mainly has two scenarios: classification of the discrete labels and regression of the continuously tagged labels. Although many algorithms were proposed for classification tasks, there are only a few methods for regression tasks. For emotion regression, the label is continuous in time. A natural method is to learn the temporal dynamic patterns. In previous studies, long short-term memory (LSTM) and temporal convolutional neural networks (TCN) were utilized to learn the temporal contextual information from feature vectors of EEG. However, the spatial patterns of EEG were not effectively extracted. To enable the spatial learning ability of TCN towards better regression and classification performances, we propose a novel unified model, named MASA-TCN, for EEG emotion regression and classification tasks. The space-aware temporal layer enables TCN to additionally learn from spatial relations among EEG electrodes. Besides, a novel multi-anchor block with attentive fusion is proposed to learn dynamic temporal dependencies. Experiments on two publicly available datasets show MASA-TCN achieves higher results than the state-of-the-art methods for both EEG emotion regression and classification tasks. The code is available at https://github.com/yi-ding-cs/MASA-TCN.

Deep Inductive Logic Programming meets Reinforcement Learning. (arXiv:2308.16210v1 [cs.LG])

Authors: Andreas Bueff (University of Edinburgh), Vaishak Belle (University of Edinburgh)

One approach to explaining the hierarchical levels of understanding within a machine learning model is the symbolic method of inductive logic programming (ILP), which is data efficient and capable of learning first-order logic rules that can entail data behaviour. A differentiable extension to ILP, so-called differentiable Neural Logic (dNL) networks, are able to learn Boolean functions as their neural architecture includes symbolic reasoning. We propose an application of dNL in the field of Relational Reinforcement Learning (RRL) to address dynamic continuous environments. This represents an extension of previous work in applying dNL-based ILP in RRL settings, as our proposed model updates the architecture to enable it to solve problems in continuous RL environments. The goal of this research is to improve upon current ILP methods for use in RRL by incorporating non-linear continuous predicates, allowing RRL agents to reason and make decisions in dynamic and continuous environments.

RetroBridge: Modeling Retrosynthesis with Markov Bridges. (arXiv:2308.16212v1 [q-bio.QM])

Authors: Ilia Igashov, Arne Schneuing, Marwin Segler, Michael Bronstein, Bruno Correia

Retrosynthesis planning is a fundamental challenge in chemistry which aims at designing reaction pathways from commercially available starting materials to a target molecule. Each step in multi-step retrosynthesis planning requires accurate prediction of possible precursor molecules given the target molecule and confidence estimates to guide heuristic search algorithms. We model single-step retrosynthesis planning as a distribution learning problem in a discrete state space. First, we introduce the Markov Bridge Model, a generative framework aimed to approximate the dependency between two intractable discrete distributions accessible via a finite sample of coupled data points. Our framework is based on the concept of a Markov bridge, a Markov process pinned at its endpoints. Unlike diffusion-based methods, our Markov Bridge Model does not need a tractable noise distribution as a sampling proxy and directly operates on the input product molecules as samples from the intractable prior distribution. We then address the retrosynthesis planning problem with our novel framework and introduce RetroBridge, a template-free retrosynthesis modeling approach that achieves state-of-the-art results on standard evaluation benchmarks.

Deep Video Codec Control. (arXiv:2308.16215v1 [eess.IV])

Authors: Christoph Reich, Biplob Debnath, Deep Patel, Tim Prangemeier, Srimat Chakradhar

Lossy video compression is commonly used when transmitting and storing video data. Unified video codecs (e.g., H.264 or H.265) remain the \emph{de facto} standard, despite the availability of advanced (neural) compression approaches. Transmitting videos in the face of dynamic network bandwidth conditions requires video codecs to adapt to vastly different compression strengths. Rate control modules augment the codec's compression such that bandwidth constraints are satisfied and video distortion is minimized. While, both standard video codes and their rate control modules are developed to minimize video distortion w.r.t. human quality assessment, preserving the downstream performance of deep vision models is not considered. In this paper, we present the first end-to-end learnable deep video codec control considering both bandwidth constraints and downstream vision performance, while not breaking existing standardization. We demonstrate for two common vision tasks (semantic segmentation and optical flow estimation) and on two different datasets that our deep codec control better preserves downstream performance than using 2-pass average bit rate control while meeting dynamic bandwidth constraints and adhering to standardizations.

Calibrated Explanations for Regression. (arXiv:2308.16245v1 [cs.LG])

Authors: Tuwe Löfström, Helena Löfström, Ulf Johansson, Cecilia Sönströd

Artificial Intelligence (AI) is often an integral part of modern decision support systems (DSSs). The best-performing predictive models used in AI-based DSSs lack transparency. Explainable Artificial Intelligence (XAI) aims to create AI systems that can explain their rationale to human users. Local explanations in XAI can provide information about the causes of individual predictions in terms of feature importance. However, a critical drawback of existing local explanation methods is their inability to quantify the uncertainty associated with a feature's importance. This paper introduces an extension of a feature importance explanation method, Calibrated Explanations (CE), previously only supporting classification, with support for standard regression and probabilistic regression, i.e., the probability that the target is above an arbitrary threshold. The extension for regression keeps all the benefits of CE, such as calibration of the prediction from the underlying model with confidence intervals, uncertainty quantification of feature importance, and allows both factual and counterfactual explanations. CE for standard regression provides fast, reliable, stable, and robust explanations. CE for probabilistic regression provides an entirely new way of creating probabilistic explanations from any ordinary regression model and with a dynamic selection of thresholds. The performance of CE for probabilistic regression regarding stability and speed is comparable to LIME. The method is model agnostic with easily understood conditional rules. An implementation in Python is freely available on GitHub and for installation using pip making the results in this paper easily replicable.

Materials Informatics Transformer: A Language Model for Interpretable Materials Properties Prediction. (arXiv:2308.16259v1 [cs.LG])

Authors: Hongshuo Huang, Rishikesh Magar, Changwen Xu, Amir Bariti Farimani

Recently, the remarkable capabilities of large language models (LLMs) have been illustrated across a variety of research domains such as natural language processing, computer vision, and molecular modeling. We extend this paradigm by utilizing LLMs for material property prediction by introducing our model Materials Informatics Transformer (MatInFormer). Specifically, we introduce a novel approach that involves learning the grammar of crystallography through the tokenization of pertinent space group information. We further illustrate the adaptability of MatInFormer by incorporating task-specific data pertaining to Metal-Organic Frameworks (MOFs). Through attention visualization, we uncover the key features that the model prioritizes during property prediction. The effectiveness of our proposed model is empirically validated across 14 distinct datasets, hereby underscoring its potential for high throughput screening through accurate material property prediction.

Emergence of Segmentation with Minimalistic White-Box Transformers. (arXiv:2308.16271v1 [cs.CV])

Authors: Yaodong Yu, Tianzhe Chu, Shengbang Tong, Ziyang Wu, Druv Pai, Sam Buchanan, Yi Ma

Transformer-like models for vision tasks have recently proven effective for a wide range of downstream applications such as segmentation and detection. Previous works have shown that segmentation properties emerge in vision transformers (ViTs) trained using self-supervised methods such as DINO, but not in those trained on supervised classification tasks. In this study, we probe whether segmentation emerges in transformer-based models solely as a result of intricate self-supervised learning mechanisms, or if the same emergence can be achieved under much broader conditions through proper design of the model architecture. Through extensive experimental results, we demonstrate that when employing a white-box transformer-like architecture known as CRATE, whose design explicitly models and pursues low-dimensional structures in the data distribution, segmentation properties, at both the whole and parts levels, already emerge with a minimalistic supervised training recipe. Layer-wise finer-grained analysis reveals that the emergent properties strongly corroborate the designed mathematical functions of the white-box network. Our results suggest a path to design white-box foundation models that are simultaneously highly performant and mathematically fully interpretable. Code is at \url{https://github.com/Ma-Lab-Berkeley/CRATE}.

A numerical approach for the fractional Laplacian via deep neural networks. (arXiv:2308.16272v1 [math.AP])

Authors: Nicolás Valenzuela

We consider the fractional elliptic problem with Dirichlet boundary conditions on a bounded and convex domain $D$ of $\mathbb{R}^d$, with $d \geq 2$. In this paper, we perform a stochastic gradient descent algorithm that approximates the solution of the fractional problem via Deep Neural Networks. Additionally, we provide four numerical examples to test the efficiency of the algorithm, and each example will be studied for many values of $\alpha \in (1,2)$ and $d \geq 2$.

Learning Diverse Features in Vision Transformers for Improved Generalization. (arXiv:2308.16274v1 [cs.CV])

Authors: Armand Mihai Nicolicioiu, Andrei Liviu Nicolicioiu, Bogdan Alexe, Damien Teney

Deep learning models often rely only on a small set of features even when there is a rich set of predictive signals in the training data. This makes models brittle and sensitive to distribution shifts. In this work, we first examine vision transformers (ViTs) and find that they tend to extract robust and spurious features with distinct attention heads. As a result of this modularity, their performance under distribution shifts can be significantly improved at test time by pruning heads corresponding to spurious features, which we demonstrate using an "oracle selection" on validation data. Second, we propose a method to further enhance the diversity and complementarity of the learned features by encouraging orthogonality of the attention heads' input gradients. We observe improved out-of-distribution performance on diagnostic benchmarks (MNIST-CIFAR, Waterbirds) as a consequence of the enhanced diversity of features and the pruning of undesirable heads.

Classification of Anomalies in Telecommunication Network KPI Time Series. (arXiv:2308.16279v1 [cs.LG])

Authors: Korantin Bordeau-Aubert, Justin Whatley, Sylvain Nadeau, Tristan Glatard, Brigitte Jaumard

The increasing complexity and scale of telecommunication networks have led to a growing interest in automated anomaly detection systems. However, the classification of anomalies detected on network Key Performance Indicators (KPI) has received less attention, resulting in a lack of information about anomaly characteristics and classification processes. To address this gap, this paper proposes a modular anomaly classification framework. The framework assumes separate entities for the anomaly classifier and the detector, allowing for a distinct treatment of anomaly detection and classification tasks on time series. The objectives of this study are (1) to develop a time series simulator that generates synthetic time series resembling real-world network KPI behavior, (2) to build a detection model to identify anomalies in the time series, (3) to build classification models that accurately categorize detected anomalies into predefined classes (4) to evaluate the classification framework performance on simulated and real-world network KPI time series. This study has demonstrated the good performance of the anomaly classification models trained on simulated anomalies when applied to real-world network time series data.

Ten Years of Generative Adversarial Nets (GANs): A survey of the state-of-the-art. (arXiv:2308.16316v1 [cs.LG])

Authors: Tanujit Chakraborty, Ujjwal Reddy K S, Shraddha M. Naik, Madhurima Panja, Bayapureddy Manvitha

Since their inception in 2014, Generative Adversarial Networks (GANs) have rapidly emerged as powerful tools for generating realistic and diverse data across various domains, including computer vision and other applied areas. Consisting of a discriminative network and a generative network engaged in a Minimax game, GANs have revolutionized the field of generative modeling. In February 2018, GAN secured the leading spot on the ``Top Ten Global Breakthrough Technologies List'' issued by the Massachusetts Science and Technology Review. Over the years, numerous advancements have been proposed, leading to a rich array of GAN variants, such as conditional GAN, Wasserstein GAN, CycleGAN, and StyleGAN, among many others. This survey aims to provide a general overview of GANs, summarizing the latent architecture, validation metrics, and application areas of the most widely recognized variants. We also delve into recent theoretical developments, exploring the profound connection between the adversarial principle underlying GAN and Jensen-Shannon divergence, while discussing the optimality characteristics of the GAN framework. The efficiency of GAN variants and their model architectures will be evaluated along with training obstacles as well as training solutions. In addition, a detailed discussion will be provided, examining the integration of GANs with newly developed deep learning frameworks such as Transformers, Physics-Informed Neural Networks, Large Language models, and Diffusion models. Finally, we reveal several issues as well as future research outlines in this field.

Symmetry Preservation in Hamiltonian Systems: Simulation and Learning. (arXiv:2308.16331v1 [math-ph])

Authors: Miguel Vaquero, Jorge Cortés, David Martín de Diego

This work presents a general geometric framework for simulating and learning the dynamics of Hamiltonian systems that are invariant under a Lie group of transformations. This means that a group of symmetries is known to act on the system respecting its dynamics and, as a consequence, Noether's Theorem, conserved quantities are observed. We propose to simulate and learn the mappings of interest through the construction of $G$-invariant Lagrangian submanifolds, which are pivotal objects in symplectic geometry. A notable property of our constructions is that the simulated/learned dynamics also preserves the same conserved quantities as the original system, resulting in a more faithful surrogate of the original dynamics than non-symmetry aware methods, and in a more accurate predictor of non-observed trajectories. Furthermore, our setting is able to simulate/learn not only Hamiltonian flows, but any Lie group-equivariant symplectic transformation. Our designs leverage pivotal techniques and concepts in symplectic geometry and geometric mechanics: reduction theory, Noether's Theorem, Lagrangian submanifolds, momentum mappings, and coisotropic reduction among others. We also present methods to learn Poisson transformations while preserving the underlying geometry and how to endow non-geometric integrators with geometric properties. Thus, this work presents a novel attempt to harness the power of symplectic and Poisson geometry towards simulating and learning problems.

ToddlerBERTa: Exploiting BabyBERTa for Grammar Learning and Language Understanding. (arXiv:2308.16336v1 [cs.CL])

Authors: Omer Veysel Cagatan

We present ToddlerBERTa, a BabyBERTa-like language model, exploring its capabilities through five different models with varied hyperparameters. Evaluating on BLiMP, SuperGLUE, MSGS, and a Supplement benchmark from the BabyLM challenge, we find that smaller models can excel in specific tasks, while larger models perform well with substantial data. Despite training on a smaller dataset, ToddlerBERTa demonstrates commendable performance, rivalling the state-of-the-art RoBERTa-base. The model showcases robust language understanding, even with single-sentence pretraining, and competes with baselines that leverage broader contextual information. Our work provides insights into hyperparameter choices, and data utilization, contributing to the advancement of language models.

Emoji Promotes Developer Participation and Issue Resolution on GitHub. (arXiv:2308.16360v1 [cs.CY])

Authors: Yuhang Zhou, Xuan Lu, Ge Gao, Qiaozhu Mei, Wei Ai

Although remote working is increasingly adopted during the pandemic, many are concerned by the low-efficiency in the remote working. Missing in text-based communication are non-verbal cues such as facial expressions and body language, which hinders the effective communication and negatively impacts the work outcomes. Prevalent on social media platforms, emojis, as alternative non-verbal cues, are gaining popularity in the virtual workspaces well. In this paper, we study how emoji usage influences developer participation and issue resolution in virtual workspaces. To this end, we collect GitHub issues for a one-year period and apply causal inference techniques to measure the causal effect of emojis on the outcome of issues, controlling for confounders such as issue content, repository, and author information. We find that emojis can significantly reduce the resolution time of issues and attract more user participation. We also compare the heterogeneous effect on different types of issues. These findings deepen our understanding of the developer communities, and they provide design implications on how to facilitate interactions and broaden developer participation.

A Unified Analysis for the Subgradient Methods Minimizing Composite Nonconvex, Nonsmooth and Non-Lipschitz Functions. (arXiv:2308.16362v1 [math.OC])

Authors: Daoli Zhu, Lei Zhao, Shuzhong Zhang

In this paper we propose a proximal subgradient method (Prox-SubGrad) for solving nonconvex and nonsmooth optimization problems without assuming Lipschitz continuity conditions. A number of subgradient upper bounds and their relationships are presented. By means of these upper bounding conditions, we establish some uniform recursive relations for the Moreau envelopes for weakly convex optimization. This uniform scheme simplifies and unifies the proof schemes to establish rate of convergence for Prox-SubGrad without assuming Lipschitz continuity. We present a novel convergence analysis in this context. Furthermore, we propose some new stochastic subgradient upper bounding conditions and establish convergence and iteration complexity rates for the stochastic subgradient method (Sto-SubGrad) to solve non-Lipschitz and nonsmooth stochastic optimization problems. In particular, for both deterministic and stochastic subgradient methods on weakly convex optimization problems without Lipschitz continuity, under any of the subgradient upper bounding conditions to be introduced in the paper, we show that $O(1/\sqrt{T})$ convergence rate holds in terms of the square of gradient of the Moreau envelope function, which further improves to be $O(1/{T})$ if, in addition, the uniform KL condition with exponent $1/2$ holds.

SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills. (arXiv:2308.16369v1 [cs.LG])

Authors: Amey Agrawal, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Ramachandran Ramjee

Large Language Model (LLM) inference consists of two distinct phases - prefill phase which processes the input prompt and decode phase which generates output tokens autoregressively. While the prefill phase effectively saturates GPU compute at small batch sizes, the decode phase results in low compute utilization as it generates one token at a time per request. The varying prefill and decode times also lead to imbalance across micro-batches when using pipeline parallelism, resulting in further inefficiency due to bubbles.

We present SARATHI to address these challenges. SARATHI employs chunked-prefills, which splits a prefill request into equal sized chunks, and decode-maximal batching, which constructs a batch using a single prefill chunk and populates the remaining slots with decodes. During inference, the prefill chunk saturates GPU compute, while the decode requests 'piggyback' and cost up to an order of magnitude less compared to a decode-only batch. Chunked-prefills allows constructing multiple decode-maximal batches from a single prefill request, maximizing coverage of decodes that can piggyback. Furthermore, the uniform compute design of these batches ameliorates the imbalance between micro-batches, significantly reducing pipeline bubbles.

Our techniques yield significant improvements in inference performance across models and hardware. For the LLaMA-13B model on A6000 GPU, SARATHI improves decode throughput by up to 10x, and accelerates end-to-end throughput by up to 1.33x. For LLaMa-33B on A100 GPU, we achieve 1.25x higher end-to-end-throughput and up to 4.25x higher decode throughput. When used with pipeline parallelism on GPT-3, SARATHI reduces bubbles by 6.29x, resulting in an end-to-end throughput improvement of 1.91x.

A Survey on Privacy in Graph Neural Networks: Attacks, Preservation, and Applications. (arXiv:2308.16375v1 [cs.LG])

Authors: Yi Zhang, Yuying Zhao, Zhaoqing Li, Xueqi Cheng, Yu Wang, Olivera Kotevska, Philip S. Yu, Tyler Derr

Graph Neural Networks (GNNs) have gained significant attention owing to their ability to handle graph-structured data and the improvement in practical applications. However, many of these models prioritize high utility performance, such as accuracy, with a lack of privacy consideration, which is a major concern in modern society where privacy attacks are rampant. To address this issue, researchers have started to develop privacy-preserving GNNs. Despite this progress, there is a lack of a comprehensive overview of the attacks and the techniques for preserving privacy in the graph domain. In this survey, we aim to address this gap by summarizing the attacks on graph data according to the targeted information, categorizing the privacy preservation techniques in GNNs, and reviewing the datasets and applications that could be used for analyzing/solving privacy issues in GNNs. We also outline potential directions for future research in order to build better privacy-preserving GNNs.

Multi-Objective Decision Transformers for Offline Reinforcement Learning. (arXiv:2308.16379v1 [cs.LG])

Authors: Abdelghani Ghanem, Philippe Ciblat, Mounir Ghogho

Offline Reinforcement Learning (RL) is structured to derive policies from static trajectory data without requiring real-time environment interactions. Recent studies have shown the feasibility of framing offline RL as a sequence modeling task, where the sole aim is to predict actions based on prior context using the transformer architecture. However, the limitation of this single task learning approach is its potential to undermine the transformer model's attention mechanism, which should ideally allocate varying attention weights across different tokens in the input context for optimal prediction. To address this, we reformulate offline RL as a multi-objective optimization problem, where the prediction is extended to states and returns. We also highlight a potential flaw in the trajectory representation used for sequence modeling, which could generate inaccuracies when modeling the state and return distributions. This is due to the non-smoothness of the action distribution within the trajectory dictated by the behavioral policy. To mitigate this issue, we introduce action space regions to the trajectory representation. Our experiments on D4RL benchmark locomotion tasks reveal that our propositions allow for more effective utilization of the attention mechanism in the transformer model, resulting in performance that either matches or outperforms current state-of-the art methods.

BenchTemp: A General Benchmark for Evaluating Temporal Graph Neural Networks. (arXiv:2308.16385v1 [cs.LG])

Authors: Qiang Huang, Jiawei Jiang, Xi Susie Rao, Ce Zhang, Zhichao Han, Zitao Zhang, Xin Wang, Yongjun He, Quanqing Xu, Yang Zhao, Chuang Hu, Shuo Shang, Bo Du

To handle graphs in which features or connectivities are evolving over time, a series of temporal graph neural networks (TGNNs) have been proposed. Despite the success of these TGNNs, the previous TGNN evaluations reveal several limitations regarding four critical issues: 1) inconsistent datasets, 2) inconsistent evaluation pipelines, 3) lacking workload diversity, and 4) lacking efficient comparison. Overall, there lacks an empirical study that puts TGNN models onto the same ground and compares them comprehensively. To this end, we propose BenchTemp, a general benchmark for evaluating TGNN models on various workloads. BenchTemp provides a set of benchmark datasets so that different TGNN models can be fairly compared. Further, BenchTemp engineers a standard pipeline that unifies the TGNN evaluation. With BenchTemp, we extensively compare the representative TGNN models on different tasks (e.g., link prediction and node classification) and settings (transductive and inductive), w.r.t. both effectiveness and efficiency metrics. We have made BenchTemp publicly available at https://github.com/qianghuangwhu/benchtemp.

Improving Robustness and Accuracy of Ponzi Scheme Detection on Ethereum Using Time-Dependent Features. (arXiv:2308.16391v1 [cs.CR])

Authors: Phuong Duy Huynh, Son Hoang Dau, Xiaodong Li, Phuc Luong, Emanuele Viterbo

The rapid development of blockchain has led to more and more funding pouring into the cryptocurrency market, which also attracted cybercriminals' interest in recent years. The Ponzi scheme, an old-fashioned fraud, is now popular on the blockchain, causing considerable financial losses to many crypto-investors. A few Ponzi detection methods have been proposed in the literature, most of which detect a Ponzi scheme based on its smart contract source code or opcode. The contract-code-based approach, while achieving very high accuracy, is not robust: first, the source codes of a majority of contracts on Ethereum are not available, and second, a Ponzi developer can fool a contract-code-based detection model by obfuscating the opcode or inventing a new profit distribution logic that cannot be detected (since these models were trained on existing Ponzi logics only). A transaction-based approach could improve the robustness of detection because transactions, unlike smart contracts, are harder to be manipulated. However, the current transaction-based detection models achieve fairly low accuracy. We address this gap in the literature by developing new detection models that rely only on the transactions, hence guaranteeing the robustness, and moreover, achieve considerably higher Accuracy, Precision, Recall, and F1-score than existing transaction-based models. This is made possible thanks to the introduction of novel time-dependent features that capture Ponzi behaviours characteristics derived from our comprehensive data analyses on Ponzi and non-Ponzi data from the XBlock-ETH repository

Balancing between the Local and Global Structures (LGS) in Graph Embedding. (arXiv:2308.16403v1 [cs.HC])

Authors: Jacob Miller, Vahan Huroyan, Stephen Kobourov

We present a method for balancing between the Local and Global Structures (LGS) in graph embedding, via a tunable parameter. Some embedding methods aim to capture global structures, while others attempt to preserve local neighborhoods. Few methods attempt to do both, and it is not always possible to capture well both local and global information in two dimensions, which is where most graph drawing live. The choice of using a local or a global embedding for visualization depends not only on the task but also on the structure of the underlying data, which may not be known in advance. For a given graph, LGS aims to find a good balance between the local and global structure to preserve. We evaluate the performance of LGS with synthetic and real-world datasets and our results indicate that it is competitive with the state-of-the-art methods, using established quality metrics such as stress and neighborhood preservation. We introduce a novel quality metric, cluster distance preservation, to assess intermediate structure capture. All source-code, datasets, experiments and analysis are available online.

CktGNN: Circuit Graph Neural Network for Electronic Design Automation. (arXiv:2308.16406v1 [cs.LG])

Authors: Zehao Dong, Weidong Cao, Muhan Zhang, Dacheng Tao, Yixin Chen, Xuan Zhang

The electronic design automation of analog circuits has been a longstanding challenge in the integrated circuit field due to the huge design space and complex design trade-offs among circuit specifications. In the past decades, intensive research efforts have mostly been paid to automate the transistor sizing with a given circuit topology. By recognizing the graph nature of circuits, this paper presents a Circuit Graph Neural Network (CktGNN) that simultaneously automates the circuit topology generation and device sizing based on the encoder-dependent optimization subroutines. Particularly, CktGNN encodes circuit graphs using a two-level GNN framework (of nested GNN) where circuits are represented as combinations of subgraphs in a known subgraph basis. In this way, it significantly improves design efficiency by reducing the number of subgraphs to perform message passing. Nonetheless, another critical roadblock to advancing learning-assisted circuit design automation is a lack of public benchmarks to perform canonical assessment and reproducible research. To tackle the challenge, we introduce Open Circuit Benchmark (OCB), an open-sourced dataset that contains $10$K distinct operational amplifiers with carefully-extracted circuit specifications. OCB is also equipped with communicative circuit generation and evaluation capabilities such that it can help to generalize CktGNN to design various analog circuits by producing corresponding datasets. Experiments on OCB show the extraordinary advantages of CktGNN through representation-based optimization frameworks over other recent powerful GNN baselines and human experts' manual designs. Our work paves the way toward a learning-based open-sourced design automation for analog circuits. Our source code is available at \url{https://github.com/zehao-dong/CktGNN}.

DECODE: DilatEd COnvolutional neural network for Detecting Extreme-mass-ratio inspirals. (arXiv:2308.16422v1 [astro-ph.IM])

Authors: Tianyu Zhao, Yue Zhou, Ruijun Shi, Zhoujian Cao, Zhixiang Ren

The detection of Extreme Mass Ratio Inspirals (EMRIs) is intricate due to their complex waveforms, extended duration, and low signal-to-noise ratio (SNR), making them more challenging to be identified compared to compact binary coalescences. While matched filtering-based techniques are known for their computational demands, existing deep learning-based methods primarily handle time-domain data and are often constrained by data duration and SNR. In addition, most existing work ignores time-delay interferometry (TDI) and applies the long-wavelength approximation in detector response calculations, thus limiting their ability to handle laser frequency noise. In this study, we introduce DECODE, an end-to-end model focusing on EMRI signal detection by sequence modeling in the frequency domain. Centered around a dilated causal convolutional neural network, trained on synthetic data considering TDI-1.5 detector response, DECODE can efficiently process a year's worth of multichannel TDI data with an SNR of around 50. We evaluate our model on 1-year data with accumulated SNR ranging from 50 to 120 and achieve a true positive rate of 96.3% at a false positive rate of 1%, keeping an inference time of less than 0.01 seconds. With the visualization of three showcased EMRI signals for interpretability and generalization, DECODE exhibits strong potential for future space-based gravitational wave data analyses.

On the Equivalence between Implicit and Explicit Neural Networks: A High-dimensional Viewpoint. (arXiv:2308.16425v1 [cs.LG])

Authors: Zenan Ling, Zhenyu Liao, Robert C. Qiu

Implicit neural networks have demonstrated remarkable success in various tasks. However, there is a lack of theoretical analysis of the connections and differences between implicit and explicit networks. In this paper, we study high-dimensional implicit neural networks and provide the high dimensional equivalents for the corresponding conjugate kernels and neural tangent kernels. Built upon this, we establish the equivalence between implicit and explicit networks in high dimensions.

AntM$^{2}$C: A Large Scale Dataset For Multi-Scenario Multi-Modal CTR Prediction. (arXiv:2308.16437v1 [cs.IR])

Authors: Zhaoxin Huan, Ke Ding, Ang Li, Xiaolu Zhang, Xu Min, Yong He, Liang Zhang, Jun Zhou, Linjian Mo, Jinjie Gu, Zhongyi Liu, Wenliang Zhong, Guannan Zhang

Click-through rate (CTR) prediction is a crucial issue in recommendation systems. There has been an emergence of various public CTR datasets. However, existing datasets primarily suffer from the following limitations. Firstly, users generally click different types of items from multiple scenarios, and modeling from multiple scenarios can provide a more comprehensive understanding of users. Existing datasets only include data for the same type of items from a single scenario. Secondly, multi-modal features are essential in multi-scenario prediction as they address the issue of inconsistent ID encoding between different scenarios. The existing datasets are based on ID features and lack multi-modal features. Third, a large-scale dataset can provide a more reliable evaluation of models, fully reflecting the performance differences between models. The scale of existing datasets is around 100 million, which is relatively small compared to the real-world CTR prediction. To address these limitations, we propose AntM$^{2}$C, a Multi-Scenario Multi-Modal CTR dataset based on industrial data from Alipay. Specifically, AntM$^{2}$C provides the following advantages: 1) It covers CTR data of 5 different types of items, providing insights into the preferences of users for different items, including advertisements, vouchers, mini-programs, contents, and videos. 2) Apart from ID-based features, AntM$^{2}$C also provides 2 multi-modal features, raw text and image features, which can effectively establish connections between items with different IDs. 3) AntM$^{2}$C provides 1 billion CTR data with 200 features, including 200 million users and 6 million items. It is currently the largest-scale CTR dataset available. Based on AntM$^{2}$C, we construct several typical CTR tasks and provide comparisons with baseline methods. The dataset homepage is available at https://www.atecup.cn/home.

Listen to Minority: Encrypted Traffic Classification for Class Imbalance with Contrastive Pre-Training. (arXiv:2308.16453v1 [cs.CR])

Authors: Xiang Li, Juncheng Guo, Qige Song, Jiang Xie, Yafei Sang, Shuyuan Zhao, Yongzheng Zhang

Mobile Internet has profoundly reshaped modern lifestyles in various aspects. Encrypted Traffic Classification (ETC) naturally plays a crucial role in managing mobile Internet, especially with the explosive growth of mobile apps using encrypted communication. Despite some existing learning-based ETC methods showing promising results, three-fold limitations still remain in real-world network environments, 1) label bias caused by traffic class imbalance, 2) traffic homogeneity caused by component sharing, and 3) training with reliance on sufficient labeled traffic. None of the existing ETC methods can address all these limitations. In this paper, we propose a novel Pre-trAining Semi-Supervised ETC framework, dubbed PASS. Our key insight is to resample the original train dataset and perform contrastive pre-training without using individual app labels directly to avoid label bias issues caused by class imbalance, while obtaining a robust feature representation to differentiate overlapping homogeneous traffic by pulling positive traffic pairs closer and pushing negative pairs away. Meanwhile, PASS designs a semi-supervised optimization strategy based on pseudo-label iteration and dynamic loss weighting algorithms in order to effectively utilize massive unlabeled traffic data and alleviate manual train dataset annotation workload. PASS outperforms state-of-the-art ETC methods and generic sampling approaches on four public datasets with significant class imbalance and traffic homogeneity, remarkably pushing the F1 of Cross-Platform215 with 1.31%, ISCX-17 with 9.12%. Furthermore, we validate the generality of the contrastive pre-training and pseudo-label iteration components of PASS, which can adaptively benefit ETC methods with diverse feature extractors.

Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff. (arXiv:2308.16454v1 [cs.CV])

Authors: Satoshi Suzuki, Shin'ya Yamaguchi, Shoichiro Takeda, Sekitoshi Kanai, Naoki Makishima, Atsushi Ando, Ryo Masumura

This paper addresses the tradeoff between standard accuracy on clean examples and robustness against adversarial examples in deep neural networks (DNNs). Although adversarial training (AT) improves robustness, it degrades the standard accuracy, thus yielding the tradeoff. To mitigate this tradeoff, we propose a novel AT method called ARREST, which comprises three components: (i) adversarial finetuning (AFT), (ii) representation-guided knowledge distillation (RGKD), and (iii) noisy replay (NR). AFT trains a DNN on adversarial examples by initializing its parameters with a DNN that is standardly pretrained on clean examples. RGKD and NR respectively entail a regularization term and an algorithm to preserve latent representations of clean examples during AFT. RGKD penalizes the distance between the representations of the standardly pretrained and AFT DNNs. NR switches input adversarial examples to nonadversarial ones when the representation changes significantly during AFT. By combining these components, ARREST achieves both high standard accuracy and robustness. Experimental results demonstrate that ARREST mitigates the tradeoff more effectively than previous AT-based methods do.

Least Squares Maximum and Weighted Generalization-Memorization Machines. (arXiv:2308.16456v1 [stat.ML])

Authors: Shuai Wang, Zhen Wang, Yuan-Hai Shao

In this paper, we propose a new way of remembering by introducing a memory influence mechanism for the least squares support vector machine (LSSVM). Without changing the equation constraints of the original LSSVM, this mechanism, allows an accurate partitioning of the training set without overfitting. The maximum memory impact model (MIMM) and the weighted impact memory model (WIMM) are then proposed. It is demonstrated that these models can be degraded to the LSSVM. Furthermore, we propose some different memory impact functions for the MIMM and WIMM. The experimental results show that that our MIMM and WIMM have better generalization performance compared to the LSSVM and significant advantage in time cost compared to other memory models.

BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge. (arXiv:2308.16458v1 [cs.LG])

Authors: Xiangru Tang, Bill Qian, Rick Gao, Jiakang Chen, Xinyun Chen, Mark Gerstein

Pre-trained language models like ChatGPT have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks. Moreover, in bioinformatics, generating functional programs poses additional notable challenges due to the amount of domain knowledge, the need for complicated data operations, and intricate functional dependencies between the operations. Here, we present BioCoder, a benchmark developed to evaluate existing pre-trained models in generating bioinformatics code. In relation to function-code generation, BioCoder covers potential package dependencies, class declarations, and global variables. It incorporates 1026 functions and 1243 methods in Python and Java from GitHub and 253 examples from the Rosalind Project. BioCoder incorporates a fuzz-testing framework for evaluation, and we have applied it to evaluate many models including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, and ChatGPT. Our detailed analysis of these models emphasizes the importance of domain knowledge, pragmatic code generation, and contextual understanding. Our dataset, benchmark, Docker images, and scripts required for testing are all available at https://github.com/gersteinlab/biocoder.

Computing excited states of molecules using normalizing flows. (arXiv:2308.16468v1 [physics.chem-ph])

Authors: Yahya Saleh, Álvaro Fernández Corral, Armin Iske, Jochen Küpper, Andrey Yachmenev

We present a new nonlinear variational framework for simultaneously computing ground and excited states of quantum systems. Our approach is based on approximating wavefunctions in the linear span of basis functions that are augmented and optimized \emph{via} composition with normalizing flows. The accuracy and efficiency of our approach are demonstrated in the calculations of a large number of vibrational states of the triatomic H$_2$S molecule as well as ground and several excited electronic states of prototypical one-electron systems including the hydrogen atom, the molecular hydrogen ion, and a carbon atom in a single-active-electron approximation. The results demonstrate significant improvements in the accuracy of energy predictions and accelerated basis-set convergence even when using normalizing flows with a small number of parameters. The present approach can be also seen as the optimization of a set of intrinsic coordinates that best capture the underlying physics within the given basis set.

Domain-adaptive Message Passing Graph Neural Network. (arXiv:2308.16470v1 [cs.LG])

Authors: Xiao Shen, Shirui Pan, Kup-Sze Choi, Xi Zhou

Cross-network node classification (CNNC), which aims to classify nodes in a label-deficient target network by transferring the knowledge from a source network with abundant labels, draws increasing attention recently. To address CNNC, we propose a domain-adaptive message passing graph neural network (DM-GNN), which integrates graph neural network (GNN) with conditional adversarial domain adaptation. DM-GNN is capable of learning informative representations for node classification that are also transferrable across networks. Firstly, a GNN encoder is constructed by dual feature extractors to separate ego-embedding learning from neighbor-embedding learning so as to jointly capture commonality and discrimination between connected nodes. Secondly, a label propagation node classifier is proposed to refine each node's label prediction by combining its own prediction and its neighbors' prediction. In addition, a label-aware propagation scheme is devised for the labeled source network to promote intra-class propagation while avoiding inter-class propagation, thus yielding label-discriminative source embeddings. Thirdly, conditional adversarial domain adaptation is performed to take the neighborhood-refined class-label information into account during adversarial domain adaptation, so that the class-conditional distributions across networks can be better matched. Comparisons with eleven state-of-the-art methods demonstrate the effectiveness of the proposed DM-GNN.

A Policy Adaptation Method for Implicit Multitask Reinforcement Learning Problems. (arXiv:2308.16471v1 [cs.RO])

Authors: Satoshi Yamamori, Jun Morimoto

In dynamic motion generation tasks, including contact and collisions, small changes in policy parameters can lead to extremely different returns. For example, in soccer, the ball can fly in completely different directions with a similar heading motion by slightly changing the hitting position or the force applied to the ball or when the friction of the ball varies. However, it is difficult to imagine that completely different skills are needed for heading a ball in different directions. In this study, we proposed a multitask reinforcement learning algorithm for adapting a policy to implicit changes in goals or environments in a single motion category with different reward functions or physical parameters of the environment. We evaluated the proposed method on the ball heading task using a monopod robot model. The results showed that the proposed method can adapt to implicit changes in the goal positions or the coefficients of restitution of the ball, whereas the standard domain randomization approach cannot cope with different task settings.

Point-TTA: Test-Time Adaptation for Point Cloud Registration Using Multitask Meta-Auxiliary Learning. (arXiv:2308.16481v1 [cs.CV])

Authors: Ahmed Hatem, Yiming Qian, Yang Wang

We present Point-TTA, a novel test-time adaptation framework for point cloud registration (PCR) that improves the generalization and the performance of registration models. While learning-based approaches have achieved impressive progress, generalization to unknown testing environments remains a major challenge due to the variations in 3D scans. Existing methods typically train a generic model and the same trained model is applied on each instance during testing. This could be sub-optimal since it is difficult for the same model to handle all the variations during testing. In this paper, we propose a test-time adaptation approach for PCR. Our model can adapt to unseen distributions at test-time without requiring any prior knowledge of the test data. Concretely, we design three self-supervised auxiliary tasks that are optimized jointly with the primary PCR task. Given a test instance, we adapt our model using these auxiliary tasks and the updated model is used to perform the inference. During training, our model is trained using a meta-auxiliary learning approach, such that the adapted model via auxiliary tasks improves the accuracy of the primary task. Experimental results demonstrate the effectiveness of our approach in improving generalization of point cloud registration and outperforming other state-of-the-art approaches.

Echocardiographic View Classification with Integrated Out-of-Distribution Detection for Enhanced Automatic Echocardiographic Analysis. (arXiv:2308.16483v1 [eess.SP])

Authors: Jaeik Jeon, Seongmin Ha, Yeonyee E. Yoon, Jiyeon Kim, Hyunseok Jeong, Dawun Jeong, Yeonggul Jang, Youngtaek Hong, Hyuk-Jae Chang

In the rapidly evolving field of automatic echocardiographic analysis and interpretation, automatic view classification is a critical yet challenging task, owing to the inherent complexity and variability of echocardiographic data. This study presents ECHOcardiography VIew Classification with Out-of-Distribution dEtection (ECHO-VICODE), a novel deep learning-based framework that effectively addresses this challenge by training to classify 31 classes, surpassing previous studies and demonstrating its capacity to handle a wide range of echocardiographic views. Furthermore, ECHO-VICODE incorporates an integrated out-of-distribution (OOD) detection function, leveraging the relative Mahalanobis distance to effectively identify 'near-OOD' instances commonly encountered in echocardiographic data. Through extensive experimentation, we demonstrated the outstanding performance of ECHO-VICODE in terms of view classification and OOD detection, significantly reducing the potential for errors in echocardiographic analyses. This pioneering study significantly advances the domain of automated echocardiography analysis and exhibits promising prospects for substantial applications in extensive clinical research and practice.

Test-Time Adaptation for Point Cloud Upsampling Using Meta-Learning. (arXiv:2308.16484v1 [cs.CV])

Authors: Ahmed Hatem, Yiming Qian, Yang Wang

Affordable 3D scanners often produce sparse and non-uniform point clouds that negatively impact downstream applications in robotic systems. While existing point cloud upsampling architectures have demonstrated promising results on standard benchmarks, they tend to experience significant performance drops when the test data have different distributions from the training data. To address this issue, this paper proposes a test-time adaption approach to enhance model generality of point cloud upsampling. The proposed approach leverages meta-learning to explicitly learn network parameters for test-time adaption. Our method does not require any prior information about the test data. During meta-training, the model parameters are learned from a collection of instance-level tasks, each of which consists of a sparse-dense pair of point clouds from the training data. During meta-testing, the trained model is fine-tuned with a few gradient updates to produce a unique set of network parameters for each test instance. The updated model is then used for the final prediction. Our framework is generic and can be applied in a plug-and-play manner with existing backbone networks in point cloud upsampling. Extensive experiments demonstrate that our approach improves the performance of state-of-the-art models.

Latent Painter. (arXiv:2308.16490v1 [cs.CV])

Authors: Shih-Chieh Su

Latent diffusers revolutionized the generative AI and inspired creative art. When denoising the latent, the predicted original image at each step collectively animates the formation. However, the animation is limited by the denoising nature of the diffuser, and only renders a sharpening process. This work presents Latent Painter, which uses the latent as the canvas, and the diffuser predictions as the plan, to generate painting animation. Latent Painter also transits one generated image to another, which can happen between images from two different sets of checkpoints.

In-class Data Analysis Replications: Teaching Students while Testing Science. (arXiv:2308.16491v1 [cs.CY])

Authors: Kristina Gligoric, Tiziano Piccardi, Jake Hofman, Robert West

Science is facing a reproducibility crisis. Previous work has proposed incorporating data analysis replications into classrooms as a potential solution. However, despite the potential benefits, it is unclear whether this approach is feasible, and if so, what the involved stakeholders-students, educators, and scientists-should expect from it. Can students perform a data analysis replication over the course of a class? What are the costs and benefits for educators? And how can this solution help benchmark and improve the state of science?

In the present study, we incorporated data analysis replications in the project component of the Applied Data Analysis course (CS-401) taught at EPFL (N=354 students). Here we report pre-registered findings based on surveys administered throughout the course. First, we demonstrate that students can replicate previously published scientific papers, most of them qualitatively and some exactly. We find discrepancies between what students expect of data analysis replications and what they experience by doing them along with changes in expectations about reproducibility, which together serve as evidence of attitude shifts to foster students' critical thinking. Second, we provide information for educators about how much overhead is needed to incorporate replications into the classroom and identify concerns that replications bring as compared to more traditional assignments. Third, we identify tangible benefits of the in-class data analysis replications for scientific communities, such as a collection of replication reports and insights about replication barriers in scientific work that should be avoided going forward.

Overall, we demonstrate that incorporating replication tasks into a large data science class can increase the reproducibility of scientific work as a by-product of data science instruction, thus benefiting both science and students.

Curvature-based Pooling within Graph Neural Networks. (arXiv:2308.16516v1 [cs.LG])

Authors: Cedric Sanders, Andreas Roth, Thomas Liebig

Over-squashing and over-smoothing are two critical issues, that limit the capabilities of graph neural networks (GNNs). While over-smoothing eliminates the differences between nodes making them indistinguishable, over-squashing refers to the inability of GNNs to propagate information over long distances, as exponentially many node states are squashed into fixed-size representations. Both phenomena share similar causes, as both are largely induced by the graph topology. To mitigate these problems in graph classification tasks, we propose CurvPool, a novel pooling method. CurvPool exploits the notion of curvature of a graph to adaptively identify structures responsible for both over-smoothing and over-squashing. By clustering nodes based on the Balanced Forman curvature, CurvPool constructs a graph with a more suitable structure, allowing deeper models and the combination of distant information. We compare it to other state-of-the-art pooling approaches and establish its competitiveness in terms of classification accuracy, computational complexity, and flexibility. CurvPool outperforms several comparable methods across all considered tasks. The most consistent results are achieved by pooling densely connected clusters using the sum aggregation, as this allows additional information about the size of each pool.

SA6D: Self-Adaptive Few-Shot 6D Pose Estimator for Novel and Occluded Objects. (arXiv:2308.16528v1 [cs.CV])

Authors: Ning Gao, Ngo Anh Vien, Hanna Ziesche, Gerhard Neumann

To enable meaningful robotic manipulation of objects in the real-world, 6D pose estimation is one of the critical aspects. Most existing approaches have difficulties to extend predictions to scenarios where novel object instances are continuously introduced, especially with heavy occlusions. In this work, we propose a few-shot pose estimation (FSPE) approach called SA6D, which uses a self-adaptive segmentation module to identify the novel target object and construct a point cloud model of the target object using only a small number of cluttered reference images. Unlike existing methods, SA6D does not require object-centric reference images or any additional object information, making it a more generalizable and scalable solution across categories. We evaluate SA6D on real-world tabletop object datasets and demonstrate that SA6D outperforms existing FSPE methods, particularly in cluttered scenes with occlusions, while requiring fewer reference images.

Conditioning Score-Based Generative Models by Neuro-Symbolic Constraints. (arXiv:2308.16534v1 [cs.LG])

Authors: Davide Scassola, Sebastiano Saccani, Ginevra Carbone, Luca Bortolussi

Score-based and diffusion models have emerged as effective approaches for both conditional and unconditional generation. Still conditional generation is based on either a specific training of a conditional model or classifier guidance, which requires training a noise-dependent classifier, even when the classifier for uncorrupted data is given. We propose an approach to sample from unconditional score-based generative models enforcing arbitrary logical constraints, without any additional training. Firstly, we show how to manipulate the learned score in order to sample from an un-normalized distribution conditional on a user-defined constraint. Then, we define a flexible and numerically stable neuro-symbolic framework for encoding soft logical constraints. Combining these two ingredients we obtain a general, but approximate, conditional sampling algorithm. We further developed effective heuristics aimed at improving the approximation. Finally, we show the effectiveness of our approach for various types of constraints and data: tabular data, images and time series.

On a Connection between Differential Games, Optimal Control, and Energy-based Models for Multi-Agent Interactions. (arXiv:2308.16539v1 [cs.RO])

Authors: Christopher Diehl, Tobias Klosek, Martin Krüger, Nils Murzyn, Torsten Bertram

Game theory offers an interpretable mathematical framework for modeling multi-agent interactions. However, its applicability in real-world robotics applications is hindered by several challenges, such as unknown agents' preferences and goals. To address these challenges, we show a connection between differential games, optimal control, and energy-based models and demonstrate how existing approaches can be unified under our proposed Energy-based Potential Game formulation. Building upon this formulation, this work introduces a new end-to-end learning application that combines neural networks for game-parameter inference with a differentiable game-theoretic optimization layer, acting as an inductive bias. The experiments using simulated mobile robot pedestrian interactions and real-world automated driving data provide empirical evidence that the game-theoretic layer improves the predictive performance of various neural network backbones.

Scalable Incomplete Multi-View Clustering with Structure Alignment. (arXiv:2308.16541v1 [cs.LG])

Authors: Yi Wen, Siwei Wang, Ke Liang, Weixuan Liang, Xinhang Wan, Xinwang Liu, Suyuan Liu, Jiyuan Liu, En Zhu

The success of existing multi-view clustering (MVC) relies on the assumption that all views are complete. However, samples are usually partially available due to data corruption or sensor malfunction, which raises the research of incomplete multi-view clustering (IMVC). Although several anchor-based IMVC methods have been proposed to process the large-scale incomplete data, they still suffer from the following drawbacks: i) Most existing approaches neglect the inter-view discrepancy and enforce cross-view representation to be consistent, which would corrupt the representation capability of the model; ii) Due to the samples disparity between different views, the learned anchor might be misaligned, which we referred as the Anchor-Unaligned Problem for Incomplete data (AUP-ID). Such the AUP-ID would cause inaccurate graph fusion and degrades clustering performance. To tackle these issues, we propose a novel incomplete anchor graph learning framework termed Scalable Incomplete Multi-View Clustering with Structure Alignment (SIMVC-SA). Specially, we construct the view-specific anchor graph to capture the complementary information from different views. In order to solve the AUP-ID, we propose a novel structure alignment module to refine the cross-view anchor correspondence. Meanwhile, the anchor graph construction and alignment are jointly optimized in our unified framework to enhance clustering quality. Through anchor graph construction instead of full graphs, the time and space complexity of the proposed SIMVC-SA is proven to be linearly correlated with the number of samples. Extensive experiments on seven incomplete benchmark datasets demonstrate the effectiveness and efficiency of our proposed method. Our code is publicly available at https://github.com/wy1019/SIMVC-SA.

Forecasting Emergency Department Crowding with Advanced Machine Learning Models and Multivariable Input. (arXiv:2308.16544v1 [cs.LG])

Authors: Jalmari Tuominen, Eetu Pulkkinen, Jaakko Peltonen, Juho Kanniainen, Niku Oksala, Ari Palomäki, Antti Roine

Emergency department (ED) crowding is a significant threat to patient safety and it has been repeatedly associated with increased mortality. Forecasting future service demand has the potential patient outcomes. Despite active research on the subject, several gaps remain: 1) proposed forecasting models have become outdated due to quick influx of advanced machine learning models (ML), 2) amount of multivariable input data has been limited and 3) discrete performance metrics have been rarely reported. In this study, we document the performance of a set of advanced ML models in forecasting ED occupancy 24 hours ahead. We use electronic health record data from a large, combined ED with an extensive set of explanatory variables, including the availability of beds in catchment area hospitals, traffic data from local observation stations, weather variables, etc. We show that N-BEATS and LightGBM outpeform benchmarks with 11 % and 9 % respective improvements and that DeepAR predicts next day crowding with an AUC of 0.76 (95 % CI 0.69-0.84). To the best of our knowledge, this is the first study to document the superiority of LightGBM and N-BEATS over statistical benchmarks in the context of ED forecasting.

MONDEO: Multistage Botnet Detection. (arXiv:2308.16570v1 [cs.CR])

Authors: Duarte Dias, Bruno Sousa, Nuno Antunes

Mobile devices have widespread to become the most used piece of technology. Due to their characteristics, they have become major targets for botnet-related malware. FluBot is one example of botnet malware that infects mobile devices. In particular, FluBot is a DNS-based botnet that uses Domain Generation Algorithms (DGA) to establish communication with the Command and Control Server (C2). MONDEO is a multistage mechanism with a flexible design to detect DNS-based botnet malware. MONDEO is lightweight and can be deployed without requiring the deployment of software, agents, or configuration in mobile devices, allowing easy integration in core networks. MONDEO comprises four detection stages: Blacklisting/Whitelisting, Query rate analysis, DGA analysis, and Machine learning evaluation. It was created with the goal of processing streams of packets to identify attacks with high efficiency, in the distinct phases. MONDEO was tested against several datasets to measure its efficiency and performance, being able to achieve high performance with RandomForest classifiers. The implementation is available at github.

Document Layout Analysis on BaDLAD Dataset: A Comprehensive MViTv2 Based Approach. (arXiv:2308.16571v1 [cs.CV])

Authors: Ashrafur Rahman Khan, Asif Azad

In the rapidly evolving digital era, the analysis of document layouts plays a pivotal role in automated information extraction and interpretation. In our work, we have trained MViTv2 transformer model architecture with cascaded mask R-CNN on BaDLAD dataset to extract text box, paragraphs, images and tables from a document. After training on 20365 document images for 36 epochs in a 3 phase cycle, we achieved a training loss of 0.2125 and a mask loss of 0.19. Our work extends beyond training, delving into the exploration of potential enhancement avenues. We investigate the impact of rotation and flip augmentation, the effectiveness of slicing input images pre-inference, the implications of varying the resolution of the transformer backbone, and the potential of employing a dual-pass inference to uncover missed text-boxes. Through these explorations, we observe a spectrum of outcomes, where some modifications result in tangible performance improvements, while others offer unique insights for future endeavors.

CL-MAE: Curriculum-Learned Masked Autoencoders. (arXiv:2308.16572v1 [cs.CV])

Authors: Neelu Madan, Nicolae-Catalin Ristea, Kamal Nasrollahi, Thomas B. Moeslund, Radu Tudor Ionescu

Masked image modeling has been demonstrated as a powerful pretext task for generating robust representations that can be effectively generalized across multiple downstream tasks. Typically, this approach involves randomly masking patches (tokens) in input images, with the masking strategy remaining unchanged during training. In this paper, we propose a curriculum learning approach that updates the masking strategy to continually increase the complexity of the self-supervised reconstruction task. We conjecture that, by gradually increasing the task complexity, the model can learn more sophisticated and transferable representations. To facilitate this, we introduce a novel learnable masking module that possesses the capability to generate masks of different complexities, and integrate the proposed module into masked autoencoders (MAE). Our module is jointly trained with the MAE, while adjusting its behavior during training, transitioning from a partner to the MAE (optimizing the same reconstruction loss) to an adversary (optimizing the opposite loss), while passing through a neutral state. The transition between these behaviors is smooth, being regulated by a factor that is multiplied with the reconstruction loss of the masking module. The resulting training procedure generates an easy-to-hard curriculum. We train our Curriculum-Learned Masked Autoencoder (CL-MAE) on ImageNet and show that it exhibits superior representation learning capabilities compared to MAE. The empirical results on five downstream tasks confirm our conjecture, demonstrating that curriculum learning can be successfully used to self-supervise masked autoencoders.

Development and validation of an interpretable machine learning-based calculator for predicting 5-year weight trajectories after bariatric surgery: a multinational retrospective cohort SOPHIA study. (arXiv:2308.16585v1 [cs.LG])

Authors: Patrick Saux (Scool, CRIStAL), Pierre Bauvin, Violeta Raverdy, Julien Teigny (Scool), Hélène Verkindt, Tomy Soumphonphakdy (Scool), Maxence Debert (Scool), Anne Jacobs, Daan Jacobs, Valerie Monpellier, Phong Ching Lee, Chin Hong Lim, Johanna C Andersson-Assarsson, Lena Carlsson, Per-Arne Svensson, Florence Galtier, Guelareh Dezfoulian, Mihaela Moldovanu, Severine Andrieux, Julien Couster, Marie Lepage, Erminia Lembo, Ornella Verrastro, Maud Robert, Paulina Salminen, Geltrude Mingrone, Ralph Peterli, Ricardo V Cohen, Carlos Zerrweck, David Nocca, Carel W Le Roux, Robert Caiazzo, Philippe Preux (Scool, CRIStAL), François Pattou

Background Weight loss trajectories after bariatric surgery vary widely between individuals, and predicting weight loss before the operation remains challenging. We aimed to develop a model using machine learning to provide individual preoperative prediction of 5-year weight loss trajectories after surgery. Methods In this multinational retrospective observational study we enrolled adult participants (aged $\ge$18 years) from ten prospective cohorts (including ABOS [NCT01129297], BAREVAL [NCT02310178], the Swedish Obese Subjects study, and a large cohort from the Dutch Obesity Clinic [Nederlandse Obesitas Kliniek]) and two randomised trials (SleevePass [NCT00793143] and SM-BOSS [NCT00356213]) in Europe, the Americas, and Asia, with a 5 year followup after Roux-en-Y gastric bypass, sleeve gastrectomy, or gastric band. Patients with a previous history of bariatric surgery or large delays between scheduled and actual visits were excluded. The training cohort comprised patients from two centres in France (ABOS and BAREVAL). The primary outcome was BMI at 5 years. A model was developed using least absolute shrinkage and selection operator to select variables and the classification and regression trees algorithm to build interpretable regression trees. The performances of the model were assessed through the median absolute deviation (MAD) and root mean squared error (RMSE) of BMI. Findings10 231 patients from 12 centres in ten countries were included in the analysis, corresponding to 30 602 patient-years. Among participants in all 12 cohorts, 7701 (75$\bullet$3%) were female, 2530 (24$\bullet$7%) were male. Among 434 baseline attributes available in the training cohort, seven variables were selected: height, weight, intervention type, age, diabetes status, diabetes duration, and smoking status. At 5 years, across external testing cohorts the overall mean MAD BMI was 2$\bullet$8 kg/m${}^2$ (95% CI 2$\bullet$6-3$\bullet$0) and mean RMSE BMI was 4$\bullet$7 kg/m${}^2$ (4$\bullet$4-5$\bullet$0), and the mean difference between predicted and observed BMI was-0$\bullet$3 kg/m${}^2$ (SD 4$\bullet$7). This model is incorporated in an easy to use and interpretable web-based prediction tool to help inform clinical decision before surgery. InterpretationWe developed a machine learning-based model, which is internationally validated, for predicting individual 5-year weight loss trajectories after three common bariatric interventions.

Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis. (arXiv:2308.16593v1 [cs.SD])

Authors: Weiqin Li, Shun Lei, Qiaochu Huang, Yixuan Zhou, Zhiyong Wu, Shiyin Kang, Helen Meng

The spontaneous behavior that often occurs in conversations makes speech more human-like compared to reading-style. However, synthesizing spontaneous-style speech is challenging due to the lack of high-quality spontaneous datasets and the high cost of labeling spontaneous behavior. In this paper, we propose a semi-supervised pre-training method to increase the amount of spontaneous-style speech and spontaneous behavioral labels. In the process of semi-supervised learning, both text and speech information are considered for detecting spontaneous behaviors labels in speech. Moreover, a linguistic-aware encoder is used to model the relationship between each sentence in the conversation. Experimental results indicate that our proposed method achieves superior expressive speech synthesis performance with the ability to model spontaneous behavior in spontaneous-style speech and predict reasonable spontaneous behavior from text.

Towards Optimal Patch Size in Vision Transformers for Tumor Segmentation. (arXiv:2308.16598v1 [eess.IV])

Authors: Ramtin Mojtahedi, Mohammad Hamghalam, Richard K. G. Do, Amber L. Simpson

Detection of tumors in metastatic colorectal cancer (mCRC) plays an essential role in the early diagnosis and treatment of liver cancer. Deep learning models backboned by fully convolutional neural networks (FCNNs) have become the dominant model for segmenting 3D computerized tomography (CT) scans. However, since their convolution layers suffer from limited kernel size, they are not able to capture long-range dependencies and global context. To tackle this restriction, vision transformers have been introduced to solve FCNN's locality of receptive fields. Although transformers can capture long-range features, their segmentation performance decreases with various tumor sizes due to the model sensitivity to the input patch size. While finding an optimal patch size improves the performance of vision transformer-based models on segmentation tasks, it is a time-consuming and challenging procedure. This paper proposes a technique to select the vision transformer's optimal input multi-resolution image patch size based on the average volume size of metastasis lesions. We further validated our suggested framework using a transfer-learning technique, demonstrating that the highest Dice similarity coefficient (DSC) performance was obtained by pre-training on training data with a larger tumour volume using the suggested ideal patch size and then training with a smaller one. We experimentally evaluate this idea through pre-training our model on a multi-resolution public dataset. Our model showed consistent and improved results when applied to our private multi-resolution mCRC dataset with a smaller average tumor volume. This study lays the groundwork for optimizing semantic segmentation of small objects using vision transformers. The implementation source code is available at:https://github.com/Ramtin-Mojtahedi/OVTPS.

A Causal Discovery Approach To Learn How Urban Form Shapes Sustainable Mobility Across Continents. (arXiv:2308.16599v1 [cs.LG])

Authors: Felix Wagner, Florian Nachtigall, Lukas Franken, Nikola Milojevic-Dupont, Rafael H.M. Pereira, Nicolas Koch, Jakob Runge, Marta Gonzalez, Felix Creutzig

Global sustainability requires low-carbon urban transport systems, shaped by adequate infrastructure, deployment of low-carbon transport modes and shifts in travel behavior. To adequately implement alterations in infrastructure, it's essential to grasp the location-specific cause-and-effect mechanisms that the constructed environment has on travel. Yet, current research falls short in representing causal relationships between the 6D urban form variables and travel, generalizing across different regions, and modeling urban form effects at high spatial resolution. Here, we address all three gaps by utilizing a causal discovery and an explainable machine learning framework to detect urban form effects on intra-city travel based on high-resolution mobility data of six cities across three continents. We show that both distance to city center, demographics and density indirectly affect other urban form features. By considering the causal relationships, we find that location-specific influences align across cities, yet vary in magnitude. In addition, the spread of the city and the coverage of jobs across the city are the strongest determinants of travel-related emissions, highlighting the benefits of compact development and associated benefits. Differences in urban form effects across the cities call for a more holistic definition of 6D measures. Our work is a starting point for location-specific analysis of urban form effects on mobility behavior using causal discovery approaches, which is highly relevant for city planners and municipalities across continents.

Towards Long-Tailed Recognition for Graph Classification via Collaborative Experts. (arXiv:2308.16609v1 [cs.LG])

Authors: Siyu Yi, Zhengyang Mao, Wei Ju, Yongdao Zhou, Luchen Liu, Xiao Luo, Ming Zhang

Graph classification, aiming at learning the graph-level representations for effective class assignments, has received outstanding achievements, which heavily relies on high-quality datasets that have balanced class distribution. In fact, most real-world graph data naturally presents a long-tailed form, where the head classes occupy much more samples than the tail classes, it thus is essential to study the graph-level classification over long-tailed data while still remaining largely unexplored. However, most existing long-tailed learning methods in visions fail to jointly optimize the representation learning and classifier training, as well as neglect the mining of the hard-to-classify classes. Directly applying existing methods to graphs may lead to sub-optimal performance, since the model trained on graphs would be more sensitive to the long-tailed distribution due to the complex topological characteristics. Hence, in this paper, we propose a novel long-tailed graph-level classification framework via Collaborative Multi-expert Learning (CoMe) to tackle the problem. To equilibrate the contributions of head and tail classes, we first develop balanced contrastive learning from the view of representation learning, and then design an individual-expert classifier training based on hard class mining. In addition, we execute gated fusion and disentangled knowledge distillation among the multiple experts to promote the collaboration in a multi-expert framework. Comprehensive experiments are performed on seven widely-used benchmark datasets to demonstrate the superiority of our method CoMe over state-of-the-art baselines.

Generate Your Own Scotland: Satellite Image Generation Conditioned on Maps. (arXiv:2308.16648v1 [cs.CV])

Authors: Miguel Espinosa, Elliot J. Crowley

Despite recent advancements in image generation, diffusion models still remain largely underexplored in Earth Observation. In this paper we show that state-of-the-art pretrained diffusion models can be conditioned on cartographic data to generate realistic satellite images. We provide two large datasets of paired OpenStreetMap images and satellite views over the region of Mainland Scotland and the Central Belt. We train a ControlNet model and qualitatively evaluate the results, demonstrating that both image quality and map fidelity are possible. Finally, we provide some insights on the opportunities and challenges of applying these models for remote sensing. Our model weights and code for creating the dataset are publicly available at https://github.com/miquel-espinosa/map-sat.

Autoencoder-based Online Data Quality Monitoring for the CMS Electromagnetic Calorimeter. (arXiv:2308.16659v1 [physics.ins-det])

Authors: Abhirami Harilal, Kyungmin Park, Michael Andrews, Manfred Paulini (on behalf of the CMS Collaboration)

The online Data Quality Monitoring system (DQM) of the CMS electromagnetic calorimeter (ECAL) is a crucial operational tool that allows ECAL experts to quickly identify, localize, and diagnose a broad range of detector issues that would otherwise hinder physics-quality data taking. Although the existing ECAL DQM system has been continuously updated to respond to new problems, it remains one step behind newer and unforeseen issues. Using unsupervised deep learning, a real-time autoencoder-based anomaly detection system is developed that is able to detect ECAL anomalies unseen in past data. After accounting for spatial variations in the response of the ECAL and the temporal evolution of anomalies, the new system is able to efficiently detect anomalies while maintaining an estimated false discovery rate between $10^{-2}$ to $10^{-4}$, beating existing benchmarks by about two orders of magnitude. The real-world performance of the system is validated using anomalies found in 2018 and 2022 LHC collision data. Additionally, first results from deploying the autoencoder-based system in the CMS online DQM workflow for the ECAL barrel during Run 3 of the LHC are presented, showing its promising performance in detecting obscure issues that could have been missed in the existing DQM system.

What can we learn from quantum convolutional neural networks?. (arXiv:2308.16664v1 [quant-ph])

Authors: Chukwudubem Umeano, Annie E. Paine, Vincent E. Elfving, Oleksandr Kyriienko

We can learn from analyzing quantum convolutional neural networks (QCNNs) that: 1) working with quantum data can be perceived as embedding physical system parameters through a hidden feature map; 2) their high performance for quantum phase recognition can be attributed to generation of a very suitable basis set during the ground state embedding, where quantum criticality of spin models leads to basis functions with rapidly changing features; 3) pooling layers of QCNNs are responsible for picking those basis functions that can contribute to forming a high-performing decision boundary, and the learning process corresponds to adapting the measurement such that few-qubit operators are mapped to full-register observables; 4) generalization of QCNN models strongly depends on the embedding type, and that rotation-based feature maps with the Fourier basis require careful feature engineering; 5) accuracy and generalization of QCNNs with readout based on a limited number of shots favor the ground state embeddings and associated physics-informed models. We demonstrate these points in simulation, where our results shed light on classification for physical processes, relevant for applications in sensing. Finally, we show that QCNNs with properly chosen ground state embeddings can be used for fluid dynamics problems, expressing shock wave solutions with good generalization and proven trainability.

Communication-Efficient Decentralized Federated Learning via One-Bit Compressive Sensing. (arXiv:2308.16671v1 [cs.LG])

Authors: Shenglong Zhou, Kaidi Xu, Geoffrey Ye Li

Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications. Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging, as there is no central server to coordinate the training process. Especially when distributed nodes suffer from limitations in communication or computational resources, DFL will experience extremely inefficient and unstable training. Motivated by these challenges, in this paper, we develop a novel algorithm based on the framework of the inexact alternating direction method (iADM). On one hand, our goal is to train a shared model with a sparsity constraint. This constraint enables us to leverage one-bit compressive sensing (1BCS), allowing transmission of one-bit information among neighbour nodes. On the other hand, communication between neighbour nodes occurs only at certain steps, reducing the number of communication rounds. Therefore, the algorithm exhibits notable communication efficiency. Additionally, as each node selects only a subset of neighbours to participate in the training, the algorithm is robust against stragglers. Additionally, complex items are computed only once for several consecutive steps and subproblems are solved inexactly using closed-form solutions, resulting in high computational efficiency. Finally, numerical experiments showcase the algorithm's effectiveness in both communication and computation.

Dynamic nsNet2: Efficient Deep Noise Suppression with Early Exiting. (arXiv:2308.16678v1 [cs.SD])

Authors: Riccardo Miccini, Alaa Zniber, Clément Laroche, Tobias Piechowiak, Martin Schoeberl, Luca Pezzarossa, Ouassim Karrakchou, Jens Sparsø, Mounir Ghogho

Although deep learning has made strides in the field of deep noise suppression, leveraging deep architectures on resource-constrained devices still proved challenging. Therefore, we present an early-exiting model based on nsNet2 that provides several levels of accuracy and resource savings by halting computations at different stages. Moreover, we adapt the original architecture by splitting the information flow to take into account the injected dynamism. We show the trade-offs between performance and computational complexity based on established metrics.

Branches of a Tree: Taking Derivatives of Programs with Discrete and Branching Randomness in High Energy Physics. (arXiv:2308.16680v1 [stat.ML])

Authors: Michael Kagan, Lukas Heinrich

We propose to apply several gradient estimation techniques to enable the differentiation of programs with discrete randomness in High Energy Physics. Such programs are common in High Energy Physics due to the presence of branching processes and clustering-based analysis. Thus differentiating such programs can open the way for gradient based optimization in the context of detector design optimization, simulator tuning, or data analysis and reconstruction optimization. We discuss several possible gradient estimation strategies, including the recent Stochastic AD method, and compare them in simplified detector design experiments. In doing so we develop, to the best of our knowledge, the first fully differentiable branching program.

Everything, Everywhere All in One Evaluation: Using Multiverse Analysis to Evaluate the Influence of Model Design Decisions on Algorithmic Fairness. (arXiv:2308.16681v1 [stat.ML])

Authors: Jan Simson, Florian Pfisterer, Christoph Kern

A vast number of systems across the world use algorithmic decision making (ADM) to (partially) automate decisions that have previously been made by humans. When designed well, these systems promise more objective decisions while saving large amounts of resources and freeing up human time. However, when ADM systems are not designed well, they can lead to unfair decisions which discriminate against societal groups. The downstream effects of ADMs critically depend on the decisions made during the systems' design and implementation, as biases in data can be mitigated or reinforced along the modeling pipeline. Many of these design decisions are made implicitly, without knowing exactly how they will influence the final system. It is therefore important to make explicit the decisions made during the design of ADM systems and understand how these decisions affect the fairness of the resulting system.

To study this issue, we draw on insights from the field of psychology and introduce the method of multiverse analysis for algorithmic fairness. In our proposed method, we turn implicit design decisions into explicit ones and demonstrate their fairness implications. By combining decisions, we create a grid of all possible "universes" of decision combinations. For each of these universes, we compute metrics of fairness and performance. Using the resulting dataset, one can see how and which decisions impact fairness. We demonstrate how multiverse analyses can be used to better understand variability and robustness of algorithmic fairness using an exemplary case study of predicting public health coverage of vulnerable populations for potential interventions. Our results illustrate how decisions during the design of a machine learning system can have surprising effects on its fairness and how to detect these effects using multiverse analysis.

Everyone Can Attack: Repurpose Lossy Compression as a Natural Backdoor Attack. (arXiv:2308.16684v1 [cs.CR])

Authors: Sze Jue Yang, Quang Nguyen, Chee Seng Chan, Khoa Doan

The vulnerabilities to backdoor attacks have recently threatened the trustworthiness of machine learning models in practical applications. Conventional wisdom suggests that not everyone can be an attacker since the process of designing the trigger generation algorithm often involves significant effort and extensive experimentation to ensure the attack's stealthiness and effectiveness. Alternatively, this paper shows that there exists a more severe backdoor threat: anyone can exploit an easily-accessible algorithm for silent backdoor attacks. Specifically, this attacker can employ the widely-used lossy image compression from a plethora of compression tools to effortlessly inject a trigger pattern into an image without leaving any noticeable trace; i.e., the generated triggers are natural artifacts. One does not require extensive knowledge to click on the "convert" or "save as" button while using tools for lossy image compression. Via this attack, the adversary does not need to design a trigger generator as seen in prior works and only requires poisoning the data. Empirically, the proposed attack consistently achieves 100% attack success rate in several benchmark datasets such as MNIST, CIFAR-10, GTSRB and CelebA. More significantly, the proposed attack can still achieve almost 100% attack success rate with very small (approximately 10%) poisoning rates in the clean label setting. The generated trigger of the proposed attack using one lossy compression algorithm is also transferable across other related compression algorithms, exacerbating the severity of this backdoor threat. This work takes another crucial step toward understanding the extensive risks of backdoor attacks in practice, urging practitioners to investigate similar attacks and relevant backdoor mitigation methods.

Robust Representation Learning for Unreliable Partial Label Learning. (arXiv:2308.16718v1 [cs.LG])

Authors: Yu Shi, Dong-Dong Wu, Xin Geng, Min-Ling Zhang

Partial Label Learning (PLL) is a type of weakly supervised learning where each training instance is assigned a set of candidate labels, but only one label is the ground-truth. However, this idealistic assumption may not always hold due to potential annotation inaccuracies, meaning the ground-truth may not be present in the candidate label set. This is known as Unreliable Partial Label Learning (UPLL) that introduces an additional complexity due to the inherent unreliability and ambiguity of partial labels, often resulting in a sub-optimal performance with existing methods. To address this challenge, we propose the Unreliability-Robust Representation Learning framework (URRL) that leverages unreliability-robust contrastive learning to help the model fortify against unreliable partial labels effectively. Concurrently, we propose a dual strategy that combines KNN-based candidate label set correction and consistency-regularization-based label disambiguation to refine label quality and enhance the ability of representation learning within the URRL framework. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art PLL methods on various datasets with diverse degrees of unreliability and ambiguity. Furthermore, we provide a theoretical analysis of our approach from the perspective of the expectation maximization (EM) algorithm. Upon acceptance, we pledge to make the code publicly accessible.

Robust Networked Federated Learning for Localization. (arXiv:2308.16737v1 [cs.LG])

Authors: Reza Mirzaeifard, Naveen K. D. Venkategowda, Stefan Werner

This paper addresses the problem of localization, which is inherently non-convex and non-smooth in a federated setting where the data is distributed across a multitude of devices. Due to the decentralized nature of federated environments, distributed learning becomes essential for scalability and adaptability. Moreover, these environments are often plagued by outlier data, which presents substantial challenges to conventional methods, particularly in maintaining estimation accuracy and ensuring algorithm convergence. To mitigate these challenges, we propose a method that adopts an $L_1$-norm robust formulation within a distributed sub-gradient framework, explicitly designed to handle these obstacles. Our approach addresses the problem in its original form, without resorting to iterative simplifications or approximations, resulting in enhanced computational efficiency and improved estimation accuracy. We demonstrate that our method converges to a stationary point, highlighting its effectiveness and reliability. Through numerical simulations, we confirm the superior performance of our approach, notably in outlier-rich environments, which surpasses existing state-of-the-art localization methods.

US-SFNet: A Spatial-Frequency Domain-based Multi-branch Network for Cervical Lymph Node Lesions Diagnoses in Ultrasound Images. (arXiv:2308.16738v1 [eess.IV])

Authors: Yubiao Yue, Jun Xue, Haihua Liang, Bingchun Luo, Zhenzhang Li

Ultrasound imaging serves as a pivotal tool for diagnosing cervical lymph node lesions. However, the diagnoses of these images largely hinge on the expertise of medical practitioners, rendering the process susceptible to misdiagnoses. Although rapidly developing deep learning has substantially improved the diagnoses of diverse ultrasound images, there remains a conspicuous research gap concerning cervical lymph nodes. The objective of our work is to accurately diagnose cervical lymph node lesions by leveraging a deep learning model. To this end, we first collected 3392 images containing normal lymph nodes, benign lymph node lesions, malignant primary lymph node lesions, and malignant metastatic lymph node lesions. Given that ultrasound images are generated by the reflection and scattering of sound waves across varied bodily tissues, we proposed the Conv-FFT Block. It integrates convolutional operations with the fast Fourier transform to more astutely model the images. Building upon this foundation, we designed a novel architecture, named US-SFNet. This architecture not only discerns variances in ultrasound images from the spatial domain but also adeptly captures microstructural alterations across various lesions in the frequency domain. To ascertain the potential of US-SFNet, we benchmarked it against 12 popular architectures through five-fold cross-validation. The results show that US-SFNet is SOTA and can achieve 92.89% accuracy, 90.46% precision, 89.95% sensitivity and 97.49% specificity, respectively.

Moreau Envelope ADMM for Decentralized Weakly Convex Optimization. (arXiv:2308.16752v1 [math.OC])

Authors: Reza Mirzaeifard, Naveen K. D. Venkategowda, Alexander Jung, Stefan Werner

This paper proposes a proximal variant of the alternating direction method of multipliers (ADMM) for distributed optimization. Although the current versions of ADMM algorithm provide promising numerical results in producing solutions that are close to optimal for many convex and non-convex optimization problems, it remains unclear if they can converge to a stationary point for weakly convex and locally non-smooth functions. Through our analysis using the Moreau envelope function, we demonstrate that MADM can indeed converge to a stationary point under mild conditions. Our analysis also includes computing the bounds on the amount of change in the dual variable update step by relating the gradient of the Moreau envelope function to the proximal function. Furthermore, the results of our numerical experiments indicate that our method is faster and more robust than widely-used approaches.

Training Neural Networks Using Reproducing Kernel Space Interpolation and Model Reduction. (arXiv:2308.16754v1 [math.FA])

Authors: Eric Arthur Werneburg

We introduce and study the theory of training neural networks using interpolation techniques from reproducing kernel Hilbert space theory. We generalize the method to Krein spaces, and show that widely-used neural network architectures are subsets of reproducing kernel Krein spaces (RKKS). We study the concept of "associated Hilbert spaces" of RKKS and develop techniques to improve upon the expressivity of various activation functions. Next, using concepts from the theory of functions of several complex variables, we prove a computationally applicable, multidimensional generalization of the celebrated Adamjan- Arov-Krein (AAK) theorem. The theorem yields a novel class of neural networks, called Prolongation Neural Networks (PNN). We demonstrate that, by applying the multidimensional AAK theorem to gain a PNN, one can gain performance superior to both our interpolatory methods and current state-of-the-art methods in noisy environments. We provide useful illustrations of our methods in practice.

Constructing Indoor Region-based Radio Map without Location Labels. (arXiv:2308.16759v1 [cs.LG])

Authors: Zheng Xing, Junting Chen

Radio map construction requires a large amount of radio measurement data with location labels, which imposes a high deployment cost. This paper develops a region-based radio map from received signal strength (RSS) measurements without location labels. The construction is based on a set of blindly collected RSS measurement data from a device that visits each region in an indoor area exactly once, where the footprints and timestamps are not recorded. The main challenge is to cluster the RSS data and match clusters with the physical regions. Classical clustering algorithms fail to work as the RSS data naturally appears as non-clustered due to multipaths and noise. In this paper, a signal subspace model with a sequential prior is constructed for the RSS data, and an integrated segmentation and clustering algorithm is developed, which is shown to find the globally optimal solution in a special case. Furthermore, the clustered data is matched with the physical regions using a graph-based approach. Based on real measurements from an office space, the proposed scheme reduces the region localization error by roughly 50% compared to a weighted centroid localization (WCL) baseline, and it even outperforms some supervised localization schemes, including k-nearest neighbor (KNN), support vector machine (SVM), and deep neural network (DNN), which require labeled data for training.

Efficacy of Neural Prediction-Based NAS for Zero-Shot NAS Paradigm. (arXiv:2308.16775v1 [cs.LG])

Authors: Minh Le, Nhan Nguyen, Ngoc Hoang Luong

In prediction-based Neural Architecture Search (NAS), performance indicators derived from graph convolutional networks have shown significant success. These indicators, achieved by representing feed-forward structures as component graphs through one-hot encoding, face a limitation: their inability to evaluate architecture performance across varying search spaces. In contrast, handcrafted performance indicators (zero-shot NAS), which use the same architecture with random initialization, can generalize across multiple search spaces. Addressing this limitation, we propose a novel approach for zero-shot NAS using deep learning. Our method employs Fourier sum of sines encoding for convolutional kernels, enabling the construction of a computational feed-forward graph with a structure similar to the architecture under evaluation. These encodings are learnable and offer a comprehensive view of the architecture's topological information. An accompanying multi-layer perceptron (MLP) then ranks these architectures based on their encodings. Experimental results show that our approach surpasses previous methods using graph convolutional networks in terms of correlation on the NAS-Bench-201 dataset and exhibits a higher convergence rate. Moreover, our extracted feature representation trained on each NAS-Benchmark is transferable to other NAS-Benchmarks, showing promising generalizability across multiple search spaces. The code is available at: https://github.com/minh1409/DFT-NPZS-NAS

StratMed: Relevance Stratification for Low-resource Medication Recommendation. (arXiv:2308.16781v1 [cs.AI])

Authors: Xiang Li

With the growing imbalance between limited medical resources and escalating demands, AI-based clinical tasks have become paramount. Medication recommendation, as a sub-domain, aims to amalgamate longitudinal patient history with medical knowledge, assisting physicians in prescribing safer and more accurate medication combinations. Existing methods overlook the inherent long-tail distribution in medical data, lacking balanced representation between head and tail data, which leads to sub-optimal model performance. To address this challenge, we introduce StratMed, a model that incorporates an innovative relevance stratification mechanism. It harmonizes discrepancies in data long-tail distribution and strikes a balance between the safety and accuracy of medication combinations. Specifically, we first construct a pre-training method using deep learning networks to obtain entity representation. After that, we design a pyramid-like data stratification method to obtain more generalized entity relationships by reinforcing the features of unpopular entities. Based on this relationship, we designed two graph structures to express medication precision and safety at the same level to obtain visit representations. Finally, the patient's historical clinical information is fitted to generate medication combinations for the current health condition. Experiments on the MIMIC-III dataset demonstrate that our method has outperformed current state-of-the-art methods in four evaluation metrics (including safety and accuracy).

Joint Semantic-Native Communication and Inference via Minimal Simplicial Structures. (arXiv:2308.16789v1 [eess.SP])

Authors: Qiyang Zhao, Hang Zou, Mehdi Bennis, Merouane Debbah, Ebtesam Almazrouei, Faouzi Bader

In this work, we study the problem of semantic communication and inference, in which a student agent (i.e. mobile device) queries a teacher agent (i.e. cloud sever) to generate higher-order data semantics living in a simplicial complex. Specifically, the teacher first maps its data into a k-order simplicial complex and learns its high-order correlations. For effective communication and inference, the teacher seeks minimally sufficient and invariant semantic structures prior to conveying information. These minimal simplicial structures are found via judiciously removing simplices selected by the Hodge Laplacians without compromising the inference query accuracy. Subsequently, the student locally runs its own set of queries based on a masked simplicial convolutional autoencoder (SCAE) leveraging both local and remote teacher's knowledge. Numerical results corroborate the effectiveness of the proposed approach in terms of improving inference query accuracy under different channel conditions and simplicial structures. Experiments on a coauthorship dataset show that removing simplices by ranking the Laplacian values yields a 85% reduction in payload size without sacrificing accuracy. Joint semantic communication and inference by masked SCAE improves query accuracy by 25% compared to local student based query and 15% compared to remote teacher based query. Finally, incorporating channel semantics is shown to effectively improve inference accuracy, notably at low SNR values.

Rank Collapse Causes Over-Smoothing and Over-Correlation in Graph Neural Networks. (arXiv:2308.16800v1 [cs.LG])

Authors: Andreas Roth, Thomas Liebig

Our study reveals new theoretical insights into over-smoothing and feature over-correlation in deep graph neural networks. We show the prevalence of invariant subspaces, demonstrating a fixed relative behavior that is unaffected by feature transformations. Our work clarifies recent observations related to convergence to a constant state and a potential over-separation of node states, as the amplification of subspaces only depends on the spectrum of the aggregation function. In linear scenarios, this leads to node representations being dominated by a low-dimensional subspace with an asymptotic convergence rate independent of the feature transformations. This causes a rank collapse of the node representations, resulting in over-smoothing when smooth vectors span this subspace, and over-correlation even when over-smoothing is avoided. Guided by our theory, we propose a sum of Kronecker products as a beneficial property that can provably prevent over-smoothing, over-correlation, and rank collapse. We empirically extend our insights to the non-linear case, demonstrating the inability of existing models to capture linearly independent features.

Irregular Traffic Time Series Forecasting Based on Asynchronous Spatio-Temporal Graph Convolutional Network. (arXiv:2308.16818v1 [cs.LG])

Authors: Weijia Zhang, Le Zhang, Jindong Han, Hao Liu, Jingbo Zhou, Yu Mei, Hui Xiong

Accurate traffic forecasting at intersections governed by intelligent traffic signals is critical for the advancement of an effective intelligent traffic signal control system. However, due to the irregular traffic time series produced by intelligent intersections, the traffic forecasting task becomes much more intractable and imposes three major new challenges: 1) asynchronous spatial dependency, 2) irregular temporal dependency among traffic data, and 3) variable-length sequence to be predicted, which severely impede the performance of current traffic forecasting methods. To this end, we propose an Asynchronous Spatio-tEmporal graph convolutional nEtwoRk (ASeer) to predict the traffic states of the lanes entering intelligent intersections in a future time window. Specifically, by linking lanes via a traffic diffusion graph, we first propose an Asynchronous Graph Diffusion Network to model the asynchronous spatial dependency between the time-misaligned traffic state measurements of lanes. After that, to capture the temporal dependency within irregular traffic state sequence, a learnable personalized time encoding is devised to embed the continuous time for each lane. Then we propose a Transformable Time-aware Convolution Network that learns meta-filters to derive time-aware convolution filters with transformable filter sizes for efficient temporal convolution on the irregular sequence. Furthermore, a Semi-Autoregressive Prediction Network consisting of a state evolution unit and a semiautoregressive predictor is designed to effectively and efficiently predict variable-length traffic state sequences. Extensive experiments on two real-world datasets demonstrate the effectiveness of ASeer in six metrics.

Latent Variable Multi-output Gaussian Processes for Hierarchical Datasets. (arXiv:2308.16822v1 [cs.LG])

Authors: Chunchao Ma, Arthur Leroy, Mauricio Alvarez

Multi-output Gaussian processes (MOGPs) have been introduced to deal with multiple tasks by exploiting the correlations between different outputs. Generally, MOGPs models assume a flat correlation structure between the outputs. However, such a formulation does not account for more elaborate relationships, for instance, if several replicates were observed for each output (which is a typical setting in biological experiments). This paper proposes an extension of MOGPs for hierarchical datasets (i.e. datasets for which the relationships between observations can be represented within a tree structure). Our model defines a tailored kernel function accounting for hierarchical structures in the data to capture different levels of correlations while leveraging the introduction of latent variables to express the underlying dependencies between outputs through a dedicated kernel. This latter feature is expected to significantly improve scalability as the number of tasks increases. An extensive experimental study involving both synthetic and real-world data from genomics and motion capture is proposed to support our claims.

FedDD: Toward Communication-efficient Federated Learning with Differential Parameter Dropout. (arXiv:2308.16835v1 [cs.LG])

Authors: Zhiying Feng, Xu Chen, Qiong Wu, Wen Wu, Xiaoxi Zhang, Qianyi Huang

Federated Learning (FL) requires frequent exchange of model parameters, which leads to long communication delay, especially when the network environments of clients vary greatly. Moreover, the parameter server needs to wait for the slowest client (i.e., straggler, which may have the largest model size, lowest computing capability or worst network condition) to upload parameters, which may significantly degrade the communication efficiency. Commonly-used client selection methods such as partial client selection would lead to the waste of computing resources and weaken the generalization of the global model. To tackle this problem, along a different line, in this paper, we advocate the approach of model parameter dropout instead of client selection, and accordingly propose a novel framework of Federated learning scheme with Differential parameter Dropout (FedDD). FedDD consists of two key modules: dropout rate allocation and uploaded parameter selection, which will optimize the model parameter uploading ratios tailored to different clients' heterogeneous conditions and also select the proper set of important model parameters for uploading subject to clients' dropout rate constraints. Specifically, the dropout rate allocation is formulated as a convex optimization problem, taking system heterogeneity, data heterogeneity, and model heterogeneity among clients into consideration. The uploaded parameter selection strategy prioritizes on eliciting important parameters for uploading to speedup convergence. Furthermore, we theoretically analyze the convergence of the proposed FedDD scheme. Extensive performance evaluations demonstrate that the proposed FedDD scheme can achieve outstanding performances in both communication efficiency and model convergence, and also possesses a strong generalization capability to data of rare classes.

Diffusion Models for Interferometric Satellite Aperture Radar. (arXiv:2308.16847v1 [cs.CV])

Authors: Alexandre Tuel, Thomas Kerdreux, Claudia Hulbert, Bertrand Rouet-Leduc

Probabilistic Diffusion Models (PDMs) have recently emerged as a very promising class of generative models, achieving high performance in natural image generation. However, their performance relative to non-natural images, like radar-based satellite data, remains largely unknown. Generating large amounts of synthetic (and especially labelled) satellite data is crucial to implement deep-learning approaches for the processing and analysis of (interferometric) satellite aperture radar data. Here, we leverage PDMs to generate several radar-based satellite image datasets. We show that PDMs succeed in generating images with complex and realistic structures, but that sampling time remains an issue. Indeed, accelerated sampling strategies, which work well on simple image datasets like MNIST, fail on our radar datasets. We provide a simple and versatile open-source https://github.com/thomaskerdreux/PDM_SAR_InSAR_generation to train, sample and evaluate PDMs using any dataset on a single GPU.

Natural Quantum Monte Carlo Computation of Excited States. (arXiv:2308.16848v1 [physics.comp-ph])

Authors: David Pfau, Simon Axelrod, Halvard Sutterud, Ingrid von Glehn, James S. Spencer

We present a variational Monte Carlo algorithm for estimating the lowest excited states of a quantum system which is a natural generalization of the estimation of ground states. The method has no free parameters and requires no explicit orthogonalization of the different states, instead transforming the problem of finding excited states of a given system into that of finding the ground state of an expanded system. Expected values of arbitrary observables can be calculated, including off-diagonal expectations between different states such as the transition dipole moment. Although the method is entirely general, it works particularly well in conjunction with recent work on using neural networks as variational Ansatze for many-electron systems, and we show that by combining this method with the FermiNet and Psiformer Ansatze we can accurately recover vertical excitation energies and oscillator strengths on molecules as large as benzene. Beyond the examples on molecules presented here, we expect this technique will be of great interest for applications of variational quantum Monte Carlo to atomic, nuclear and condensed matter physics.

Majorization-Minimization for sparse SVMs. (arXiv:2308.16858v1 [cs.LG])

Authors: Alessandro Benfenati, Emilie Chouzenoux, Giorgia Franchini, Salla Latva-Aijo, Dominik Narnhofer, Jean-Christophe Pesquet, Sebastian J. Scott, Mahsa Yousefi

Several decades ago, Support Vector Machines (SVMs) were introduced for performing binary classification tasks, under a supervised framework. Nowadays, they often outperform other supervised methods and remain one of the most popular approaches in the machine learning arena. In this work, we investigate the training of SVMs through a smooth sparse-promoting-regularized squared hinge loss minimization. This choice paves the way to the application of quick training methods built on majorization-minimization approaches, benefiting from the Lipschitz differentiabililty of the loss function. Moreover, the proposed approach allows us to handle sparsity-preserving regularizers promoting the selection of the most significant features, so enhancing the performance. Numerical tests and comparisons conducted on three different datasets demonstrate the good performance of the proposed methodology in terms of qualitative metrics (accuracy, precision, recall, and F 1 score) as well as computational cost.

Information Theoretically Optimal Sample Complexity of Learning Dynamical Directed Acyclic Graphs. (arXiv:2308.16859v1 [stat.ML])

Authors: Mishfad Shaikh Veedu, Deepjyoti Deka, Murti V. Salapaka

In this article, the optimal sample complexity of learning the underlying interaction/dependencies of a Linear Dynamical System (LDS) over a Directed Acyclic Graph (DAG) is studied. The sample complexity of learning a DAG's structure is well-studied for static systems, where the samples of nodal states are independent and identically distributed (i.i.d.). However, such a study is less explored for DAGs with dynamical systems, where the nodal states are temporally correlated. We call such a DAG underlying an LDS as \emph{dynamical} DAG (DDAG). In particular, we consider a DDAG where the nodal dynamics are driven by unobserved exogenous noise sources that are wide-sense stationary (WSS) in time but are mutually uncorrelated, and have the same {power spectral density (PSD)}. Inspired by the static settings, a metric and an algorithm based on the PSD matrix of the observed time series are proposed to reconstruct the DDAG. The equal noise PSD assumption can be relaxed such that identifiability conditions for DDAG reconstruction are not violated. For the LDS with WSS (sub) Gaussian exogenous noise sources, it is shown that the optimal sample complexity (or length of state trajectory) needed to learn the DDAG is $n=\Theta(q\log(p/q))$, where $p$ is the number of nodes and $q$ is the maximum number of parents per node. To prove the sample complexity upper bound, a concentration bound for the PSD estimation is derived, under two different sampling strategies. A matching min-max lower bound using generalized Fano's inequality also is provided, thus showing the order optimality of the proposed algorithm.

The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants. (arXiv:2308.16884v1 [cs.CL])

Authors: Lucas Bandarkar, Davis Liang, Benjamin Muller, Mikel Artetxe, Satya Narayan Shukla, Donald Husa, Naman Goyal, Abhinandan Krishnan, Luke Zettlemoyer, Madian Khabsa

We present Belebele, a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. Significantly expanding the language coverage of natural language understanding (NLU) benchmarks, this dataset enables the evaluation of text models in high-, medium-, and low-resource languages. Each question is based on a short passage from the Flores-200 dataset and has four multiple-choice answers. The questions were carefully curated to discriminate between models with different levels of general language comprehension. The English dataset on its own proves difficult enough to challenge state-of-the-art language models. Being fully parallel, this dataset enables direct comparison of model performance across all languages. We use this dataset to evaluate the capabilities of multilingual masked language models (MLMs) and large language models (LLMs). We present extensive results and find that despite significant cross-lingual transfer in English-centric LLMs, much smaller MLMs pretrained on balanced multilingual data still understand far more languages. We also observe that larger vocabulary size and conscious vocabulary construction correlate with better performance on low-resource languages. Overall, Belebele opens up new avenues for evaluating and analyzing the multilingual capabilities of NLP systems.

Prediction of Diblock Copolymer Morphology via Machine Learning. (arXiv:2308.16886v1 [physics.chem-ph])

Authors: Hyun Park, Boyuan Yu, Juhae Park, Ge Sun, Emad Tajkhorshid, Juan J. de Pablo, Ludwig Schneider

A machine learning approach is presented to accelerate the computation of block polymer morphology evolution for large domains over long timescales. The strategy exploits the separation of characteristic times between coarse-grained particle evolution on the monomer scale and slow morphological evolution over mesoscopic scales. In contrast to empirical continuum models, the proposed approach learns stochastically driven defect annihilation processes directly from particle-based simulations. A UNet architecture that respects different boundary conditions is adopted, thereby allowing periodic and fixed substrate boundary conditions of arbitrary shape. Physical concepts are also introduced via the loss function and symmetries are incorporated via data augmentation. The model is validated using three different use cases. Explainable artificial intelligence methods are applied to visualize the morphology evolution over time. This approach enables the generation of large system sizes and long trajectories to investigate defect densities and their evolution under different types of confinement. As an application, we demonstrate the importance of accessing late-stage morphologies for understanding particle diffusion inside a single block. This work has implications for directed self-assembly and materials design in micro-electronics, battery materials, and membranes.

Federated Learning in UAV-Enhanced Networks: Joint Coverage and Convergence Time Optimization. (arXiv:2308.16889v1 [cs.LG])

Authors: Mariam Yahya, Setareh Maghsudi, Slawomir Stanczak

Federated learning (FL) involves several devices that collaboratively train a shared model without transferring their local data. FL reduces the communication overhead, making it a promising learning method in UAV-enhanced wireless networks with scarce energy resources. Despite the potential, implementing FL in UAV-enhanced networks is challenging, as conventional UAV placement methods that maximize coverage increase the FL delay significantly. Moreover, the uncertainty and lack of a priori information about crucial variables, such as channel quality, exacerbate the problem. In this paper, we first analyze the statistical characteristics of a UAV-enhanced wireless sensor network (WSN) with energy harvesting. We then develop a model and solution based on the multi-objective multi-armed bandit theory to maximize the network coverage while minimizing the FL delay. Besides, we propose another solution that is particularly useful with large action sets and strict energy constraints at the UAVs. Our proposal uses a scalarized best-arm identification algorithm to find the optimal arms that maximize the ratio of the expected reward to the expected energy cost by sequentially eliminating one or more arms in each round. Then, we derive the upper bound on the error probability of our multi-objective and cost-aware algorithm. Numerical results show the effectiveness of our approach.

GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields. (arXiv:2308.16891v1 [cs.RO])

Authors: Yanjie Ze, Ge Yan, Yueh-Hua Wu, Annabella Macaluso, Yuying Ge, Jianglong Ye, Nicklas Hansen, Li Erran Li, Xiaolong Wang

It is a long-standing problem in robotics to develop agents capable of executing diverse manipulation tasks from visual observations in unstructured real-world environments. To achieve this goal, the robot needs to have a comprehensive understanding of the 3D structure and semantics of the scene. In this work, we present $\textbf{GNFactor}$, a visual behavior cloning agent for multi-task robotic manipulation with $\textbf{G}$eneralizable $\textbf{N}$eural feature $\textbf{F}$ields. GNFactor jointly optimizes a generalizable neural field (GNF) as a reconstruction module and a Perceiver Transformer as a decision-making module, leveraging a shared deep 3D voxel representation. To incorporate semantics in 3D, the reconstruction module utilizes a vision-language foundation model ($\textit{e.g.}$, Stable Diffusion) to distill rich semantic information into the deep 3D voxel. We evaluate GNFactor on 3 real robot tasks and perform detailed ablations on 10 RLBench tasks with a limited number of demonstrations. We observe a substantial improvement of GNFactor over current state-of-the-art methods in seen and unseen tasks, demonstrating the strong generalization ability of GNFactor. Our project website is https://yanjieze.com/GNFactor/ .

Language-Conditioned Path Planning. (arXiv:2308.16893v1 [cs.RO])

Authors: Amber Xie, Youngwoon Lee, Pieter Abbeel, Stephen James

Contact is at the core of robotic manipulation. At times, it is desired (e.g. manipulation and grasping), and at times, it is harmful (e.g. when avoiding obstacles). However, traditional path planning algorithms focus solely on collision-free paths, limiting their applicability in contact-rich tasks. To address this limitation, we propose the domain of Language-Conditioned Path Planning, where contact-awareness is incorporated into the path planning problem. As a first step in this domain, we propose Language-Conditioned Collision Functions (LACO) a novel approach that learns a collision function using only a single-view image, language prompt, and robot configuration. LACO predicts collisions between the robot and the environment, enabling flexible, conditional path planning without the need for manual object annotations, point cloud data, or ground-truth object meshes. In both simulation and the real world, we demonstrate that LACO can facilitate complex, nuanced path plans that allow for interaction with objects that are safe to collide, rather than prohibiting any collision.

PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction. (arXiv:2308.16896v1 [cs.CV])

Authors: Sicheng Zuo, Wenzhao Zheng, Yuanhui Huang, Jie Zhou, Jiwen Lu

Semantic segmentation in autonomous driving has been undergoing an evolution from sparse point segmentation to dense voxel segmentation, where the objective is to predict the semantic occupancy of each voxel in the concerned 3D space. The dense nature of the prediction space has rendered existing efficient 2D-projection-based methods (e.g., bird's eye view, range view, etc.) ineffective, as they can only describe a subspace of the 3D scene. To address this, we propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively and a PointOcc model to process them efficiently. Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system for more fine-grained modeling of nearer areas. We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane. Finally, we obtain the features of each point by aggregating its projected features on each of the processed TPV planes without the need for any post-processing. Extensive experiments on both 3D occupancy prediction and LiDAR segmentation benchmarks demonstrate that the proposed PointOcc achieves state-of-the-art performance with much faster speed. Specifically, despite only using LiDAR, PointOcc significantly outperforms all other methods, including multi-modal methods, with a large margin on the OpenOccupancy benchmark. Code: https://github.com/wzzheng/PointOcc.

Transformers as Support Vector Machines. (arXiv:2308.16898v1 [cs.LG])

Authors: Davoud Ataee Tarzanagh, Yingcong Li, Christos Thrampoulidis, Samet Oymak

Since its inception in "Attention Is All You Need", transformer architecture has led to revolutionary advancements in NLP. The attention layer within the transformer admits a sequence of input tokens $X$ and makes them interact through pairwise similarities computed as softmax$(XQK^\top X^\top)$, where $(K,Q)$ are the trainable key-query parameters. In this work, we establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem that separates optimal input tokens from non-optimal tokens using linear constraints on the outer-products of token pairs. This formalism allows us to characterize the implicit bias of 1-layer transformers optimized with gradient descent: (1) Optimizing the attention layer with vanishing regularization, parameterized by $(K,Q)$, converges in direction to an SVM solution minimizing the nuclear norm of the combined parameter $W=KQ^\top$. Instead, directly parameterizing by $W$ minimizes a Frobenius norm objective. We characterize this convergence, highlighting that it can occur toward locally-optimal directions rather than global ones. (2) Complementing this, we prove the local/global directional convergence of gradient descent under suitable geometric conditions. Importantly, we show that over-parameterization catalyzes global convergence by ensuring the feasibility of the SVM problem and by guaranteeing a benign optimization landscape devoid of stationary points. (3) While our theory applies primarily to linear prediction heads, we propose a more general SVM equivalence that predicts the implicit bias with nonlinear heads. Our findings are applicable to arbitrary datasets and their validity is verified via experiments. We also introduce several open problems and research directions. We believe these findings inspire the interpretation of transformers as a hierarchy of SVMs that separates and selects optimal tokens.

Learning to Taste: A Multimodal Wine Dataset. (arXiv:2308.16900v1 [cs.LG])

Authors: Thoranna Bender, Simon Møe Sørensen, Alireza Kashani, K. Eldjarn Hjorleifsson, Grethe Hyldig, Søren Hauberg, Serge Belongie, Frederik Warburg

We present WineSensed, a large multimodal wine dataset for studying the relations between visual perception, language, and flavor. The dataset encompasses 897k images of wine labels and 824k reviews of wines curated from the Vivino platform. It has over 350k unique vintages, annotated with year, region, rating, alcohol percentage, price, and grape composition. We obtained fine-grained flavor annotations on a subset by conducting a wine-tasting experiment with 256 participants who were asked to rank wines based on their similarity in flavor, resulting in more than 5k pairwise flavor distances. We propose a low-dimensional concept embedding algorithm that combines human experience with automatic machine similarity kernels. We demonstrate that this shared concept embedding space improves upon separate embedding spaces for coarse flavor classification (alcohol percentage, country, grape, price, rating) and aligns with the intricate human perception of flavor.

A Note on Randomized Kaczmarz Algorithm for Solving Doubly-Noisy Linear Systems. (arXiv:2308.16904v1 [math.NA])

Authors: El Houcine Bergou, Soumia Boucherouite, Aritra Dutta, Xin Li, Anna Ma

Large-scale linear systems, $Ax=b$, frequently arise in practice and demand effective iterative solvers. Often, these systems are noisy due to operational errors or faulty data-collection processes. In the past decade, the randomized Kaczmarz (RK) algorithm has been studied extensively as an efficient iterative solver for such systems. However, the convergence study of RK in the noisy regime is limited and considers measurement noise in the right-hand side vector, $b$. Unfortunately, in practice, that is not always the case; the coefficient matrix $A$ can also be noisy. In this paper, we analyze the convergence of RK for noisy linear systems when the coefficient matrix, $A$, is corrupted with both additive and multiplicative noise, along with the noisy vector, $b$. In our analyses, the quantity $\tilde R=\| \tilde A^{\dagger} \|_2^2 \|\tilde A \|_F^2$ influences the convergence of RK, where $\tilde A$ represents a noisy version of $A$. We claim that our analysis is robust and realistically applicable, as we do not require information about the noiseless coefficient matrix, $A$, and considering different conditions on noise, we can control the convergence of RK. We substantiate our theoretical findings by performing comprehensive numerical experiments.

Learning Optimal Strategies for Temporal Tasks in Stochastic Games. (arXiv:2102.04307v3 [cs.AI] UPDATED)

Authors: Alper Kamil Bozkurt, Yu Wang, Michael M. Zavlanos, Miroslav Pajic

Synthesis from linear temporal logic (LTL) specifications provides assured controllers for systems operating in stochastic and potentially adversarial environments. Automatic synthesis tools, however, require a model of the environment to construct controllers. In this work, we introduce a model-free reinforcement learning (RL) approach to derive controllers from given LTL specifications even when the environment is completely unknown. We model the problem as a stochastic game (SG) between the controller and the adversarial environment; we then learn optimal control strategies that maximize the probability of satisfying the LTL specifications against the worst-case environment behavior. We first construct a product game using the deterministic parity automaton (DPA) translated from the given LTL specification. By deriving distinct rewards and discount factors from the acceptance condition of the DPA, we reduce the maximization of the worst-case probability of satisfying the LTL specification into the maximization of a discounted reward objective in the product game; this enables the use of model-free RL algorithms to learn an optimal controller strategy. To deal with the common scalability problems when the number of sets defining the acceptance condition of the DPA (usually referred as colors), is large, we propose a lazy color generation method where distinct rewards and discount factors are utilized only when needed, and an approximate method where the controller eventually focuses on only one color. In several case studies, we show that our approach is scalable to a wide range of LTL formulas, significantly outperforming existing methods for learning controllers from LTL specifications in SGs.

Simulation-Based Optimization of User Interfaces for Quality-Assuring Machine Learning Model Predictions. (arXiv:2104.01129v2 [cs.HC] UPDATED)

Authors: Yu Zhang, Martijn Tennekes, Tim de Jong, Lyana Curier, Bob Coecke, Min Chen

Quality-sensitive applications of machine learning (ML) require quality assurance (QA) by humans before the predictions of an ML model can be deployed. QA for ML (QA4ML) interfaces require users to view a large amount of data and perform many interactions to correct errors made by the ML model. An optimized user interface (UI) can significantly reduce interaction costs. While UI optimization can be informed by user studies evaluating design options, this approach is not scalable because there are typically numerous small variations that can affect the efficiency of a QA4ML interface. Hence, we propose using simulation to evaluate and aid the optimization of QA4ML interfaces. In particular, we focus on simulating the combined effects of human intelligence in initiating appropriate interaction commands and machine intelligence in providing algorithmic assistance for accelerating QA4ML processes. As QA4ML is usually labor-intensive, we use the simulated task completion time as the metric for UI optimization under different interface and algorithm setups. We demonstrate the usage of this UI design method in several QA4ML applications.

Combining Inductive and Deductive Reasoning for Query Answering over Incomplete Knowledge Graphs. (arXiv:2106.14052v2 [cs.AI] UPDATED)

Authors: Medina Andresel, Trung-Kien Tran, Csaba Domokos, Pasquale Minervini, Daria Stepanova

Current methods for embedding-based query answering over incomplete Knowledge Graphs (KGs) only focus on inductive reasoning, i.e., predicting answers by learning patterns from the data, and lack the complementary ability to do deductive reasoning, which requires the application of domain knowledge to infer further information. To address this shortcoming, we investigate the problem of incorporating ontologies into embedding-based query answering models by defining the task of embedding-based ontology-mediated query answering. We propose various integration strategies into prominent representatives of embedding models that involve (1) different ontology-driven data augmentation techniques and (2) adaptation of the loss function to enforce the ontology axioms. We design novel benchmarks for the considered task based on the LUBM and the NELL KGs and evaluate our methods on them. The achieved improvements in the setting that requires both inductive and deductive reasoning are from 20% to 55% in HITS@3.

Leveraging Image-based Generative Adversarial Networks for Time Series Generation. (arXiv:2112.08060v2 [cs.LG] UPDATED)

Authors: Justin Hellermann, Stefan Lessmann

Generative models for images have gained significant attention in computer vision and natural language processing due to their ability to generate realistic samples from complex data distributions. To leverage the advances of image-based generative models for the time series domain, we propose a two-dimensional image representation for time series, the Extended Intertemporal Return Plot (XIRP). Our approach captures the intertemporal time series dynamics in a scale-invariant and invertible way, reducing training time and improving sample quality. We benchmark synthetic XIRPs obtained by an off-the-shelf Wasserstein GAN with gradient penalty (WGAN-GP) to other image representations and models regarding similarity and predictive ability metrics. Our novel, validated image representation for time series consistently and significantly outperforms a state-of-the-art RNN-based generative model regarding predictive ability. Further, we introduce an improved stochastic inversion to substantially improve simulation quality regardless of the representation and provide the prospect of transfer potentials in other domains.

MGNN: Graph Neural Networks Inspired by Distance Geometry Problem. (arXiv:2201.12994v4 [cs.LG] UPDATED)

Authors: Guanyu Cui, Zhewei Wei

Graph Neural Networks (GNNs) have emerged as a prominent research topic in the field of machine learning. Existing GNN models are commonly categorized into two types: spectral GNNs, which are designed based on polynomial graph filters, and spatial GNNs, which utilize a message-passing scheme as the foundation of the model. For the expressive power and universality of spectral GNNs, a natural approach is to improve the design of basis functions for better approximation ability. As for spatial GNNs, models like Graph Isomorphism Networks (GIN) analyze their expressive power based on Graph Isomorphism Tests. Recently, there have been attempts to establish connections between spatial GNNs and geometric concepts like curvature and cellular sheaves, as well as physical phenomena like oscillators. However, despite the recent progress, there is still a lack of comprehensive analysis regarding the universality of spatial GNNs from the perspectives of geometry and physics. In this paper, we propose MetricGNN (MGNN), a spatial GNN model inspired by the congruent-insensitivity property of classifiers in the classification phase of GNNs. We demonstrate that a GNN model is universal in the spatial domain if it can generate embedding matrices that are congruent to any given embedding matrix. This property is closely related to the Distance Geometry Problem (DGP). Since DGP is an NP-Hard combinatorial optimization problem, we propose optimizing an energy function derived from spring networks and the Multi-Dimensional Scaling (MDS) problem. This approach also allows our model to handle both homophilic and heterophilic graphs. Finally, we propose employing the iteration method to optimize our energy function. We extensively evaluate the effectiveness of our model through experiments conducted on both synthetic and real-world datasets. Our code is available at: https://github.com/GuanyuCui/MGNN.

Neuronal diversity can improve machine learning for physics and beyond. (arXiv:2204.04348v3 [cs.LG] UPDATED)

Authors: Anshul Choudhary, Anil Radhakrishnan, John F. Lindner, Sudeshna Sinha, William L. Ditto

Diversity conveys advantages in nature, yet homogeneous neurons typically comprise the layers of artificial neural networks. Here we construct neural networks from neurons that learn their own activation functions, quickly diversify, and subsequently outperform their homogeneous counterparts on image classification and nonlinear regression tasks. Sub-networks instantiate the neurons, which meta-learn especially efficient sets of nonlinear responses. Examples include conventional neural networks classifying digits and forecasting a van der Pol oscillator and physics-informed Hamiltonian neural networks learning H\'enon-Heiles stellar orbits and the swing of a video recorded pendulum clock. Such \textit{learned diversity} provides examples of dynamical systems selecting diversity over uniformity and elucidates the role of diversity in natural and artificial systems.

Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks. (arXiv:2206.07741v2 [cs.LG] UPDATED)

Authors: Clemens JS Schaefer, Siddharth Joshi, Shan Li, Raul Blazquez

The large computing and memory cost of deep neural networks (DNNs) often precludes their use in resource-constrained devices. Quantizing the parameters and operations to lower bit-precision offers substantial memory and energy savings for neural network inference, facilitating the use of DNNs on edge computing platforms. Recent efforts at quantizing DNNs have employed a range of techniques encompassing progressive quantization, step-size adaptation, and gradient scaling. This paper proposes a new quantization approach for mixed precision convolutional neural networks (CNNs) targeting edge-computing. Our method establishes a new pareto frontier in model accuracy and memory footprint demonstrating a range of quantized models, delivering best-in-class accuracy below 4.3 MB of weights (wgts.) and activations (acts.). Our main contributions are: (i) hardware-aware heterogeneous differentiable quantization with tensor-sliced learned precision, (ii) targeted gradient modification for wgts. and acts. to mitigate quantization errors, and (iii) a multi-phase learning schedule to address instability in learning arising from updates to the learned quantizer and model parameters. We demonstrate the effectiveness of our techniques on the ImageNet dataset across a range of models including EfficientNet-Lite0 (e.g., 4.14MB of wgts. and acts. at 67.66% accuracy) and MobileNetV2 (e.g., 3.51MB wgts. and acts. at 65.39% accuracy).

0/1 Deep Neural Networks via Block Coordinate Descent. (arXiv:2206.09379v2 [cs.LG] UPDATED)

Authors: Hui Zhang, Shenglong Zhou, Geoffrey Ye Li, Naihua Xiu

The step function is one of the simplest and most natural activation functions for deep neural networks (DNNs). As it counts 1 for positive variables and 0 for others, its intrinsic characteristics (e.g., discontinuity and no viable information of subgradients) impede its development for several decades. Even if there is an impressive body of work on designing DNNs with continuous activation functions that can be deemed as surrogates of the step function, it is still in the possession of some advantageous properties, such as complete robustness to outliers and being capable of attaining the best learning-theoretic guarantee of predictive accuracy. Hence, in this paper, we aim to train DNNs with the step function used as an activation function (dubbed as 0/1 DNNs). We first reformulate 0/1 DNNs as an unconstrained optimization problem and then solve it by a block coordinate descend (BCD) method. Moreover, we acquire closed-form solutions for sub-problems of BCD as well as its convergence properties. Furthermore, we also integrate $\ell_{2,0}$-regularization into 0/1 DNN to accelerate the training process and compress the network scale. As a result, the proposed algorithm has a high performance on classifying MNIST and Fashion-MNIST datasets. As a result, the proposed algorithm has a desirable performance on classifying MNIST, FashionMNIST, Cifar10, and Cifar100 datasets.

Extending regionalization algorithms to explore spatial process heterogeneity. (arXiv:2206.09429v4 [stat.ME] UPDATED)

Authors: Hao Guo, Andre Python, Yu Liu

In spatial regression models, spatial heterogeneity may be considered with either continuous or discrete specifications. The latter is related to delineation of spatially connected regions with homogeneous relationships between variables (spatial regimes). Although various regionalization algorithms have been proposed and studied in the field of spatial analytics, methods to optimize spatial regimes have been largely unexplored. In this paper, we propose two new algorithms for spatial regime delineation, two-stage K-Models and Regional-K-Models. We also extend the classic Automatic Zoning Procedure to spatial regression context. The proposed algorithms are applied to a series of synthetic datasets and two real-world datasets. Results indicate that all three algorithms achieve superior or comparable performance to existing approaches, while the two-stage K-Models algorithm largely outperforms existing approaches on model fitting, region reconstruction, and coefficient estimation. Our work enriches the spatial analytics toolbox to explore spatial heterogeneous processes.

Visual correspondence-based explanations improve AI robustness and human-AI team accuracy. (arXiv:2208.00780v5 [cs.CV] UPDATED)

Authors: Giang Nguyen, Mohammad Reza Taesiri, Anh Nguyen

Explaining artificial intelligence (AI) predictions is increasingly important and even imperative in many high-stakes applications where humans are the ultimate decision-makers. In this work, we propose two novel architectures of self-interpretable image classifiers that first explain, and then predict (as opposed to post-hoc explanations) by harnessing the visual correspondences between a query image and exemplars. Our models consistently improve (by 1 to 4 points) on out-of-distribution (OOD) datasets while performing marginally worse (by 1 to 2 points) on in-distribution tests than ResNet-50 and a $k$-nearest neighbor classifier (kNN). Via a large-scale, human study on ImageNet and CUB, our correspondence-based explanations are found to be more useful to users than kNN explanations. Our explanations help users more accurately reject AI's wrong decisions than all other tested methods. Interestingly, for the first time, we show that it is possible to achieve complementary human-AI team accuracy (i.e., that is higher than either AI-alone or human-alone), in ImageNet and CUB image classification tasks.

GRASP: A Goodness-of-Fit Test for Classification Learning. (arXiv:2209.02064v2 [stat.ME] UPDATED)

Authors: Adel Javanmard, Mohammad Mehrabi

Performance of classifiers is often measured in terms of average accuracy on test data. Despite being a standard measure, average accuracy fails in characterizing the fit of the model to the underlying conditional law of labels given the features vector ($Y|X$), e.g. due to model misspecification, over fitting, and high-dimensionality. In this paper, we consider the fundamental problem of assessing the goodness-of-fit for a general binary classifier. Our framework does not make any parametric assumption on the conditional law $Y|X$, and treats that as a black box oracle model which can be accessed only through queries. We formulate the goodness-of-fit assessment problem as a tolerance hypothesis testing of the form \[ H_0: \mathbb{E}\Big[D_f\Big({\sf Bern}(\eta(X))\|{\sf Bern}(\hat{\eta}(X))\Big)\Big]\leq \tau\,, \] where $D_f$ represents an $f$-divergence function, and $\eta(x)$, $\hat{\eta}(x)$ respectively denote the true and an estimate likelihood for a feature vector $x$ admitting a positive label. We propose a novel test, called \grasp for testing $H_0$, which works in finite sample settings, no matter the features (distribution-free). We also propose model-X \grasp designed for model-X settings where the joint distribution of the features vector is known. Model-X \grasp uses this distributional information to achieve better power. We evaluate the performance of our tests through extensive numerical experiments.

Seeking Interpretability and Explainability in Binary Activated Neural Networks. (arXiv:2209.03450v2 [cs.LG] UPDATED)

Authors: Benjamin Leblanc, Pascal Germain

We study the use of binary activated neural networks as interpretable and explainable predictors in the context of regression tasks on tabular data; more specifically, we provide guarantees on their expressiveness, present an approach based on the efficient computation of SHAP values for quantifying the relative importance of the features, hidden neurons and even weights. As the model's simplicity is instrumental in achieving interpretability, we propose a greedy algorithm for building compact binary activated networks. This approach doesn't need to fix an architecture for the network in advance: it is built one layer at a time, one neuron at a time, leading to predictors that aren't needlessly complex for a given task.

Dynamical systems' based neural networks. (arXiv:2210.02373v2 [cs.LG] UPDATED)

Authors: Elena Celledoni, Davide Murari, Brynjulf Owren, Carola-Bibiane Schönlieb, Ferdia Sherry

Neural networks have gained much interest because of their effectiveness in many applications. However, their mathematical properties are generally not well understood. If there is some underlying geometric structure inherent to the data or to the function to approximate, it is often desirable to take this into account in the design of the neural network. In this work, we start with a non-autonomous ODE and build neural networks using a suitable, structure-preserving, numerical time-discretisation. The structure of the neural network is then inferred from the properties of the ODE vector field. Besides injecting more structure into the network architectures, this modelling procedure allows a better theoretical understanding of their behaviour. We present two universal approximation results and demonstrate how to impose some particular properties on the neural networks. A particular focus is on 1-Lipschitz architectures including layers that are not 1-Lipschitz. These networks are expressive and robust against adversarial attacks, as shown for the CIFAR-10 and CIFAR-100 datasets.

Hypernetwork approach to Bayesian MAML. (arXiv:2210.02796v2 [cs.LG] UPDATED)

Authors: Piotr Borycki, Piotr Kubacki, Marcin Przewięźlikowski, Tomasz Kuśmierczyk, Jacek Tabor, Przemysław Spurek

The main goal of Few-Shot learning algorithms is to enable learning from small amounts of data. One of the most popular and elegant Few-Shot learning approaches is Model-Agnostic Meta-Learning (MAML). The main idea behind this method is to learn the shared universal weights of a meta-model, which are then adapted for specific tasks. However, the method suffers from over-fitting and poorly quantifies uncertainty due to limited data size. Bayesian approaches could, in principle, alleviate these shortcomings by learning weight distributions in place of point-wise weights. Unfortunately, previous modifications of MAML are limited due to the simplicity of Gaussian posteriors, MAML-like gradient-based weight updates, or by the same structure enforced for universal and adapted weights.

In this paper, we propose a novel framework for Bayesian MAML called BayesianHMAML, which employs Hypernetworks for weight updates. It learns the universal weights point-wise, but a probabilistic structure is added when adapted for specific tasks. In such a framework, we can use simple Gaussian distributions or more complicated posteriors induced by Continuous Normalizing Flows.

Pre-Training Representations of Binary Code Using Contrastive Learning. (arXiv:2210.05102v2 [cs.SE] UPDATED)

Authors: Yifan Zhang, Chen Huang, Yueke Zhang, Kevin Cao, Scott Thomas Andersen, Huajie Shao, Kevin Leach, Yu Huang

Compiled software is delivered as executable binary code. Developers write source code to express the software semantics, but the compiler converts it to a binary format that the CPU can directly execute. Therefore, binary code analysis is critical to applications in reverse engineering and computer security tasks where source code is not available. However, unlike source code and natural language that contain rich semantic information, binary code is typically difficult for human engineers to understand and analyze. While existing work uses AI models to assist source code analysis, few studies have considered binary code. In this paper, we propose a COntrastive learning Model for Binary cOde Analysis, or COMBO, that incorporates source code and comment information into binary code during representation learning. Specifically, we present three components in COMBO: (1) a primary contrastive learning method for cold-start pre-training, (2) a simplex interpolation method to incorporate source code, comments, and binary code, and (3) an intermediate representation learning algorithm to provide binary code embeddings. Finally, we evaluate the effectiveness of the pre-trained representations produced by COMBO using three indicative downstream tasks relating to binary code: algorithmic functionality classification, binary code similarity, and vulnerability detection. Our experimental results show that COMBO facilitates representation learning of binary code visualized by distribution analysis, and improves the performance on all three downstream tasks by 5.45% on average compared to state-of-the-art large-scale language representation models. To the best of our knowledge, COMBO is the first language representation model that incorporates source code, binary code, and comments into contrastive code representation learning and unifies multiple tasks for binary code analysis.

Principled Pruning of Bayesian Neural Networks through Variational Free Energy Minimization. (arXiv:2210.09134v2 [cs.LG] UPDATED)

Authors: Jim Beckers, Bart van Erp, Ziyue Zhao, Kirill Kondrashov, Bert de Vries

Bayesian model reduction provides an efficient approach for comparing the performance of all nested sub-models of a model, without re-evaluating any of these sub-models. Until now, Bayesian model reduction has been applied mainly in the computational neuroscience community on simple models. In this paper, we formulate and apply Bayesian model reduction to perform principled pruning of Bayesian neural networks, based on variational free energy minimization. Direct application of Bayesian model reduction, however, gives rise to approximation errors. Therefore, a novel iterative pruning algorithm is presented to alleviate the problems arising with naive Bayesian model reduction, as supported experimentally on the publicly available UCI datasets for different inference algorithms. This novel parameter pruning scheme solves the shortcomings of current state-of-the-art pruning methods that are used by the signal processing community. The proposed approach has a clear stopping criterion and minimizes the same objective that is used during training. Next to these benefits, our experiments indicate better model performance in comparison to state-of-the-art pruning schemes.

Learning Melanocytic Cell Masks from Adjacent Stained Tissue. (arXiv:2211.00646v3 [q-bio.QM] UPDATED)

Authors: Mikio Tada, Ursula E. Lang, Iwei Yeh, Maria L. Wei, Michael J. Keiser

Melanoma is one of the most aggressive forms of skin cancer, causing a large proportion of skin cancer deaths. However, melanoma diagnoses by pathologists shows low interrater reliability. As melanoma is a cancer of the melanocyte, there is a clear need to develop a melanocytic cell segmentation tool that is agnostic to pathologist variability and automates pixel-level annotation. Gigapixel-level pathologist labeling, however, is impractical. Herein, we propose a means to train deep neural networks for melanocytic cell segmentation from hematoxylin and eosin (H&E) stained sections and paired immunohistochemistry (IHC) of adjacent tissue sections, achieving a mean IOU of 0.64 despite imperfect ground-truth labels.

Federated Adaptive Prompt Tuning for Multi-domain Collaborative Learning. (arXiv:2211.07864v2 [cs.LG] UPDATED)

Authors: Shangchao Su, Mingzhao Yang, Bin Li, Xiangyang Xue

Federated learning (FL) enables multiple clients to collaboratively train a global model without disclosing their data. Previous researches often require training the complete model parameters. However, the emergence of powerful pre-trained models makes it possible to achieve higher performance with fewer learnable parameters in FL. In this paper, we propose a federated adaptive prompt tuning algorithm, FedAPT, for multi-domain collaborative image classification with powerful foundation models, like CLIP. Compared with direct federated prompt tuning, our core idea is to adaptively unlock specific domain knowledge for each test sample in order to provide them with personalized prompts. To implement this idea, we design an adaptive prompt tuning module, which consists of a meta prompt, an adaptive network, and some keys. The server randomly generates a set of keys and assigns a unique key to each client. Then all clients cooperatively train the global adaptive network and meta prompt with the local datasets and the frozen keys. Ultimately, the global aggregation model can assign a personalized prompt to CLIP based on the domain features of each test sample. We perform extensive experiments on two multi-domain image classification datasets across two different settings - supervised and unsupervised. The results show that FedAPT can achieve better performance with less than 10\% of the number of parameters of the fully trained model, and the global model can perform well in diverse client domains simultaneously.

Sequential Informed Federated Unlearning: Efficient and Provable Client Unlearning in Federated Optimization. (arXiv:2211.11656v4 [cs.LG] UPDATED)

Authors: Yann Fraboni, Martin Van Waerebeke, Kevin Scaman, Richard Vidal, Laetitia Kameni, Marco Lorenzi

The aim of Machine Unlearning (MU) is to provide theoretical guarantees on the removal of the contribution of a given data point from a training procedure. Federated Unlearning (FU) consists in extending MU to unlearn a given client's contribution from a federated training routine. Current FU approaches are generally not scalable, and do not come with sound theoretical quantification of the effectiveness of unlearning. In this work we present Informed Federated Unlearning (IFU), a novel efficient and quantifiable FU approach. Upon unlearning request from a given client, IFU identifies the optimal FL iteration from which FL has to be reinitialized, with unlearning guarantees obtained through a randomized perturbation mechanism. The theory of IFU is also extended to account for sequential unlearning requests. Experimental results on different tasks and dataset show that IFU leads to more efficient unlearning procedures as compared to basic re-training and state-of-the-art FU approaches.

StyleGAN as a Utility-Preserving Face De-identification Method. (arXiv:2212.02611v2 [cs.CV] UPDATED)

Authors: Seyyed Mohammad Sadegh Moosavi Khorzooghi, Shirin Nilizadeh

Face de-identification methods have been proposed to preserve users' privacy by obscuring their faces. These methods, however, can degrade the quality of photos, and they usually do not preserve the utility of faces, i.e., their age, gender, pose, and facial expression. Recently, GANs, such as StyleGAN, have been proposed, which generate realistic, high-quality imaginary faces. In this paper, we investigate the use of StyleGAN in generating de-identified faces through style mixing. We examined this de-identification method for preserving utility and privacy by implementing several face detection, verification, and identification attacks and conducting a user study. The results from our extensive experiments, human evaluation, and comparison with two state-of-the-art methods, i.e., CIAGAN and DeepPrivacy, show that StyleGAN performs on par or better than these methods, preserving users' privacy and images' utility. In particular, the results of the machine learning-based experiments show that StyleGAN0-4 preserves utility better than CIAGAN and DeepPrivacy while preserving privacy at the same level. StyleGAN0-3 preserves utility at the same level while providing more privacy. In this paper, for the first time, we also performed a carefully designed user study to examine both privacy and utility-preserving properties of StyleGAN0-3, 0-4, and 0-5, as well as CIAGAN and DeepPrivacy from the human observers' perspectives. Our statistical tests showed that participants tend to verify and identify StyleGAN0-5 images more easily than DeepPrivacy images. All the methods but StyleGAN0-5 had significantly lower identification rates than CIAGAN. Regarding utility, as expected, StyleGAN0-5 performed significantly better in preserving some attributes. Among all methods, on average, participants believe gender has been preserved the most while naturalness has been preserved the least.

Invertible normalizing flow neural networks by JKO scheme. (arXiv:2212.14424v2 [stat.ML] UPDATED)

Authors: Chen Xu, Xiuyuan Cheng, Yao Xie

Normalizing flow is a class of deep generative models for efficient sampling and density estimation. In practice, the flow often appears as a chain of invertible neural network blocks; to facilitate training, existing works have regularized flow trajectories and designed special network architectures. The current paper develops a neural ODE flow network inspired by the Jordan-Kinderleherer-Otto (JKO) scheme, which allows efficient block-wise training of the residual blocks without sampling SDE trajectories or inner loops of score matching or variational learning. As the JKO scheme unfolds the dynamic of gradient flow, the proposed model naturally stacks residual network blocks one by one, reducing the memory load and difficulty in performing end-to-end deep flow network training. We also develop adaptive time reparameterization of the flow network with a progressive refinement of the trajectory in probability space, which improves the model training efficiency and accuracy in practice. Using numerical experiments with synthetic and real data, we show that the proposed JKO-iFlow model achieves similar or better performance in generating new samples compared with the existing flow and diffusion models at a significantly reduced computational and memory cost.

Point Cloud-based Proactive Link Quality Prediction for Millimeter-wave Communications. (arXiv:2301.00752v3 [cs.NI] UPDATED)

Authors: Shoki Ohta, Takayuki Nishio, Riichi Kudo, Kahoko Takahashi, Hisashi Nagata

This study demonstrates the feasibility of point cloud-based proactive link quality prediction for millimeter-wave (mmWave) communications. Previous studies have proposed machine learning-based methods to predict received signal strength for future time periods using time series of depth images to mitigate the line-of-sight (LOS) path blockage by pedestrians in mmWave communication. However, these image-based methods have limited applicability due to privacy concerns as camera images may contain sensitive information. This study proposes a point cloud-based method for mmWave link quality prediction and demonstrates its feasibility through experiments. Point clouds represent three-dimensional (3D) spaces as a set of points and are sparser and less likely to contain sensitive information than camera images. Additionally, point clouds provide 3D position and motion information, which is necessary for understanding the radio propagation environment involving pedestrians. This study designs the mmWave link quality prediction method and conducts realistic indoor experiments, where the link quality fluctuates significantly due to human blockage, using commercially available IEEE 802.11ad-based 60 GHz wireless LAN devices and Kinect v2 RGB-D camera and Velodyne VLP-16 light detection and ranging (LiDAR) for point cloud acquisition. The experimental results showed that our proposed method can predict future large attenuation of mmWave received signal strength and throughput induced by the LOS path blockage by pedestrians with comparable or superior accuracy to image-based prediction methods. Hence, our point cloud-based method can serve as a viable alternative to image-based methods.

Transformers Meet Directed Graphs. (arXiv:2302.00049v3 [cs.LG] UPDATED)

Authors: Simon Geisler, Yujia Li, Daniel Mankowitz, Ali Taylan Cemgil, Stephan Günnemann, Cosmin Paduraru

Transformers were originally proposed as a sequence-to-sequence model for text but have become vital for a wide range of modalities, including images, audio, video, and undirected graphs. However, transformers for directed graphs are a surprisingly underexplored topic, despite their applicability to ubiquitous domains, including source code and logic circuits. In this work, we propose two direction- and structure-aware positional encodings for directed graphs: (1) the eigenvectors of the Magnetic Laplacian - a direction-aware generalization of the combinatorial Laplacian; (2) directional random walk encodings. Empirically, we show that the extra directionality information is useful in various downstream tasks, including correctness testing of sorting networks and source code understanding. Together with a data-flow-centric graph construction, our model outperforms the prior state of the art on the Open Graph Benchmark Code2 relatively by 14.7%.

System identification of neural systems: If we got it right, would we know?. (arXiv:2302.06677v2 [q-bio.NC] UPDATED)

Authors: Yena Han, Tomaso Poggio, Brian Cheung

Artificial neural networks are being proposed as models of parts of the brain. The networks are compared to recordings of biological neurons, and good performance in reproducing neural responses is considered to support the model's validity. A key question is how much this system identification approach tells us about brain computation. Does it validate one model architecture over another? We evaluate the most commonly used comparison techniques, such as a linear encoding model and centered kernel alignment, to correctly identify a model by replacing brain recordings with known ground truth models. System identification performance is quite variable; it also depends significantly on factors independent of the ground truth architecture, such as stimuli images. In addition, we show the limitations of using functional similarity scores in identifying higher-level architectural motifs.

On-Demand Communication for Asynchronous Multi-Agent Bandits. (arXiv:2302.07446v2 [cs.LG] UPDATED)

Authors: Yu-Zhen Janice Chen, Lin Yang, Xuchuang Wang, Xutong Liu, Mohammad Hajiesmaili, John C.S. Lui, Don Towsley

This paper studies a cooperative multi-agent multi-armed stochastic bandit problem where agents operate asynchronously -- agent pull times and rates are unknown, irregular, and heterogeneous -- and face the same instance of a K-armed bandit problem. Agents can share reward information to speed up the learning process at additional communication costs. We propose ODC, an on-demand communication protocol that tailors the communication of each pair of agents based on their empirical pull times. ODC is efficient when the pull times of agents are highly heterogeneous, and its communication complexity depends on the empirical pull times of agents. ODC is a generic protocol that can be integrated into most cooperative bandit algorithms without degrading their performance. We then incorporate ODC into the natural extensions of UCB and AAE algorithms and propose two communication-efficient cooperative algorithms. Our analysis shows that both algorithms are near-optimal in regret.

Metropolitan Segment Traffic Speeds from Massive Floating Car Data in 10 Cities. (arXiv:2302.08761v3 [cs.LG] UPDATED)

Authors: Moritz Neun, Christian Eichenberger, Yanan Xin, Cheng Fu, Nina Wiedemann, Henry Martin, Martin Tomko, Lukas Ambühl, Luca Hermes, Michael Kopp

Traffic analysis is crucial for urban operations and planning, while the availability of dense urban traffic data beyond loop detectors is still scarce. We present a large-scale floating vehicle dataset of per-street segment traffic information, Metropolitan Segment Traffic Speeds from Massive Floating Car Data in 10 Cities (MeTS-10), available for 10 global cities with a 15-minute resolution for collection periods ranging between 108 and 361 days in 2019-2021 and covering more than 1500 square kilometers per metropolitan area. MeTS-10 features traffic speed information at all street levels from main arterials to local streets for Antwerp, Bangkok, Barcelona, Berlin, Chicago, Istanbul, London, Madrid, Melbourne and Moscow. The dataset leverages the industrial-scale floating vehicle Traffic4cast data with speeds and vehicle counts provided in a privacy-preserving spatio-temporal aggregation. We detail the efficient matching approach mapping the data to the OpenStreetMap road graph. We evaluate the dataset by comparing it with publicly available stationary vehicle detector data (for Berlin, London, and Madrid) and the Uber traffic speed dataset (for Barcelona, Berlin, and London). The comparison highlights the differences across datasets in spatio-temporal coverage and variations in the reported traffic caused by the binning method. MeTS-10 enables novel, city-wide analysis of mobility and traffic patterns for ten major world cities, overcoming current limitations of spatially sparse vehicle detector data. The large spatial and temporal coverage offers an opportunity for joining the MeTS-10 with other datasets, such as traffic surveys in traffic planning studies or vehicle detector data in traffic control settings.

Flexible Phase Dynamics for Bio-Plausible Contrastive Learning. (arXiv:2302.12431v2 [cs.LG] UPDATED)

Authors: Ezekiel Williams, Colin Bredenberg, Guillaume Lajoie

Many learning algorithms used as normative models in neuroscience or as candidate approaches for learning on neuromorphic chips learn by contrasting one set of network states with another. These Contrastive Learning (CL) algorithms are traditionally implemented with rigid, temporally non-local, and periodic learning dynamics that could limit the range of physical systems capable of harnessing CL. In this study, we build on recent work exploring how CL might be implemented by biological or neurmorphic systems and show that this form of learning can be made temporally local, and can still function even if many of the dynamical requirements of standard training procedures are relaxed. Thanks to a set of general theorems corroborated by numerical experiments across several CL models, our results provide theoretical foundations for the study and development of CL methods for biological and neuromorphic neural networks.

Fair Attribute Completion on Graph with Missing Attributes. (arXiv:2302.12977v3 [cs.LG] UPDATED)

Authors: Dongliang Guo, Zhixuan Chu, Sheng Li

Tackling unfairness in graph learning models is a challenging task, as the unfairness issues on graphs involve both attributes and topological structures. Existing work on fair graph learning simply assumes that attributes of all nodes are available for model training and then makes fair predictions. In practice, however, the attributes of some nodes might not be accessible due to missing data or privacy concerns, which makes fair graph learning even more challenging. In this paper, we propose FairAC, a fair attribute completion method, to complement missing information and learn fair node embeddings for graphs with missing attributes. FairAC adopts an attention mechanism to deal with the attribute missing problem and meanwhile, it mitigates two types of unfairness, i.e., feature unfairness from attributes and topological unfairness due to attribute completion. FairAC can work on various types of homogeneous graphs and generate fair embeddings for them and thus can be applied to most downstream tasks to improve their fairness performance. To our best knowledge, FairAC is the first method that jointly addresses the graph attribution completion and graph unfairness problems. Experimental results on benchmark datasets show that our method achieves better fairness performance with less sacrifice in accuracy, compared with the state-of-the-art methods of fair graph learning. Code is available at: https://github.com/donglgcn/FairAC.

Collage Diffusion. (arXiv:2303.00262v2 [cs.CV] UPDATED)

Authors: Vishnu Sarukkai, Linden Li, Arden Ma, Christopher Ré, Kayvon Fatahalian

We seek to give users precise control over diffusion-based image generation by modeling complex scenes as sequences of layers, which define the desired spatial arrangement and visual attributes of objects in the scene. Collage Diffusion harmonizes the input layers to make objects fit together -- the key challenge involves minimizing changes in the positions and key visual attributes of the input layers while allowing other attributes to change in the harmonization process. We ensure that objects are generated in the correct locations by modifying text-image cross-attention with the layers' alpha masks. We preserve key visual attributes of input layers by learning specialized text representations per layer and by extending ControlNet to operate on layers. Layer input allows users to control the extent of image harmonization on a per-object basis, and users can even iteratively edit individual objects in generated images while keeping other objects fixed. By leveraging the rich information present in layer input, Collage Diffusion generates globally harmonized images that maintain desired object characteristics better than prior approaches.

StyleDiff: Attribute Comparison Between Unlabeled Datasets in Latent Disentangled Space. (arXiv:2303.05102v2 [stat.ML] UPDATED)

Authors: Keisuke Kawano, Takuro Kutsuna, Ryoko Tokuhisa, Akihiro Nakamura, Yasushi Esaki

One major challenge in machine learning applications is coping with mismatches between the datasets used in the development and those obtained in real-world applications. These mismatches may lead to inaccurate predictions and errors, resulting in poor product quality and unreliable systems. In this study, we propose StyleDiff to inform developers of the differences between the two datasets for the steady development of machine learning systems. Using disentangled image spaces obtained from recently proposed generative models, StyleDiff compares the two datasets by focusing on attributes in the images and provides an easy-to-understand analysis of the differences between the datasets. The proposed StyleDiff performs in $O (d N\log N)$, where $N$ is the size of the datasets and $d$ is the number of attributes, enabling the application to large datasets. We demonstrate that StyleDiff accurately detects differences between datasets and presents them in an understandable format using, for example, driving scenes datasets.

Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning. (arXiv:2303.08566v2 [cs.CV] UPDATED)

Authors: Haoyu He, Jianfei Cai, Jing Zhang, Dacheng Tao, Bohan Zhuang

Visual Parameter-Efficient Fine-Tuning (PEFT) has become a powerful alternative for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only tunes a small number of parameters while freezing the vast majority ones to ease storage burden and optimization difficulty. However, existing PEFT methods introduce trainable parameters to the same positions across different tasks depending solely on human heuristics and neglect the domain gaps. To this end, we study where to introduce and how to allocate trainable parameters by proposing a novel Sensitivity-aware visual Parameter-efficient fine-Tuning (SPT) scheme, which adaptively allocates trainable parameters to task-specific important positions given a desired tunable parameter budget. Specifically, our SPT first quickly identifies the sensitive parameters that require tuning for a given task in a data-dependent way. Next, our SPT further boosts the representational capability for the weight matrices whose number of sensitive parameters exceeds a pre-defined threshold by utilizing existing structured tuning methods, e.g., LoRA [23] or Adapter [22], to replace directly tuning the selected sensitive parameters (unstructured tuning) under the budget. Extensive experiments on a wide range of downstream recognition tasks show that our SPT is complementary to the existing PEFT methods and largely boosts their performance, e.g., SPT improves Adapter with supervised pre-trained ViT-B/16 backbone by 4.2% and 1.4% mean Top-1 accuracy, reaching SOTA performance on FGVC and VTAB-1k benchmarks, respectively. Source code is at https://github.com/ziplab/SPT

Inferring Traffic Models in Terminal Airspace from Flight Tracks and Procedures. (arXiv:2303.09981v2 [cs.LG] UPDATED)

Authors: Soyeon Jung, Mykel J. Kochenderfer

Realistic aircraft trajectory models are useful in the design and validation of air traffic management (ATM) systems. Models of aircraft operated under instrument flight rules (IFR) require capturing the variability inherent in how aircraft follow standard flight procedures. The variability in aircraft behavior varies among flight stages. In this paper, we propose a probabilistic model that can learn the variability from the procedural data and flight tracks collected from radar surveillance data. For each segment, a Gaussian mixture model is used to learn the deviations of aircraft trajectories from their procedures. Given new procedures, we can generate synthetic trajectories by sampling a series of deviations from the trained Gaussian distributions and reconstructing the aircraft trajectory using the deviations and the procedures. We extend this method to capture pairwise correlations between aircraft and show how a pairwise model can be used to generate traffic involving an arbitrary number of aircraft. We demonstrate the proposed models on the arrival tracks and procedures of the John F. Kennedy International Airport. The distributional similarity between the original and the synthetic trajectory dataset was evaluated using the Jensen-Shannon divergence between the empirical distributions of different variables. We also provide qualitative analyses of the synthetic trajectories generated from the models.

DR.CPO: Diversified and Realistic 3D Augmentation via Iterative Construction, Random Placement, and HPR Occlusion. (arXiv:2303.12743v4 [cs.CV] UPDATED)

Authors: Jungwook Shin, Jaeill Kim, Kyungeun Lee, Hyunghun Cho, Wonjong Rhee

In autonomous driving, data augmentation is commonly used for improving 3D object detection. The most basic methods include insertion of copied objects and rotation and scaling of the entire training frame. Numerous variants have been developed as well. The existing methods, however, are considerably limited when compared to the variety of the real world possibilities. In this work, we develop a diversified and realistic augmentation method that can flexibly construct a whole-body object, freely locate and rotate the object, and apply self-occlusion and external-occlusion accordingly. To improve the diversity of the whole-body object construction, we develop an iterative method that stochastically combines multiple objects observed from the real world into a single object. Unlike the existing augmentation methods, the constructed objects can be randomly located and rotated in the training frame because proper occlusions can be reflected to the whole-body objects in the final step. Finally, proper self-occlusion at each local object level and external-occlusion at the global frame level are applied using the Hidden Point Removal (HPR) algorithm that is computationally efficient. HPR is also used for adaptively controlling the point density of each object according to the object's distance from the LiDAR. Experiment results show that the proposed DR.CPO algorithm is data-efficient and model-agnostic without incurring any computational overhead. Also, DR.CPO can improve mAP performance by 2.08% when compared to the best 3D detection result known for KITTI dataset. The code is available at https://github.com/SNU-DRL/DRCPO.git

Backpropagation through Back Substitution with a Backslash. (arXiv:2303.15449v2 [math.NA] UPDATED)

Authors: Alan Edelman, Ekin Akyurek, Yuyang Wang

We present a linear algebra formulation of backpropagation which allows the calculation of gradients by using a generically written ``backslash'' or Gaussian elimination on triangular systems of equations. Generally, the matrix elements are operators. This paper has three contributions: (i) it is of intellectual value to replace traditional treatments of automatic differentiation with a (left acting) operator theoretic, graph-based approach; (ii) operators can be readily placed in matrices in software in programming languages such as Julia as an implementation option; (iii) we introduce a novel notation, ``transpose dot'' operator ``$\{\}^{T_\bullet}$'' that allows for the reversal of operators.

We further demonstrate the elegance of the operators approach in a suitable programming language consisting of generic linear algebra operators such as Julia \cite{bezanson2017julia}, and that it is possible to realize this abstraction in code. Our implementation shows how generic linear algebra can allow operators as elements of matrices. In contrast to ``operator overloading,'' where backslash would normally have to be rewritten to take advantage of operators, with ``generic programming'' there is no such need.

Knowledge Enhanced Graph Neural Networks for Graph Completion. (arXiv:2303.15487v3 [cs.AI] UPDATED)

Authors: Luisa Werner (TYREX, UGA), Nabil Layaïda (TYREX), Pierre Genevès (CNRS, TYREX), Sarah Chlyah (TYREX)

Graph data is omnipresent and has a wide variety of applications, such as in natural science, social networks, or the semantic web. However, while being rich in information, graphs are often noisy and incomplete. As a result, graph completion tasks, such as node classification or link prediction, have gained attention. On one hand, neural methods, such as graph neural networks, have proven to be robust tools for learning rich representations of noisy graphs. On the other hand, symbolic methods enable exact reasoning on graphs.We propose Knowledge Enhanced Graph Neural Networks (KeGNN), a neuro-symbolic framework for graph completion that combines both paradigms as it allows for the integration of prior knowledge into a graph neural network model.Essentially, KeGNN consists of a graph neural network as a base upon which knowledge enhancement layers are stacked with the goal of refining predictions with respect to prior knowledge.We instantiate KeGNN in conjunction with two state-of-the-art graph neural networks, Graph Convolutional Networks and Graph Attention Networks, and evaluate KeGNN on multiple benchmark datasets for node classification.

Expressive Text-to-Image Generation with Rich Text. (arXiv:2304.06720v2 [cs.CV] UPDATED)

Authors: Songwei Ge, Taesung Park, Jun-Yan Zhu, Jia-Bin Huang

Plain text has become a prevalent interface for text-to-image synthesis. However, its limited customization options hinder users from accurately describing desired outputs. For example, plain text makes it hard to specify continuous quantities, such as the precise RGB color value or importance of each word. Furthermore, creating detailed text prompts for complex scenes is tedious for humans to write and challenging for text encoders to interpret. To address these challenges, we propose using a rich-text editor supporting formats such as font style, size, color, and footnote. We extract each word's attributes from rich text to enable local style control, explicit token reweighting, precise color rendering, and detailed region synthesis. We achieve these capabilities through a region-based diffusion process. We first obtain each word's region based on attention maps of a diffusion process using plain text. For each region, we enforce its text attributes by creating region-specific detailed prompts and applying region-specific guidance, and maintain its fidelity against plain-text generation through region-based injections. We present various examples of image generation from rich text and demonstrate that our method outperforms strong baselines with quantitative evaluations.

From Chaos Comes Order: Ordering Event Representations for Object Recognition and Detection. (arXiv:2304.13455v4 [cs.CV] UPDATED)

Authors: Nikola Zubić, Daniel Gehrig, Mathias Gehrig, Davide Scaramuzza

Today, state-of-the-art deep neural networks that process events first convert them into dense, grid-like input representations before using an off-the-shelf network. However, selecting the appropriate representation for the task traditionally requires training a neural network for each representation and selecting the best one based on the validation score, which is very time-consuming. This work eliminates this bottleneck by selecting representations based on the Gromov-Wasserstein Discrepancy (GWD) between raw events and their representation. It is about 200 times faster to compute than training a neural network and preserves the task performance ranking of event representations across multiple representations, network backbones, datasets, and tasks. Thus finding representations with high task scores is equivalent to finding representations with a low GWD. We use this insight to, for the first time, perform a hyperparameter search on a large family of event representations, revealing new and powerful representations that exceed the state-of-the-art. Our optimized representations outperform existing representations by 1.7 mAP on the 1 Mpx dataset and 0.3 mAP on the Gen1 dataset, two established object detection benchmarks, and reach a 3.8% higher classification score on the mini N-ImageNet benchmark. Moreover, we outperform state-of-the-art by 2.1 mAP on Gen1 and state-of-the-art feed-forward methods by 6.0 mAP on the 1 Mpx datasets. This work opens a new unexplored field of explicit representation optimization for event-based learning.

Transformer-based interpretable multi-modal data fusion for skin lesion classification. (arXiv:2304.14505v2 [eess.IV] UPDATED)

Authors: Theodor Cheslerean-Boghiu, Melia-Evelina Fleischmann, Theresa Willem, Tobias Lasser

A lot of deep learning (DL) research these days is mainly focused on improving quantitative metrics regardless of other factors. In human-centered applications, like skin lesion classification in dermatology, DL-driven clinical decision support systems are still in their infancy due to the limited transparency of their decision-making process. Moreover, the lack of procedures that can explain the behavior of trained DL algorithms leads to almost no trust from clinical physicians. To diagnose skin lesions, dermatologists rely on visual assessment of the disease and the data gathered from the patient's anamnesis. Data-driven algorithms dealing with multi-modal data are limited by the separation of feature-level and decision-level fusion procedures required by convolutional architectures. To address this issue, we enable single-stage multi-modal data fusion via the attention mechanism of transformer-based architectures to aid in diagnosing skin diseases. Our method beats other state-of-the-art single- and multi-modal DL architectures in image-rich and patient-data-rich environments. Additionally, the choice of the architecture enables native interpretability support for the classification task both in the image and metadata domain with no additional modifications necessary.

A Stable and Scalable Method for Solving Initial Value PDEs with Neural Networks. (arXiv:2304.14994v2 [cs.LG] UPDATED)

Authors: Marc Finzi, Andres Potapczynski, Matthew Choptuik, Andrew Gordon Wilson

Unlike conventional grid and mesh based methods for solving partial differential equations (PDEs), neural networks have the potential to break the curse of dimensionality, providing approximate solutions to problems where using classical solvers is difficult or impossible. While global minimization of the PDE residual over the network parameters works well for boundary value problems, catastrophic forgetting impairs the applicability of this approach to initial value problems (IVPs). In an alternative local-in-time approach, the optimization problem can be converted into an ordinary differential equation (ODE) on the network parameters and the solution propagated forward in time; however, we demonstrate that current methods based on this approach suffer from two key issues. First, following the ODE produces an uncontrolled growth in the conditioning of the problem, ultimately leading to unacceptably large numerical errors. Second, as the ODE methods scale cubically with the number of model parameters, they are restricted to small neural networks, significantly limiting their ability to represent intricate PDE initial conditions and solutions. Building on these insights, we develop Neural IVP, an ODE based IVP solver which prevents the network from getting ill-conditioned and runs in time linear in the number of parameters, enabling us to evolve the dynamics of challenging PDEs with neural networks.

When Deep Learning Meets Polyhedral Theory: A Survey. (arXiv:2305.00241v2 [math.OC] UPDATED)

Authors: Joey Huchette, Gonzalo Muñoz, Thiago Serra, Calvin Tsay

In the past decade, deep learning became the prevalent methodology for predictive modeling thanks to the remarkable accuracy of deep neural networks in tasks such as computer vision and natural language processing. Meanwhile, the structure of neural networks converged back to simpler representations based on piecewise constant and piecewise linear functions such as the Rectified Linear Unit (ReLU), which became the most commonly used type of activation function in neural networks. That made certain types of network structure $\unicode{x2014}$such as the typical fully-connected feedforward neural network$\unicode{x2014}$ amenable to analysis through polyhedral theory and to the application of methodologies such as Linear Programming (LP) and Mixed-Integer Linear Programming (MILP) for a variety of purposes. In this paper, we survey the main topics emerging from this fast-paced area of work, which bring a fresh perspective to understanding neural networks in more detail as well as to applying linear optimization techniques to train, verify, and reduce the size of such networks.

MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation. (arXiv:2305.08396v4 [eess.IV] UPDATED)

Authors: Abdul Rehman Khan, Asifullah Khan

In this work, we present MaxViT-UNet, an Encoder-Decoder based hybrid vision transformer (CNN-Transformer) for medical image segmentation. The proposed Hybrid Decoder, based on MaxViT-block, is designed to harness the power of both the convolution and self-attention mechanisms at each decoding stage with a nominal memory and computational burden. The inclusion of multi-axis self-attention, within each decoder stage, significantly enhances the discriminating capacity between the object and background regions, thereby helping in improving the segmentation efficiency. In the Hybrid Decoder block, the fusion process commences by integrating the upsampled lower-level decoder features, obtained through transpose convolution, with the skip-connection features derived from the hybrid encoder. Subsequently, the fused features undergo refinement through the utilization of a multi-axis attention mechanism. The proposed decoder block is repeated multiple times to progressively segment the nuclei regions. Experimental results on MoNuSeg18 and MoNuSAC20 dataset demonstrates the effectiveness of the proposed technique. Our MaxViT-UNet outperformed the previous CNN-based (UNet) and Transformer-based (Swin-UNet) techniques by a considerable margin on both of the standard datasets. The following github (https://github.com/PRLAB21/MaxViT-UNet) contains the implementation and trained weights.

Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models. (arXiv:2305.10474v2 [cs.CV] UPDATED)

Authors: Songwei Ge, Seungjun Nah, Guilin Liu, Tyler Poon, Andrew Tao, Bryan Catanzaro, David Jacobs, Jia-Bin Huang, Ming-Yu Liu, Yogesh Balaji

Despite tremendous progress in generating high-quality images using diffusion models, synthesizing a sequence of animated frames that are both photorealistic and temporally coherent is still in its infancy. While off-the-shelf billion-scale datasets for image generation are available, collecting similar video data of the same scale is still challenging. Also, training a video diffusion model is computationally much more expensive than its image counterpart. In this work, we explore finetuning a pretrained image diffusion model with video data as a practical solution for the video synthesis task. We find that naively extending the image noise prior to video noise prior in video diffusion leads to sub-optimal performance. Our carefully designed video noise prior leads to substantially better performance. Extensive experimental validation shows that our model, Preserve Your Own Correlation (PYoCo), attains SOTA zero-shot text-to-video results on the UCF-101 and MSR-VTT benchmarks. It also achieves SOTA video generation quality on the small-scale UCF-101 benchmark with a $10\times$ smaller model using significantly less computation than the prior art.

pTSE: A Multi-model Ensemble Method for Probabilistic Time Series Forecasting. (arXiv:2305.11304v2 [cs.LG] UPDATED)

Authors: Yunyi Zhou, Zhixuan Chu, Yijia Ruan, Ge Jin, Yuchen Huang, Sheng Li

Various probabilistic time series forecasting models have sprung up and shown remarkably good performance. However, the choice of model highly relies on the characteristics of the input time series and the fixed distribution that the model is based on. Due to the fact that the probability distributions cannot be averaged over different models straightforwardly, the current time series model ensemble methods cannot be directly applied to improve the robustness and accuracy of forecasting. To address this issue, we propose pTSE, a multi-model distribution ensemble method for probabilistic forecasting based on Hidden Markov Model (HMM). pTSE only takes off-the-shelf outputs from member models without requiring further information about each model. Besides, we provide a complete theoretical analysis of pTSE to prove that the empirical distribution of time series subject to an HMM will converge to the stationary distribution almost surely. Experiments on benchmarks show the superiority of pTSE overall member models and competitive ensemble methods.

Generative Sliced MMD Flows with Riesz Kernels. (arXiv:2305.11463v2 [cs.LG] UPDATED)

Authors: Johannes Hertrich, Christian Wald, Fabian Altekrüger, Paul Hagemann

Maximum mean discrepancy (MMD) flows suffer from high computational costs in large scale computations. In this paper, we show that MMD flows with Riesz kernels $K(x,y) = - \Vert x-y\Vert^r$, $r \in (0,2)$ have exceptional properties which allow their efficient computation. We prove that the MMD of Riesz kernels coincides with the MMD of their sliced version. As a consequence, the computation of gradients of MMDs can be performed in the one-dimensional setting. Here, for $r=1$, a simple sorting algorithm can be applied to reduce the complexity from $O(MN+N^2)$ to $O((M+N)\log(M+N))$ for two measures with $M$ and $N$ support points. As another interesting follow-up result, the MMD of compactly supported measures can be estimated from above and below by the Wasserstein-1 distance. For the implementations we approximate the gradient of the sliced MMD by using only a finite number $P$ of slices. We show that the resulting error has complexity $O(\sqrt{d/P})$, where $d$ is the data dimension. These results enable us to train generative models by approximating MMD gradient flows by neural networks even for image applications. We demonstrate the efficiency of our model by image generation on MNIST, FashionMNIST and CIFAR10.

Dynamic Data Augmentation via MCTS for Prostate MRI Segmentation. (arXiv:2305.15777v2 [eess.IV] UPDATED)

Authors: Xinyue Xu, Yuhan Hsi, Haonan Wang, Xiaomeng Li

Medical image data are often limited due to the expensive acquisition and annotation process. Hence, training a deep-learning model with only raw data can easily lead to overfitting. One solution to this problem is to augment the raw data with various transformations, improving the model's ability to generalize to new data. However, manually configuring a generic augmentation combination and parameters for different datasets is non-trivial due to inconsistent acquisition approaches and data distributions. Therefore, automatic data augmentation is proposed to learn favorable augmentation strategies for different datasets while incurring large GPU overhead. To this end, we present a novel method, called Dynamic Data Augmentation (DDAug), which is efficient and has negligible computation cost. Our DDAug develops a hierarchical tree structure to represent various augmentations and utilizes an efficient Monte-Carlo tree searching algorithm to update, prune, and sample the tree. As a result, the augmentation pipeline can be optimized for each dataset automatically. Experiments on multiple Prostate MRI datasets show that our method outperforms the current state-of-the-art data augmentation strategies.

Knowledge Graph Embeddings in the Biomedical Domain: Are They Useful? A Look at Link Prediction, Rule Learning, and Downstream Polypharmacy Tasks. (arXiv:2305.19979v2 [cs.LG] UPDATED)

Authors: Aryo Pradipta Gema, Dominik Grabarczyk, Wolf De Wulf, Piyush Borole, Javier Antonio Alfaro, Pasquale Minervini, Antonio Vergari, Ajitha Rajan

Knowledge graphs are powerful tools for representing and organising complex biomedical data. Several knowledge graph embedding algorithms have been proposed to learn from and complete knowledge graphs. However, a recent study demonstrates the limited efficacy of these embedding algorithms when applied to biomedical knowledge graphs, raising the question of whether knowledge graph embeddings have limitations in biomedical settings. This study aims to apply state-of-the-art knowledge graph embedding models in the context of a recent biomedical knowledge graph, BioKG, and evaluate their performance and potential downstream uses. We achieve a three-fold improvement in terms of performance based on the HITS@10 score over previous work on the same biomedical knowledge graph. Additionally, we provide interpretable predictions through a rule-based method. We demonstrate that knowledge graph embedding models are applicable in practice by evaluating the best-performing model on four tasks that represent real-life polypharmacy situations. Results suggest that knowledge learnt from large biomedical knowledge graphs can be transferred to such downstream use cases. Our code is available at https://github.com/aryopg/biokge.

Mixed-type Distance Shrinkage and Selection for Clustering via Kernel Metric Learning. (arXiv:2306.01890v2 [cs.LG] UPDATED)

Authors: Jesse S. Ghashti, John R. J. Thompson

Distance-based clustering and classification are widely used in various fields to group mixed numeric and categorical data. In many algorithms, a predefined distance measurement is used to cluster data points based on their dissimilarity. While there exist numerous distance-based measures for data with pure numerical attributes and several ordered and unordered categorical metrics, an efficient and accurate distance for mixed-type data that utilizes the continuous and discrete properties simulatenously is an open problem. Many metrics convert numerical attributes to categorical ones or vice versa. They handle the data points as a single attribute type or calculate a distance between each attribute separately and add them up. We propose a metric called KDSUM that uses mixed kernels to measure dissimilarity, with cross-validated optimal bandwidth selection. We demonstrate that KDSUM is a shrinkage method from existing mixed-type metrics to a uniform dissimilarity metric, and improves clustering accuracy when utilized in existing distance-based clustering algorithms on simulated and real-world datasets containing continuous-only, categorical-only, and mixed-type data.

Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML. (arXiv:2306.05109v2 [cs.LG] UPDATED)

Authors: Robin van de Water, Hendrik Schmidt, Paul Elbers, Patrick Thoral, Bert Arnrich, Patrick Rockenschaub

Medical applications of machine learning (ML) have experienced a surge in popularity in recent years. The intensive care unit (ICU) is a natural habitat for ML given the abundance of available data from electronic health records. Models have been proposed to address numerous ICU prediction tasks like the early detection of complications. While authors frequently report state-of-the-art performance, it is challenging to verify claims of superiority. Datasets and code are not always published, and cohort definitions, preprocessing pipelines, and training setups are difficult to reproduce. This work introduces Yet Another ICU Benchmark (YAIB), a modular framework that allows researchers to define reproducible and comparable clinical ML experiments; we offer an end-to-end solution from cohort definition to model evaluation. The framework natively supports most open-access ICU datasets (MIMIC III/IV, eICU, HiRID, AUMCdb) and is easily adaptable to future ICU datasets. Combined with a transparent preprocessing pipeline and extensible training code for multiple ML and deep learning models, YAIB enables unified model development. Our benchmark comes with five predefined established prediction tasks (mortality, acute kidney injury, sepsis, kidney function, and length of stay) developed in collaboration with clinicians. Adding further tasks is straightforward by design. Using YAIB, we demonstrate that the choice of dataset, cohort definition, and preprocessing have a major impact on the prediction performance - often more so than model class - indicating an urgent need for YAIB as a holistic benchmarking tool. We provide our work to the clinical ML community to accelerate method development and enable real-world clinical implementations. Software Repository: https://github.com/rvandewater/YAIB.

The Role of Diverse Replay for Generalisation in Reinforcement Learning. (arXiv:2306.05727v2 [cs.LG] UPDATED)

Authors: Max Weltevrede, Matthijs T.J. Spaan, Wendelin Böhmer

In reinforcement learning (RL), key components of many algorithms are the exploration strategy and replay buffer. These strategies regulate what environment data is collected and trained on and have been extensively studied in the RL literature. In this paper, we investigate the impact of these components in the context of generalisation in multi-task RL. We investigate the hypothesis that collecting and training on more diverse data from the training environments will improve zero-shot generalisation to new tasks. We motivate mathematically and show empirically that generalisation to tasks that are "reachable'' during training is improved by increasing the diversity of transitions in the replay buffer. Furthermore, we show empirically that this same strategy also shows improvement for generalisation to similar but "unreachable'' tasks which could be due to improved generalisation of the learned latent representations.

Improving the Validity of Decision Trees as Explanations. (arXiv:2306.06777v3 [cs.LG] UPDATED)

Authors: Jiri Nemecek, Tomas Pevny, Jakub Marecek

In classification and forecasting with tabular data, one often utilizes tree-based models. Those can be competitive with deep neural networks on tabular data [cf. Grinsztajn et al., NeurIPS 2022, arXiv:2207.08815] and, under some conditions, explainable. The explainability depends on the depth of the tree and the accuracy in each leaf of the tree. Decision trees containing leaves with unbalanced accuracy can provide misleading explanations. Low-accuracy leaves give less valid explanations, which could be interpreted as unfairness among explanations. Here, we train a shallow tree with the objective of minimizing the maximum misclassification error across each leaf node. Then, we extend each leaf with a separate tree-based model. The shallow tree provides a global explanation, while the overall statistical performance of the shallow tree with extended leaves improves upon decision trees of unlimited depth trained using classical methods (e.g., CART) and is comparable to state-of-the-art methods (e.g., well-tuned XGBoost).

Neural Mixed Effects for Nonlinear Personalized Predictions. (arXiv:2306.08149v3 [cs.LG] UPDATED)

Authors: Torsten Wörtwein, Nicholas Allen, Lisa B. Sheeber, Randy P. Auerbach, Jeffrey F. Cohn, Louis-Philippe Morency

Personalized prediction is a machine learning approach that predicts a person's future observations based on their past labeled observations and is typically used for sequential tasks, e.g., to predict daily mood ratings. When making personalized predictions, a model can combine two types of trends: (a) trends shared across people, i.e., person-generic trends, such as being happier on weekends, and (b) unique trends for each person, i.e., person-specific trends, such as a stressful weekly meeting. Mixed effect models are popular statistical models to study both trends by combining person-generic and person-specific parameters. Though linear mixed effect models are gaining popularity in machine learning by integrating them with neural networks, these integrations are currently limited to linear person-specific parameters: ruling out nonlinear person-specific trends. In this paper, we propose Neural Mixed Effect (NME) models to optimize nonlinear person-specific parameters anywhere in a neural network in a scalable manner. NME combines the efficiency of neural network optimization with nonlinear mixed effects modeling. Empirically, we observe that NME improves performance across six unimodal and multimodal datasets, including a smartphone dataset to predict daily mood and a mother-adolescent dataset to predict affective state sequences where half the mothers experience at least moderate symptoms of depression. Furthermore, we evaluate NME for two model architectures, including for neural conditional random fields (CRF) to predict affective state sequences where the CRF learns nonlinear person-specific temporal transitions between affective states. Analysis of these person-specific transitions on the mother-adolescent dataset shows interpretable trends related to the mother's depression symptoms.

Neural ShDF: Reviving an Efficient and Consistent Mesh Segmentation Method. (arXiv:2306.11737v2 [cs.GR] UPDATED)

Authors: Bruno Roy

Partitioning a polygonal mesh into meaningful parts can be challenging. Many applications require decomposing such structures for further processing in computer graphics. In the last decade, several methods were proposed to tackle this problem, at the cost of intensive computational times. Recently, machine learning has proven to be effective for the segmentation task on 3D structures. Nevertheless, these state-of-the-art methods are often hardly generalizable and require dividing the learned model into several specific classes of objects to avoid overfitting. We present a data-driven approach leveraging deep learning to encode a mapping function prior to mesh segmentation for multiple applications. Our network reproduces a neighborhood map using our knowledge of the \textsl{Shape Diameter Function} (SDF) method using similarities among vertex neighborhoods. Our approach is resolution-agnostic as we downsample the input meshes and query the full-resolution structure solely for neighborhood contributions. Using our predicted SDF values, we can inject the resulting structure into a graph-cut algorithm to generate an efficient and robust mesh segmentation while considerably reducing the required computation times.

Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings. (arXiv:2306.17670v2 [cs.NE] UPDATED)

Authors: Ilyass Hammouamri, Ismail Khalfaoui-Hassani, Timothée Masquelier

Spiking Neural Networks (SNNs) are a promising research direction for building power-efficient information processing systems, especially for temporal tasks such as speech recognition. In SNNs, delays refer to the time needed for one spike to travel from one neuron to another. These delays matter because they influence the spike arrival times, and it is well-known that spiking neurons respond more strongly to coincident input spikes. More formally, it has been shown theoretically that plastic delays greatly increase the expressivity in SNNs. Yet, efficient algorithms to learn these delays have been lacking. Here, we propose a new discrete-time algorithm that addresses this issue in deep feedforward SNNs using backpropagation, in an offline manner. To simulate delays between consecutive layers, we use 1D convolutions across time. The kernels contain only a few non-zero weights - one per synapse - whose positions correspond to the delays. These positions are learned together with the weights using the recently proposed Dilated Convolution with Learnable Spacings (DCLS). We evaluated our method on three datasets: the Spiking Heidelberg Dataset (SHD), the Spiking Speech Commands (SSC) and its non-spiking version Google Speech Commands v0.02 (GSC) benchmarks, which require detecting temporal patterns. We used feedforward SNNs with two or three hidden fully connected layers, and vanilla leaky integrate-and fire neurons. We showed that fixed random delays help and that learning them helps even more. Furthermore, our method outperformed the state-of-the-art in the three datasets without using recurrent connections and with substantially fewer parameters. Our work demonstrates the potential of delay learning in developing accurate and precise models for temporal data processing. Our code is based on PyTorch / SpikingJelly and available at: https://github.com/Thvnvtos/SNN-delays

Data-driven Predictive Latency for 5G: A Theoretical and Experimental Analysis Using Network Measurements. (arXiv:2307.02329v3 [cs.NI] UPDATED)

Authors: Marco Skocaj, Francesca Conserva, Nicol Sarcone Grande, Andrea Orsi, Davide Micheli, Giorgio Ghinamo, Simone Bizzarri, Roberto Verdone

The advent of novel 5G services and applications with binding latency requirements and guaranteed Quality of Service (QoS) hastened the need to incorporate autonomous and proactive decision-making in network management procedures. The objective of our study is to provide a thorough analysis of predictive latency within 5G networks by utilizing real-world network data that is accessible to mobile network operators (MNOs). In particular, (i) we present an analytical formulation of the user-plane latency as a Hypoexponential distribution, which is validated by means of a comparative analysis with empirical measurements, and (ii) we conduct experimental results of probabilistic regression, anomaly detection, and predictive forecasting leveraging on emerging domains in Machine Learning (ML), such as Bayesian Learning (BL) and Machine Learning on Graphs (GML). We test our predictive framework using data gathered from scenarios of vehicular mobility, dense-urban traffic, and social gathering events. Our results provide valuable insights into the efficacy of predictive algorithms in practical applications.

Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning. (arXiv:2307.04726v2 [cs.LG] UPDATED)

Authors: Suzan Ece Ada, Erhan Oztop, Emre Ugur

Offline Reinforcement Learning (RL) methods leverage previous experiences to learn better policies than the behavior policy used for data collection. In contrast to behavior cloning, which assumes the data is collected from expert demonstrations, offline RL can work with non-expert data and multimodal behavior policies. However, offline RL algorithms face challenges in handling distribution shifts and effectively representing policies due to the lack of online interaction during training. Prior work on offline RL uses conditional diffusion models to represent multimodal behavior in the dataset. Nevertheless, these methods are not tailored toward alleviating the out-of-distribution state generalization. We introduce a novel method, named State Reconstruction for Diffusion Policies (SRDP), incorporating state reconstruction feature learning in the recent class of diffusion policies to address the out-of-distribution generalization problem. State reconstruction loss promotes more descriptive representation learning of states to alleviate the distribution shift incurred by the out-of-distribution (OOD) states. We design a novel 2D Multimodal Contextual Bandit environment to illustrate the OOD generalization of SRDP compared to prior algorithms. In addition, we assess the performance of our model on D4RL continuous control benchmarks, namely the navigation of an 8-DoF ant and forward locomotion of half-cheetah, hopper, and walker2d, achieving state-of-the-art results.

DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks. (arXiv:2307.05628v3 [q-bio.GN] UPDATED)

Authors: Daoan Zhang, Weitong Zhang, Yu Zhao, Jianguo Zhang, Bing He, Chenchen Qin, Jianhua Yao

Pre-trained large language models demonstrate potential in extracting information from DNA sequences, yet adapting to a variety of tasks and data modalities remains a challenge. To address this, we propose DNAGPT, a generalized DNA pre-training model trained on over 200 billion base pairs from all mammals. By enhancing the classic GPT model with a binary classification task (DNA sequence order), a numerical regression task (guanine-cytosine content prediction), and a comprehensive token language, DNAGPT can handle versatile DNA analysis tasks while processing both sequence and numerical data. Our evaluation of genomic signal and region recognition, mRNA abundance regression, and artificial genomes generation tasks demonstrates DNAGPT's superior performance compared to existing models designed for specific downstream tasks, benefiting from pre-training using the newly designed model structure.

Online Distributed Learning with Quantized Finite-Time Coordination. (arXiv:2307.06620v2 [cs.LG] UPDATED)

Authors: Nicola Bastianello, Apostolos I. Rikos, Karl H. Johansson

In this paper we consider online distributed learning problems. Online distributed learning refers to the process of training learning models on distributed data sources. In our setting a set of agents need to cooperatively train a learning model from streaming data. Differently from federated learning, the proposed approach does not rely on a central server but only on peer-to-peer communications among the agents. This approach is often used in scenarios where data cannot be moved to a centralized location due to privacy, security, or cost reasons. In order to overcome the absence of a central server, we propose a distributed algorithm that relies on a quantized, finite-time coordination protocol to aggregate the locally trained models. Furthermore, our algorithm allows for the use of stochastic gradients during local training. Stochastic gradients are computed using a randomly sampled subset of the local training data, which makes the proposed algorithm more efficient and scalable than traditional gradient descent. In our paper, we analyze the performance of the proposed algorithm in terms of the mean distance from the online solution. Finally, we present numerical results for a logistic regression task.

Why Does Little Robustness Help? Understanding and Improving Adversarial Transferability from Surrogate Training. (arXiv:2307.07873v5 [cs.LG] UPDATED)

Authors: Yechao Zhang, Shengshan Hu, Leo Yu Zhang, Junyu Shi, Minghui Li, Xiaogeng Liu, Wei Wan, Hai Jin

Adversarial examples (AEs) for DNNs have been shown to be transferable: AEs that successfully fool white-box surrogate models can also deceive other black-box models with different architectures. Although a bunch of empirical studies have provided guidance on generating highly transferable AEs, many of these findings lack explanations and even lead to inconsistent advice. In this paper, we take a further step towards understanding adversarial transferability, with a particular focus on surrogate aspects. Starting from the intriguing little robustness phenomenon, where models adversarially trained with mildly perturbed adversarial samples can serve as better surrogates, we attribute it to a trade-off between two predominant factors: model smoothness and gradient similarity. Our investigations focus on their joint effects, rather than their separate correlations with transferability. Through a series of theoretical and empirical analyses, we conjecture that the data distribution shift in adversarial training explains the degradation of gradient similarity. Building on these insights, we explore the impacts of data augmentation and gradient regularization on transferability and identify that the trade-off generally exists in the various training mechanisms, thus building a comprehensive blueprint for the regulation mechanism behind transferability. Finally, we provide a general route for constructing better surrogates to boost transferability which optimizes both model smoothness and gradient similarity simultaneously, e.g., the combination of input gradient regularization and sharpness-aware minimization (SAM), validated by extensive experiments. In summary, we call for attention to the united impacts of these two factors for launching effective transfer attacks, rather than optimizing one while ignoring the other, and emphasize the crucial role of manipulating surrogate models.

Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media. (arXiv:2307.09312v2 [cs.CL] UPDATED)

Authors: Liam Hebert, Gaurav Sahu, Yuxuan Guo, Nanda Kishore Sreenivas, Lukasz Golab, Robin Cohen

We present the Multi-Modal Discussion Transformer (mDT), a novel multi-modal graph-based transformer model for detecting hate speech in online social networks, such as Reddit discussions. In contrast to traditional comment-only methods, our approach to labelling a comment as hate speech involves a holistic analysis of text and images grounded in the discussion context. This is done by leveraging graph transformers to capture the contextual relationships in the entire discussion surrounding a comment and grounding the interwoven fusion layers that combine individual comments' text and image embeddings instead of processing modalities separately. We compare the performance of our model to baselines that only process individual comments and conduct extensive ablation studies. To evaluate our work, we present a new dataset, HatefulDiscussions, comprising complete multi-modal discussions from multiple online communities on Reddit. We conclude with future work for multimodal solutions to deliver social value in online contexts, arguing that capturing a holistic view of a conversation significantly advances the effort to detect anti-social behaviour.

Speeding up Fourier Neural Operators via Mixed Precision. (arXiv:2307.15034v2 [cs.LG] UPDATED)

Authors: Colin White, Renbo Tu, Jean Kossaifi, Gennady Pekhimenko, Kamyar Azizzadenesheli, Anima Anandkumar

The Fourier neural operator (FNO) is a powerful technique for learning surrogate maps for partial differential equation (PDE) solution operators. For many real-world applications, which often require high-resolution data points, training time and memory usage are significant bottlenecks. While there are mixed-precision training techniques for standard neural networks, those work for real-valued datatypes on finite dimensions and therefore cannot be directly applied to FNO, which crucially operates in the (complex-valued) Fourier domain and in function spaces. On the other hand, since the Fourier transform is already an approximation (due to discretization error), we do not need to perform the operation at full precision. In this work, we (i) profile memory and runtime for FNO with full and mixed-precision training, (ii) conduct a study on the numerical stability of mixed-precision training of FNO, and (iii) devise a training routine which substantially decreases training time and memory usage (up to 34%), with little or no reduction in accuracy, on the Navier-Stokes and Darcy flow equations. Combined with the recently proposed tensorized FNO (Kossaifi et al., 2023), the resulting model has far better performance while also being significantly faster than the original FNO.

Symmetry-Preserving Program Representations for Learning Code Semantics. (arXiv:2308.03312v5 [cs.LG] UPDATED)

Authors: Kexin Pei, Weichen Li, Qirui Jin, Shuyang Liu, Scott Geng, Lorenzo Cavallaro, Junfeng Yang, Suman Jana

Large Language Models (LLMs) have shown promise in automated program reasoning, a crucial aspect of many security tasks. However, existing LLM architectures for code are often borrowed from other domains like natural language processing, raising concerns about their generalization and robustness to unseen code. A key generalization challenge is to incorporate the knowledge of code semantics, including control and data flow, into the LLM architectures.

Drawing inspiration from examples of convolution layers exploiting translation symmetry, we explore how code symmetries can enhance LLM architectures for program analysis and modeling. We present a rigorous group-theoretic framework that formally defines code symmetries as semantics-preserving transformations and provides techniques for precisely reasoning about symmetry preservation within LLM architectures. Using this framework, we introduce a novel variant of self-attention that preserves program symmetries, demonstrating its effectiveness in generalization and robustness through detailed experimental evaluations across different binary and source code analysis tasks. Overall, our code symmetry framework offers rigorous and powerful reasoning techniques that can guide the future development of specialized LLMs for code and advance LLM-guided program reasoning tasks.

Adaptive Uncertainty-Guided Model Selection for Data-Driven PDE Discovery. (arXiv:2308.10283v2 [cs.LG] UPDATED)

Authors: Pongpisit Thanasutives, Takashi Morita, Masayuki Numao, Ken-ichi Fukui

We propose a new parameter-adaptive uncertainty-penalized Bayesian information criterion (UBIC) to prioritize the parsimonious partial differential equation (PDE) that sufficiently governs noisy spatial-temporal observed data with few reliable terms. Since the naive use of the BIC for model selection has been known to yield an undesirable overfitted PDE, the UBIC penalizes the found PDE not only by its complexity but also the quantified uncertainty, derived from the model supports' coefficient of variation in a probabilistic view. We also introduce physics-informed neural network learning as a simulation-based approach to further validate the selected PDE flexibly against the other discovered PDE. Numerical results affirm the successful application of the UBIC in identifying the true governing PDE. Additionally, we reveal an interesting effect of denoising the observed data on improving the trade-off between the BIC score and model complexity. Code is available at https://github.com/Pongpisit-Thanasutives/UBIC.

RBA-GCN: Relational Bilevel Aggregation Graph Convolutional Network for Emotion Recognition. (arXiv:2308.11029v2 [cs.AI] UPDATED)

Authors: Lin Yuan, Guoheng Huang, Fenghuan Li, Xiaochen Yuan, Chi-Man Pun, Guo Zhong

Emotion recognition in conversation (ERC) has received increasing attention from researchers due to its wide range of applications.As conversation has a natural graph structure,numerous approaches used to model ERC based on graph convolutional networks (GCNs) have yielded significant results.However,the aggregation approach of traditional GCNs suffers from the node information redundancy problem,leading to node discriminant information loss.Additionally,single-layer GCNs lack the capacity to capture long-range contextual information from the graph. Furthermore,the majority of approaches are based on textual modality or stitching together different modalities, resulting in a weak ability to capture interactions between modalities. To address these problems, we present the relational bilevel aggregation graph convolutional network (RBA-GCN), which consists of three modules: the graph generation module (GGM), similarity-based cluster building module (SCBM) and bilevel aggregation module (BiAM). First, GGM constructs a novel graph to reduce the redundancy of target node information.Then,SCBM calculates the node similarity in the target node and its structural neighborhood, where noisy information with low similarity is filtered out to preserve the discriminant information of the node. Meanwhile, BiAM is a novel aggregation method that can preserve the information of nodes during the aggregation process. This module can construct the interaction between different modalities and capture long-range contextual information based on similarity clusters. On both the IEMOCAP and MELD datasets, the weighted average F1 score of RBA-GCN has a 2.17$\sim$5.21\% improvement over that of the most advanced method.Our code is available at https://github.com/luftmenscher/RBA-GCN and our article with the same name has been published in IEEE/ACM Transactions on Audio,Speech,and Language Processing,vol.31,2023

xxMD: Benchmarking Neural Force Fields Using Extended Dynamics beyond Equilibrium. (arXiv:2308.11155v2 [cs.LG] UPDATED)

Authors: Zihan Pengmei, Yinan Shu, Junyu Liu

Neural force fields (NFFs) have gained prominence in computational chemistry as surrogate models, superseding quantum-chemistry calculations in ab initio molecular dynamics. The prevalent benchmark for NFFs has been the MD17 dataset and its subsequent extension. These datasets predominantly comprise geometries from the equilibrium region of the ground electronic state potential energy surface, sampling from direct adiabatic dynamics. However, many chemical reactions entail significant molecular deformations, notably bond breaking. We demonstrate the constrained distribution of internal coordinates and energies in the MD17 datasets, underscoring their inadequacy for representing systems undergoing chemical reactions. Addressing this sampling limitation, we introduce the xxMD (Extended Excited-state Molecular Dynamics) dataset, derived from non-adiabatic dynamics. This dataset encompasses energies and forces ascertained from both multireference wave function theory and density functional theory. Furthermore, its nuclear configuration spaces authentically depict chemical reactions, making xxMD a more chemically relevant dataset. Our re-assessment of equivariant models on the xxMD datasets reveals notably higher mean absolute errors than those reported for MD17 and its variants. This observation underscores the challenges faced in crafting a generalizable NFF model with extrapolation capability. Our proposed xxMD-CASSCF and xxMD-DFT datasets are available at https://github.com/zpengmei/xxMD.

Quantization-based Optimization with Perspective of Quantum Mechanics. (arXiv:2308.11594v2 [quant-ph] UPDATED)

Authors: Jinwuk Seok, Changsik Cho

Statistical and stochastic analysis based on thermodynamics has been the main analysis framework for stochastic global optimization. Recently, appearing quantum annealing or quantum tunneling algorithm for global optimization, we require a new researching framework for global optimization algorithms. In this paper, we provide the analysis for quantization-based optimization based on the Schr\"odinger equation to reveal what property in quantum mechanics enables global optimization. We present that the tunneling effect derived by the Schr\"odinger equation in quantization-based optimization enables to escape of a local minimum. Additionally, we confirm that this tunneling effect is the same property included in quantum mechanics-based global optimization. Experiments with standard multi-modal benchmark functions represent that the proposed analysis is valid.

Stochastic Configuration Machines for Industrial Artificial Intelligence. (arXiv:2308.13570v2 [cs.LG] UPDATED)

Authors: Dianhui Wang, Matthew J. Felicetti

Real-time predictive modelling with desired accuracy is highly expected in industrial artificial intelligence (IAI), where neural networks play a key role. Neural networks in IAI require powerful, high-performance computing devices to operate a large number of floating point data. Based on stochastic configuration networks (SCNs), this paper proposes a new randomized learner model, termed stochastic configuration machines (SCMs), to stress effective modelling and data size saving that are useful and valuable for industrial applications. Compared to SCNs and random vector functional-link (RVFL) nets with binarized implementation, the model storage of SCMs can be significantly compressed while retaining favourable prediction performance. Besides the architecture of the SCM learner model and its learning algorithm, as an important part of this contribution, we also provide a theoretical basis on the learning capacity of SCMs by analysing the model's complexity. Experimental studies are carried out over some benchmark datasets and three industrial applications. The results demonstrate that SCM has great potential for dealing with industrial data analytics.

Hypergraph Structure Inference From Data Under Smoothness Prior. (arXiv:2308.14172v2 [cs.LG] UPDATED)

Authors: Bohan Tang, Siheng Chen, Xiaowen Dong

Hypergraphs are important for processing data with higher-order relationships involving more than two entities. In scenarios where explicit hypergraphs are not readily available, it is desirable to infer a meaningful hypergraph structure from the node features to capture the intrinsic relations within the data. However, existing methods either adopt simple pre-defined rules that fail to precisely capture the distribution of the potential hypergraph structure, or learn a mapping between hypergraph structures and node features but require a large amount of labelled data, i.e., pre-existing hypergraph structures, for training. Both restrict their applications in practical scenarios. To fill this gap, we propose a novel smoothness prior that enables us to design a method to infer the probability for each potential hyperedge without labelled data as supervision. The proposed prior indicates features of nodes in a hyperedge are highly correlated by the features of the hyperedge containing them. We use this prior to derive the relation between the hypergraph structure and the node features via probabilistic modelling. This allows us to develop an unsupervised inference method to estimate the probability for each potential hyperedge via solving an optimisation problem that has an analytical solution. Experiments on both synthetic and real-world data demonstrate that our method can learn meaningful hypergraph structures from data more efficiently than existing hypergraph structure inference methods.

Biclustering Methods via Sparse Penalty. (arXiv:2308.14388v2 [stat.ML] UPDATED)

Authors: Jiqiang Wang

In this paper, we first reviewed several biclustering methods that are used to identify the most significant clusters in gene expression data. Here we mainly focused on the SSVD(sparse SVD) method and tried a new sparse penalty named "Prenet penalty" which has been used only in factor analysis to gain sparsity. Then in the simulation study, we tried different types of generated datasets (with different sparsity and dimension) and tried 1-layer approximation then for k-layers which shows the mixed Prenet penalty is very effective for non-overlapped data. Finally, we used some real gene expression data to show the behavior of our methods.

Multi-Response Heteroscedastic Gaussian Process Models and Their Inference. (arXiv:2308.15370v2 [stat.ML] UPDATED)

Authors: Taehee Lee, Jun S. Liu

Despite the widespread utilization of Gaussian process models for versatile nonparametric modeling, they exhibit limitations in effectively capturing abrupt changes in function smoothness and accommodating relationships with heteroscedastic errors. Addressing these shortcomings, the heteroscedastic Gaussian process (HeGP) regression seeks to introduce flexibility by acknowledging the variability of residual variances across covariates in the regression model. In this work, we extend the HeGP concept, expanding its scope beyond regression tasks to encompass classification and state-space models. To achieve this, we propose a novel framework where the Gaussian process is coupled with a covariate-induced precision matrix process, adopting a mixture formulation. This approach enables the modeling of heteroscedastic covariance functions across covariates. To mitigate the computational challenges posed by sampling, we employ variational inference to approximate the posterior and facilitate posterior predictive modeling. Additionally, our training process leverages an EM algorithm featuring closed-form M-step updates to efficiently evaluate the heteroscedastic covariance function. A notable feature of our model is its consistent performance on multivariate responses, accommodating various types (continuous or categorical) seamlessly. Through a combination of simulations and real-world applications in climatology, we illustrate the model's prowess and advantages. By overcoming the limitations of traditional Gaussian process models, our proposed framework offers a robust and versatile tool for a wide array of applications.

CongNaMul: A Dataset for Advanced Image Processing of Soybean Sprouts. (arXiv:2308.15690v2 [cs.CV] UPDATED)

Authors: Byunghyun Ban, Donghun Ryu, Su-won Hwang

We present 'CongNaMul', a comprehensive dataset designed for various tasks in soybean sprouts image analysis. The CongNaMul dataset is curated to facilitate tasks such as image classification, semantic segmentation, decomposition, and measurement of length and weight. The classification task provides four classes to determine the quality of soybean sprouts: normal, broken, spotted, and broken and spotted, for the development of AI-aided automatic quality inspection technology. For semantic segmentation, images with varying complexity, from single sprout images to images with multiple sprouts, along with human-labelled mask images, are included. The label has 4 different classes: background, head, body, tail. The dataset also provides images and masks for the image decomposition task, including two separate sprout images and their combined form. Lastly, 5 physical features of sprouts (head length, body length, body thickness, tail length, weight) are provided for image-based measurement tasks. This dataset is expected to be a valuable resource for a wide range of research and applications in the advanced analysis of images of soybean sprouts. Also, we hope that this dataset can assist researchers studying classification, semantic segmentation, decomposition, and physical feature measurement in other industrial fields, in evaluating their models. The dataset is available at the authors' repository. (https://bhban.kr/data)

MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision. (arXiv:2308.16139v2 [cs.CV] UPDATED)

Authors: Jianning Li, Antonio Pepe, Christina Gsaxner, Gijs Luijten, Yuan Jin, Narmada Ambigapathy, Enrico Nasca, Naida Solak, Gian Marco Melito, Afaque R. Memon, Xiaojun Chen, Jan Stefan Kirschke, Ezequiel de la Rosa, Patrich Ferndinand Christ, Hongwei Bran Li, David G. Ellis, Michele R. Aizenberg, Sergios Gatidis, Thomas Kuestner, Nadya Shusharina, Nicholas Heller, Vincent Andrearczyk, Adrien Depeursinge, Mathieu Hatt, Anjany Sekuboyina, Maximilian Loeffler, Hans Liebl, Reuben Dorent, Tom Vercauteren, Jonathan Shapey, Aaron Kujawa, Stefan Cornelissen, Patrick Langenhuizen, Achraf Ben-Hamadou, Ahmed Rekik, Sergi Pujades, Edmond Boyer, Federico Bolelli, Costantino Grana, Luca Lumetti, Hamidreza Salehi, Jun Ma, Yao Zhang, Ramtin Gharleghi, Susann Beier, Arcot Sowmya, Eduardo A. Garza-Villarreal, Thania Balducci, et al. (68 additional authors not shown)

We present MedShapeNet, a large collection of anatomical shapes (e.g., bones, organs, vessels) and 3D surgical instrument models. Prior to the deep learning era, the broad application of statistical shape models (SSMs) in medical image analysis is evidence that shapes have been commonly used to describe medical data. Nowadays, however, state-of-the-art (SOTA) deep learning algorithms in medical imaging are predominantly voxel-based. In computer vision, on the contrary, shapes (including, voxel occupancy grids, meshes, point clouds and implicit surface models) are preferred data representations in 3D, as seen from the numerous shape-related publications in premier vision conferences, such as the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), as well as the increasing popularity of ShapeNet (about 51,300 models) and Princeton ModelNet (127,915 models) in computer vision research. MedShapeNet is created as an alternative to these commonly used shape benchmarks to facilitate the translation of data-driven vision algorithms to medical applications, and it extends the opportunities to adapt SOTA vision algorithms to solve critical medical problems. Besides, the majority of the medical shapes in MedShapeNet are modeled directly on the imaging data of real patients, and therefore it complements well existing shape benchmarks comprising of computer-aided design (CAD) models. MedShapeNet currently includes more than 100,000 medical shapes, and provides annotations in the form of paired data. It is therefore also a freely available repository of 3D models for extended reality (virtual reality - VR, augmented reality - AR, mixed reality - MR) and medical 3D printing. This white paper describes in detail the motivations behind MedShapeNet, the shape acquisition procedures, the use cases, as well as the usage of the online shape search portal: https://medshapenet.ikim.nrw/