Authors: X. Xiao, P. Chen, X. Cao, K. Liu, L. Deng, D. Zhao, Z. Chen, Q. Deng, F. Yu, H. Zhang
Abstract: Epidemic outbreaks can cause critical health concerns and severe global economic crises. For countries or regions with new infectious disease outbreaks, it is essential to generate preventive strategies by learning lessons from others with similar risk profiles. A Strategy Transfer and Decision Support Approach (STDSA) is proposed based on the profile similarity evaluation. There are four steps in this method: (1) The similarity evaluation indicators are determined from three dimensions, i.e., the Basis of National Epidemic Prevention & Control, Social Resilience, and Infection Situation. (2) The data related to the indicators are collected and preprocessed. (3) The first round of screening on the preprocessed dataset is conducted through an improved collaborative filtering algorithm to calculate the preliminary similarity result from the perspective of the infection situation. (4) Finally, the K-Means model is used for the second round of screening to obtain the final similarity values. The approach will be applied to decision-making support in the context of COVID-19. Our results demonstrate that the recommendations generated by the STDSA model are more accurate and aligned better with the actual situation than those produced by pure K-means models. This study will provide new insights into preventing and controlling epidemics in regions that lack experience.
Authors: Nuzhat Prova
Abstract: Recent advances in natural language processing (NLP) may enable artificial intelligence (AI) models to generate writing that is identical to human written form in the future. This might have profound ethical, legal, and social repercussions. This study aims to address this problem by offering an accurate AI detector model that can differentiate between electronically produced text and human-written text. Our approach includes machine learning methods such as XGB Classifier, SVM, BERT architecture deep learning models. Furthermore, our results show that the BERT performs better than previous models in identifying information generated by AI from information provided by humans. Provide a comprehensive analysis of the current state of AI-generated text identification in our assessment of pertinent studies. Our testing yielded positive findings, showing that our strategy is successful, with the BERT emerging as the most probable answer. We analyze the research's societal implications, highlighting the possible advantages for various industries while addressing sustainability issues pertaining to morality and the environment. The XGB classifier and SVM give 0.84 and 0.81 accuracy in this article, respectively. The greatest accuracy in this research is provided by the BERT model, which provides 0.93% accuracy.
Authors: Micha{\l} Koziarski, Mohammed Abukalam, Vedant Shah, Louis Vaillancourt, Doris Alexandra Schuetz, Moksh Jain, Almer van der Sloot, Mathieu Bourgey, Anne Marinier, Yoshua Bengio
Abstract: DNA-encoded libraries (DELs) are a powerful approach for rapidly screening large numbers of diverse compounds. One of the key challenges in using DELs is library design, which involves choosing the building blocks that will be combinatorially combined to produce the final library. In this paper we consider the task of protein-protein interaction (PPI) biased DEL design. To this end, we evaluate several machine learning algorithms on the PPI modulation task and use them as a reward for the proposed GFlowNet-based generative approach. We additionally investigate the possibility of using structural information about building blocks to design a hierarchical action space for the GFlowNet. The observed results indicate that GFlowNets are a promising approach for generating diverse combinatorial library candidates.
Authors: Chong Yu, Shuaiqi Shen, Shiqiang Wang, Kuan Zhang, Hai Zhao
Abstract: E-health allows smart devices and medical institutions to collaboratively collect patients' data, which is trained by Artificial Intelligence (AI) technologies to help doctors make diagnosis. By allowing multiple devices to train models collaboratively, federated learning is a promising solution to address the communication and privacy issues in e-health. However, applying federated learning in e-health faces many challenges. First, medical data is both horizontally and vertically partitioned. Since single Horizontal Federated Learning (HFL) or Vertical Federated Learning (VFL) techniques cannot deal with both types of data partitioning, directly applying them may consume excessive communication cost due to transmitting a part of raw data when requiring high modeling accuracy. Second, a naive combination of HFL and VFL has limitations including low training efficiency, unsound convergence analysis, and lack of parameter tuning strategies. In this paper, we provide a thorough study on an effective integration of HFL and VFL, to achieve communication efficiency and overcome the above limitations when data is both horizontally and vertically partitioned. Specifically, we propose a hybrid federated learning framework with one intermediate result exchange and two aggregation phases. Based on this framework, we develop a Hybrid Stochastic Gradient Descent (HSGD) algorithm to train models. Then, we theoretically analyze the convergence upper bound of the proposed algorithm. Using the convergence results, we design adaptive strategies to adjust the training parameters and shrink the size of transmitted data. Experimental results validate that the proposed HSGD algorithm can achieve the desired accuracy while reducing communication cost, and they also verify the effectiveness of the adaptive strategies.
Authors: Fanny Lehmann, Filippo Gatti, Didier Clouteau
Abstract: Numerical simulations are essential tools to evaluate the solution of the wave equation in complex settings, such as three-dimensional (3D) domains with heterogeneous properties. However, their application is limited by high computational costs and existing surrogate models lack the flexibility of numerical solvers. This work introduces the Multiple-Input Fourier Neural Operator (MIFNO) to deal with structured 3D fields representing material properties as well as vectors describing the source characteristics. The MIFNO is applied to the problem of elastic wave propagation in the Earth's crust. It is trained on the HEMEW^S-3D database containing 30000 earthquake simulations in different heterogeneous domains with random source positions and orientations. Outputs are time- and space-dependent surface wavefields. The MIFNO predictions are assessed as good to excellent based on Goodness-Of-Fit (GOF) criteria. Wave arrival times and wave fronts' propagation are very accurate since 80% of the predictions have an excellent phase GOF. The fluctuations amplitudes are good for 87% of the predictions. The envelope score is hindered by the small-scale fluctuations that are challenging to capture due to the complex physical phenomena associated with high-frequency features. Nevertheless, the MIFNO can generalize to sources located outside the training domain and it shows good generalization ability to a real complex overthrust geology. When focusing on a region of interest, transfer learning improves the accuracy with limited additional costs, since GOF scores improved by more than 1 GOF unit with only 500 additional specific samples. The MIFNO is the first surrogate model offering the flexibility of an earthquake simulator with varying sources and material properties. Its good accuracy and massive speed-up offer new perspectives to replace numerical simulations in many-query problems.
Authors: Hanjing Wang, Qiang Ji
Abstract: Epistemic uncertainty quantification (UQ) identifies where models lack knowledge. Traditional UQ methods, often based on Bayesian neural networks, are not suitable for pre-trained non-Bayesian models. Our study addresses quantifying epistemic uncertainty for any pre-trained model, which does not need the original training data or model modifications and can ensure broad applicability regardless of network architectures or training techniques. Specifically, we propose a gradient-based approach to assess epistemic uncertainty, analyzing the gradients of outputs relative to model parameters, and thereby indicating necessary model adjustments to accurately represent the inputs. We first explore theoretical guarantees of gradient-based methods for epistemic UQ, questioning the view that this uncertainty is only calculable through differences between multiple models. We further improve gradient-driven UQ by using class-specific weights for integrating gradients and emphasizing distinct contributions from neural network layers. Additionally, we enhance UQ accuracy by combining gradient and perturbation methods to refine the gradients. We evaluate our approach on out-of-distribution detection, uncertainty calibration, and active learning, demonstrating its superiority over current state-of-the-art UQ methods for pre-trained models.
Authors: Yihan Wang
Abstract: Obtaining reliable precipitation estimation with high resolutions in time and space is of great importance to hydrological studies. However, accurately estimating precipitation is a challenging task over high mountainous complex terrain. The three widely used precipitation measurement approaches, namely rainfall gauge, precipitation radars, and satellite-based precipitation sensors, have their own pros and cons in producing reliable precipitation products over complex areas. One way to decrease the detection error probability and improve data reliability is precipitation data merging. With the rapid advancements in computational capabilities and the escalating volume and diversity of earth observational data, Deep Learning (DL) models have gained considerable attention in geoscience. In this study, a deep learning technique, namely Long Short-term Memory (LSTM), was employed to merge a radar-based and a satellite-based Global Precipitation Measurement (GPM) precipitation product Integrated Multi-Satellite Retrievals for GPM (IMERG) precipitation product at hourly scale. The merged results are compared with the widely used reanalysis precipitation product, Multi-Radar Multi-Sensor (MRMS), and assessed against gauge observational data from the California Data Exchange Center (CDEC). The findings indicated that the LSTM-based merged precipitation notably underestimated gauge observations and, at times, failed to provide meaningful estimates, showing predominantly near-zero values. Relying solely on individual Quantitative Precipitation Estimates (QPEs) without additional meteorological input proved insufficient for generating reliable merged QPE. However, the merged results effectively captured the temporal trends of the observations, outperforming MRMS in this aspect. This suggested that incorporating bias correction techniques could potentially enhance the accuracy of the merged product.
Authors: Khawir Mahmood, Jehandad Khan, Hammad Afzal
Abstract: GPU kernels have come to the forefront of computing due to their utility in varied fields, from high-performance computing to machine learning. A typical GPU compute kernel is invoked millions, if not billions of times in a typical application, which makes their performance highly critical. Due to the unknown nature of the optimization surface, an exhaustive search is required to discover the global optimum, which is infeasible due to the possible exponential number of parameter combinations. In this work, we propose a methodology that uses deep sequence-to-sequence models to predict the optimal tuning parameters governing compute kernels. This work considers the prediction of kernel parameters as a sequence to the sequence translation problem, borrowing models from the Natural Language Processing (NLP) domain. Parameters describing the input, output and weight tensors are considered as the input language to the model that emits the corresponding kernel parameters. In essence, the model translates the problem parameter language to kernel parameter language. The core contributions of this work are: a) Proposing that a sequence to sequence model can accurately learn the performance dynamics of a GPU compute kernel b) A novel network architecture which predicts the kernel tuning parameters for GPU kernels, c) A constrained beam search which incorporates the physical limits of the GPU hardware as well as other expert knowledge reducing the search space. The proposed algorithm can achieve more than 90% accuracy on various convolutional kernels in MIOpen, the AMD machine learning primitives library. As a result, the proposed technique can reduce the development time and compute resources required to tune unseen input configurations, resulting in shorter development cycles, reduced development costs, and better user experience.
Authors: Nian Ran, Bahrul Ilmi Nasution, Claire Little, Richard Allmendinger, Mark Elliot
Abstract: Synthetic data has a key role to play in data sharing by statistical agencies and other generators of statistical data products. Generative Adversarial Networks (GANs), typically applied to image synthesis, are also a promising method for tabular data synthesis. However, there are unique challenges in tabular data compared to images, eg tabular data may contain both continuous and discrete variables and conditional sampling, and, critically, the data should possess high utility and low disclosure risk (the risk of re-identifying a population unit or learning something new about them), providing an opportunity for multi-objective (MO) optimization. Inspired by MO GANs for images, this paper proposes a smart MO evolutionary conditional tabular GAN (SMOE-CTGAN). This approach models conditional synthetic data by applying conditional vectors in training, and uses concepts from MO optimisation to balance disclosure risk against utility. Our results indicate that SMOE-CTGAN is able to discover synthetic datasets with different risk and utility levels for multiple national census datasets. We also find a sweet spot in the early stage of training where a competitive utility and extremely low risk are achieved, by using an Improvement Score. The full code can be downloaded from https://github.com/HuskyNian/SMO\_EGAN\_pytorch.
Authors: Zhun Zhang, Yi Zeng, Qihe Liu, Shijie Zhou
Abstract: Enhancing our understanding of adversarial examples is crucial for the secure application of machine learning models in real-world scenarios. A prevalent method for analyzing adversarial examples is through a frequency-based approach. However, existing research indicates that attacks designed to exploit low-frequency or high-frequency information can enhance attack performance, leading to an unclear relationship between adversarial perturbations and different frequency components. In this paper, we seek to demystify this relationship by exploring the characteristics of adversarial perturbations within the frequency domain. We employ wavelet packet decomposition for detailed frequency analysis of adversarial examples and conduct statistical examinations across various frequency bands. Intriguingly, our findings indicate that significant adversarial perturbations are present within the high-frequency components of low-frequency bands. Drawing on this insight, we propose a black-box adversarial attack algorithm based on combining different frequency bands. Experiments conducted on multiple datasets and models demonstrate that combining low-frequency bands and high-frequency components of low-frequency bands can significantly enhance attack efficiency. The average attack success rate reaches 99\%, surpassing attacks that utilize a single frequency segment. Additionally, we introduce the normalized disturbance visibility index as a solution to the limitations of $L_2$ norm in assessing continuous and discrete perturbations.
Authors: Ziyou Gong, Xianwen Fang, Ping Wu
Abstract: Event log records all events that occur during the execution of business processes, so detecting and correcting anomalies in event log can provide reliable guarantee for subsequent process analysis. The previous works mainly include next event prediction based methods and autoencoder-based methods. These methods cannot accurately and efficiently detect anomalies and correct anomalies at the same time, and they all rely on the set threshold to detect anomalies. To solve these problems, we propose a business process anomaly correction method based on Transformer autoencoder. By using self-attention mechanism and autoencoder structure, it can efficiently process event sequences of arbitrary length, and can directly output corrected business process instances, so that it can adapt to various scenarios. At the same time, the anomaly detection is transformed into a classification problem by means of selfsupervised learning, so that there is no need to set a specific threshold in anomaly detection. The experimental results on several real-life event logs show that the proposed method is superior to the previous methods in terms of anomaly detection accuracy and anomaly correction results while ensuring high running efficiency.
Authors: Joshua Melton, Shannon Reid, Gabriel Terejanu, Siddharth Krishnan
Abstract: The high volume and rapid evolution of content on social media present major challenges for studying the stance of social media users. In this work, we develop a two stage stance labeling method that utilizes the user-hashtag bipartite graph and the user-user interaction graph. In the first stage, a simple and efficient heuristic for stance labeling uses the user-hashtag bipartite graph to iteratively update the stance association of user and hashtag nodes via a label propagation mechanism. This set of soft labels is then integrated with the user-user interaction graph to train a graph neural network (GNN) model using semi-supervised learning. We evaluate this method on two large-scale datasets containing tweets related to climate change from June 2021 to June 2022 and gun control from January 2022 to January 2023. Experiments demonstrate that our user-hashtag heuristic and the semi-supervised GNN method outperform zero-shot stance labeling using LLMs such as GPT4. Further analysis illustrates how the stance labeling information and interaction graph can be used for evaluating the polarization of social media interactions on divisive issues such as climate change and gun control.
Authors: Zhiyuan Wu, Sheng Sun, Yuwei Wang, Min Liu, Bo Gao, Tianliu He, Wen Wang
Abstract: On-device intelligence (ODI) enables artificial intelligence (AI) applications to run on end devices, providing real-time and customized AI services without relying on remote servers. However, training models for on-device deployment face significant challenges due to the decentralized and privacy-sensitive nature of users' data, along with end-side constraints related to network connectivity, computation efficiency, etc. Existing training paradigms, such as cloud-based training, federated learning, and transfer learning, fail to sufficiently address these practical constraints that are prevalent for devices. To overcome these challenges, we propose Privacy-Preserving Training-as-a-Service (PTaaS), a novel service computing paradigm that provides privacy-friendly, customized AI model training for end devices. PTaaS outsources the core training process to remote and powerful cloud or edge servers, efficiently developing customized on-device models based on uploaded anonymous queries, ensuring data privacy while reducing the computation load on individual devices. We explore the definition, goals, and design principles of PTaaS, alongside emerging technologies that support the PTaaS paradigm. An architectural scheme for PTaaS is also presented, followed by a series of open problems that set the stage for future research directions in the field of PTaaS.
Authors: Vincent Conitzer, Rachel Freedman, Jobst Heitzig, Wesley H. Holliday, Bob M. Jacobs, Nathan Lambert, Milan Moss\'e, Eric Pacuit, Stuart Russell, Hailey Schoelkopf, Emanuel Tewolde, William S. Zwicker
Abstract: Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise problematic behavior, so that, for example, they refuse to comply with requests for help with committing crimes or with producing racist text. One approach to fine-tuning, called reinforcement learning from human feedback, learns from humans' expressed preferences over multiple outputs. Another approach is constitutional AI, in which the input from humans is a list of high-level principles. But how do we deal with potentially diverging input from humans? How can we aggregate the input into consistent data about ''collective'' preferences or otherwise use it to make collective choices about model behavior? In this paper, we argue that the field of social choice is well positioned to address these questions, and we discuss ways forward for this agenda, drawing on discussions in a recent workshop on Social Choice for AI Ethics and Safety held in Berkeley, CA, USA in December 2023.
Authors: Vincent Grari, Marcin Detyniecki
Abstract: This paper presents a novel approach to optimizing profit margins in non-life insurance markets through a gradient descent-based method, targeting three key objectives: 1) maximizing profit margins, 2) ensuring conversion rates, and 3) enforcing fairness criteria such as demographic parity (DP). Traditional pricing optimization, which heavily lean on linear and semi definite programming, encounter challenges in balancing profitability and fairness. These challenges become especially pronounced in situations that necessitate continuous rate adjustments and the incorporation of fairness criteria. Specifically, indirect Ratebook optimization, a widely-used method for new business price setting, relies on predictor models such as XGBoost or GLMs/GAMs to estimate on downstream individually optimized prices. However, this strategy is prone to sequential errors and struggles to effectively manage optimizations for continuous rate scenarios. In practice, to save time actuaries frequently opt for optimization within discrete intervals (e.g., range of [-20\%, +20\%] with fix increments) leading to approximate estimations. Moreover, to circumvent infeasible solutions they often use relaxed constraints leading to suboptimal pricing strategies. The reverse-engineered nature of traditional models complicates the enforcement of fairness and can lead to biased outcomes. Our method addresses these challenges by employing a direct optimization strategy in the continuous space of rates and by embedding fairness through an adversarial predictor model. This innovation not only reduces sequential errors and simplifies the complexities found in traditional models but also directly integrates fairness measures into the commercial premium calculation. We demonstrate improved margin performance and stronger enforcement of fairness highlighting the critical need to evolve existing pricing strategies.
Authors: Kyle Hsu, Jubayer Ibn Hamid, Kaylee Burns, Chelsea Finn, Jiajun Wu
Abstract: Inductive biases are crucial in disentangled representation learning for narrowing down an underspecified solution set. In this work, we consider endowing a neural network autoencoder with three select inductive biases from the literature: data compression into a grid-like latent space via quantization, collective independence amongst latents, and minimal functional influence of any latent on how other latents determine data generation. In principle, these inductive biases are deeply complementary: they most directly specify properties of the latent space, encoder, and decoder, respectively. In practice, however, naively combining existing techniques instantiating these inductive biases fails to yield significant benefits. To address this, we propose adaptations to the three techniques that simplify the learning problem, equip key regularization terms with stabilizing invariances, and quash degenerate incentives. The resulting model, Tripod, achieves state-of-the-art results on a suite of four image disentanglement benchmarks. We also verify that Tripod significantly improves upon its naive incarnation and that all three of its "legs" are necessary for best performance.
Authors: Chanwook Park, Sourav Saha, Jiachen Guo, Xiaoyu Xie, Satyajit Mojumder, Miguel A. Bessa, Dong Qian, Wei Chen, Gregory J. Wagner, Jian Cao, Wing Kam Liu
Abstract: The evolution of artificial intelligence (AI) and neural network theories has revolutionized the way software is programmed, shifting from a hard-coded series of codes to a vast neural network. However, this transition in engineering software has faced challenges such as data scarcity, multi-modality of data, low model accuracy, and slow inference. Here, we propose a new network based on interpolation theories and tensor decomposition, the interpolating neural network (INN). Instead of interpolating training data, a common notion in computer science, INN interpolates interpolation points in the physical space whose coordinates and values are trainable. It can also extrapolate if the interpolation points reside outside of the range of training data and the interpolation functions have a larger support domain. INN features orders of magnitude fewer trainable parameters, faster training, a smaller memory footprint, and higher model accuracy compared to feed-forward neural networks (FFNN) or physics-informed neural networks (PINN). INN is poised to usher in Engineering Software 2.0, a unified neural network that spans various domains of space, time, parameters, and initial/boundary conditions. This has previously been computationally prohibitive due to the exponentially growing number of trainable parameters, easily exceeding the parameter size of ChatGPT, which is over 1 trillion. INN addresses this challenge by leveraging tensor decomposition and tensor product, with adaptable network architecture.
Authors: Shintaro Tamai, Masayuki Numao, Ken-ichi Fukui
Abstract: Recently, growing health awareness, novel methods allow individuals to monitor sleep at home. Utilizing sleep sounds offers advantages over conventional methods like smartwatches, being non-intrusive, and capable of detecting various physiological activities. This study aims to construct a machine learning-based sleep assessment model providing evidence-based assessments, such as poor sleep due to frequent movement during sleep onset. Extracting sleep sound events, deriving latent representations using VAE, clustering with GMM, and training LSTM for subjective sleep assessment achieved a high accuracy of 94.8% in distinguishing sleep satisfaction. Moreover, TimeSHAP revealed differences in impactful sound event types and timings for different individuals.
Authors: Woomin Song, Seunghyuk Oh, Sangwoo Mo, Jaehyung Kim, Sukmin Yun, Jung-Woo Ha, Jinwoo Shin
Abstract: Large language models (LLMs) have shown remarkable performance in various natural language processing tasks. However, a primary constraint they face is the context limit, i.e., the maximum number of tokens they can process. Previous works have explored architectural changes and modifications in positional encoding to relax the constraint, but they often require expensive training or do not address the computational demands of self-attention. In this paper, we present Hierarchical cOntext MERging (HOMER), a new training-free scheme designed to overcome the limitations. HOMER uses a divide-and-conquer algorithm, dividing long inputs into manageable chunks. Each chunk is then processed collectively, employing a hierarchical strategy that merges adjacent chunks at progressive transformer layers. A token reduction technique precedes each merging, ensuring memory usage efficiency. We also propose an optimized computational order reducing the memory requirement to logarithmically scale with respect to input length, making it especially favorable for environments with tight memory restrictions. Our experiments demonstrate the proposed method's superior performance and memory efficiency, enabling the broader use of LLMs in contexts requiring extended context. Code is available at https://github.com/alinlab/HOMER.
Authors: Christian G\"uck, Cyriana M. A. Roelofs, Stefan Faulstich
Abstract: Anomaly detection plays a crucial role in the field of predictive maintenance for wind turbines, yet the comparison of different algorithms poses a difficult task because domain specific public datasets are scarce. Many comparisons of different approaches either use benchmarks composed of data from many different domains, inaccessible data or one of the few publicly available datasets which lack detailed information about the faults. Moreover, many publications highlight a couple of case studies where fault detection was successful. With this paper we publish a high quality dataset that contains data from 36 wind turbines across 3 different wind farms as well as the most detailed fault information of any public wind turbine dataset as far as we know. The new dataset contains 89 years worth of real-world operating data of wind turbines, distributed across 44 labeled time frames for anomalies that led up to faults, as well as 51 time series representing normal behavior. Additionally, the quality of training data is ensured by turbine-status-based labels for each data point. Furthermore, we propose a new scoring method, called CARE (Coverage, Accuracy, Reliability and Earliness), which takes advantage of the information depth that is present in the dataset to identify a good all-around anomaly detection model. This score considers the anomaly detection performance, the ability to recognize normal behavior properly and the capability to raise as few false alarms as possible while simultaneously detecting anomalies early.
Authors: Zhiyu Zhang, Chenkaixiang Lu, Wenchong Tian, Zhenliang Liao, Zhiguo Yuan
Abstract: Physics-based models are computationally time-consuming and infeasible for real-time scenarios of urban drainage networks, and a surrogate model is needed to accelerate the online predictive modelling. Fully-connected neural networks (NNs) are potential surrogate models, but may suffer from low interpretability and efficiency in fitting complex targets. Owing to the state-of-the-art modelling power of graph neural networks (GNNs) and their match with urban drainage networks in the graph structure, this work proposes a GNN-based surrogate of the flow routing model for the hydraulic prediction problem of drainage networks, which regards recent hydraulic states as initial conditions, and future runoff and control policy as boundary conditions. To incorporate hydraulic constraints and physical relationships into drainage modelling, physics-guided mechanisms are designed on top of the surrogate model to restrict the prediction variables with flow balance and flooding occurrence constraints. According to case results in a stormwater network, the GNN-based model is more cost-effective with better hydraulic prediction accuracy than the NN-based model after equal training epochs, and the designed mechanisms further limit prediction errors with interpretable domain knowledge. As the model structure adheres to the flow routing mechanisms and hydraulic constraints in urban drainage networks, it provides an interpretable and effective solution for data-driven surrogate modelling. Simultaneously, the surrogate model accelerates the predictive modelling of urban drainage networks for real-time use compared with the physics-based model.
Authors: Arnulf Hagen, Trond Michael Andersen
Abstract: In April 2021 Stava bridge, a main bridge on E6 in Norway, was abruptly closed for traffic. A structural defect had seriously compromised the bridge structural integrity. The Norwegian Public Roads Administration (NPRA) closed it, made a temporary solution and reopened with severe traffic restrictions. The incident was alerted through what constitutes the bridge Digital Twin processing data from Internet of Things sensors. The solution was crucial in online and offline diagnostics, the case demonstrating the value of technologies to tackle emerging dangerous situations as well as acting preventively. A critical and rapidly developing damage was detected in time to stop the development, but not in time to avoid the incident altogether. The paper puts risk in a broader perspective for an organization responsible for highway infrastructure. It positions online monitoring and Digital Twins in the context of Risk- and Condition-Based Maintenance. The situation that arose at Stava bridge, and how it was detected, analyzed, and diagnosed during virtual inspection, is described. The case demonstrates how combining physics-based methods with Machine Learning can facilitate damage detection and diagnostics. A summary of lessons learnt, both from technical and organizational perspectives, as well as plans of future work, is presented.
Authors: Haodong Wen, Bodong Du, Ruixun Liu, Deyu Meng, Xiangyong Cao
Abstract: Recently, the optimization of polynomial filters within Spectral Graph Neural Networks (GNNs) has emerged as a prominent research focus. Existing spectral GNNs mainly emphasize polynomial properties in filter design, introducing computational overhead and neglecting the integration of crucial graph structure information. We argue that incorporating graph information into basis construction can enhance understanding of polynomial basis, and further facilitate simplified polynomial filter design. Motivated by this, we first propose a Positive and Negative Coupling Analysis (PNCA) framework, where the concepts of positive and negative activation are defined and their respective and mixed effects are analysed. Then, we explore PNCA from the message propagation perspective, revealing the subtle information hidden in the activation process. Subsequently, PNCA is used to analyze the mainstream polynomial filters, and a novel simple basis that decouples the positive and negative activation and fully utilizes graph structure information is designed. Finally, a simple GNN (called GSCNet) is proposed based on the new basis. Experimental results on the benchmark datasets for node classification verify that our GSCNet obtains better or comparable results compared with existing state-of-the-art GNNs while demanding relatively less computational time.
Authors: Payal Varshney, Adriano Lucieri, Christoph Balada, Andreas Dengel, Sheraz Ahmed
Abstract: Trustworthiness is a major prerequisite for the safe application of opaque deep learning models in high-stakes domains like medicine. Understanding the decision-making process not only contributes to fostering trust but might also reveal previously unknown decision criteria of complex models that could advance the state of medical research. The discovery of decision-relevant concepts from black box models is a particularly challenging task. This study proposes Concept Discovery through Latent Diffusion-based Counterfactual Trajectories (CDCT), a novel three-step framework for concept discovery leveraging the superior image synthesis capabilities of diffusion models. In the first step, CDCT uses a Latent Diffusion Model (LDM) to generate a counterfactual trajectory dataset. This dataset is used to derive a disentangled representation of classification-relevant concepts using a Variational Autoencoder (VAE). Finally, a search algorithm is applied to identify relevant concepts in the disentangled latent space. The application of CDCT to a classifier trained on the largest public skin lesion dataset revealed not only the presence of several biases but also meaningful biomarkers. Moreover, the counterfactuals generated within CDCT show better FID scores than those produced by a previously established state-of-the-art method, while being 12 times more resource-efficient. Unsupervised concept discovery holds great potential for the application of trustworthy AI and the further development of human knowledge in various domains. CDCT represents a further step in this direction.
Authors: Ayah Youssef (DIAPRO), Hassan Noura (DIAPRO), Abderrahim El Amrani (DIAPRO), El Mostafa El Adel (DIAPRO), Mustapha Ouladsine (DIAPRO)
Abstract: Fault diagnosis in marine diesel engines is vital for maritime safety and operational efficiency.These engines are integral to marine vessels, and their reliable performance is crucial for safenavigation. Swift identification and resolution of faults are essential to prevent breakdowns,enhance safety, and reduce the risk of catastrophic failures at sea. Proactive fault diagnosisfacilitates timely maintenance, minimizes downtime, and ensures the overall reliability andlongevity of marine diesel engines. This paper explores the importance of fault diagnosis,emphasizing subsystems, common faults, and recent advancements in data-driven approachesfor effective marine diesel engine maintenance
Authors: Ziqi Zhao, Zhaochun Ren, Liu Yang, Fajie Yuan, Pengjie Ren, Zhumin Chen, jun Ma, Xin Xin
Abstract: Offline reinforcement learning (RL) aims to learn policies from static datasets of previously collected trajectories. Existing methods for offline RL either constrain the learned policy to the support of offline data or utilize model-based virtual environments to generate simulated rollouts. However, these methods suffer from (i) poor generalization to unseen states; and (ii) trivial improvement from low-qualified rollout simulation. In this paper, we propose offline trajectory generalization through world transformers for offline reinforcement learning (OTTO). Specifically, we use casual Transformers, a.k.a. World Transformers, to predict state dynamics and the immediate reward. Then we propose four strategies to use World Transformers to generate high-rewarded trajectory simulation by perturbing the offline data. Finally, we jointly use offline data with simulated data to train an offline RL algorithm. OTTO serves as a plug-in module and can be integrated with existing offline RL methods to enhance them with better generalization capability of transformers and high-rewarded data augmentation. Conducting extensive experiments on D4RL benchmark datasets, we verify that OTTO significantly outperforms state-of-the-art offline RL methods.
Authors: Dayin Chen, Xiaodan Shi, Haoran Zhang, Xuan Song, Dongxiao Zhang, Yuntian Chen, Jinyue Yan
Abstract: Enhancing the energy efficiency of buildings significantly relies on monitoring indoor ambient temperature. The potential limitations of conventional temperature measurement techniques, together with the omnipresence of smartphones, have redirected researchers' attention towards the exploration of phone-based ambient temperature estimation technology. Nevertheless, numerous obstacles remain to be addressed in order to achieve a practical implementation of this technology. This study proposes a distributed phone-based ambient temperature estimation system which enables collaboration between multiple phones to accurately measure the ambient temperature in each small area of an indoor space. Besides, it offers a secure, efficient, and cost-effective training strategy to train a new estimation model for each newly added phone, eliminating the need for manual collection of labeled data. This innovative training strategy can yield a high-performing estimation model for a new phone with just 5 data points, requiring only a few iterations. Meanwhile, by crowdsourcing, our system automatically provides accurate inferred labels for all newly collected data. We also highlight the potential of integrating federated learning into our system to ensure privacy protection at the end of this study. We believe this study has the potential to advance the practical application of phone-based ambient temperature measurement, facilitating energy-saving efforts in buildings.
Authors: Ren\'e Heinrich, Bernhard Sick, Christoph Scholz
Abstract: Recently, scientists have proposed several deep learning models to monitor the diversity of bird species. These models can detect bird species with high accuracy by analyzing acoustic signals. However, traditional deep learning algorithms are black-box models that provide no insight into their decision-making process. For domain experts, such as ornithologists, it is crucial that these models are not only efficient, but also interpretable in order to be used as assistive tools. In this study, we present an adaption of the Prototypical Part Network (ProtoPNet) for audio classification that provides inherent interpretability through its model architecture. Our approach is based on a ConvNeXt backbone architecture for feature extraction and learns prototypical patterns for each bird species using spectrograms of the training data. Classification of new data is done by comparison with these prototypes in latent space, which simultaneously serve as easily understandable explanations for the model's decisions.
Authors: Sean O'Hagan, Jungeum Kim, Veronika Rockova
Abstract: In generative models with obscured likelihood, Approximate Bayesian Computation (ABC) is often the tool of last resort for inference. However, ABC demands many prior parameter trials to keep only a small fraction that passes an acceptance test. To accelerate ABC rejection sampling, this paper develops a self-aware framework that learns from past trials and errors. We apply recursive partitioning classifiers on the ABC lookup table to sequentially refine high-likelihood regions into boxes. Each box is regarded as an arm in a binary bandit problem treating ABC acceptance as a reward. Each arm has a proclivity for being chosen for the next ABC evaluation, depending on the prior distribution and past rejections. The method places more splits in those areas where the likelihood resides, shying away from low-probability regions destined for ABC rejections. We provide two versions: (1) ABC-Tree for posterior sampling, and (2) ABC-MAP for maximum a posteriori estimation. We demonstrate accurate ABC approximability at much lower simulation cost. We justify the use of our tree-based bandit algorithms with nearly optimal regret bounds. Finally, we successfully apply our approach to the problem of masked image classification using deep generative models.
Authors: Jinhui Yuan, Shan Lu, Peibo Duan, Jieyue He
Abstract: Recently, heterogeneous graph neural networks (HGNNs) have achieved impressive success in representation learning by capturing long-range dependencies and heterogeneity at the node level. However, few existing studies have delved into the utilization of node attributes in heterogeneous information networks (HINs). In this paper, we investigate the impact of inter-node attribute disparities on HGNNs performance within the benchmark task, i.e., node classification, and empirically find that typical models exhibit significant performance decline when classifying nodes whose attributes markedly differ from their neighbors. To alleviate this issue, we propose a novel Attribute-Guided heterogeneous Information Networks representation learning model with Transformer (AGHINT), which allows a more effective aggregation of neighbor node information under the guidance of attributes. Specifically, AGHINT transcends the constraints of the original graph structure by directly integrating higher-order similar neighbor features into the learning process and modifies the message-passing mechanism between nodes based on their attribute disparities. Extensive experimental results on three real-world heterogeneous graph benchmarks with target node attributes demonstrate that AGHINT outperforms the state-of-the-art.
Authors: Kafeng Wang, Jianfei Chen, He Li, Zhenpeng Mi, Jun Zhu
Abstract: Diffusion models have been extensively used in data generation tasks and are recognized as one of the best generative models. However, their time-consuming deployment, long inference time, and requirements on large memory limit their application on mobile devices. In this paper, we propose a method based on the improved Straight-Through Estimator to improve the deployment efficiency of diffusion models. Specifically, we add sparse masks to the Convolution and Linear layers in a pre-trained diffusion model, then use design progressive sparsity for model training in the fine-tuning stage, and switch the inference mask on and off, which supports a flexible choice of sparsity during inference according to the FID and MACs requirements. Experiments on four datasets conducted on a state-of-the-art Transformer-based diffusion model demonstrate that our method reduces MACs by $50\%$ while increasing FID by only 1.5 on average. Under other MACs conditions, the FID is also lower than 1$\sim$137 compared to other methods.
Authors: Mingda Xu, Peisheng Qian, Ziyuan Zhao, Zeng Zeng, Jianguo Chen, Weide Liu, Xulei Yang
Abstract: Protein-protein interactions (PPIs) play key roles in a broad range of biological processes. Numerous strategies have been proposed for predicting PPIs, and among them, graph-based methods have demonstrated promising outcomes owing to the inherent graph structure of PPI networks. This paper reviews various graph-based methodologies, and discusses their applications in PPI prediction. We classify these approaches into two primary groups based on their model structures. The first category employs Graph Neural Networks (GNN) or Graph Convolutional Networks (GCN), while the second category utilizes Graph Attention Networks (GAT), Graph Auto-Encoders and Graph-BERT. We highlight the distinctive methodologies of each approach in managing the graph-structured data inherent in PPI networks and anticipate future research directions in this domain.
Authors: Anton Bushuiev, Roman Bushuiev, Jiri Sedlar, Tomas Pluskal, Jiri Damborsky, Stanislav Mazurenko, Josef Sivic
Abstract: In recent years, there has been remarkable progress in machine learning for protein-protein interactions. However, prior work has predominantly focused on improving learning algorithms, with less attention paid to evaluation strategies and data preparation. Here, we demonstrate that further development of machine learning methods may be hindered by the quality of existing train-test splits. Specifically, we find that commonly used splitting strategies for protein complexes, based on protein sequence or metadata similarity, introduce major data leakage. This may result in overoptimistic evaluation of generalization, as well as unfair benchmarking of the models, biased towards assessing their overfitting capacity rather than practical utility. To overcome the data leakage, we recommend constructing data splits based on 3D structural similarity of protein-protein interfaces and suggest corresponding algorithms. We believe that addressing the data leakage problem is critical for further progress in this research area.
Authors: Qiuyi Hong, Fanlin Meng, Felipe Maldonado
Abstract: In the context of increasing demands for long-term multi-energy load forecasting in real-world applications, this paper introduces Patchformer, a novel model that integrates patch embedding with encoder-decoder Transformer-based architectures. To address the limitation in existing Transformer-based models, which struggle with intricate temporal patterns in long-term forecasting, Patchformer employs patch embedding, which predicts multivariate time-series data by separating it into multiple univariate data and segmenting each of them into multiple patches. This method effectively enhances the model's ability to capture local and global semantic dependencies. The numerical analysis shows that the Patchformer obtains overall better prediction accuracy in both multivariate and univariate long-term forecasting on the novel Multi-Energy dataset and other benchmark datasets. In addition, the positive effect of the interdependence among energy-related products on the performance of long-term time-series forecasting across Patchformer and other compared models is discovered, and the superiority of the Patchformer against other models is also demonstrated, which presents a significant advancement in handling the interdependence and complexities of long-term multi-energy forecasting. Lastly, Patchformer is illustrated as the only model that follows the positive correlation between model performance and the length of the past sequence, which states its ability to capture long-range past local semantic information.
Authors: Pietro Recalcati, Fabio Garcea, Luca Piano, Fabrizio Lamberti, Lia Morra
Abstract: Deep neural networks are increasingly used in a wide range of technologies and services, but remain highly susceptible to out-of-distribution (OOD) samples, that is, drawn from a different distribution than the original training set. A common approach to address this issue is to endow deep neural networks with the ability to detect OOD samples. Several benchmarks have been proposed to design and validate OOD detection techniques. However, many of them are based on far-OOD samples drawn from very different distributions, and thus lack the complexity needed to capture the nuances of real-world scenarios. In this work, we introduce a comprehensive benchmark for OOD detection, based on ImageNet and Places365, that assigns individual classes as in-distribution or out-of-distribution depending on the semantic similarity with the training set. Several techniques can be used to determine which classes should be considered in-distribution, yielding benchmarks with varying properties. Experimental results on different OOD detection techniques show how their measured efficacy depends on the selected benchmark and how confidence-based techniques may outperform classifier-based ones on near-OOD samples.
Authors: Ubaid Azam, Imran Razzak, Shelly Vishwakarma, Hakim Hacid, Dell Zhang, Shoaib Jameel
Abstract: Predicting legal judgments with reliable confidence is paramount for responsible legal AI applications. While transformer-based deep neural networks (DNNs) like BERT have demonstrated promise in legal tasks, accurately assessing their prediction confidence remains crucial. We present a novel Bayesian approach called BayesJudge that harnesses the synergy between deep learning and deep Gaussian Processes to quantify uncertainty through Bayesian kernel Monte Carlo dropout. Our method leverages informative priors and flexible data modelling via kernels, surpassing existing methods in both predictive accuracy and confidence estimation as indicated through brier score. Extensive evaluations of public legal datasets showcase our model's superior performance across diverse tasks. We also introduce an optimal solution to automate the scrutiny of unreliable predictions, resulting in a significant increase in the accuracy of the model's predictions by up to 27\%. By empowering judges and legal professionals with more reliable information, our work paves the way for trustworthy and transparent legal AI applications that facilitate informed decisions grounded in both knowledge and quantified uncertainty.
Authors: Ubaid Azam, Imran Razzak, Shelly Vishwakarma, Hakim Hacid, Dell Zhang, Shoaib Jameel
Abstract: The growing capabilities of AI raise questions about their trustworthiness in healthcare, particularly due to opaque decision-making and limited data availability. This paper proposes a novel approach to address these challenges, introducing a Bayesian Monte Carlo Dropout model with kernel modelling. Our model is designed to enhance reliability on small medical datasets, a crucial barrier to the wider adoption of AI in healthcare. This model leverages existing language models for improved effectiveness and seamlessly integrates with current workflows. We demonstrate significant improvements in reliability, even with limited data, offering a promising step towards building trust in AI-driven medical predictions and unlocking its potential to improve patient care.
Authors: Kuai Dai, Xutao Li, Junying Fang, Yunming Ye, Demin Yu, Di Xian, Danyu Qin
Abstract: Convection (thunderstorm) develops rapidly within hours and is highly destructive, posing a significant challenge for nowcasting and resulting in substantial losses to nature and society. After the emergence of artificial intelligence (AI)-based methods, convection nowcasting has experienced rapid advancements, with its performance surpassing that of physics-based numerical weather prediction and other conventional approaches. However, the lead time and coverage of it still leave much to be desired and hardly meet the needs of disaster emergency response. Here, we propose a deep diffusion model of satellite (DDMS) to establish an AI-based convection nowcasting system. On one hand, it employs diffusion processes to effectively simulate complicated spatiotemporal evolution patterns of convective clouds, significantly improving the forecast lead time. On the other hand, it utilizes geostationary satellite brightness temperature data, thereby achieving planetary-scale forecast coverage. During long-term tests and objective validation based on the FengYun-4A satellite, our system achieves, for the first time, effective convection nowcasting up to 4 hours, with broad coverage (about 20,000,000 km2), remarkable accuracy, and high resolution (15 minutes; 4 km). Its performance reaches a new height in convection nowcasting compared to the existing models. In terms of application, our system operates efficiently (forecasting 4 hours of convection in 8 minutes), and is highly transferable with the potential to collaborate with multiple satellites for global convection nowcasting. Furthermore, our results highlight the remarkable capabilities of diffusion models in convective clouds forecasting, as well as the significant value of geostationary satellite data when empowered by AI technologies.
Authors: Shiv Shankar, Ritwik Sinha, Yash Chandak, Saayan Mitra, Madalina Fiterau
Abstract: A/B tests are often required to be conducted on subjects that might have social connections. For e.g., experiments on social media, or medical and social interventions to control the spread of an epidemic. In such settings, the SUTVA assumption for randomized-controlled trials is violated due to network interference, or spill-over effects, as treatments to group A can potentially also affect the control group B. When the underlying social network is known exactly, prior works have demonstrated how to conduct A/B tests adequately to estimate the global average treatment effect (GATE). However, in practice, it is often impossible to obtain knowledge about the exact underlying network. In this paper, we present UNITE: a novel estimator that relax this assumption and can identify GATE while only relying on knowledge of the superset of neighbors for any subject in the graph. Through theoretical analysis and extensive experiments, we show that the proposed approach performs better in comparison to standard estimators.
Authors: Roumen Nikolaev Popov
Abstract: We propose an analytical solution for approximating the gradient of the Evidence Lower Bound (ELBO) in variational inference problems where the statistical model is a Bayesian network consisting of observations drawn from a mixture of a Gaussian distribution embedded in unrelated clutter, known as the clutter problem. The method employs the reparameterization trick to move the gradient operator inside the expectation and relies on the assumption that, because the likelihood factorizes over the observed data, the variational distribution is generally more compactly supported than the Gaussian distribution in the likelihood factors. This allows efficient local approximation of the individual likelihood factors, which leads to an analytical solution for the integral defining the gradient expectation. We integrate the proposed gradient approximation as the expectation step in an EM (Expectation Maximization) algorithm for maximizing ELBO and test against classical deterministic approaches in Bayesian inference, such as the Laplace approximation, Expectation Propagation and Mean-Field Variational Inference. The proposed method demonstrates good accuracy and rate of convergence together with linear computational complexity.
Authors: Bin Liu, Siqi Wu, Jin Wang, Xin Deng, Ao Zhou
Abstract: The discovery of drug-target interactions (DTIs) plays a crucial role in pharmaceutical development. The deep learning model achieves more accurate results in DTI prediction due to its ability to extract robust and expressive features from drug and target chemical structures. However, existing deep learning methods typically generate drug features via aggregating molecular atom representations, ignoring the chemical properties carried by motifs, i.e., substructures of the molecular graph. The atom-drug double-level molecular representation learning can not fully exploit structure information and fails to interpret the DTI mechanism from the motif perspective. In addition, sequential model-based target feature extraction either fuses limited contextual information or requires expensive computational resources. To tackle the above issues, we propose a hierarchical graph representation learning-based DTI prediction method (HiGraphDTI). Specifically, HiGraphDTI learns hierarchical drug representations from triple-level molecular graphs to thoroughly exploit chemical information embedded in atoms, motifs, and molecules. Then, an attentional feature fusion module incorporates information from different receptive fields to extract expressive target features.Last, the hierarchical attention mechanism identifies crucial molecular segments, which offers complementary views for interpreting interaction mechanisms. The experiment results not only demonstrate the superiority of HiGraphDTI to the state-of-the-art methods, but also confirm the practical ability of our model in interaction interpretation and new DTI discovery.
Authors: Chung-Yiu Yau, Hoi-To Wai, Parameswaran Raman, Soumajyoti Sarkar, Mingyi Hong
Abstract: A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the partition function. In this paper, we propose an Efficient Markov Chain Monte Carlo negative sampling method for Contrastive learning (EMC$^2$). We follow the global contrastive learning loss as introduced in SogCLR, and propose EMC$^2$ which utilizes an adaptive Metropolis-Hastings subroutine to generate hardness-aware negative samples in an online fashion during the optimization. We prove that EMC$^2$ finds an $\mathcal{O}(1/\sqrt{T})$-stationary point of the global contrastive loss in $T$ iterations. Compared to prior works, EMC$^2$ is the first algorithm that exhibits global convergence (to stationarity) regardless of the choice of batch size while exhibiting low computation and memory cost. Numerical experiments validate that EMC$^2$ is effective with small batch training and achieves comparable or better performance than baseline algorithms. We report the results for pre-training image encoders on STL-10 and Imagenet-100.
Authors: Eric Yeats, Cameron Darwin, Eduardo Ortega, Frank Liu, Hai Li
Abstract: We leverage diffusion models to study the robustness-performance tradeoff of robust classifiers. Our approach introduces a simple, pretrained diffusion method to generate low-norm counterfactual examples (CEs): semantically altered data which results in different true class membership. We report that the confidence and accuracy of robust models on their clean training data are associated with the proximity of the data to their CEs. Moreover, robust models perform very poorly when evaluated on the CEs directly, as they become increasingly invariant to the low-norm, semantic changes brought by CEs. The results indicate a significant overlap between non-robust and semantic features, countering the common assumption that non-robust features are not interpretable.
Authors: Zehao Zhou
Abstract: Distributed Distributional DrQ is a model-free and off-policy RL algorithm for continuous control tasks based on the state and observation of the agent, which is an actor-critic method with the data-augmentation and the distributional perspective of critic value function. Aim to learn to control the agent and master some tasks in a high-dimensional continuous space. DrQ-v2 uses DDPG as the backbone and achieves out-performance in various continuous control tasks. Here Distributed Distributional DrQ uses Distributed Distributional DDPG as the backbone, and this modification aims to achieve better performance in some hard continuous control tasks through the better expression ability of distributional value function and distributed actor policies.
Authors: Jinmei Liu, Wenbin Li, Xiangyu Yue, Shilin Zhang, Chunlin Chen, Zhi Wang
Abstract: We study continual offline reinforcement learning, a practical paradigm that facilitates forward transfer and mitigates catastrophic forgetting to tackle sequential offline tasks. We propose a dual generative replay framework that retains previous knowledge by concurrent replay of generated pseudo-data. First, we decouple the continual learning policy into a diffusion-based generative behavior model and a multi-head action evaluation model, allowing the policy to inherit distributional expressivity for encompassing a progressive range of diverse behaviors. Second, we train a task-conditioned diffusion model to mimic state distributions of past tasks. Generated states are paired with corresponding responses from the behavior generator to represent old tasks with high-fidelity replayed samples. Finally, by interleaving pseudo samples with real ones of the new task, we continually update the state and behavior generators to model progressively diverse behaviors, and regularize the multi-head critic via behavior cloning to mitigate forgetting. Experiments demonstrate that our method achieves better forward transfer with less forgetting, and closely approximates the results of using previous ground-truth data due to its high-fidelity replay of the sample space. Our code is available at \href{https://github.com/NJU-RL/CuGRO}{https://github.com/NJU-RL/CuGRO}.
URLs: https://github.com/NJU-RL/CuGRO, https://github.com/NJU-RL/CuGRO
Authors: Sree Pooja Akula, Mukund Telukunta, Venkata Sriram Siddhardh Nadendla
Abstract: Drivers in ridesharing platforms exhibit cognitive atrophy and fatigue as they accept ride offers along the day, which can have a significant impact on the overall efficiency of the ridesharing platform. In contrast to the current literature which focuses primarily on modeling and learning driver's preferences across different ride offers, this paper proposes a novel Dynamic Discounted Satisficing (DDS) heuristic to model and predict driver's sequential ride decisions during a given shift. Based on DDS heuristic, a novel stochastic neural network with random activations is proposed to model DDS heuristic and predict the final decision made by a given driver. The presence of random activations in the network necessitated the development of a novel training algorithm called Sampling-Based Back Propagation Through Time (SBPTT), where gradients are computed for independent instances of neural networks (obtained via sampling the distribution of activation threshold) and aggregated to update the network parameters. Using both simulation experiments as well as on real Chicago taxi dataset, this paper demonstrates the improved performance of the proposed approach, when compared to state-of-the-art methods.
Authors: Adarsha Balaji, Ramyad Hadidi, Gregory Kollmer, Mohammed E. Fouda, Prasanna Balaprakash
Abstract: X-ray and electron diffraction-based microscopy use bragg peak detection and ptychography to perform 3-D imaging at an atomic resolution. Typically, these techniques are implemented using computationally complex tasks such as a Psuedo-Voigt function or solving a complex inverse problem. Recently, the use of deep neural networks has improved the existing state-of-the-art approaches. However, the design and development of the neural network models depends on time and labor intensive tuning of the model by application experts. To that end, we propose a hyperparameter (HPS) and neural architecture search (NAS) approach to automate the design and optimization of the neural network models for model size, energy consumption and throughput. We demonstrate the improved performance of the auto-tuned models when compared to the manually tuned BraggNN and PtychoNN benchmark. We study and demonstrate the importance of the exploring the search space of tunable hyperparameters in enhancing the performance of bragg peak detection and ptychographic reconstruction. Our NAS and HPS of (1) BraggNN achieves a 31.03\% improvement in bragg peak detection accuracy with a 87.57\% reduction in model size, and (2) PtychoNN achieves a 16.77\% improvement in model accuracy and a 12.82\% reduction in model size when compared to the baseline PtychoNN model. When inferred on the Orin-AGX platform, the optimized Braggnn and Ptychonn models demonstrate a 10.51\% and 9.47\% reduction in inference latency and a 44.18\% and 15.34\% reduction in energy consumption when compared to their respective baselines, when inferred in the Orin-AGX edge platform.
Authors: Hao-Lun Hsu, Weixin Wang, Miroslav Pajic, Pan Xu
Abstract: We present the first study on provably efficient randomized exploration in cooperative multi-agent reinforcement learning (MARL). We propose a unified algorithm framework for randomized exploration in parallel Markov Decision Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE and CoopTS-LMC, incorporating the perturbed-history exploration (PHE) strategy and the Langevin Monte Carlo exploration (LMC) strategy respectively, which are flexible in design and easy to implement in practice. For a special class of parallel MDPs where the transition is (approximately) linear, we theoretically prove that both CoopTS-PHE and CoopTS-LMC achieve a $\widetilde{\mathcal{O}}(d^{3/2}H^2\sqrt{MK})$ regret bound with communication complexity $\widetilde{\mathcal{O}}(dHM^2)$, where $d$ is the feature dimension, $H$ is the horizon length, $M$ is the number of agents, and $K$ is the number of episodes. This is the first theoretical result for randomized exploration in cooperative MARL. We evaluate our proposed method on multiple parallel RL environments, including a deep exploration problem (\textit{i.e.,} $N$-chain), a video game, and a real-world problem in energy systems. Our experimental results support that our framework can achieve better performance, even under conditions of misspecified transition models. Additionally, we establish a connection between our unified framework and the practical application of federated learning.
Authors: Hieu Le, Zhenhua He, Mai Le, Dhruva K. Chakravorty, Lisa M. Perez, Akhil Chilumuru, Yan Yao, Jiefu Chen
Abstract: The discoveries in this paper show that Intelligence Processing Units (IPUs) offer a viable accelerator alternative to GPUs for machine learning (ML) applications within the fields of materials science and battery research. We investigate the process of migrating a model from GPU to IPU and explore several optimization techniques, including pipelining and gradient accumulation, aimed at enhancing the performance of IPU-based models. Furthermore, we have effectively migrated a specialized model to the IPU platform. This model is employed for predicting effective conductivity, a parameter crucial in ion transport processes, which govern the performance of multiple charge and discharge cycles of batteries. The model utilizes a Convolutional Neural Network (CNN) architecture to perform prediction tasks for effective conductivity. The performance of this model on the IPU is found to be comparable to its execution on GPUs. We also analyze the utilization and performance of Graphcore's Bow IPU. Through benchmark tests, we observe significantly improved performance with the Bow IPU when compared to its predecessor, the Colossus IPU.
Authors: Weitong Zhang, Zhiyuan Fan, Jiafan He, Quanquan Gu
Abstract: We study the constant regret guarantees in reinforcement learning (RL). Our objective is to design an algorithm that incurs only finite regret over infinite episodes with high probability. We introduce an algorithm, Cert-LSVI-UCB, for misspecified linear Markov decision processes (MDPs) where both the transition kernel and the reward function can be approximated by some linear function up to misspecification level $\zeta$. At the core of Cert-LSVI-UCB is an innovative certified estimator, which facilitates a fine-grained concentration analysis for multi-phase value-targeted regression, enabling us to establish an instance-dependent regret bound that is constant w.r.t. the number of episodes. Specifically, we demonstrate that for an MDP characterized by a minimal suboptimality gap $\Delta$, Cert-LSVI-UCB has a cumulative regret of $\tilde{\mathcal{O}}(d^3H^5/\Delta)$ with high probability, provided that the misspecification level $\zeta$ is below $\tilde{\mathcal{O}}(\Delta / (\sqrt{d}H^2))$. Remarkably, this regret bound remains constant relative to the number of episodes $K$. To the best of our knowledge, Cert-LSVI-UCB is the first algorithm to achieve a constant, instance-dependent, high-probability regret bound in RL with linear function approximation for infinite runs without relying on prior distribution assumptions. This not only highlights the robustness of Cert-LSVI-UCB to model misspecification but also introduces novel algorithmic designs and analytical techniques of independent interest.
Authors: Saeid Pourmand, Wyatt D. Whiting, Alireza Aghasi, Nicholas F. Marshall
Abstract: This paper studies the geometry of binary hyperdimensional computing (HDC), a computational scheme in which data are encoded using high-dimensional binary vectors. We establish a result about the similarity structure induced by the HDC binding operator and show that the Laplace kernel naturally arises in this setting, motivating our new encoding method Laplace-HDC, which improves upon previous methods. We describe how our results indicate limitations of binary HDC in encoding spatial information from images and discuss potential solutions, including using Haar convolutional features and the definition of a translation-equivariant HDC encoding. Several numerical experiments highlighting the improved accuracy of Laplace-HDC in contrast to alternative methods are presented. We also numerically study other aspects of the proposed framework such as robustness and the underlying translation-equivariant encoding.
Authors: M\'elodie Monod, Peter Krusche, Qian Cao, Berkman Sahiner, Nicholas Petrick, David Ohlssen, Thibaud Coroller
Abstract: TorchSurv is a Python package that serves as a companion tool to perform deep survival modeling within the PyTorch environment. Unlike existing libraries that impose specific parametric forms, TorchSurv enables the use of custom PyTorch-based deep survival models. With its lightweight design, minimal input requirements, full PyTorch backend, and freedom from restrictive survival model parameterizations, TorchSurv facilitates efficient deep survival model implementation and is particularly beneficial for high-dimensional and complex input data scenarios.
Authors: Zhuo Chen, Jacob McCarran, Esteban Vizcaino, Marin Solja\v{c}i\'c, Di Luo
Abstract: Partial differential equations (PDEs) are instrumental for modeling dynamical systems in science and engineering. The advent of neural networks has initiated a significant shift in tackling these complexities though challenges in accuracy persist, especially for initial value problems. In this paper, we introduce the $\textit{Time-Evolving Natural Gradient (TENG)}$, generalizing time-dependent variational principles and optimization-based time integration, leveraging natural gradient optimization to obtain high accuracy in neural-network-based PDE solutions. Our comprehensive development includes algorithms like TENG-Euler and its high-order variants, such as TENG-Heun, tailored for enhanced precision and efficiency. TENG's effectiveness is further validated through its performance, surpassing current leading methods and achieving machine precision in step-by-step optimizations across a spectrum of PDEs, including the heat equation, Allen-Cahn equation, and Burgers' equation.
Authors: Qiwei Di, Jiafan He, Quanquan Gu
Abstract: Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM). However, the effectiveness of this approach can be influenced by adversaries, who may intentionally provide misleading preferences to manipulate the output in an undesirable or harmful direction. To tackle this challenge, we study a specific model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary. We propose an algorithm namely robust contextual dueling bandit (\algo), which is based on uncertainty-weighted maximum likelihood estimation. Our algorithm achieves an $\tilde O(d\sqrt{T}+dC)$ regret bound, where $T$ is the number of rounds, $d$ is the dimension of the context, and $ 0 \le C \le T$ is the total number of adversarial feedback. We also prove a lower bound to show that our regret bound is nearly optimal, both in scenarios with and without ($C=0$) adversarial feedback. Additionally, we conduct experiments to evaluate our proposed algorithm against various types of adversarial feedback. Experimental results demonstrate its superiority over the state-of-the-art dueling bandit algorithms in the presence of adversarial feedback.
Authors: Patrick Geitner
Abstract: New technology for energy storage is necessary for the large-scale adoption of renewable energy sources like wind and solar. The ability to discover suitable catalysts is crucial for making energy storage more cost-effective and scalable. The Open Catalyst Project aims to apply advances in graph neural networks (GNNs) to accelerate progress in catalyst discovery, replacing Density Functional Theory-based (DFT) approaches that are computationally burdensome. Current approaches involve scaling GNNs to over 1 billion parameters, pushing the problem out of reach for a vast majority of machine learning practitioner around the world. This study aims to evaluate the performance and insights gained from using more lightweight approaches for this task that are more approachable for smaller teams to encourage participation from individuals from diverse backgrounds. By implementing robust design patterns like geometric and symmetric message passing, we were able to train a GNN model that reached a MAE of 0.0748 in predicting the per-atom forces of adsorbate-surface interactions, rivaling established model architectures like SchNet and DimeNet++ while using only a fraction of trainable parameters.
Authors: Subhadarsi Nayak, Hrithwik Shalu, Joseph Stember
Abstract: Tropospheric ozone, known as a concerning air pollutant, has been associated with health issues including asthma, bronchitis, and impaired lung function. The rates at which peroxy radicals react with NO play a critical role in the overall formation and depletion of tropospheric ozone. However, obtaining comprehensive kinetic data for these reactions remains challenging. Traditional approaches to determine rate constants are costly and technically intricate. Fortunately, the emergence of machine learning-based models offers a less resource and time-intensive alternative for acquiring kinetics information. In this study, we leveraged deep reinforcement learning to predict ranges of rate constants (\textit{k}) with exceptional accuracy, achieving a testing set accuracy of 100%. To analyze reactivity trends based on the molecular structure of peroxy radicals, we employed 51 global descriptors as input parameters. These descriptors were derived from optimized minimum energy geometries of peroxy radicals using the quantum composite G3B3 method. Through the application of Integrated Gradients (IGs), we gained valuable insights into the significance of the various descriptors in relation to reaction rates. We successfully validated and contextualized our findings by conducting cross-comparisons with established trends in the existing literature. These results establish a solid foundation for pioneering advancements in chemistry, where computer analysis serves as an inspirational source driving innovation.
Authors: Simon Eisenmann, Daniel Hein, Steffen Udluft, Thomas A. Runkler
Abstract: This paper presents the first algorithm for model-based offline quantum reinforcement learning and demonstrates its functionality on the cart-pole benchmark. The model and the policy to be optimized are each implemented as variational quantum circuits. The model is trained by gradient descent to fit a pre-recorded data set. The policy is optimized with a gradient-free optimization scheme using the return estimate given by the model as the fitness function. This model-based approach allows, in principle, full realization on a quantum computer during the optimization phase and gives hope that a quantum advantage can be achieved as soon as sufficiently powerful quantum computers are available.
Authors: Yu Wang, Shu-Rui Zhang, Aidin Momtaz, Rahim Moradi, Fatemeh Rastegarnia, Narek Sahakyan, Soroush Shakeri, Liang Li
Abstract: ChatGPT has been the most talked-about concept in recent months, captivating both professionals and the general public alike, and has sparked discussions about the changes that artificial intelligence (AI) will bring to the world. As physicists and astrophysicists, we are curious about if scientific data can be correctly analyzed by large language models (LLMs) and yield accurate physics. In this article, we fine-tune the generative pre-trained transformer (GPT) model by the astronomical data from the observations of galaxies, quasars, stars, gamma-ray bursts (GRBs), and the simulations of black holes (BHs), the fine-tuned model demonstrates its capability to classify astrophysical phenomena, distinguish between two types of GRBs, deduce the redshift of quasars, and estimate BH parameters. We regard this as a successful test, marking the LLM's proven efficacy in scientific research. With the ever-growing volume of multidisciplinary data and the advancement of AI technology, we look forward to the emergence of a more fundamental and comprehensive understanding of our universe. This article also shares some interesting thoughts on data collection and AI design. Using the approach of understanding the universe - looking outward at data and inward for fundamental building blocks - as a guideline, we propose a method of series expansion for AI, suggesting ways to train and control AI that is smarter than humans.
Authors: Yogesh Verma, Markus Heinonen, Vikas Garg
Abstract: Climate and weather prediction traditionally relies on complex numerical simulations of atmospheric physics. Deep learning approaches, such as transformers, have recently challenged the simulation paradigm with complex network forecasts. However, they often act as data-driven black-box models that neglect the underlying physics and lack uncertainty quantification. We address these limitations with ClimODE, a spatiotemporal continuous-time process that implements a key principle of advection from statistical mechanics, namely, weather changes due to a spatial movement of quantities over time. ClimODE models precise weather evolution with value-conserving dynamics, learning global weather transport as a neural flow, which also enables estimating the uncertainty in predictions. Our approach outperforms existing data-driven methods in global and regional forecasting with an order of magnitude smaller parameterization, establishing a new state of the art.
Authors: Lisang Zhou, Meng Wang, Ning Zhou
Abstract: Distributed training can facilitate the processing of large medical image datasets, and improve the accuracy and efficiency of disease diagnosis while protecting patient privacy, which is crucial for achieving efficient medical image analysis and accelerating medical research progress. This paper presents an innovative approach to medical image classification, leveraging Federated Learning (FL) to address the dual challenges of data privacy and efficient disease diagnosis. Traditional Centralized Machine Learning models, despite their widespread use in medical imaging for tasks such as disease diagnosis, raise significant privacy concerns due to the sensitive nature of patient data. As an alternative, FL emerges as a promising solution by allowing the training of a collective global model across local clients without centralizing the data, thus preserving privacy. Focusing on the application of FL in Magnetic Resonance Imaging (MRI) brain tumor detection, this study demonstrates the effectiveness of the Federated Learning framework coupled with EfficientNet-B0 and the FedAvg algorithm in enhancing both privacy and diagnostic accuracy. Through a meticulous selection of preprocessing methods, algorithms, and hyperparameters, and a comparative analysis of various Convolutional Neural Network (CNN) architectures, the research uncovers optimal strategies for image classification. The experimental results reveal that EfficientNet-B0 outperforms other models like ResNet in handling data heterogeneity and achieving higher accuracy and lower loss, highlighting the potential of FL in overcoming the limitations of traditional models. The study underscores the significance of addressing data heterogeneity and proposes further research directions for broadening the applicability of FL in medical image analysis.
Authors: Zhenwei Huang, Wen Huang, Pratik Jawanpuria, Bamdev Mishra
Abstract: In recent years, federated learning (FL) has emerged as a prominent paradigm in distributed machine learning. Despite the partial safeguarding of agents' information within FL systems, a malicious adversary can potentially infer sensitive information through various means. In this paper, we propose a generic private FL framework defined on Riemannian manifolds (PriRFed) based on the differential privacy (DP) technique. We analyze the privacy guarantee while establishing the convergence properties. To the best of our knowledge, this is the first federated learning framework on Riemannian manifold with a privacy guarantee and convergence results. Numerical simulations are performed on synthetic and real-world datasets to showcase the efficacy of the proposed PriRFed approach.
Authors: Wojciech Czaja, Jeremiah Emidih, Brandon Kolstoe, Richard G. Spencer
Abstract: Hyperspectral imagery (HSI) is an established technique with an array of applications, but its use is limited due to both practical and technical issues associated with spectral devices. The goal of the ICASSP 2024 'Hyper-Skin' Challenge is to extract skin HSI from matching RGB images and an infrared band. To address this problem we propose a model using features of the scattering transform - a type of convolutional neural network with predefined filters. Our model matches and inverts those features, rather than the pixel values, reducing the complexity of matching while grouping similar features together, resulting in an improved learning process.
Authors: Ammar Ahmed Pallikonda Latheef, Alberto Santamaria-Pang, Craig K Jones, Haris I Sair
Abstract: Brain networks display a hierarchical organization, a complexity that poses a challenge for existing deep learning models, often structured as flat classifiers, leading to difficulties in interpretability and the 'black box' issue. To bridge this gap, we propose a novel architecture: a symbolic autoencoder informed by weak supervision and an Emergent Language (EL) framework. This model moves beyond traditional flat classifiers by producing hierarchical clusters and corresponding imagery, subsequently represented through symbolic sentences to improve the clinical interpretability of hierarchically organized data such as intrinsic brain networks, which can be characterized using resting-state fMRI images. Our innovation includes a generalized hierarchical loss function designed to ensure that both sentences and images accurately reflect the hierarchical structure of functional brain networks. This enables us to model functional brain networks from a broader perspective down to more granular details. Furthermore, we introduce a quantitative method to assess the hierarchical consistency of these symbolic representations. Our qualitative analyses show that our model successfully generates hierarchically organized, clinically interpretable images, a finding supported by our quantitative evaluations. We find that our best performing loss function leads to a hierarchical consistency of over 97% when identifying images corresponding to brain networks. This approach not only advances the interpretability of deep learning models in neuroimaging analysis but also represents a significant step towards modeling the intricate hierarchical nature of brain networks.
Authors: Shakeeb Murtaza, Soufiane Belharbi, Marco Pedersoli, Eric Granger
Abstract: Weakly Supervised Object Localization (WSOL) allows for training deep learning models for classification and localization, using only global class-level labels. The lack of bounding box (bbox) supervision during training represents a considerable challenge for hyper-parameter search and model selection. Earlier WSOL works implicitly observed localization performance over a test set which leads to biased performance evaluation. More recently, a better WSOL protocol has been proposed, where a validation set with bbox annotations is held out for model selection. Although it does not rely on the test set, this protocol is unrealistic since bboxes are not available in real-world applications, and when available, it is better to use them directly to fit model weights. Our initial empirical analysis shows that the localization performance of a model declines significantly when using only image-class labels for model selection (compared to using bounding-box annotations). This suggests that adding bounding-box labels is preferable for selecting the best model for localization. In this paper, we introduce a new WSOL validation protocol that provides a localization signal without the need for manual bbox annotations. In particular, we leverage noisy pseudo boxes from an off-the-shelf ROI proposal generator such as Selective-Search, CLIP, and RPN pretrained models for model selection. Our experimental results with several WSOL methods on ILSVRC and CUB-200-2011 datasets show that our noisy boxes allow selecting models with performance close to those selected using ground truth boxes, and better than models selected using only image-class labels.
Authors: Ricard Puig i Valls, Marc Drudis, Supanut Thanasilp, Zo\"e Holmes
Abstract: The barren plateau phenomenon, characterized by loss gradients that vanish exponentially with system size, poses a challenge to scaling variational quantum algorithms. Here we explore the potential of warm starts, whereby one initializes closer to a solution in the hope of enjoying larger loss variances. Focusing on an iterative variational method for learning shorter-depth circuits for quantum real and imaginary time evolution we conduct a case study to elucidate the potential and limitations of warm starts. We start by proving that the iterative variational algorithm will exhibit substantial (at worst vanishing polynomially in system size) gradients in a small region around the initializations at each time-step. Convexity guarantees for these regions are then established, suggesting trainability for polynomial size time-steps. However, our study highlights scenarios where a good minimum shifts outside the region with trainability guarantees. Our analysis leaves open the question whether such minima jumps necessitate optimization across barren plateau landscapes or whether there exist gradient flows, i.e., fertile valleys away from the plateau with substantial gradients, that allow for training.
Authors: Ming Xiang, Stratis Ioannidis, Edmund Yeh, Carlee Joe-Wong, Lili Su
Abstract: Federated learning is a popular distributed learning approach for training a machine learning model without disclosing raw data. It consists of a parameter server and a possibly large collection of clients (e.g., in cross-device federated learning) that may operate in congested and changing environments. In this paper, we study federated learning in the presence of stochastic and dynamic communication failures wherein the uplink between the parameter server and client $i$ is on with unknown probability $p_i^t$ in round $t$. Furthermore, we allow the dynamics of $p_i^t$ to be arbitrary. We first demonstrate that when the $p_i^t$'s vary across clients, the most widely adopted federated learning algorithm, Federated Average (FedAvg), experiences significant bias. To address this observation, we propose Federated Postponed Broadcast (FedPBC), a simple variant of FedAvg. FedPBC differs from FedAvg in that the parameter server postpones broadcasting the global model till the end of each round. Despite uplink failures, we show that FedPBC converges to a stationary point of the original non-convex objective. On the technical front, postponing the global model broadcasts enables implicit gossiping among the clients with active links in round $t$. Despite the time-varying nature of $p_i^t$, we can bound the perturbation of the global model dynamics using techniques to control gossip-type information mixing errors. Extensive experiments have been conducted on real-world datasets over diversified unreliable uplink patterns to corroborate our analysis.
Authors: Amit Tewari
Abstract: A contract is a type of legal document commonly used in organizations. Contract review is an integral and repetitive process to avoid business risk and liability. Contract analysis requires the identification and classification of key provisions and paragraphs within an agreement. Identification and validation of contract clauses can be a time-consuming and challenging task demanding the services of trained and expensive lawyers, paralegals or other legal assistants. Classification of legal provisions in contracts using artificial intelligence and natural language processing is complex due to the requirement of domain-specialized legal language for model training and the scarcity of sufficient labeled data in the legal domain. Using general-purpose models is not effective in this context due to the use of specialized legal vocabulary in contracts which may not be recognized by a general model. To address this problem, we propose the use of a pre-trained large language model which is subsequently calibrated on legal taxonomy. We propose LegalPro-BERT, a BERT transformer architecture model that we fine-tune to efficiently handle classification task for legal provisions. We conducted experiments to measure and compare metrics with current benchmark results. We found that LegalPro-BERT outperforms the previous benchmark used for comparison in this research.
Authors: Immanuel Bomze, Federico D'Onofrio, Laura Palagi, Bo Peng
Abstract: In this paper, we study the embedded feature selection problem in linear Support Vector Machines (SVMs), in which a cardinality constraint is employed, leading to a fully explainable selection model. The problem is NP-hard due to the presence of the cardinality constraint, even though the original linear SVM amounts to a problem solvable in polynomial time. To handle the hard problem, we first introduce two mixed-integer formulations for which novel SDP relaxations are proposed. Exploiting the sparsity pattern of the relaxations, we decompose the problems and obtain equivalent relaxations in a much smaller cone, making the conic approaches scalable. To make the best usage of the decomposed relaxations, we propose heuristics using the information of its optimal solution. Moreover, an exact procedure is proposed by solving a sequence of mixed-integer decomposed SDPs. Numerical results on classical benchmarking datasets are reported, showing the efficiency and effectiveness of our approach.
Authors: Wenwen Lia, Chia-Yu Hsu, Sizhe Wang, Peter Kedron
Abstract: GeoAI has emerged as an exciting interdisciplinary research area that combines spatial theories and data with cutting-edge AI models to address geospatial problems in a novel, data-driven manner. While GeoAI research has flourished in the GIScience literature, its reproducibility and replicability (R&R), fundamental principles that determine the reusability, reliability, and scientific rigor of research findings, have rarely been discussed. This paper aims to provide an in-depth analysis of this topic from both computational and spatial perspectives. We first categorize the major goals for reproducing GeoAI research, namely, validation (repeatability), learning and adapting the method for solving a similar or new problem (reproducibility), and examining the generalizability of the research findings (replicability). Each of these goals requires different levels of understanding of GeoAI, as well as different methods to ensure its success. We then discuss the factors that may cause the lack of R&R in GeoAI research, with an emphasis on (1) the selection and use of training data; (2) the uncertainty that resides in the GeoAI model design, training, deployment, and inference processes; and more importantly (3) the inherent spatial heterogeneity of geospatial data and processes. We use a deep learning-based image analysis task as an example to demonstrate the results' uncertainty and spatial variance caused by different factors. The findings reiterate the importance of knowledge sharing, as well as the generation of a "replicability map" that incorporates spatial autocorrelation and spatial heterogeneity into consideration in quantifying the spatial replicability of GeoAI research.
Authors: Dylan J. Foster, Yanjun Han, Jian Qian, Alexander Rakhlin
Abstract: $ $The classical theory of statistical estimation aims to estimate a parameter of interest under data generated from a fixed design ("offline estimation"), while the contemporary theory of online learning provides algorithms for estimation under adaptively chosen covariates ("online estimation"). Motivated by connections between estimation and interactive decision making, we ask: is it possible to convert offline estimation algorithms into online estimation algorithms in a black-box fashion? We investigate this question from an information-theoretic perspective by introducing a new framework, Oracle-Efficient Online Estimation (OEOE), where the learner can only interact with the data stream indirectly through a sequence of offline estimators produced by a black-box algorithm operating on the stream. Our main results settle the statistical and computational complexity of online estimation in this framework. $\bullet$ Statistical complexity. We show that information-theoretically, there exist algorithms that achieve near-optimal online estimation error via black-box offline estimation oracles, and give a nearly-tight characterization for minimax rates in the OEOE framework. $\bullet$ Computational complexity. We show that the guarantees above cannot be achieved in a computationally efficient fashion in general, but give a refined characterization for the special case of conditional density estimation: computationally efficient online estimation via black-box offline estimation is possible whenever it is possible via unrestricted algorithms. Finally, we apply our results to give offline oracle-efficient algorithms for interactive decision making.
Authors: Neha Gupta, Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar
Abstract: Recent advances in language models (LMs) have led to significant improvements in quality on complex NLP tasks, but at the expense of increased inference costs. Cascading offers a simple strategy to achieve more favorable cost-quality tradeoffs: here, a small model is invoked for most "easy" instances, while a few "hard" instances are deferred to the large model. While the principles underpinning cascading are well-studied for classification tasks - with deferral based on predicted class uncertainty favored theoretically and practically - a similar understanding is lacking for generative LM tasks. In this work, we initiate a systematic study of deferral rules for LM cascades. We begin by examining the natural extension of predicted class uncertainty to generative LM tasks, namely, the predicted sequence uncertainty. We show that this measure suffers from the length bias problem, either over- or under-emphasizing outputs based on their lengths. This is because LMs produce a sequence of uncertainty values, one for each output token; and moreover, the number of output tokens is variable across examples. To mitigate this issue, we propose to exploit the richer token-level uncertainty information implicit in generative LMs. We argue that naive predicted sequence uncertainty corresponds to a simple aggregation of these uncertainties. By contrast, we show that incorporating token-level uncertainty through learned post-hoc deferral rules can significantly outperform such simple aggregation strategies, via experiments on a range of natural language benchmarks with FLAN-T5 models. We further show that incorporating embeddings from the smaller model and intermediate layers of the larger model can give an additional boost in the overall cost-quality tradeoff.
Authors: Tvrtko Tadi\'c, Cassiano Becker, Jennifer Neville
Abstract: Random Projections have been widely used to generate embeddings for various graph tasks due to their computational efficiency. The majority of applications have been justified through the Johnson-Lindenstrauss Lemma. In this paper, we take a step further and investigate how well dot product and cosine similarity are preserved by Random Projections. Our analysis provides new theoretical results, identifies pathological cases, and tests them with numerical experiments. We find that, for nodes of lower or higher degrees, the method produces especially unreliable embeddings for the dot product, regardless of whether the adjacency or the (normalized version) transition is used. With respect to the statistical noise introduced by Random Projections, we show that cosine similarity produces remarkably more precise approximations.
Authors: Mohammed Latif Siddiq, Simantika Dristi, Joy Saha, Joanna C. S. Santos
Abstract: Large Language Models (LLMs) are gaining popularity among software engineers. A crucial aspect of developing effective code-generation LLMs is to evaluate these models using a robust benchmark. Evaluation benchmarks with quality issues can provide a false sense of performance. In this work, we conduct the first-of-its-kind study of the quality of prompts within benchmarks used to compare the performance of different code generation models. To conduct this study, we analyzed 3,566 prompts from 9 code generation benchmarks to identify quality issues in them. We also investigated whether fixing the identified quality issues in the benchmarks' prompts affects a model's performance. We also studied memorization issues of the evaluation dataset, which can put into question a benchmark's trustworthiness. We found that code generation evaluation benchmarks mainly focused on Python and coding exercises and had very limited contextual dependencies to challenge the model. These datasets and the developers' prompts suffer from quality issues like spelling and grammatical errors, unclear sentences to express developers' intent, and not using proper documentation style. Fixing all these issues in the benchmarks can lead to a better performance for Python code generation, but not a significant improvement was observed for Java code generation. We also found evidence that GPT-3.5-Turbo and CodeGen-2.5 models possibly have data contamination issues.
Authors: Amir Erfan Eshratifar, Joao V. B. Soares, Kapil Thadani, Shaunak Mishra, Mikhail Kuznetsov, Yueh-Ning Ku, Paloma de Juan
Abstract: Generating background scenes for salient objects plays a crucial role across various domains including creative design and e-commerce, as it enhances the presentation and context of subjects by integrating them into tailored environments. Background generation can be framed as a task of text-conditioned outpainting, where the goal is to extend image content beyond a salient object's boundaries on a blank background. Although popular diffusion models for text-guided inpainting can also be used for outpainting by mask inversion, they are trained to fill in missing parts of an image rather than to place an object into a scene. Consequently, when used for background creation, inpainting models frequently extend the salient object's boundaries and thereby change the object's identity, which is a phenomenon we call "object expansion." This paper introduces a model for adapting inpainting diffusion models to the salient object outpainting task using Stable Diffusion and ControlNet architectures. We present a series of qualitative and quantitative results across models and datasets, including a newly proposed metric to measure object expansion that does not require any human labeling. Compared to Stable Diffusion 2.0 Inpainting, our proposed approach reduces object expansion by 3.6x on average with no degradation in standard visual metrics across multiple datasets.
Authors: Yue Jiang, Zixin Guo, Hamed Rezazadegan Tavakoli, Luis A. Leiva, Antti Oulasvirta
Abstract: From a visual perception perspective, modern graphical user interfaces (GUIs) comprise a complex graphics-rich two-dimensional visuospatial arrangement of text, images, and interactive objects such as buttons and menus. While existing models can accurately predict regions and objects that are likely to attract attention ``on average'', so far there is no scanpath model capable of predicting scanpaths for an individual. To close this gap, we introduce EyeFormer, which leverages a Transformer architecture as a policy network to guide a deep reinforcement learning algorithm that controls gaze locations. Our model has the unique capability of producing personalized predictions when given a few user scanpath samples. It can predict full scanpath information, including fixation positions and duration, across individuals and various stimulus types. Additionally, we demonstrate applications in GUI layout optimization driven by our model. Our software and models will be publicly available.
Authors: Luffina C. Huang, Darren J. Chiu, Manish Mehta
Abstract: Automated medical diagnosis through image-based neural networks has increased in popularity and matured over years. Nevertheless, it is confined by the scarcity of medical images and the expensive labor annotation costs. Self-Supervised Learning (SSL) is an good alternative to Transfer Learning (TL) and is suitable for imbalanced image datasets. In this study, we assess four pretrained SSL models and two TL models in treatable retinal diseases classification using small-scale Optical Coherence Tomography (OCT) images ranging from 125 to 4000 with balanced or imbalanced distribution for training. The proposed SSL model achieves the state-of-art accuracy of 98.84% using only 4,000 training images. Our results suggest the SSL models provide superior performance under both the balanced and imbalanced training scenarios. The SSL model with MoCo-v2 scheme has consistent good performance under the imbalanced scenario and, especially, surpasses the other models when the training set is less than 500 images.
Authors: Giannis Daras, Alexandros G. Dimakis, Constantinos Daskalakis
Abstract: Ambient diffusion is a recently proposed framework for training diffusion models using corrupted data. Both Ambient Diffusion and alternative SURE-based approaches for learning diffusion models from corrupted data resort to approximations which deteriorate performance. We present the first framework for training diffusion models that provably sample from the uncorrupted distribution given only noisy training data, solving an open problem in this space. Our key technical contribution is a method that uses a double application of Tweedie's formula and a consistency loss function that allows us to extend sampling at noise levels below the observed data noise. We also provide further evidence that diffusion models memorize from their training sets by identifying extremely corrupted images that are almost perfectly reconstructed, raising copyright and privacy concerns. Our method for training using corrupted samples can be used to mitigate this problem. We demonstrate this by fine-tuning Stable Diffusion XL to generate samples from a distribution using only noisy samples. Our framework reduces the amount of memorization of the fine-tuning dataset, while maintaining competitive performance.
Authors: SIMA Team, Maria Abi Raad, Arun Ahuja, Catarina Barros, Frederic Besse, Andrew Bolt, Adrian Bolton, Bethanie Brownfield, Gavin Buttimore, Max Cant, Sarah Chakera, Stephanie C. Y. Chan, Jeff Clune, Adrian Collister, Vikki Copeman, Alex Cullum, Ishita Dasgupta, Dario de Cesare, Julia Di Trapani, Yani Donchev, Emma Dunleavy, Martin Engelcke, Ryan Faulkner, Frankie Garcia, Charles Gbadamosi, Zhitao Gong, Lucy Gonzales, Kshitij Gupta, Karol Gregor, Arne Olav Hallingstad, Tim Harley, Sam Haves, Felix Hill, Ed Hirst, Drew A. Hudson, Jony Hudson, Steph Hughes-Fitt, Danilo J. Rezende, Mimi Jasarevic, Laura Kampis, Rosemary Ke, Thomas Keck, Junkyung Kim, Oscar Knagg, Kavya Kopparapu, Andrew Lampinen, Shane Legg, Alexander Lerchner, Marjorie Limont, Yulan Liu, Maria Loks-Thompson, Joseph Marino, Kathryn Martin Cussons, Loic Matthey, Siobhan Mcloughlin, Piermaria Mendolicchio, Hamza Merzic, Anna Mitenkova, Alexandre Moufarek, Valeria Oliveira, Yanko Oliveira, Hannah Openshaw, Renke Pan, Aneesh Pappu, Alex Platonov, Ollie Purkiss, David Reichert, John Reid, Pierre Harvey Richemond, Tyson Roberts, Giles Ruscoe, Jaume Sanchez Elias, Tasha Sandars, Daniel P. Sawyer, Tim Scholtes, Guy Simmons, Daniel Slater, Hubert Soyer, Heiko Strathmann, Peter Stys, Allison C. Tam, Denis Teplyashin, Tayfun Terzi, Davide Vercelli, Bojan Vujatovic, Marcus Wainwright, Jane X. Wang, Zhengdong Wang, Daan Wierstra, Duncan Williams, Nathaniel Wong, Sarah York, Nick Young
Abstract: Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructions across a diverse range of virtual 3D environments, including curated research environments as well as open-ended, commercial video games. Our goal is to develop an instructable agent that can accomplish anything a human can do in any simulated 3D environment. Our approach focuses on language-driven generality while imposing minimal assumptions. Our agents interact with environments in real-time using a generic, human-like interface: the inputs are image observations and language instructions and the outputs are keyboard-and-mouse actions. This general approach is challenging, but it allows agents to ground language across many visually complex and semantically rich environments while also allowing us to readily run agents in new environments. In this paper we describe our motivation and goal, the initial progress we have made, and promising preliminary results on several diverse research environments and a variety of commercial video games.
Authors: Zelin Wu, Gan Song, Christopher Li, Pat Rondon, Zhong Meng, Xavier Velez, Weiran Wang, Diamantino Caseiro, Golan Pundak, Tsendsuren Munkhdalai, Angad Chandorkar, Rohit Prabhavalkar
Abstract: Contextual biasing enables speech recognizers to transcribe important phrases in the speaker's context, such as contact names, even if they are rare in, or absent from, the training data. Attention-based biasing is a leading approach which allows for full end-to-end cotraining of the recognizer and biasing system and requires no separate inference-time components. Such biasers typically consist of a context encoder; followed by a context filter which narrows down the context to apply, improving per-step inference time; and, finally, context application via cross attention. Though much work has gone into optimizing per-frame performance, the context encoder is at least as important: recognition cannot begin before context encoding ends. Here, we show the lightweight phrase selection pass can be moved before context encoding, resulting in a speedup of up to 16.1 times and enabling biasing to scale to 20K phrases with a maximum pre-decoding delay under 33ms. With the addition of phrase- and wordpiece-level cross-entropy losses, our technique also achieves up to a 37.5% relative WER reduction over the baseline without the losses and lightweight phrase selection pass.
Authors: Hilal Asi, Vitaly Feldman, Jelani Nelson, Huy L. Nguyen, Samson Zhou, Kunal Talwar
Abstract: We study the problem of private vector mean estimation in the shuffle model of privacy where $n$ users each have a unit vector $v^{(i)} \in\mathbb{R}^d$. We propose a new multi-message protocol that achieves the optimal error using $\tilde{\mathcal{O}}\left(\min(n\varepsilon^2,d)\right)$ messages per user. Moreover, we show that any (unbiased) protocol that achieves optimal error requires each user to send $\Omega(\min(n\varepsilon^2,d)/\log(n))$ messages, demonstrating the optimality of our message complexity up to logarithmic factors. Additionally, we study the single-message setting and design a protocol that achieves mean squared error $\mathcal{O}(dn^{d/(d+2)}\varepsilon^{-4/(d+2)})$. Moreover, we show that any single-message protocol must incur mean squared error $\Omega(dn^{d/(d+2)})$, showing that our protocol is optimal in the standard setting where $\varepsilon = \Theta(1)$. Finally, we study robustness to malicious users and show that malicious users can incur large additive error with a single shuffler.
Authors: Md Kamrul Hossain Siam, Manidipa Bhattacharjee, Shakik Mahmud, Md. Saem Sarkar, Md. Masud Rana
Abstract: The Machine learning (ML) is a rapidly evolving field of technology that has the potential to greatly impact society in a variety of ways. However, there are also concerns about the potential negative effects of ML on society, such as job displacement and privacy issues. This research aimed to conduct a comprehensive analysis of the current and future impact of ML on society. The research included a thorough literature review, case studies, and surveys to gather data on the economic impact of ML, ethical and privacy implications, and public perceptions of the technology. The survey was conducted on 150 respondents from different areas. The case studies conducted were on the impact of ML on healthcare, finance, transportation, and manufacturing. The findings of this research revealed that the majority of respondents have a moderate level of familiarity with the concept of ML, believe that it has the potential to benefit society, and think that society should prioritize the development and use of ML. Based on these findings, it was recommended that more research is conducted on the impact of ML on society, stronger regulations and laws to protect the privacy and rights of individuals when it comes to ML should be developed, transparency and accountability in ML decision-making processes should be increased, and public education and awareness about ML should be enhanced.
Authors: Ruibo Yang, Jiazhou Wang, Andrew Mullhaupt
Abstract: In this paper, we study the stochastic multi-armed bandit problem, where the reward is driven by an unknown random variable. We propose a new variant of the Upper Confidence Bound (UCB) algorithm called Hellinger-UCB, which leverages the squared Hellinger distance to build the upper confidence bound. We prove that the Hellinger-UCB reaches the theoretical lower bound. We also show that the Hellinger-UCB has a solid statistical interpretation. We show that Hellinger-UCB is effective in finite time horizons with numerical experiments between Hellinger-UCB and other variants of the UCB algorithm. As a real-world example, we apply the Hellinger-UCB algorithm to solve the cold-start problem for a content recommender system of a financial app. With reasonable assumption, the Hellinger-UCB algorithm has a convenient but important lower latency feature. The online experiment also illustrates that the Hellinger-UCB outperforms both KL-UCB and UCB1 in the sense of a higher click-through rate (CTR).
Authors: Siqiao Xue, Danrui Qi, Caigao Jiang, Wenhui Shi, Fangyin Cheng, Keting Chen, Hongjun Yang, Zhiping Zhang, Jianshan He, Hongyang Zhang, Ganglin Wei, Wang Zhao, Fan Zhou, Hong Yi, Shaodong Liu, Hongjun Yang, Faqiang Chen
Abstract: The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. The technologies of interacting with data particularly have an important entanglement with LLMs as efficient and intuitive data interactions are paramount. In this paper, we present DB-GPT, a revolutionary and product-ready Python library that integrates LLMs into traditional data interaction tasks to enhance user experience and accessibility. DB-GPT is designed to understand data interaction tasks described by natural language and provide context-aware responses powered by LLMs, making it an indispensable tool for users ranging from novice to expert. Its system design supports deployment across local, distributed, and cloud environments. Beyond handling basic data interaction tasks like Text-to-SQL with LLMs, it can handle complex tasks like generative data analysis through a Multi-Agents framework and the Agentic Workflow Expression Language (AWEL). The Service-oriented Multi-model Management Framework (SMMF) ensures data privacy and security, enabling users to employ DB-GPT with private LLMs. Additionally, DB-GPT offers a series of product-ready features designed to enable users to integrate DB-GPT within their product environments easily. The code of DB-GPT is available at Github(https://github.com/eosphoros-ai/DB-GPT) which already has over 10.7k stars.
Authors: Peiyuan Zhi, Zhiyuan Zhang, Muzhi Han, Zeyu Zhang, Zhitian Li, Ziyuan Jiao, Baoxiong Jia, Siyuan Huang
Abstract: Autonomous robot navigation and manipulation in open environments require reasoning and replanning with closed-loop feedback. We present COME-robot, the first closed-loop framework utilizing the GPT-4V vision-language foundation model for open-ended reasoning and adaptive planning in real-world scenarios. We meticulously construct a library of action primitives for robot exploration, navigation, and manipulation, serving as callable execution modules for GPT-4V in task planning. On top of these modules, GPT-4V serves as the brain that can accomplish multimodal reasoning, generate action policy with code, verify the task progress, and provide feedback for replanning. Such design enables COME-robot to (i) actively perceive the environments, (ii) perform situated reasoning, and (iii) recover from failures. Through comprehensive experiments involving 8 challenging real-world tabletop and manipulation tasks, COME-robot demonstrates a significant improvement in task success rate (~25%) compared to state-of-the-art baseline methods. We further conduct comprehensive analyses to elucidate how COME-robot's design facilitates failure recovery, free-form instruction following, and long-horizon task planning.
Authors: Elham J. Barezi, Parisa Kordjamshidi
Abstract: We analyze knowledge-based visual question answering, for which given a question, the models need to ground it into the visual modality and retrieve the relevant knowledge from a given large knowledge base (KB) to be able to answer. Our analysis has two folds, one based on designing neural architectures and training them from scratch, and another based on large pre-trained language models (LLMs). Our research questions are: 1) Can we effectively augment models by explicit supervised retrieval of the relevant KB information to solve the KB-VQA problem? 2) How do task-specific and LLM-based models perform in the integration of visual and external knowledge, and multi-hop reasoning over both sources of information? 3) Is the implicit knowledge of LLMs sufficient for KB-VQA and to what extent it can replace the explicit KB? Our results demonstrate the positive impact of empowering task-specific and LLM models with supervised external and visual knowledge retrieval models. Our findings show that though LLMs are stronger in 1-hop reasoning, they suffer in 2-hop reasoning in comparison with our fine-tuned NN model even if the relevant information from both modalities is available to the model. Moreover, we observed that LLM models outperform the NN model for KB-related questions which confirms the effectiveness of implicit knowledge in LLMs however, they do not alleviate the need for external KB.
Authors: Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Leung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik, Dominique Beaini, Maciej Sypetkowski, Chi Vicky Cheng, Kristen Morse, Maureen Makes, Ben Mabey, Berton Earnshaw
Abstract: Featurizing microscopy images for use in biological research remains a significant challenge, especially for large-scale experiments spanning millions of images. This work explores the scaling properties of weakly supervised classifiers and self-supervised masked autoencoders (MAEs) when training with increasingly larger model backbones and microscopy datasets. Our results show that ViT-based MAEs outperform weakly supervised classifiers on a variety of tasks, achieving as much as a 11.5% relative improvement when recalling known biological relationships curated from public databases. Additionally, we develop a new channel-agnostic MAE architecture (CA-MAE) that allows for inputting images of different numbers and orders of channels at inference time. We demonstrate that CA-MAEs effectively generalize by inferring and evaluating on a microscopy image dataset (JUMP-CP) generated under different experimental conditions with a different channel structure than our pretraining data (RPI-93M). Our findings motivate continued research into scaling self-supervised learning on microscopy data in order to create powerful foundation models of cellular biology that have the potential to catalyze advancements in drug discovery and beyond.
Authors: Tunazzina Islam, Dan Goldwasser
Abstract: The widespread use of social media has led to a surge in popularity for automated methods of analyzing public opinion. Supervised methods are adept at text categorization, yet the dynamic nature of social media discussions poses a continual challenge for these techniques due to the constant shifting of the focus. On the other hand, traditional unsupervised methods for extracting themes from public discourse, such as topic modeling, often reveal overarching patterns that might not capture specific nuances. Consequently, a significant portion of research into social media discourse still depends on labor-intensive manual coding techniques and a human-in-the-loop approach, which are both time-consuming and costly. In this work, we study the problem of discovering arguments associated with a specific theme. We propose a generic LLMs-in-the-Loop strategy that leverages the advanced capabilities of Large Language Models (LLMs) to extract latent arguments from social media messaging. To demonstrate our approach, we apply our framework to contentious topics. We use two publicly available datasets: (1) the climate campaigns dataset of 14k Facebook ads with 25 themes and (2) the COVID-19 vaccine campaigns dataset of 9k Facebook ads with 14 themes. Furthermore, we analyze demographic targeting and the adaptation of messaging based on real-world events.
Authors: Eduardo Fernandes Montesuma, Fred Ngol\`e Mboula, Antoine Souloumiac
Abstract: In this paper, we tackle Multi-Source Domain Adaptation (MSDA), a task in transfer learning where one adapts multiple heterogeneous, labeled source probability measures towards a different, unlabeled target measure. We propose a novel framework for MSDA, based on Optimal Transport (OT) and Gaussian Mixture Models (GMMs). Our framework has two key advantages. First, OT between GMMs can be solved efficiently via linear programming. Second, it provides a convenient model for supervised learning, especially classification, as components in the GMM can be associated with existing classes. Based on the GMM-OT problem, we propose a novel technique for calculating barycenters of GMMs. Based on this novel algorithm, we propose two new strategies for MSDA: GMM-WBT and GMM-DaDiL. We empirically evaluate our proposed methods on four benchmarks in image classification and fault diagnosis, showing that we improve over the prior art while being faster and involving fewer parameters.
Authors: R V Raghavendra Rao, U Srinivasulu Reddy
Abstract: The challenge of imbalanced soil nutrient datasets significantly hampers accurate predictions of soil fertility. To tackle this, a new method is suggested in this research, combining Uniform Manifold Approximation and Projection (UMAP) with Least Absolute Shrinkage and Selection Operator (LASSO). The main aim is to counter the impact of uneven data distribution and improve soil fertility models' predictive precision. The model introduced uses Sparse Attention Regression, effectively incorporating pertinent features from the imbalanced dataset. UMAP is utilized initially to reduce data complexity, unveiling hidden structures and important patterns. Following this, LASSO is applied to refine features and enhance the model's interpretability. The experimental outcomes highlight the effectiveness of the UMAP and LASSO hybrid approach. The proposed model achieves outstanding performance metrics, reaching a predictive accuracy of 98%, demonstrating its capability in accurate soil fertility predictions. Additionally, it showcases a Precision of 91.25%, indicating its adeptness in identifying fertile soil instances accurately. The Recall metric stands at 90.90%, emphasizing the model's ability to capture true positive cases effectively.
Authors: Zach Evans, Julian D. Parker, CJ Carr, Zack Zukowski, Josiah Taylor, Jordi Pons
Abstract: Audio-based generative models for music have seen great strides recently, but so far have not managed to produce full-length music tracks with coherent musical structure. We show that by training a generative model on long temporal contexts it is possible to produce long-form music of up to 4m45s. Our model consists of a diffusion-transformer operating on a highly downsampled continuous latent representation (latent rate of 21.5Hz). It obtains state-of-the-art generations according to metrics on audio quality and prompt alignment, and subjective tests reveal that it produces full-length music with coherent structure.
Authors: Kaibo Liu, Yiyang Liu, Zhenpeng Chen, Jie M. Zhang, Yudong Han, Yun Ma, Ge Li, Gang Huang
Abstract: Conventional automated test generation tools struggle to generate test oracles and tricky bug-revealing test inputs. Large Language Models (LLMs) can be prompted to produce test inputs and oracles for a program directly, but the precision of the tests can be very low for complex scenarios (only 6.3% based on our experiments). To fill this gap, this paper proposes AID, which combines LLMs with differential testing to generate fault-revealing test inputs and oracles targeting plausibly correct programs (i.e., programs that have passed all the existing tests). In particular, AID selects test inputs that yield diverse outputs on a set of program variants generated by LLMs, then constructs the test oracle based on the outputs. We evaluate AID on two large-scale datasets with tricky bugs: TrickyBugs and EvalPlus, and compare it with three state-of-the-art baselines. The evaluation results show that the recall, precision, and F1 score of AID outperform the state-of-the-art by up to 1.80x, 2.65x, and 1.66x, respectively.
Authors: Hassam Khan Wazir, Zaid Waghoo, Vikram Kapila
Abstract: Several therapy routines require deep breathing exercises as a key component and patients undergoing such therapies must perform these exercises regularly. Assessing the outcome of a therapy and tailoring its course necessitates monitoring a patient's compliance with the therapy. While therapy compliance monitoring is routine in a clinical environment, it is challenging to do in an at-home setting. This is so because a home setting lacks access to specialized equipment and skilled professionals needed to effectively monitor the performance of a therapy routine by a patient. For some types of therapies, these challenges can be addressed with the use of consumer-grade hardware, such as earphones and smartphones, as practical solutions. To accurately monitor breathing exercises using wireless earphones, this paper proposes a framework that has the potential for assessing a patient's compliance with an at-home therapy. The proposed system performs real-time detection of breathing phases and channels with high accuracy by processing a $\mathbf{500}$ ms audio signal through two convolutional neural networks. The first network, called a channel classifier, distinguishes between nasal and oral breathing, and a pause. The second network, called a phase classifier, determines whether the audio segment is from inhalation or exhalation. According to $k$-fold cross-validation, the channel and phase classifiers achieved a maximum F1 score of $\mathbf{97.99\%}$ and $\mathbf{89.46\%}$, respectively. The results demonstrate the potential of using commodity earphones for real-time breathing channel and phase detection for breathing therapy compliance monitoring.
Authors: Alexey Kornaev, Elena Kornaeva, Oleg Ivanov, Ilya Pershin, Danis Alukaev
Abstract: One of the ways to make artificial intelligence more natural is to give it some room for doubt. Two main questions should be resolved in that way. First, how to train a model to estimate uncertainties of its own predictions? And then, what to do with the uncertain predictions if they appear? First, we proposed an uncertainty-aware negative log-likelihood loss for the case of N-dimensional multivariate normal distribution with spherical variance matrix to the solution of N-classes classification tasks. The loss is similar to the heteroscedastic regression loss. The proposed model regularizes uncertain predictions, and trains to calculate both the predictions and their uncertainty estimations. The model fits well with the label smoothing technique. Second, we expanded the limits of data augmentation at the training and test stages, and made the trained model to give multiple predictions for a given number of augmented versions of each test sample. Given the multi-view predictions together with their uncertainties and confidences, we proposed several methods to calculate final predictions, including mode values and bin counts with soft and hard weights. For the latter method, we formalized the model tuning task in the form of multimodal optimization with non-differentiable criteria of maximum accuracy, and applied particle swarm optimization to solve the tuning task. The proposed methodology was tested using CIFAR-10 dataset with clean and noisy labels and demonstrated good results in comparison with other uncertainty estimation methods related to sample selection, co-teaching, and label smoothing.
Authors: Danil Afonchikov, Elena Kornaeva, Irina Makovik, Alexey Kornaev
Abstract: Cells count become a challenging problem when the cells move in a continuous stream, and their boundaries are difficult for visual detection. To resolve this problem we modified the training and decision making processes using curriculum learning and multi-view predictions techniques, respectively.
Authors: Luke W. Yerbury, Ricardo J. G. B. Campello, G. C. Livingston Jr, Mark Goldsworthy, Lachlan O'Neil
Abstract: Relative Validity Indices (RVIs) such as the Silhouette Width Criterion, Calinski-Harabasz and Davie's Bouldin indices are the most popular tools for evaluating and optimising applications of clustering. Their ability to rank collections of candidate partitions has been used to guide the selection of the number of clusters, and to compare partitions from different clustering algorithms. Beyond these more conventional tasks, many examples can be found in the literature where RVIs have been used to compare and select other aspects of clustering approaches such as data normalisation procedures, data representation methods, and distance measures. The authors are not aware of any studies that have attempted to establish the suitability of RVIs for such comparisons. Moreover, given the impact of these aspects on pairwise similarities, it is not even immediately obvious how RVIs should be implemented when comparing these aspects. In this study, we conducted experiments with seven common RVIs on over 2.7 million clustering partitions for both synthetic and real-world datasets, encompassing feature-vector and time-series data. Our findings suggest that RVIs are not well-suited to these unconventional tasks, and that conclusions drawn from such applications may be misleading. It is recommended that normalisation procedures, representation methods, and distance measures instead be selected using external validation on high quality labelled datasets or carefully designed outcome-oriented objective criteria, both of which should be informed by relevant domain knowledge and clustering aims.
Authors: Ruifeng Li, Dongzhan Zhou, Ancheng Shen, Ao Zhang, Mao Su, Mingqian Li, Hongyang Chen, Gang Chen, Yin Zhang, Shufei Zhang, Yuqiang Li, Wanli Ouyang
Abstract: Artificial intelligence (AI) technology has demonstrated remarkable potential in drug dis-covery, where pharmacokinetics plays a crucial role in determining the dosage, safety, and efficacy of new drugs. A major challenge for AI-driven drug discovery (AIDD) is the scarcity of high-quality data, which often requires extensive wet-lab work. A typical example of this is pharmacokinetic experiments. In this work, we develop a physical formula enhanced mul-ti-task learning (PEMAL) method that predicts four key parameters of pharmacokinetics simultaneously. By incorporating physical formulas into the multi-task framework, PEMAL facilitates effective knowledge sharing and target alignment among the pharmacokinetic parameters, thereby enhancing the accuracy of prediction. Our experiments reveal that PEMAL significantly lowers the data demand, compared to typical Graph Neural Networks. Moreover, we demonstrate that PEMAL enhances the robustness to noise, an advantage that conventional Neural Networks do not possess. Another advantage of PEMAL is its high flexibility, which can be potentially applied to other multi-task machine learning scenarios. Overall, our work illustrates the benefits and potential of using PEMAL in AIDD and other scenarios with data scarcity and noise.
Authors: Yongming Huang, Xiaohu You, Hang Zhan, Shiwen He, Ningning Fu, Wei Xu
Abstract: Intelligent communications have played a pivotal role in shaping the evolution of 6G networks. Native artificial intelligence (AI) within green communication systems must meet stringent real-time requirements. To achieve this, deploying lightweight and resource-efficient AI models is necessary. However, as wireless networks generate a multitude of data fields and indicators during operation, only a fraction of them imposes significant impact on the network AI models. Therefore, real-time intelligence of communication systems heavily relies on a small but critical set of the data that profoundly influences the performance of network AI models. These challenges underscore the need for innovative architectures and solutions. In this paper, we propose a solution, termed the pervasive multi-level (PML) native AI architecture, which integrates the concept of knowledge graph (KG) into the intelligent operational manipulations of mobile networks, resulting in the establishment of a wireless data KG. Leveraging the wireless data KG, we characterize the massive and complex data collected from wireless communication networks and analyze the relationships among various data fields. The obtained graph of data field relations enables the on-demand generation of minimal and effective datasets, referred to as feature datasets, tailored to specific application requirements. Consequently, this architecture not only enhances AI training, inference, and validation processes but also significantly reduces resource wastage and overhead for communication networks. To implement this architecture, we have developed a specific solution comprising a spatio-temporal heterogeneous graph attention neural network model (STREAM) as well as a feature dataset generation algorithm. Experiments are conducted to validate the effectiveness of the proposed architecture.
Authors: Jiawen Xu
Abstract: Open set recognition (OSR) is a critical aspect of machine learning, addressing the challenge of detecting novel classes during inference. Within the realm of deep learning, neural classifiers trained on a closed set of data typically struggle to identify novel classes, leading to erroneous predictions. To address this issue, various heuristic methods have been proposed, allowing models to express uncertainty by stating "I don't know." However, a gap in the literature remains, as there has been limited exploration of the underlying mechanisms of these methods. In this paper, we conduct an analysis of open set recognition methods, focusing on the aspect of feature diversity. Our research reveals a significant correlation between learning diverse discriminative features and enhancing OSR performance. Building on this insight, we propose a novel OSR approach that leverages the advantages of feature diversity. The efficacy of our method is substantiated through rigorous evaluation on a standard OSR testbench, demonstrating a substantial improvement over state-of-the-art methods.
Authors: Ivan DeAndres-Tame, Ruben Tolosana, Pietro Melzi, Ruben Vera-Rodriguez, Minchul Kim, Christian Rathgeb, Xiaoming Liu, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Zhizhou Zhong, Yuge Huang, Yuxi Mi, Shouhong Ding, Shuigeng Zhou, Shuai He, Lingzhi Fu, Heng Cong, Rongyu Zhang, Zhihong Xiao, Evgeny Smirnov, Anton Pimenov, Aleksei Grigorev, Denis Timoshenko, Kaleb Mesfin Asfaw, Cheng Yaw Low, Hao Liu, Chuyi Wang, Qing Zuo, Zhixiang He, Hatef Otroshi Shahreza, Anjith George, Alexander Unnervik, Parsa Rahimi, S\'ebastien Marcel, Pedro C. Neto, Marco Huber, Jan Niklas Kolf, Naser Damer, Fadi Boutros, Jaime S. Cardoso, Ana F. Sequeira, Andrea Atzori, Gianni Fenu, Mirko Marras, Vitomir \v{S}truc, Jiang Yu, Zhangjie Li, Jichun Li, Weisong Zhao, Zhen Lei, Xiangyu Zhu, Xiao-Yu Zhang, Bernardo Biesseck, Pedro Vidal, Luiz Coelho, Roger Granada, David Menotti
Abstract: Synthetic data is gaining increasing relevance for training machine learning models. This is mainly motivated due to several factors such as the lack of real data and intra-class variability, time and errors produced in manual labeling, and in some cases privacy concerns, among others. This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at CVPR 2024. FRCSyn aims to investigate the use of synthetic data in face recognition to address current technological limitations, including data privacy concerns, demographic biases, generalization to novel scenarios, and performance constraints in challenging situations such as aging, pose variations, and occlusions. Unlike the 1st edition, in which synthetic data from DCFace and GANDiffFace methods was only allowed to train face recognition systems, in this 2nd edition we propose new sub-tasks that allow participants to explore novel face generative methods. The outcomes of the 2nd FRCSyn Challenge, along with the proposed experimental protocol and benchmarking contribute significantly to the application of synthetic data to face recognition.
Authors: Noah Lewis, Jean Luca Bez, Suren Byna
Abstract: High-Performance Computing (HPC) systems excel in managing distributed workloads, and the growing interest in Artificial Intelligence (AI) has resulted in a surge in demand for faster methods of Machine Learning (ML) model training and inference. In the past, research on HPC I/O focused on optimizing the underlying storage system for modeling and simulation applications and checkpointing the results, causing writes to be the dominant I/O operation. These applications typically access large portions of the data written by simulations or experiments. ML workloads, in contrast, perform small I/O reads spread across a large number of random files. This shift of I/O access patterns poses several challenges to HPC storage systems. In this paper, we survey I/O in ML applications on HPC systems, and target literature within a 6-year time window from 2019 to 2024. We provide an overview of the common phases of ML, review available profilers and benchmarks, examine the I/O patterns encountered during ML training, explore I/O optimizations utilized in modern ML frameworks and proposed in recent literature, and lastly, present gaps requiring further R&D. We seek to summarize the common practices used in accessing data by ML applications and expose research gaps that could spawn further R&D.
Authors: Hao Feng, Yuanzhe Jia, Ruijia Xu, Mukesh Prasad, Ali Anaissi, Ali Braytee
Abstract: Image recognition techniques heavily rely on abundant labeled data, particularly in medical contexts. Addressing the challenges associated with obtaining labeled data has led to the prominence of self-supervised learning and semi-supervised learning, especially in scenarios with limited annotated data. In this paper, we proposed an innovative approach by integrating self-supervised learning into semi-supervised models to enhance medical image recognition. Our methodology commences with pre-training on unlabeled data utilizing the BYOL method. Subsequently, we merge pseudo-labeled and labeled datasets to construct a neural network classifier, refining it through iterative fine-tuning. Experimental results on three different datasets demonstrate that our approach optimally leverages unlabeled data, outperforming existing methods in terms of accuracy for medical image recognition.
Authors: Tiannuo Yang, Wen Hu, Wangqi Peng, Yusen Li, Jianguo Li, Gang Wang, Xiaoguang Liu
Abstract: Vector data management systems (VDMSs) have become an indispensable cornerstone in large-scale information retrieval and machine learning systems like large language models. To enhance the efficiency and flexibility of similarity search, VDMS exposes many tunable index parameters and system parameters for users to specify. However, due to the inherent characteristics of VDMS, automatic performance tuning for VDMS faces several critical challenges, which cannot be well addressed by the existing auto-tuning methods. In this paper, we introduce VDTuner, a learning-based automatic performance tuning framework for VDMS, leveraging multi-objective Bayesian optimization. VDTuner overcomes the challenges associated with VDMS by efficiently exploring a complex multi-dimensional parameter space without requiring any prior knowledge. Moreover, it is able to achieve a good balance between search speed and recall rate, delivering an optimal configuration. Extensive evaluations demonstrate that VDTuner can markedly improve VDMS performance (14.12% in search speed and 186.38% in recall rate) compared with default setting, and is more efficient compared with state-of-the-art baselines (up to 3.57 times faster in terms of tuning time). In addition, VDTuner is scalable to specific user preference and cost-aware optimization objective. VDTuner is available online at https://github.com/tiannuo-yang/VDTuner.
Authors: Christian Tinauer, Anna Damulina, Maximilian Sackl, Martin Soellradl, Reduan Achtibat, Maximilian Dreyer, Frederik Pahde, Sebastian Lapuschkin, Reinhold Schmidt, Stefan Ropele, Wojciech Samek, Christian Langkammer
Abstract: Motivation. While recent studies show high accuracy in the classification of Alzheimer's disease using deep neural networks, the underlying learned concepts have not been investigated. Goals. To systematically identify changes in brain regions through concepts learned by the deep neural network for model validation. Approach. Using quantitative R2* maps we separated Alzheimer's patients (n=117) from normal controls (n=219) by using a convolutional neural network and systematically investigated the learned concepts using Concept Relevance Propagation and compared these results to a conventional region of interest-based analysis. Results. In line with established histological findings and the region of interest-based analyses, highly relevant concepts were primarily found in and adjacent to the basal ganglia. Impact. The identification of concepts learned by deep neural networks for disease classification enables validation of the models and could potentially improve reliability.
Authors: Rui Qiu, Zhou Yu, Zhenhua Lin
Abstract: This paper explores the field of semi-supervised Fr\'echet regression, driven by the significant costs associated with obtaining non-Euclidean labels. Methodologically, we propose two novel methods: semi-supervised NW Fr\'echet regression and semi-supervised kNN Fr\'echet regression, both based on graph distance acquired from all feature instances. These methods extend the scope of existing semi-supervised Euclidean regression methods. We establish their convergence rates with limited labeled data and large amounts of unlabeled data, taking into account the low-dimensional manifold structure of the feature space. Through comprehensive simulations across diverse settings and applications to real data, we demonstrate the superior performance of our methods over their supervised counterparts. This study addresses existing research gaps and paves the way for further exploration and advancements in the field of semi-supervised Fr\'echet regression.
Authors: Viny Saajan Victor, Manuel Ettm\"uller, Andre Schmei{\ss}er, Heike Leitte, Simone Gramsch
Abstract: Several numerical differential equation solvers have been employed effectively over the years as an alternative to analytical solvers to quickly and conveniently solve differential equations. One category of these is boundary value solvers, which are used to solve real-world problems formulated as differential equations with boundary conditions. These solvers require certain numerical settings to solve the differential equations that affect their solvability and performance. A systematic fine-tuning of these settings is required to obtain the desired solution and performance. Currently, these settings are either selected by trial and error or require domain expertise. In this paper, we propose a machine learning-based optimization workflow for fine-tuning the numerical settings to reduce the time and domain expertise required in the process. In the evaluation section, we discuss the scalability, stability, and reliability of the proposed workflow. We demonstrate our workflow on a numerical boundary value problem solver.
Authors: Zhige Chen, Rui Yang, Mengjie Huang, Chengxuan Qin, Zidong Wang
Abstract: Because of "the non-repeatability of the experiment settings and conditions" and "the variability of brain patterns among subjects", the data distributions across sessions and electrodes are different in cross-subject motor imagery (MI) studies, eventually reducing the performance of the classification model. Systematically summarised based on the existing studies, a novel temporal-electrode data distribution problem is investigated under both intra-subject and inter-subject scenarios in this paper. Based on the presented issue, a novel bridging domain adaptation network (BDAN) is proposed, aiming to minimise the data distribution difference across sessions in the aspect of the electrode, thus improving and enhancing model performance. In the proposed BDAN, deep features of all the EEG data are extracted via a specially designed spatial feature extractor. With the obtained spatio-temporal features, a special generative bridging domain is established, bridging the data from all the subjects across sessions. The difference across sessions and electrodes is then minimized using the customized bridging loss functions, and the known knowledge is automatically transferred through the constructed bridging domain. To show the effectiveness of the proposed BDAN, comparison experiments and ablation studies are conducted on a public EEG dataset. The overall comparison results demonstrate the superior performance of the proposed BDAN compared with the other advanced deep learning and domain adaptation methods.
Authors: Ke Zhu, Liang Zhao, Zheng Ge, Xiangyu Zhang
Abstract: This paper makes the first attempt towards unsupervised preference alignment in Vision-Language Models (VLMs). We generate chosen and rejected responses with regard to the original and augmented image pairs, and conduct preference alignment with direct preference optimization. It is based on a core idea: properly designed augmentation to the image input will induce VLM to generate false but hard negative responses, which helps the model to learn from and produce more robust and powerful answers. The whole pipeline no longer hinges on supervision from GPT4 or human involvement during alignment, and is highly efficient with few lines of code. With only 8k randomly sampled unsupervised data, it achieves 90\% relative score to GPT-4 on complex reasoning in LLaVA-Bench, and improves LLaVA-7B/13B by 6.7\%/5.6\% score on complex multi-modal benchmark MM-Vet. Visualizations shows its improved ability to align with user-intentions. A series of ablations are firmly conducted to reveal the latent mechanism of the approach, which also indicates its potential towards further scaling. Code will be available.
Authors: Moshe Berchansky, Daniel Fleischer, Moshe Wasserblat, Peter Izsak
Abstract: State-of-the-art performance in QA tasks is currently achieved by systems employing Large Language Models (LLMs), however these models tend to hallucinate information in their responses. One approach focuses on enhancing the generation process by incorporating attribution from the given input to the output. However, the challenge of identifying appropriate attributions and verifying their accuracy against a source is a complex task that requires significant improvements in assessing such systems. We introduce an attribution-oriented Chain-of-Thought reasoning method to enhance the accuracy of attributions. This approach focuses the reasoning process on generating an attribution-centric output. Evaluations on two context-enhanced question-answering datasets using GPT-4 demonstrate improved accuracy and correctness of attributions. In addition, the combination of our method with finetuning enhances the response and attribution accuracy of two smaller LLMs, showing their potential to outperform GPT-4 in some cases.
Authors: Manideep Reddy Aliminati, Bharatesh Chakravarthi, Aayush Atul Verma, Arpitsinh Vaghela, Hua Wei, Xuesong Zhou, Yezhou Yang
Abstract: Recently, event-based vision sensors have gained attention for autonomous driving applications, as conventional RGB cameras face limitations in handling challenging dynamic conditions. However, the availability of real-world and synthetic event-based vision datasets remains limited. In response to this gap, we present SEVD, a first-of-its-kind multi-view ego, and fixed perception synthetic event-based dataset using multiple dynamic vision sensors within the CARLA simulator. Data sequences are recorded across diverse lighting (noon, nighttime, twilight) and weather conditions (clear, cloudy, wet, rainy, foggy) with domain shifts (discrete and continuous). SEVD spans urban, suburban, rural, and highway scenes featuring various classes of objects (car, truck, van, bicycle, motorcycle, and pedestrian). Alongside event data, SEVD includes RGB imagery, depth maps, optical flow, semantic, and instance segmentation, facilitating a comprehensive understanding of the scene. Furthermore, we evaluate the dataset using state-of-the-art event-based (RED, RVT) and frame-based (YOLOv8) methods for traffic participant detection tasks and provide baseline benchmarks for assessment. Additionally, we conduct experiments to assess the synthetic event-based dataset's generalization capabilities. The dataset is available at https://eventbasedvision.github.io/SEVD
Authors: Nico Meyer, Jakob Murauer, Alexander Popov, Christian Ufrecht, Axel Plinge, Christopher Mutschler, Daniel D. Scherer
Abstract: Reinforcement learning is a powerful framework aiming to determine optimal behavior in highly complex decision-making scenarios. This objective can be achieved using policy iteration, which requires to solve a typically large linear system of equations. We propose the variational quantum policy iteration (VarQPI) algorithm, realizing this step with a NISQ-compatible quantum-enhanced subroutine. Its scalability is supported by an analysis of the structure of generic reinforcement learning environments, laying the foundation for potential quantum advantage with utility-scale quantum computers. Furthermore, we introduce the warm-start initialization variant (WS-VarQPI) that significantly reduces resource overhead. The algorithm solves a large FrozenLake environment with an underlying 256x256-dimensional linear system, indicating its practical robustness.
Authors: Malte Rippa, Ruben Schulze, Marian Himstedt, Felice Burn
Abstract: Prostate cancer is a commonly diagnosed cancerous disease among men world-wide. Even with modern technology such as multi-parametric magnetic resonance tomography and guided biopsies, the process for diagnosing prostate cancer remains time consuming and requires highly trained professionals. In this paper, different convolutional neural networks (CNN) are evaluated on their abilities to reliably classify whether an MRI sequence contains malignant lesions. Implementations of a ResNet, a ConvNet and a ConvNeXt for 3D image data are trained and evaluated. The models are trained using different data augmentation techniques, learning rates, and optimizers. The data is taken from a private dataset, provided by Cantonal Hospital Aarau. The best result was achieved by a ResNet3D, yielding an average precision score of 0.4583 and AUC ROC score of 0.6214.
Authors: Mattia Litrico, Davide Talon, Sebastiano Battiato, Alessio Del Bue, Mario Valerio Giuffrida, Pietro Morerio
Abstract: Standard Unsupervised Domain Adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target but usually requires simultaneous access to both source and target data. Moreover, UDA approaches commonly assume that source and target domains share the same labels space. Yet, these two assumptions are hardly satisfied in real-world scenarios. This paper considers the more challenging Source-Free Open-set Domain Adaptation (SF-OSDA) setting, where both assumptions are dropped. We propose a novel approach for SF-OSDA that exploits the granularity of target-private categories by segregating their samples into multiple unknown classes. Starting from an initial clustering-based assignment, our method progressively improves the segregation of target-private samples by refining their pseudo-labels with the guide of an uncertainty-based sample selection module. Additionally, we propose a novel contrastive loss, named NL-InfoNCELoss, that, integrating negative learning into self-supervised contrastive learning, enhances the model robustness to noisy pseudo-labels. Extensive experiments on benchmark datasets demonstrate the superiority of the proposed method over existing approaches, establishing new state-of-the-art performance. Notably, additional analyses show that our method is able to learn the underlying semantics of novel classes, opening the possibility to perform novel class discovery.
Authors: Christof Naumzik, Alice Kongsted, Werner Vach, Stefan Feuerriegel
Abstract: Clinical data informs the personalization of health care with a potential for more effective disease management. In practice, this is achieved by subgrouping, whereby clusters with similar patient characteristics are identified and then receive customized treatment plans with the goal of targeting subgroup-specific disease dynamics. In this paper, we propose a novel mixture hidden Markov model for subgrouping patient trajectories from chronic diseases. Our model is probabilistic and carefully designed to capture different trajectory phases of chronic diseases (i.e., "severe", "moderate", and "mild") through tailored latent states. We demonstrate our subgrouping framework based on a longitudinal study across 847 patients with non-specific low back pain. Here, our subgrouping framework identifies 8 subgroups. Further, we show that our subgrouping framework outperforms common baselines in terms of cluster validity indices. Finally, we discuss the applicability of the model to other chronic and long-lasting diseases.
Authors: Isidora Teofilovic, Ali Cem, David Sanchez-Jacome, Daniel Perez-Lopez, Francesco Da Ros
Abstract: Photonic integrated circuits play an important role in the field of optical computing, promising faster and more energy-efficient operations compared to their digital counterparts. This advantage stems from the inherent suitability of optical signals to carry out matrix multiplication. However, even deterministic phenomena such as thermal crosstalk make precise programming of photonic chips a challenging task. Here, we train and experimentally evaluate three models incorporating varying degrees of physics intuition to predict the effect of thermal crosstalk in different locations of an integrated programmable photonic mesh. We quantify the effect of thermal crosstalk by the resonance wavelength shift in the power spectrum of a microring resonator implemented in the chip, achieving modelling errors <0.5 pm. We experimentally validate the models through compensation of the crosstalk-induced wavelength shift. Finally, we evaluate the generalization capabilities of one of the models by employing it to predict and compensate for the effect of thermal crosstalk for parts of the chip it was not trained on, revealing root-mean-square-errors of <2.0 pm.
Authors: Wei-Chung Shia, Yu-Len Huang, Yi-Chun Chen, Hwa-Koon Wu, Dar-Ren Chen
Abstract: A positive margin may result in an increased risk of local recurrences after breast retention surgery for any malignant tumour. In order to reduce the number of positive margins would offer surgeon real-time intra-operative information on the presence of positive resection margins. This study aims to design an intra-operative tumour margin evaluation scheme by using specimen mammography in breast-conserving surgery. Total of 30 cases were evaluated and compared with the manually determined contours by experienced physicians and pathology report. The proposed method utilizes image thresholding to extract regions of interest and then performs a deep learning model, i.e. SegNet, to segment tumour tissue. The margin width of normal tissues surrounding it is evaluated as the result. The desired size of margin around the tumor was set for 10 mm. The smallest average difference to manual sketched margin (6.53 mm +- 5.84). In the all case, the SegNet architecture was utilized to obtain tissue specimen boundary and tumor contour, respectively. The simulation results indicated that this technology is helpful in discriminating positive from negative margins in the intra-operative setting. The aim of proposed scheme was a potential procedure in the intra-operative measurement system. The experimental results reveal that deep learning techniques can draw results that are consistent with pathology reports.
Authors: Ruizhe Liu, Qian Luo, Yanchao Yang
Abstract: We focus on the self-supervised discovery of manipulation concepts that can be adapted and reassembled to address various robotic tasks. We propose that the decision to conceptualize a physical procedure should not depend on how we name it (semantics) but rather on the significance of the informativeness in its representation regarding the low-level physical state and state changes. We model manipulation concepts (discrete symbols) as generative and discriminative goals and derive metrics that can autonomously link them to meaningful sub-trajectories from noisy, unlabeled demonstrations. Specifically, we employ a trainable codebook containing encodings (concepts) capable of synthesizing the end-state of a sub-trajectory given the current state (generative informativeness). Moreover, the encoding corresponding to a particular sub-trajectory should differentiate the state within and outside it and confidently predict the subsequent action based on the gradient of its discriminative score (discriminative informativeness). These metrics, which do not rely on human annotation, can be seamlessly integrated into a VQ-VAE framework, enabling the partitioning of demonstrations into semantically consistent sub-trajectories, fulfilling the purpose of discovering manipulation concepts and the corresponding sub-goal (key) states. We evaluate the effectiveness of the learned concepts by training policies that utilize them as guidance, demonstrating superior performance compared to other baselines. Additionally, our discovered manipulation concepts compare favorably to human-annotated ones while saving much manual effort.
Authors: Phil Romero
Abstract: The sheer number of nodes continues to increase in todays supercomputers, the first half of Trinity alone contains more than 9400 compute nodes. Since the speed of todays clusters are limited by the slowest nodes, it more important than ever to identify slow nodes, improve their performance if it can be done, and assure minimal usage of slower nodes during performance critical runs. This is an ongoing maintenance task that occurs on a regular basis and, therefore, it is important to minimize the impact upon its users by assessing and addressing slow performing nodes and mitigating their consequences while minimizing down time. These issues can be solved, in large part, through a systematic application of fast running hardware assessment tests, the application of Machine Learning, and making use of performance data to increase efficiency of large clusters. Proxy applications utilizing both MPI and OpenMP were developed to produce data as a substitute for long runtime applications to evaluate node performance. Machine learning is applied to identify underperforming nodes, and policies are being discussed to both minimize the impact of underperforming nodes and increase the efficiency of the system. In this paper, I will describe the process used to produce quickly performing proxy tests, consider various methods to isolate the outliers, and produce ordered lists for use in scheduling to accomplish this task.
Authors: Batuhan T\"omek\c{c}e, Mark Vero, Robin Staab, Martin Vechev
Abstract: As large language models (LLMs) become ubiquitous in our daily tasks and digital interactions, associated privacy risks are increasingly in focus. While LLM privacy research has primarily focused on the leakage of model training data, it has recently been shown that the increase in models' capabilities has enabled LLMs to make accurate privacy-infringing inferences from previously unseen texts. With the rise of multimodal vision-language models (VLMs), capable of understanding both images and text, a pertinent question is whether such results transfer to the previously unexplored domain of benign images posted online. To investigate the risks associated with the image reasoning capabilities of newly emerging VLMs, we compile an image dataset with human-annotated labels of the image owner's personal attributes. In order to understand the additional privacy risk posed by VLMs beyond traditional human attribute recognition, our dataset consists of images where the inferable private attributes do not stem from direct depictions of humans. On this dataset, we evaluate the inferential capabilities of 7 state-of-the-art VLMs, finding that they can infer various personal attributes at up to 77.6% accuracy. Concerningly, we observe that accuracy scales with the general capabilities of the models, implying that future models can be misused as stronger adversaries, establishing an imperative for the development of adequate defenses.
Authors: Sinisa Stekovic, Stefan Ainetter, Mattia D'Urso, Friedrich Fraundorfer, Vincent Lepetit
Abstract: We propose PyTorchGeoNodes, a differentiable module for reconstructing 3D objects from images using interpretable shape programs. In comparison to traditional CAD model retrieval methods, the use of shape programs for 3D reconstruction allows for reasoning about the semantic properties of reconstructed objects, editing, low memory footprint, etc. However, the utilization of shape programs for 3D scene understanding has been largely neglected in past works. As our main contribution, we enable gradient-based optimization by introducing a module that translates shape programs designed in Blender, for example, into efficient PyTorch code. We also provide a method that relies on PyTorchGeoNodes and is inspired by Monte Carlo Tree Search (MCTS) to jointly optimize discrete and continuous parameters of shape programs and reconstruct 3D objects for input scenes. In our experiments, we apply our algorithm to reconstruct 3D objects in the ScanNet dataset and evaluate our results against CAD model retrieval-based reconstructions. Our experiments indicate that our reconstructions match well the input scenes while enabling semantic reasoning about reconstructed objects.
Authors: Haozheng Fan, Hao Zhou, Guangtai Huang, Parameswaran Raman, Xinwei Fu, Gaurav Gupta, Dhananjay Ram, Yida Wang, Jun Huan
Abstract: Getting large language models (LLMs) to perform well on the downstream tasks requires pre-training over trillions of tokens. This typically demands a large number of powerful computational devices in addition to a stable distributed training framework to accelerate the training. The growing number of applications leveraging AI/ML had led to a scarcity of the expensive conventional accelerators (such as GPUs), which begs the need for the alternative specialized-accelerators that are scalable and cost-efficient. AWS Trainium is the second-generation machine learning accelerator that has been purposely built for training large deep learning models. Its corresponding instance, Amazon EC2 trn1, is an alternative to GPU instances for LLM training. However, training LLMs with billions of parameters on trn1 is challenging due to its relatively nascent software ecosystem. In this paper, we showcase HLAT: a 7 billion parameter decoder-only LLM pre-trained using trn1 instances over 1.8 trillion tokens. The performance of HLAT is benchmarked against popular open source baseline models including LLaMA and OpenLLaMA, which have been trained on NVIDIA GPUs and Google TPUs, respectively. On various evaluation tasks, we show that HLAT achieves model quality on par with the baselines. We also share the best practice of using the Neuron Distributed Training Library (NDTL), a customized distributed training library for AWS Trainium to achieve efficient training. Our work demonstrates that AWS Trainium powered by the NDTL is able to successfully pre-train state-of-the-art LLM models with high performance and cost-effectiveness.
Authors: Raquel Lazcano, Daniel Madro\~nal, Giordana Florimbi, Jaime Sancho, Sergio Sanchez, Raquel Leon, Himar Fabelo, Samuel Ortega, Emanuele Torti, Ruben Salvador, Margarita Marrero-Martin, Francesco Leporati, Eduardo Juarez, Gustavo M Callico, Cesar Sanz
Abstract: Hyperspectral (HS) imaging presents itself as a non-contact, non-ionizing and non-invasive technique, proven to be suitable for medical diagnosis. However, the volume of information contained in these images makes difficult providing the surgeon with information about the boundaries in real-time. To that end, High-Performance-Computing (HPC) platforms become necessary. This paper presents a comparison between the performances provided by five different HPC platforms while processing a spatial-spectral approach to classify HS images, assessing their main benefits and drawbacks. To provide a complete study, two different medical applications, with two different requirements, have been analyzed. The first application consists of HS images taken from neurosurgical operations; the second one presents HS images taken from dermatological interventions. While the main constraint for neurosurgical applications is the processing time, in other environments, as the dermatological one, other requirements can be considered. In that sense, energy efficiency is becoming a major challenge, since this kind of applications are usually developed as hand-held devices, thus depending on the battery capacity. These requirements have been considered to choose the target platforms: on the one hand, three of the most powerful Graphic Processing Units (GPUs) available in the market; and, on the other hand, a low-power GPU and a manycore architecture, both specifically thought for being used in battery-dependent environments.
Authors: Ali Beikmohammadi, Sarit Khirirat, Sindri Magn\'usson
Abstract: Reinforcement learning has recently gained unprecedented popularity, yet it still grapples with sample inefficiency. Addressing this challenge, federated reinforcement learning (FedRL) has emerged, wherein agents collaboratively learn a single policy by aggregating local estimations. However, this aggregation step incurs significant communication costs. In this paper, we propose CompFedRL, a communication-efficient FedRL approach incorporating both \textit{periodic aggregation} and (direct/error-feedback) compression mechanisms. Specifically, we consider compressed federated $Q$-learning with a generative model setup, where a central server learns an optimal $Q$-function by periodically aggregating compressed $Q$-estimates from local agents. For the first time, we characterize the impact of these two mechanisms (which have remained elusive) by providing a finite-time analysis of our algorithm, demonstrating strong convergence behaviors when utilizing either direct or error-feedback compression. Our bounds indicate improved solution accuracy concerning the number of agents and other federated hyperparameters while simultaneously reducing communication costs. To corroborate our theory, we also conduct in-depth numerical experiments to verify our findings, considering Top-$K$ and Sparsified-$K$ sparsification operators.
Authors: Oliver Klingefjord, Ryan Lowe, Joe Edelman
Abstract: There is an emerging consensus that we need to align AI systems with human values (Gabriel, 2020; Ji et al., 2024), but it remains unclear how to apply this to language models in practice. We split the problem of "aligning to human values" into three parts: first, eliciting values from people; second, reconciling those values into an alignment target for training ML models; and third, actually training the model. In this paper, we focus on the first two parts, and ask the question: what are "good" ways to synthesize diverse human inputs about values into a target for aligning language models? To answer this question, we first define a set of 6 criteria that we believe must be satisfied for an alignment target to shape model behavior in accordance with human values. We then propose a process for eliciting and reconciling values called Moral Graph Elicitation (MGE), which uses a large language model to interview participants about their values in particular contexts; our approach is inspired by the philosophy of values advanced by Taylor (1977), Chang (2004), and others. We trial MGE with a representative sample of 500 Americans, on 3 intentionally divisive prompts (e.g. advice about abortion). Our results demonstrate that MGE is promising for improving model alignment across all 6 criteria. For example, almost all participants (89.1%) felt well represented by the process, and (89%) thought the final moral graph was fair, even if their value wasn't voted as the wisest. Our process often results in "expert" values (e.g. values from women who have solicited abortion advice) rising to the top of the moral graph, without defining who is considered an expert in advance.
Authors: Pengyu Cheng, Tianhao Hu, Han Xu, Zhisong Zhang, Yong Dai, Lei Han, Nan Du
Abstract: We explore the self-play training procedure of large language models (LLMs) in a two-player adversarial language game called Adversarial Taboo. In this game, an attacker and a defender communicate with respect to a target word only visible to the attacker. The attacker aims to induce the defender to utter the target word unconsciously, while the defender tries to infer the target word from the attacker's utterances. To win the game, both players should have sufficient knowledge about the target word and high-level reasoning ability to infer and express in this information-reserved conversation. Hence, we are curious about whether LLMs' reasoning ability can be further enhanced by Self-Play in this Adversarial language Game (SPAG). With this goal, we let LLMs act as the attacker and play with a copy of itself as the defender on an extensive range of target words. Through reinforcement learning on the game outcomes, we observe that the LLMs' performance uniformly improves on a broad range of reasoning benchmarks. Furthermore, iteratively adopting this self-play process can continuously promote LLM's reasoning ability. The code is at https://github.com/Linear95/SPAG.
Authors: Mohsen Hami, Mahdi JameBozorg
Abstract: Images captured from the real world are often affected by different types of noise, which can significantly impact the performance of Computer Vision systems and the quality of visual data. This study presents a novel approach for defect detection in casting product noisy images, specifically focusing on submersible pump impellers. The methodology involves utilizing deep learning models such as VGG16, InceptionV3, and other models in both the spatial and frequency domains to identify noise types and defect status. The research process begins with preprocessing images, followed by applying denoising techniques tailored to specific noise categories. The goal is to enhance the accuracy and robustness of defect detection by integrating noise detection and denoising into the classification pipeline. The study achieved remarkable results using VGG16 for noise type classification in the frequency domain, achieving an accuracy of over 99%. Removal of salt and pepper noise resulted in an average SSIM of 87.9, while Gaussian noise removal had an average SSIM of 64.0, and periodic noise removal yielded an average SSIM of 81.6. This comprehensive approach showcases the effectiveness of the deep AutoEncoder model and median filter, for denoising strategies in real-world industrial applications. Finally, our study reports significant improvements in binary classification accuracy for defect detection compared to previous methods. For the VGG16 classifier, accuracy increased from 94.6% to 97.0%, demonstrating the effectiveness of the proposed noise detection and denoising approach. Similarly, for the InceptionV3 classifier, accuracy improved from 84.7% to 90.0%, further validating the benefits of integrating noise analysis into the classification pipeline.
Authors: S Deepika Sri, Mohammed Aadil S, Sanjjushri Varshini R, Raja CSP Raman, Gopinath Rajagopal, S Taranath Chan
Abstract: In the contemporary landscape of technological advancements, the automation of manual processes is crucial, compelling the demand for huge datasets to effectively train and test machines. This research paper is dedicated to the exploration and implementation of an automated approach to generate test cases specifically using Large Language Models. The methodology integrates the use of Open AI to enhance the efficiency and effectiveness of test case generation for training and evaluating Large Language Models. This formalized approach with LLMs simplifies the testing process, making it more efficient and comprehensive. Leveraging natural language understanding, LLMs can intelligently formulate test cases that cover a broad range of REST API properties, ensuring comprehensive testing. The model that is developed during the research is trained using manually collected postman test cases or instances for various Rest APIs. LLMs enhance the creation of Postman test cases by automating the generation of varied and intricate test scenarios. Postman test cases offer streamlined automation, collaboration, and dynamic data handling, providing a user-friendly and efficient approach to API testing compared to traditional test cases. Thus, the model developed not only conforms to current technological standards but also holds the promise of evolving into an idea of substantial importance in future technological advancements.
Authors: Yutao Yuan, Chun Yuan
Abstract: Image super-resolution is a fundamentally ill-posed problem because multiple valid high-resolution images exist for one low-resolution image. Super-resolution methods based on diffusion probabilistic models can deal with the ill-posed nature by learning the distribution of high-resolution images conditioned on low-resolution images, avoiding the problem of blurry images in PSNR-oriented methods. However, existing diffusion-based super-resolution methods have high time consumption with the use of iterative sampling, while the quality and consistency of generated images are less than ideal due to problems like color shifting. In this paper, we propose Efficient Conditional Diffusion Model with Probability Flow Sampling (ECDP) for image super-resolution. To reduce the time consumption, we design a continuous-time conditional diffusion model for image super-resolution, which enables the use of probability flow sampling for efficient generation. Additionally, to improve the consistency of generated images, we propose a hybrid parametrization for the denoiser network, which interpolates between the data-predicting parametrization and the noise-predicting parametrization for different noise scales. Moreover, we design an image quality loss as a complement to the score matching loss of diffusion models, further improving the consistency and quality of super-resolution. Extensive experiments on DIV2K, ImageNet, and CelebA demonstrate that our method achieves higher super-resolution quality than existing diffusion-based image super-resolution methods while having lower time consumption. Our code is available at https://github.com/Yuan-Yutao/ECDP.
Authors: Philippe Gervais, Asya Fadeeva, Andrii Maksai
Abstract: We introduce MathWriting, the largest online handwritten mathematical expression dataset to date. It consists of 230k human-written samples and an additional 400k synthetic ones. MathWriting can also be used for offline HME recognition and is larger than all existing offline HME datasets like IM2LATEX-100K. We introduce a benchmark based on MathWriting data in order to advance research on both online and offline HME recognition.
Authors: Georgy Perevozchikov, Nancy Mehta, Mahmoud Afifi, Radu Timofte
Abstract: Modern smartphone camera quality heavily relies on the image signal processor (ISP) to enhance captured raw images, utilizing carefully designed modules to produce final output images encoded in a standard color space (e.g., sRGB). Neural-based end-to-end learnable ISPs offer promising advancements, potentially replacing traditional ISPs with their ability to adapt without requiring extensive tuning for each new camera model, as is often the case for nearly every module in traditional ISPs. However, the key challenge with the recent learning-based ISPs is the urge to collect large paired datasets for each distinct camera model due to the influence of intrinsic camera characteristics on the formation of input raw images. This paper tackles this challenge by introducing a novel method for unpaired learning of raw-to-raw translation across diverse cameras. Specifically, we propose Rawformer, an unsupervised Transformer-based encoder-decoder method for raw-to-raw translation. It accurately maps raw images captured by a certain camera to the target camera, facilitating the generalization of learnable ISPs to new unseen cameras. Our method demonstrates superior performance on real camera datasets, achieving higher accuracy compared to previous state-of-the-art techniques, and preserving a more robust correlation between the original and translated raw images.
Authors: Debopriya Roy Dipta, Thore Tiemann, Berk Gulmezoglu, Eduard Marin Fabregas, Thomas Eisenbarth
Abstract: The cloud computing landscape has evolved significantly in recent years, embracing various sandboxes to meet the diverse demands of modern cloud applications. These sandboxes encompass container-based technologies like Docker and gVisor, microVM-based solutions like Firecracker, and security-centric sandboxes relying on Trusted Execution Environments (TEEs) such as Intel SGX and AMD SEV. However, the practice of placing multiple tenants on shared physical hardware raises security and privacy concerns, most notably side-channel attacks. In this paper, we investigate the possibility of fingerprinting containers through CPU frequency reporting sensors in Intel and AMD CPUs. One key enabler of our attack is that the current CPU frequency information can be accessed by user-space attackers. We demonstrate that Docker images exhibit a unique frequency signature, enabling the distinction of different containers with up to 84.5% accuracy even when multiple containers are running simultaneously in different cores. Additionally, we assess the effectiveness of our attack when performed against several sandboxes deployed in cloud environments, including Google's gVisor, AWS' Firecracker, and TEE-based platforms like Gramine (utilizing Intel SGX) and AMD SEV. Our empirical results show that these attacks can also be carried out successfully against all of these sandboxes in less than 40 seconds, with an accuracy of over 70% in all cases. Finally, we propose a noise injection-based countermeasure to mitigate the proposed attack on cloud environments.
Authors: T. Crosta, L. Reb\'on, F. Vilari\~no, J. M. Matera, M. Bilkis
Abstract: During their operation, due to shifts in environmental conditions, devices undergo various forms of detuning from their optimal settings. Typically, this is addressed through control loops, which monitor variables and the device performance, to maintain settings at their optimal values. Quantum devices are particularly challenging since their functionality relies on precisely tuning their parameters. At the same time, the detailed modeling of the environmental behavior is often computationally unaffordable, while a direct measure of the parameters defining the system state is costly and introduces extra noise in the mechanism. In this study, we investigate the application of reinforcement learning techniques to develop a model-free control loop for continuous recalibration of quantum device parameters. Furthermore, we explore the advantages of incorporating minimal environmental noise models. As an example, the application to numerical simulations of a Kennedy receiver-based long-distance quantum communication protocol is presented.
Authors: Umberto Tomasini, Matthieu Wyart
Abstract: Understanding what makes high-dimensional data learnable is a fundamental question in machine learning. On the one hand, it is believed that the success of deep learning lies in its ability to build a hierarchy of representations that become increasingly more abstract with depth, going from simple features like edges to more complex concepts. On the other hand, learning to be insensitive to invariances of the task, such as smooth transformations for image datasets, has been argued to be important for deep networks and it strongly correlates with their performance. In this work, we aim to explain this correlation and unify these two viewpoints. We show that by introducing sparsity to generative hierarchical models of data, the task acquires insensitivity to spatial transformations that are discrete versions of smooth transformations. In particular, we introduce the Sparse Random Hierarchy Model (SRHM), where we observe and rationalize that a hierarchical representation mirroring the hierarchical model is learnt precisely when such insensitivity is learnt, thereby explaining the strong correlation between the latter and performance. Moreover, we quantify how the sample complexity of CNNs learning the SRHM depends on both the sparsity and hierarchical structure of the task.
Authors: Juno Nam, Rafael G\'omez-Bombarelli
Abstract: Machine learning interatomic potentials (MLIPs) have become a workhorse of modern atomistic simulations, and recently published universal MLIPs, pre-trained on large datasets, have demonstrated remarkable accuracy and generalizability. However, the computational cost of MLIPs limits their applicability to chemically disordered systems requiring large simulation cells or to sample-intensive statistical methods. Here, we report the use of continuous and differentiable alchemical degrees of freedom in atomistic materials simulations, exploiting the fact that graph neural network MLIPs represent discrete elements as real-valued tensors. The proposed method introduces alchemical atoms with corresponding weights into the input graph, alongside modifications to the message-passing and readout mechanisms of MLIPs, and allows smooth interpolation between the compositional states of materials. The end-to-end differentiability of MLIPs enables efficient calculation of the gradient of energy with respect to the compositional weights. Leveraging these gradients, we propose methodologies for optimizing the composition of solid solutions towards target macroscopic properties and conducting alchemical free energy simulations to quantify the free energy of vacancy formation and composition changes. The approach offers an avenue for extending the capabilities of universal MLIPs in the modeling of compositional disorder and characterizing the phase stabilities of complex materials systems.
Authors: Yu-Yang Li, Yu Bai, Cunshi Wang, Mengwei Qu, Ziteng Lu, Roberto Soria, Jifeng Liu
Abstract: Light curves serve as a valuable source of information on stellar formation and evolution. With the rapid advancement of machine learning techniques, it can be effectively processed to extract astronomical patterns and information. In this study, we present a comprehensive evaluation of deep-learning and large language model (LLM) based models for the automatic classification of variable star light curves, based on large datasets from the Kepler and K2 missions. Special emphasis is placed on Cepheids, RR Lyrae, and eclipsing binaries, examining the influence of observational cadence and phase distribution on classification precision. Employing AutoDL optimization, we achieve striking performance with the 1D-Convolution+BiLSTM architecture and the Swin Transformer, hitting accuracies of 94\% and 99\% correspondingly, with the latter demonstrating a notable 83\% accuracy in discerning the elusive Type II Cepheids-comprising merely 0.02\% of the total dataset.We unveil StarWhisper LightCurve (LC), an innovative Series comprising three LLM-based models: LLM, multimodal large language model (MLLM), and Large Audio Language Model (LALM). Each model is fine-tuned with strategic prompt engineering and customized training methods to explore the emergent abilities of these models for astronomical data. Remarkably, StarWhisper LC Series exhibit high accuracies around 90\%, significantly reducing the need for explicit feature engineering, thereby paving the way for streamlined parallel data processing and the progression of multifaceted multimodal models in astronomical applications. The study furnishes two detailed catalogs illustrating the impacts of phase and sampling intervals on deep learning classification accuracy, showing that a substantial decrease of up to 14\% in observation duration and 21\% in sampling points can be realized without compromising accuracy by more than 10\%.
Authors: Hubert Eichner, Daniel Ramage, Kallista Bonawitz, Dzmitry Huba, Tiziano Santoro, Brett McLarnon, Timon Van Overveldt, Nova Fallen, Peter Kairouz, Albert Cheu, Katharine Daly, Adria Gascon, Marco Gruteser, Brendan McMahan
Abstract: Federated Learning and Analytics (FLA) have seen widespread adoption by technology platforms for processing sensitive on-device data. However, basic FLA systems have privacy limitations: they do not necessarily require anonymization mechanisms like differential privacy (DP), and provide limited protections against a potentially malicious service provider. Adding DP to a basic FLA system currently requires either adding excessive noise to each device's updates, or assuming an honest service provider that correctly implements the mechanism and only uses the privatized outputs. Secure multiparty computation (SMPC) -based oblivious aggregations can limit the service provider's access to individual user updates and improve DP tradeoffs, but the tradeoffs are still suboptimal, and they suffer from scalability challenges and susceptibility to Sybil attacks. This paper introduces a novel system architecture that leverages trusted execution environments (TEEs) and open-sourcing to both ensure confidentiality of server-side computations and provide externally verifiable privacy properties, bolstering the robustness and trustworthiness of private federated computations.
Authors: Isao Ishikawa
Abstract: This paper introduces a theoretical framework for investigating analytic maps from finite discrete data, elucidating mathematical machinery underlying the polynomial approximation with least-squares in multivariate situations. Our approach is to consider the push-forward on the space of locally analytic functionals, instead of directly handling the analytic map itself. We establish a methodology enabling appropriate finite-dimensional approximation of the push-forward from finite discrete data, through the theory of the Fourier--Borel transform and the Fock space. Moreover, we prove a rigorous convergence result with a convergence rate. As an application, we prove that it is not the least-squares polynomial, but the polynomial obtained by truncating its higher-degree terms, that approximates analytic functions and further allows for approximation beyond the support of the data distribution. One advantage of our theory is that it enables us to apply linear algebraic operations to the finite-dimensional approximation of the push-forward. Utilizing this, we prove the convergence of a method for approximating an analytic vector field from finite data of the flow map of an ordinary differential equation.
Authors: Chris Cundy, Rishi Desai, Stefano Ermon
Abstract: As reinforcement learning techniques are increasingly applied to real-world decision problems, attention has turned to how these algorithms use potentially sensitive information. We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions. We give examples of how this setting covers real-world problems in privacy for sequential decision-making. We solve this problem in the policy gradients framework by introducing a regularizer based on the mutual information (MI) between the sensitive state and the actions. We develop a model-based stochastic gradient estimator for optimization of privacy-constrained policies. We also discuss an alternative MI regularizer that serves as an upper bound to our main MI regularizer and can be optimized in a model-free setting, and a powerful direct estimator that can be used in an environment with differentiable dynamics. We contrast previous work in differentially-private RL to our mutual-information formulation of information disclosure. Experimental results show that our training method results in policies that hide the sensitive state, even in challenging high-dimensional tasks.
Authors: Zulqarnain Khan, Davin Hill, Aria Masoomi, Joshua Bone, Jennifer Dy
Abstract: Machine learning methods have significantly improved in their predictive capabilities, but at the same time they are becoming more complex and less transparent. As a result, explainers are often relied on to provide interpretability to these black-box prediction models. As crucial diagnostics tools, it is important that these explainers themselves are robust. In this paper we focus on one particular aspect of robustness, namely that an explainer should give similar explanations for similar data inputs. We formalize this notion by introducing and defining explainer astuteness, analogous to astuteness of prediction functions. Our formalism allows us to connect explainer robustness to the predictor's probabilistic Lipschitzness, which captures the probability of local smoothness of a function. We provide lower bound guarantees on the astuteness of a variety of explainers (e.g., SHAP, RISE, CXPlain) given the Lipschitzness of the prediction function. These theoretical results imply that locally smooth prediction functions lend themselves to locally robust explanations. We evaluate these results empirically on simulated as well as real datasets.
Authors: Jeremy M. Cohen, Behrooz Ghorbani, Shankar Krishnan, Naman Agarwal, Sourabh Medapati, Michal Badura, Daniel Suo, David Cardoze, Zachary Nado, George E. Dahl, Justin Gilmer
Abstract: Very little is known about the training dynamics of adaptive gradient methods like Adam in deep learning. In this paper, we shed light on the behavior of these algorithms in the full-batch and sufficiently large batch settings. Specifically, we empirically demonstrate that during full-batch training, the maximum eigenvalue of the preconditioned Hessian typically equilibrates at a certain numerical value -- the stability threshold of a gradient descent algorithm. For Adam with step size $\eta$ and $\beta_1 = 0.9$, this stability threshold is $38/\eta$. Similar effects occur during minibatch training, especially as the batch size grows. Yet, even though adaptive methods train at the ``Adaptive Edge of Stability'' (AEoS), their behavior in this regime differs in a significant way from that of non-adaptive methods at the EoS. Whereas non-adaptive algorithms at the EoS are blocked from entering high-curvature regions of the loss landscape, adaptive gradient methods at the AEoS can keep advancing into high-curvature regions, while adapting the preconditioner to compensate. Our findings can serve as a foundation for the community's future understanding of adaptive gradient methods in deep learning.
Authors: Hao Wang, Luxi He, Rui Gao, Flavio P. Calmon
Abstract: Machine learning (ML) models can underperform on certain population groups due to choices made during model development and bias inherent in the data. We categorize sources of discrimination in the ML pipeline into two classes: aleatoric discrimination, which is inherent in the data distribution, and epistemic discrimination, which is due to decisions made during model development. We quantify aleatoric discrimination by determining the performance limits of a model under fairness constraints, assuming perfect knowledge of the data distribution. We demonstrate how to characterize aleatoric discrimination by applying Blackwell's results on comparing statistical experiments. We then quantify epistemic discrimination as the gap between a model's accuracy when fairness constraints are applied and the limit posed by aleatoric discrimination. We apply this approach to benchmark existing fairness interventions and investigate fairness risks in data with missing values. Our results indicate that state-of-the-art fairness interventions are effective at removing epistemic discrimination on standard (overused) tabular datasets. However, when data has missing values, there is still significant room for improvement in handling aleatoric discrimination.
Authors: Dmitrii Usynin, Daniel Rueckert, Georgios Kaissis
Abstract: Obtaining high-quality data for collaborative training of machine learning models can be a challenging task due to A) regulatory concerns and B) a lack of data owner incentives to participate. The first issue can be addressed through the combination of distributed machine learning techniques (e.g. federated learning) and privacy enhancing technologies (PET), such as the differentially private (DP) model training. The second challenge can be addressed by rewarding the participants for giving access to data which is beneficial to the training model, which is of particular importance in federated settings, where the data is unevenly distributed. However, DP noise can adversely affect the underrepresented and the atypical (yet often informative) data samples, making it difficult to assess their usefulness. In this work, we investigate how to leverage gradient information to permit the participants of private training settings to select the data most beneficial for the jointly trained model. We assess two such methods, namely variance of gradients (VoG) and the privacy loss-input susceptibility score (PLIS). We show that these techniques can provide the federated clients with tools for principled data selection even in stricter privacy settings.
Authors: Carlos Aguirre, Mark Dredze
Abstract: Training supervised machine learning systems with a fairness loss can improve prediction fairness across different demographic groups. However, doing so requires demographic annotations for training data, without which we cannot produce debiased classifiers for most tasks. Drawing inspiration from transfer learning methods, we investigate whether we can utilize demographic data from a related task to improve the fairness of a target task. We adapt a single-task fairness loss to a multi-task setting to exploit demographic labels from a related task in debiasing a target task and demonstrate that demographic fairness objectives transfer fairness within a multi-task framework. Additionally, we show that this approach enables intersectional fairness by transferring between two datasets with different single-axis demographics. We explore different data domains to show how our loss can improve fairness domains and tasks.
Authors: Yu-Hu Yan, Peng Zhao, Zhi-Hua Zhou
Abstract: In this paper, we propose an online convex optimization approach with two different levels of adaptivity. On a higher level, our approach is agnostic to the unknown types and curvatures of the online functions, while at a lower level, it can exploit the unknown niceness of the environments and attain problem-dependent guarantees. Specifically, we obtain $\mathcal{O}(\log V_T)$, $\mathcal{O}(d \log V_T)$ and $\hat{\mathcal{O}}(\sqrt{V_T})$ regret bounds for strongly convex, exp-concave and convex loss functions, respectively, where $d$ is the dimension, $V_T$ denotes problem-dependent gradient variations and the $\hat{\mathcal{O}}(\cdot)$-notation omits $\log V_T$ factors. Our result not only safeguards the worst-case guarantees but also directly implies the small-loss bounds in analysis. Moreover, when applied to adversarial/stochastic convex optimization and game theory problems, our result enhances the existing universal guarantees. Our approach is based on a multi-layer online ensemble framework incorporating novel ingredients, including a carefully designed optimism for unifying diverse function types and cascaded corrections for algorithmic stability. Notably, despite its multi-layer structure, our algorithm necessitates only one gradient query per round, making it favorable when the gradient evaluation is time-consuming. This is facilitated by a novel regret decomposition equipped with carefully designed surrogate losses.
Authors: Runzhe Wang, Sadhika Malladi, Tianhao Wang, Kaifeng Lyu, Zhiyuan Li
Abstract: Momentum is known to accelerate the convergence of gradient descent in strongly convex settings without stochastic gradient noise. In stochastic optimization, such as training neural networks, folklore suggests that momentum may help deep learning optimization by reducing the variance of the stochastic gradient update, but previous theoretical analyses do not find momentum to offer any provable acceleration. Theoretical results in this paper clarify the role of momentum in stochastic settings where the learning rate is small and gradient noise is the dominant source of instability, suggesting that SGD with and without momentum behave similarly in the short and long time horizons. Experiments show that momentum indeed has limited benefits for both optimization and generalization in practical training regimes where the optimal learning rate is not very large, including small- to medium-batch training from scratch on ImageNet and fine-tuning language models on downstream tasks.
Authors: Albert Yu Sun, Eliott Zemour, Arushi Saxena, Udith Vaidyanathan, Eric Lin, Christian Lau, Vaikkunth Mugunthan
Abstract: Machine learning practitioners often fine-tune generative pre-trained models like GPT-3 to improve model performance at specific tasks. Previous works, however, suggest that fine-tuned machine learning models memorize and emit sensitive information from the original fine-tuning dataset. Companies such as OpenAI offer fine-tuning services for their models, but no prior work has conducted a memorization attack on any closed-source models. In this work, we simulate a privacy attack on GPT-3 using OpenAI's fine-tuning API. Our objective is to determine if personally identifiable information (PII) can be extracted from this model. We (1) explore the use of naive prompting methods on a GPT-3 fine-tuned classification model, and (2) we design a practical word generation task called Autocomplete to investigate the extent of PII memorization in fine-tuned GPT-3 within a real-world context. Our findings reveal that fine-tuning GPT3 for both tasks led to the model memorizing and disclosing critical personally identifiable information (PII) obtained from the underlying fine-tuning dataset. To encourage further research, we have made our codes and datasets publicly available on GitHub at: https://github.com/albertsun1/gpt3-pii-attacks
Authors: Daniel Haider, Vincent Lostanlen, Martin Ehler, Peter Balazs
Abstract: What makes waveform-based deep learning so hard? Despite numerous attempts at training convolutional neural networks (convnets) for filterbank design, they often fail to outperform hand-crafted baselines. These baselines are linear time-invariant systems: as such, they can be approximated by convnets with wide receptive fields. Yet, in practice, gradient-based optimization leads to suboptimal approximations. In our article, we approach this phenomenon from the perspective of initialization. We present a theory of large deviations for the energy response of FIR filterbanks with random Gaussian weights. We find that deviations worsen for large filters and locally periodic input signals, which are both typical for audio signal processing applications. Numerical simulations align with our theory and suggest that the condition number of a convolutional layer follows a logarithmic scaling law between the number and length of the filters, which is reminiscent of discrete wavelet bases.
Authors: Alan Q. Wang, Batuhan K. Karaman, Heejong Kim, Jacob Rosenthal, Rachit Saluja, Sean I. Young, Mert R. Sabuncu
Abstract: Interpretability for machine learning models in medical imaging (MLMI) is an important direction of research. However, there is a general sense of murkiness in what interpretability means. Why does the need for interpretability in MLMI arise? What goals does one actually seek to address when interpretability is needed? To answer these questions, we identify a need to formalize the goals and elements of interpretability in MLMI. By reasoning about real-world tasks and goals common in both medical image analysis and its intersection with machine learning, we identify five core elements of interpretability: localization, visual recognizability, physical attribution, model transparency, and actionability. From this, we arrive at a framework for interpretability in MLMI, which serves as a step-by-step guide to approaching interpretability in this context. Overall, this paper formalizes interpretability needs in the context of medical imaging, and our applied perspective clarifies concrete MLMI-specific goals and considerations in order to guide method design and improve real-world usage. Our goal is to provide practical and didactic information for model designers and practitioners, inspire developers of models in the medical imaging field to reason more deeply about what interpretability is achieving, and suggest future directions of interpretability research.
Authors: Tianyuan Zou, Zixuan Gu, Yu He, Hideaki Takahashi, Yang Liu, Ya-Qin Zhang
Abstract: Vertical Federated Learning (VFL) has emerged as a collaborative training paradigm that allows participants with different features of the same group of users to accomplish cooperative training without exposing their raw data or model parameters. VFL has gained significant attention for its research potential and real-world applications in recent years, but still faces substantial challenges, such as in defending various kinds of data inference and backdoor attacks. Moreover, most of existing VFL projects are industry-facing and not easily used for keeping track of the current research progress. To address this need, we present an extensible and lightweight VFL framework VFLAIR (available at https://github.com/FLAIR-THU/VFLAIR), which supports VFL training with a variety of models, datasets and protocols, along with standardized modules for comprehensive evaluations of attacks and defense strategies. We also benchmark 11 attacks and 8 defenses performance under different communication and model partition settings and draw concrete insights and recommendations on the choice of defense strategies for different practical VFL deployment scenarios.
Authors: Priyam Gupta, Peter J. Schmid, Denis Sipp, Taraneh Sayadi, Georgios Rigas
Abstract: The Koopman operator presents an attractive approach to achieve global linearization of nonlinear systems, making it a valuable method for simplifying the understanding of complex dynamics. While data-driven methodologies have exhibited promise in approximating finite Koopman operators, they grapple with various challenges, such as the judicious selection of observables, dimensionality reduction, and the ability to predict complex system behaviors accurately. This study presents a novel approach termed Mori-Zwanzig autoencoder (MZ-AE) to robustly approximate the Koopman operator in low-dimensional spaces. The proposed method leverages a nonlinear autoencoder to extract key observables for approximating a finite invariant Koopman subspace and integrates a non-Markovian correction mechanism using the Mori-Zwanzig formalism. Consequently, this approach yields a closed representation of dynamics within the latent manifold of the nonlinear autoencoder, thereby enhancing the precision and stability of the Koopman operator approximation. Demonstrations showcase the technique's ability to capture regime transitions in the flow around a cylinder. It also provides a low dimensional approximation for Kuramoto-Sivashinsky with promising short-term predictability and robust long-term statistical performance. By bridging the gap between data-driven techniques and the mathematical foundations of Koopman theory, MZ-AE offers a promising avenue for improved understanding and prediction of complex nonlinear dynamics.
Authors: Andrew Szot, Max Schwarzer, Harsh Agrawal, Bogdan Mazoure, Walter Talbott, Katherine Metcalf, Natalie Mackraz, Devon Hjelm, Alexander Toshev
Abstract: We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take as input text instructions and visual egocentric observations and output actions directly in the environment. Using reinforcement learning, we train LLaRP to see and act solely through environmental interactions. We show that LLaRP is robust to complex paraphrasings of task instructions and can generalize to new tasks that require novel optimal behavior. In particular, on 1,000 unseen tasks it achieves 42% success rate, 1.7x the success rate of other common learned baselines or zero-shot applications of LLMs. Finally, to aid the community in studying language conditioned, massively multi-task, embodied AI problems we release a novel benchmark, Language Rearrangement, consisting of 150,000 training and 1,000 testing tasks for language-conditioned rearrangement. Video examples of LLaRP in unseen Language Rearrangement instructions are at https://llm-rl.github.io.
Authors: Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci
Abstract: The growing demand for Large Language Models (LLMs) in applications such as content generation, intelligent chatbots, and sentiment analysis poses considerable challenges for LLM service providers. To efficiently use GPU resources and boost throughput, batching multiple requests has emerged as a popular paradigm; to further speed up batching, LLM quantization techniques reduce memory consumption and increase computing capacity. However, prevalent quantization schemes (e.g., 8-bit weight-activation quantization) cannot fully leverage the capabilities of modern GPUs, such as 4-bit integer operators, resulting in sub-optimal performance. To maximize LLMs' serving throughput, we introduce Atom, a low-bit quantization method that achieves high throughput improvements with negligible accuracy loss. Atom significantly boosts serving throughput by using low-bit operators and considerably reduces memory consumption via low-bit quantization. It attains high accuracy by applying a novel mixed-precision and fine-grained quantization process. We evaluate Atom on 4-bit weight-activation quantization in the serving context. Atom improves end-to-end throughput (token/s) by up to $7.7\times$ compared to the FP16 and by $2.5\times$ compared to INT8 quantization, while maintaining the same latency target.
Authors: Dongwei Ye, Mengwu Guo
Abstract: One of the pivotal tasks in scientific machine learning is to represent underlying dynamical systems from time series data. Many methods for such dynamics learning explicitly require the derivatives of state data, which are not directly available and can be approximated conventionally by finite differences. However, the discrete approximations of time derivatives may result in poor estimations when state data are scarce and/or corrupted by noise, thus compromising the predictiveness of the learned dynamical models. To overcome this technical hurdle, we propose a new method that learns nonlinear dynamics through a Bayesian inference of characterizing model parameters. This method leverages a Gaussian process representation of states, and constructs a likelihood function using the correlation between state data and their derivatives, yet prevents explicit evaluations of time derivatives. Through a Bayesian scheme, a probabilistic estimate of the model parameters is given by the posterior distribution, and thus a quantification is facilitated for uncertainties from noisy state data and the learning process. Specifically, we will discuss the applicability of the proposed method to several typical scenarios for dynamical systems: identification and estimation with an affine parametrization, nonlinear parametric approximation without prior knowledge, and general parameter estimation for a given dynamical system.
Authors: Pengfei Ding, Yan Wang, Guanfeng Liu, Nan Wang, Xiaofang Zhou
Abstract: Heterogeneous graph few-shot learning (HGFL) has been developed to address the label sparsity issue in heterogeneous graphs (HGs), which consist of various types of nodes and edges. The core concept of HGFL is to extract knowledge from rich-labeled classes in a source HG, transfer this knowledge to a target HG to facilitate learning new classes with few-labeled training data, and finally make predictions on unlabeled testing data. Existing methods typically assume that the source HG, training data, and testing data all share the same distribution. However, in practice, distribution shifts among these three types of data are inevitable due to two reasons: (1) the limited availability of the source HG that matches the target HG distribution, and (2) the unpredictable data generation mechanism of the target HG. Such distribution shifts result in ineffective knowledge transfer and poor learning performance in existing methods, thereby leading to a novel problem of out-of-distribution (OOD) generalization in HGFL. To address this challenging problem, we propose a novel Causal OOD Heterogeneous graph Few-shot learning model, namely COHF. In COHF, we first characterize distribution shifts in HGs with a structural causal model, establishing an invariance principle for OOD generalization in HGFL. Then, following this invariance principle, we propose a new variational autoencoder-based heterogeneous graph neural network to mitigate the impact of distribution shifts. Finally, by integrating this network with a novel meta-learning framework, COHF effectively transfers knowledge to the target HG to predict new classes with few-labeled data. Extensive experiments on seven real-world datasets have demonstrated the superior performance of COHF over the state-of-the-art methods.
Authors: Ahmad Peyvan, Vivek Oommen, Ameya D. Jagtap, George Em Karniadakis
Abstract: Developing the proper representations for simulating high-speed flows with strong shock waves, rarefactions, and contact discontinuities has been a long-standing question in numerical analysis. Herein, we employ neural operators to solve Riemann problems encountered in compressible flows for extreme pressure jumps (up to $10^{10}$ pressure ratio). In particular, we first consider the DeepONet that we train in a two-stage process, following the recent work of \cite{lee2023training}, wherein the first stage, a basis is extracted from the trunk net, which is orthonormalized and subsequently is used in the second stage in training the branch net. This simple modification of DeepONet has a profound effect on its accuracy, efficiency, and robustness and leads to very accurate solutions to Riemann problems compared to the vanilla version. It also enables us to interpret the results physically as the hierarchical data-driven produced basis reflects all the flow features that would otherwise be introduced using ad hoc feature expansion layers. We also compare the results with another neural operator based on the U-Net for low, intermediate, and very high-pressure ratios that are very accurate for Riemann problems, especially for large pressure ratios, due to their multiscale nature but computationally more expensive. Overall, our study demonstrates that simple neural network architectures, if properly pre-trained, can achieve very accurate solutions of Riemann problems for real-time forecasting. The source code, along with its corresponding data, can be found at the following URL: https://github.com/apey236/RiemannONet/tree/main
Authors: Kotaro Yoshida, Hiroki Naganuma
Abstract: Machine learning models traditionally assume that training and test data are independently and identically distributed. However, in real-world applications, the test distribution often differs from training. This problem, known as out-of-distribution generalization, challenges conventional models. Invariant Risk Minimization (IRM) emerges as a solution, aiming to identify features invariant across different environments to enhance out-of-distribution robustness. However, IRM's complexity, particularly its bi-level optimization, has led to the development of various approximate methods. Our study investigates these approximate IRM techniques, employing the Expected Calibration Error (ECE) as a key metric. ECE, which measures the reliability of model prediction, serves as an indicator of whether models effectively capture environment-invariant features. Through a comparative analysis of datasets with distributional shifts, we observe that Information Bottleneck-based IRM, which condenses representational information, achieves a balance in improving ECE while preserving accuracy relatively. This finding is pivotal, as it demonstrates a feasible path to maintaining robustness without compromising accuracy. Nonetheless, our experiments also caution against over-regularization, which can diminish accuracy. This underscores the necessity for a systematic approach in evaluating out-of-distribution generalization metrics, one that beyond mere accuracy to address the nuanced interplay between accuracy and calibration.
Authors: Yuchen Zeng, Wonjun Kang, Yicong Chen, Hyung Il Koo, Kangwook Lee
Abstract: The evolution from Large Language Models (LLMs) to Multimodal Large Language Models (MLLMs) has spurred research into extending In-Context Learning (ICL) to its multimodal counterpart. Existing such studies have primarily concentrated on image-to-text ICL. However, the Text-to-Image ICL (T2I-ICL), with its unique characteristics and potential applications, remains underexplored. To address this gap, we formally define the task of T2I-ICL and present CoBSAT, the first T2I-ICL benchmark dataset, encompassing ten tasks. Utilizing our dataset to benchmark six state-of-the-art MLLMs, we uncover considerable difficulties MLLMs encounter in solving T2I-ICL. We identify the primary challenges as the inherent complexity of multimodality and image generation, and show that strategies such as fine-tuning and Chain-of-Thought prompting help to mitigate these difficulties, leading to notable improvements in performance. Our code and dataset are available at https://github.com/UW-Madison-Lee-Lab/CoBSAT.
Authors: Dan MacKinlay, Russell Tsuchida, Dan Pagendam, Petra Kuhnert
Abstract: Efficient inference in high-dimensional models remains a central challenge in machine learning. This paper introduces the Gaussian Ensemble Belief Propagation (GEnBP) algorithm, a fusion of the Ensemble Kalman filter and Gaussian belief propagation (GaBP) methods. GEnBP updates ensembles by passing low-rank local messages in a graphical model structure. This combination inherits favourable qualities from each method. Ensemble techniques allow GEnBP to handle high-dimensional states, parameters and intricate, noisy, black-box generation processes. The use of local messages in a graphical model structure ensures that the approach is suited to distributed computing and can efficiently handle complex dependence structures. GEnBP is particularly advantageous when the ensemble size is considerably smaller than the inference dimension. This scenario often arises in fields such as spatiotemporal modelling, image processing and physical model inversion. GEnBP can be applied to general problem structures, including jointly learning system parameters, observation parameters, and latent state variables.
Authors: Yuwen Yang, Yuxiang Lu, Suizhi Huang, Shalayiding Sirejiding, Hongtao Lu, Yue Ding
Abstract: The innovative Federated Multi-Task Learning (FMTL) approach consolidates the benefits of Federated Learning (FL) and Multi-Task Learning (MTL), enabling collaborative model training on multi-task learning datasets. However, a comprehensive evaluation method, integrating the unique features of both FL and MTL, is currently absent in the field. This paper fills this void by introducing a novel framework, FMTL-Bench, for systematic evaluation of the FMTL paradigm. This benchmark covers various aspects at the data, model, and optimization algorithm levels, and comprises seven sets of comparative experiments, encapsulating a wide array of non-independent and identically distributed (Non-IID) data partitioning scenarios. We propose a systematic process for comparing baselines of diverse indicators and conduct a case study on communication expenditure, time, and energy consumption. Through our exhaustive experiments, we aim to provide valuable insights into the strengths and limitations of existing baseline methods, contributing to the ongoing discourse on optimal FMTL application in practical scenarios. The source code can be found on https://github.com/youngfish42/FMTL-Benchmark .
Authors: Ziyi Guan, Hantao Huang, Yupeng Su, Hong Huang, Ngai Wong, Hao Yu
Abstract: Large Language Models (LLMs) have greatly advanced the natural language processing paradigm. However, the high computational load and huge model sizes pose a grand challenge for deployment on edge devices. To this end, we propose APTQ (Attention-aware Post-Training Mixed-Precision Quantization) for LLMs, which considers not only the second-order information of each layer's weights, but also, for the first time, the nonlinear effect of attention outputs on the entire model. We leverage the Hessian trace as a sensitivity metric for mixed-precision quantization, ensuring an informed precision reduction that retains model performance. Experiments show APTQ surpasses previous quantization methods, achieving an average of 4 bit width a 5.22 perplexity nearly equivalent to full precision in the C4 dataset. In addition, APTQ attains state-of-the-art zero-shot accuracy of 68.24\% and 70.48\% at an average bitwidth of 3.8 in LLaMa-7B and LLaMa-13B, respectively, demonstrating its effectiveness to produce high-quality quantized LLMs.
Authors: Zihan Zhou, Jonathan Booher, Khashayar Rohanimanesh, Wei Liu, Aleksandr Petiushko, Animesh Garg
Abstract: Safe reinforcement learning tasks with multiple constraints are a challenging domain despite being very common in the real world. In safety-critical domains, properly handling the constraints becomes even more important. To address this challenge, we first describe the multi-constraint problem with a stronger Uniformly Constrained MDP (UCMDP) model; we then propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic, as a solution to the Lagrangian dual of a UCMDP. We benchmark Objective Suppression in two multi-constraint safety domains, including an autonomous driving domain where any incorrect behavior can lead to disastrous consequences. Empirically, we demonstrate that our proposed method, when combined with existing safe RL algorithms, can match the task reward achieved by our baselines with significantly fewer constraint violations.
Authors: Shouheng Li, Dongwoo Kim, Qing Wang
Abstract: Despite the celebrated popularity of Graph Neural Networks (GNNs) across numerous applications, the ability of GNNs to generalize remains less explored. In this work, we propose to study the generalization of GNNs through a novel perspective - analyzing the entropy of graph homomorphism. By linking graph homomorphism with information-theoretic measures, we derive generalization bounds for both graph and node classifications. These bounds are capable of capturing subtleties inherent in various graph structures, including but not limited to paths, cycles and cliques. This enables a data-dependent generalization analysis with robust theoretical guarantees. To shed light on the generality of of our proposed bounds, we present a unifying framework that can characterize a broad spectrum of GNN models through the lens of graph homomorphism. We validate the practical applicability of our theoretical findings by showing the alignment between the proposed bounds and the empirically observed generalization gaps over both real-world and synthetic datasets.
Authors: Jonathan Lebensold, Maziar Sanjabi, Pietro Astolfi, Adriana Romero-Soriano, Kamalika Chaudhuri, Mike Rabbat, Chuan Guo
Abstract: Text-to-image diffusion models have been shown to suffer from sample-level memorization, possibly reproducing near-perfect replica of images that they are trained on, which may be undesirable. To remedy this issue, we develop the first differentially private (DP) retrieval-augmented generation algorithm that is capable of generating high-quality image samples while providing provable privacy guarantees. Specifically, we assume access to a text-to-image diffusion model trained on a small amount of public data, and design a DP retrieval mechanism to augment the text prompt with samples retrieved from a private retrieval dataset. Our \emph{differentially private retrieval-augmented diffusion model} (DP-RDM) requires no fine-tuning on the retrieval dataset to adapt to another domain, and can use state-of-the-art generative models to generate high-quality image samples while satisfying rigorous DP guarantees. For instance, when evaluated on MS-COCO, our DP-RDM can generate samples with a privacy budget of $\epsilon=10$, while providing a $3.5$ point improvement in FID compared to public-only retrieval for up to $10,000$ queries.
Authors: Shawn Im, Yixuan Li
Abstract: Aligning large language models (LLMs) with human intentions has become a critical task for safely deploying models in real-world systems. While existing alignment approaches have seen empirical success, theoretically understanding how these methods affect model behavior remains an open question. Our work provides an initial attempt to theoretically analyze the learning dynamics of human preference alignment. We formally show how the distribution of preference datasets influences the rate of model updates and provide rigorous guarantees on the training accuracy. Our theory also reveals an intricate phenomenon where the optimization is prone to prioritizing certain behaviors with higher preference distinguishability. We empirically validate our findings on contemporary LLMs and alignment tasks, reinforcing our theoretical insights and shedding light on considerations for future alignment approaches. Disclaimer: This paper contains potentially offensive text; reader discretion is advised.
Authors: Zhishang Luo, Truong Son Hy, Puoya Tabaghi, Donghyeon Koh, Michael Defferrard, Elahe Rezaei, Ryan Carey, Rhett Davis, Rajeev Jain, Yusu Wang
Abstract: The run-time for optimization tools used in chip design has grown with the complexity of designs to the point where it can take several days to go through one design cycle which has become a bottleneck. Designers want fast tools that can quickly give feedback on a design. Using the input and output data of the tools from past designs, one can attempt to build a machine learning model that predicts the outcome of a design in significantly shorter time than running the tool. The accuracy of such models is affected by the representation of the design data, which is usually a netlist that describes the elements of the digital circuit and how they are connected. Graph representations for the netlist together with graph neural networks have been investigated for such models. However, the characteristics of netlists pose several challenges for existing graph learning frameworks, due to the large number of nodes and the importance of long-range interactions between nodes. To address these challenges, we represent the netlist as a directed hypergraph and propose a Directional Equivariant Hypergraph Neural Network (DE-HNN) for the effective learning of (directed) hypergraphs. Theoretically, we show that our DE-HNN can universally approximate any node or hyperedge based function that satisfies certain permutation equivariant and invariant properties natural for directed hypergraphs. We compare the proposed DE-HNN with several State-of-the-art (SOTA) machine learning models for (hyper)graphs and netlists, and show that the DE-HNN significantly outperforms them in predicting the outcome of optimized place-and-route tools directly from the input netlists. Our source code and the netlists data used are publicly available at https://github.com/YusuLab/chips.git
Authors: Tim K. Smit, Hajo A. Reijers, Xixi Lu
Abstract: Predictive Process Monitoring focuses on predicting future states of ongoing process executions, such as forecasting the remaining time. Recent developments in Object-Centric Process Mining have enriched event data with objects and their explicit relations between events. To leverage this enriched data, we propose the Heterogeneous Object Event Graph encoding (HOEG), which integrates events and objects into a graph structure with diverse node types. It does so without aggregating object features, thus creating a more nuanced and informative representation. We then adopt a heterogeneous Graph Neural Network architecture, which incorporates these diverse object features in prediction tasks. We evaluate the performance and scalability of HOEG in predicting remaining time, benchmarking it against two established graph-based encodings and two baseline models. Our evaluation uses three Object-Centric Event Logs (OCELs), including one from a real-life process at a major Dutch financial institution. The results indicate that HOEG competes well with existing models and surpasses them when OCELs contain informative object attributes and event-object interactions.
Authors: Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Kiant\'e Brantley, Dipendra Misra, Jason D. Lee, Wen Sun
Abstract: Reinforcement Learning (RL) from Human Preference-based feedback is a popular paradigm for fine-tuning generative models, which has produced impressive models such as GPT-4 and Claude3 Opus. This framework often consists of two steps: learning a reward model from an offline preference dataset followed by running online RL to optimize the learned reward model. In this work, leveraging the idea of reset, we propose a new RLHF algorithm with provable guarantees. Motivated by the fact that offline preference dataset provides informative states (i.e., data that is preferred by the labelers), our new algorithm, Dataset Reset Policy Optimization (DR-PO), integrates the existing offline preference dataset into the online policy training procedure via dataset reset: it directly resets the policy optimizer to the states in the offline dataset, instead of always starting from the initial state distribution. In theory, we show that DR-PO learns to perform at least as good as any policy that is covered by the offline dataset under general function approximation with finite sample complexity. In experiments, we demonstrate that on both the TL;DR summarization and the Anthropic Helpful Harmful (HH) dataset, the generation from DR-PO is better than that from Proximal Policy Optimization (PPO) and Direction Preference Optimization (DPO), under the metric of GPT4 win-rate. Code for this work can be found at https://github.com/Cornell-RL/drpo.
Authors: Shreyas Chaudhari, Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, Ameet Deshpande, Bruno Castro da Silva
Abstract: State-of-the-art large language models (LLMs) have become indispensable tools for various tasks. However, training LLMs to serve as effective assistants for humans requires careful consideration. A promising approach is reinforcement learning from human feedback (RLHF), which leverages human feedback to update the model in accordance with human preferences and mitigate issues like toxicity and hallucinations. Yet, an understanding of RLHF for LLMs is largely entangled with initial design choices that popularized the method and current research focuses on augmenting those choices rather than fundamentally improving the framework. In this paper, we analyze RLHF through the lens of reinforcement learning principles to develop an understanding of its fundamentals, dedicating substantial focus to the core component of RLHF -- the reward model. Our study investigates modeling choices, caveats of function approximation, and their implications on RLHF training algorithms, highlighting the underlying assumptions made about the expressivity of reward. Our analysis improves the understanding of the role of reward models and methods for their training, concurrently revealing limitations of the current methodology. We characterize these limitations, including incorrect generalization, model misspecification, and the sparsity of feedback, along with their impact on the performance of a language model. The discussion and analysis are substantiated by a categorical review of current literature, serving as a reference for researchers and practitioners to understand the challenges of RLHF and build upon existing efforts.
Authors: Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou
Abstract: The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy. We introduce Megalodon, a neural architecture for efficient sequence modeling with unlimited context length. Megalodon inherits the architecture of Mega (exponential moving average with gated attention), and further introduces multiple technical components to improve its capability and stability, including complex exponential moving average (CEMA), timestep normalization layer, normalized attention mechanism and pre-norm with two-hop residual configuration. In a controlled head-to-head comparison with Llama2, Megalodon achieves better efficiency than Transformer in the scale of 7 billion parameters and 2 trillion training tokens. Megalodon reaches a training loss of 1.70, landing mid-way between Llama2-7B (1.75) and 13B (1.67). Code: https://github.com/XuezheMax/megalodon
Authors: Tara Kelly, Jessica Gupta
Abstract: Traffic congestion at intersections is a significant issue in urban areas, leading to increased commute times, safety hazards, and operational inefficiencies. This study aims to develop a predictive model for congestion at intersections in major U.S. cities, utilizing a dataset of trip-logging metrics from commercial vehicles across 4,800 intersections. The dataset encompasses 27 features, including intersection coordinates, street names, time of day, and traffic metrics (Kashyap et al., 2019). Additional features, such as rainfall/snowfall percentage, distance from downtown and outskirts, and road types, were incorporated to enhance the model's predictive power. The methodology involves data exploration, feature transformation, and handling missing values through low-rank models and label encoding. The proposed model has the potential to assist city planners and governments in anticipating traffic hot spots, optimizing operations, and identifying infrastructure challenges.
Authors: Akshansh Mishra
Abstract: Architected materials with their unique topology and geometry offer the potential to modify physical and mechanical properties. Machine learning can accelerate the design and optimization of these materials by identifying optimal designs and forecasting performance. This work presents LatticeML, a data-driven application for predicting the effective Young's Modulus of high-temperature graph-based architected materials. The study considers eleven graph-based lattice structures with two high-temperature alloys, Ti-6Al-4V and Inconel 625. Finite element simulations were used to compute the effective Young's Modulus of the 2x2x2 unit cell configurations. A machine learning framework was developed to predict Young's Modulus, involving data collection, preprocessing, implementation of regression models, and deployment of the best-performing model. Five supervised learning algorithms were evaluated, with the XGBoost Regressor achieving the highest accuracy (MSE = 2.7993, MAE = 1.1521, R-squared = 0.9875). The application uses the Streamlit framework to create an interactive web interface, allowing users to input material and geometric parameters and obtain predicted Young's Modulus values.
Authors: Stephen Bates, Michael I. Jordan, Michael Sklar, Jake A. Soloff
Abstract: Consider the relationship between a regulator (the principal) and an experimenter (the agent) such as a pharmaceutical company. The pharmaceutical company wishes to sell a drug for profit, whereas the regulator wishes to allow only efficacious drugs to be marketed. The efficacy of the drug is not known to the regulator, so the pharmaceutical company must run a costly trial to prove efficacy to the regulator. Critically, the statistical protocol used to establish efficacy affects the behavior of a strategic, self-interested agent; a lower standard of statistical evidence incentivizes the agent to run more trials that are less likely to be effective. The interaction between the statistical protocol and the incentives of the pharmaceutical company is crucial for understanding this system and designing protocols with high social utility. In this work, we discuss how the regulator can set up a protocol with payoffs based on statistical evidence. We show how to design protocols that are robust to an agent's strategic actions, and derive the optimal protocol in the presence of strategic entrants.
Authors: Oleg Platonov, Denis Kuznedelev, Artem Babenko, Liudmila Prokhorenkova
Abstract: Homophily is a graph property describing the tendency of edges to connect similar nodes; the opposite is called heterophily. It is often believed that heterophilous graphs are challenging for standard message-passing graph neural networks (GNNs), and much effort has been put into developing efficient methods for this setting. However, there is no universally agreed-upon measure of homophily in the literature. In this work, we show that commonly used homophily measures have critical drawbacks preventing the comparison of homophily levels across different datasets. For this, we formalize desirable properties for a proper homophily measure and verify which measures satisfy which properties. In particular, we show that a measure that we call adjusted homophily satisfies more desirable properties than other popular homophily measures while being rarely used in graph machine learning literature. Then, we go beyond the homophily-heterophily dichotomy and propose a new characteristic that allows one to further distinguish different sorts of heterophily. The proposed label informativeness (LI) characterizes how much information a neighbor's label provides about a node's label. We prove that this measure satisfies important desirable properties. We also observe empirically that LI better agrees with GNN performance compared to homophily measures, which confirms that it is a useful characteristic of the graph structure.
Authors: Martin Magris, Mostafa Shabani, Alexandros Iosifidis
Abstract: We propose an optimization algorithm for Variational Inference (VI) in complex models. Our approach relies on natural gradient updates where the variational space is a Riemann manifold. We develop an efficient algorithm for Gaussian Variational Inference whose updates satisfy the positive definite constraint on the variational covariance matrix. Our Manifold Gaussian Variational Bayes on the Precision matrix (MGVBP) solution provides simple update rules, is straightforward to implement, and the use of the precision matrix parametrization has a significant computational advantage. Due to its black-box nature, MGVBP stands as a ready-to-use solution for VI in complex models. Over five datasets, we empirically validate our feasible approach on different statistical and econometric models, discussing its performance with respect to baseline methods.
Authors: Md Zobaer Islam, Brenden Martin, Carly Gotcher, Tyler Martinez, John F. O'Hara, Sabit Ekin
Abstract: Human respiratory rate and its pattern convey essential information about the physical and psychological states of the subject. Abnormal breathing can indicate fatal health issues leading to further diagnosis and treatment. Wireless light-wave sensing (LWS) using incoherent infrared light shows promise in safe, discreet, efficient, and non-invasive human breathing monitoring without raising privacy concerns. The respiration monitoring system needs to be trained on different types of breathing patterns to identify breathing anomalies.The system must also validate the collected data as a breathing waveform, discarding any faulty data caused by external interruption, user movement, or system malfunction. To address these needs, this study simulated normal and different types of abnormal respiration using a robot that mimics human breathing patterns. Then, time-series respiration data were collected using infrared light-wave sensing technology. Three machine learning algorithms, decision tree, random forest and XGBoost, were applied to detect breathing anomalies and faulty data. Model performances were evaluated through cross-validation, assessing classification accuracy, precision and recall scores. The random forest model achieved the highest classification accuracy of 96.75% with data collected at a 0.5m distance. In general, ensemble models like random forest and XGBoost performed better than a single model in classifying the data collected at multiple distances from the light-wave sensing setup.
Authors: Dorjan Hitaj, Giulio Pagnotta, Fabio De Gaspari, Lorenzo De Carli, Luigi V. Mancini
Abstract: Ransomware attacks have caused billions of dollars in damages in recent years, and are expected to cause billions more in the future. Consequently, significant effort has been devoted to ransomware detection and mitigation. Behavioral-based ransomware detection approaches have garnered considerable attention recently. These behavioral detectors typically rely on process-based behavioral profiles to identify malicious behaviors. However, with an increasing body of literature highlighting the vulnerability of such approaches to evasion attacks, a comprehensive solution to the ransomware problem remains elusive. This paper presents Minerva, a novel robust approach to ransomware detection. Minerva is engineered to be robust by design against evasion attacks, with architectural and feature selection choices informed by their resilience to adversarial manipulation. We conduct a comprehensive analysis of Minerva across a diverse spectrum of ransomware types, encompassing unseen ransomware as well as variants designed specifically to evade Minerva. Our evaluation showcases the ability of Minerva to accurately identify ransomware, generalize to unseen threats, and withstand evasion attacks. Furthermore, Minerva achieves remarkably low detection times, enabling the adoption of data loss prevention techniques with near-zero overhead.
Authors: Dan Andrei Iliescu, Devang Savita Ram Mohan, Tian Huey Teh, Zack Hodari
Abstract: We address the problem of human-in-the-loop control for generating prosody in the context of text-to-speech synthesis. Controlling prosody is challenging because existing generative models lack an efficient interface through which users can modify the output quickly and precisely. To solve this, we introduce a novel framework whereby the user provides partial inputs and the generative model generates the missing features. We propose a model that is specifically designed to encode partial prosodic features and output complete audio. We show empirically that our model displays two essential qualities of a human-in-the-loop control mechanism: efficiency and robustness. With even a very small number of input values (~4), our model enables users to improve the quality of the output significantly in terms of listener preference (4:1).
Authors: Robert Lefringhausen, Supitsana Srithasan, Armin Lederer, Sandra Hirche
Abstract: As control engineering methods are applied to increasingly complex systems, data-driven approaches for system identification appear as a promising alternative to physics-based modeling. While the Bayesian approaches prevalent for safety-critical applications usually rely on the availability of state measurements, the states of a complex system are often not directly measurable. It may then be necessary to jointly estimate the dynamics and the latent state, making the quantification of uncertainties and the design of controllers with formal performance guarantees considerably more challenging. This paper proposes a novel method for the computation of an optimal input trajectory for unknown nonlinear systems with latent states based on a combination of particle Markov chain Monte Carlo methods and scenario theory. Probabilistic performance guarantees are derived for the resulting input trajectory, and an approach to validate the performance of arbitrary control laws is presented. The effectiveness of the proposed method is demonstrated in a numerical simulation.
Authors: Alexander Scarlatos, Andrew Lan
Abstract: Recent developments in large pre-trained language models have enabled unprecedented performance on a variety of downstream tasks. Achieving best performance with these models often leverages in-context learning, where a model performs a (possibly new) task given one or more examples. However, recent work has shown that the choice of examples can have a large impact on task performance and that finding an optimal set of examples is non-trivial. While there are many existing methods for selecting in-context examples, they generally score examples independently, ignoring the dependency between them and the order in which they are provided to the model. In this work, we propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning. We frame the problem of sequential example selection as a Markov decision process and train an example retriever using reinforcement learning. We evaluate RetICL on math word problem solving and scientific question answering tasks and show that it consistently outperforms or matches heuristic and learnable baselines. We also use case studies to show that RetICL implicitly learns representations of problem solving strategies.
Authors: Abdulrahman Diaa, Lucas Fenaux, Thomas Humphries, Marian Dietz, Faezeh Ebrahimianghazani, Bailey Kacsmar, Xinda Li, Nils Lukas, Rasoul Akhavan Mahdavi, Simon Oya, Ehsan Amjadian, Florian Kerschbaum
Abstract: Machine Learning as a Service (MLaaS) is an increasingly popular design where a company with abundant computing resources trains a deep neural network and offers query access for tasks like image classification. The challenge with this design is that MLaaS requires the client to reveal their potentially sensitive queries to the company hosting the model. Multi-party computation (MPC) protects the client's data by allowing encrypted inferences. However, current approaches suffer from prohibitively large inference times. The inference time bottleneck in MPC is the evaluation of non-linear layers such as ReLU activation functions. Motivated by the success of previous work co-designing machine learning and MPC, we develop an activation function co-design. We replace all ReLUs with a polynomial approximation and evaluate them with single-round MPC protocols, which give state-of-the-art inference times in wide-area networks. Furthermore, to address the accuracy issues previously encountered with polynomial activations, we propose a novel training algorithm that gives accuracy competitive with plaintext models. Our evaluation shows between $3$ and $110\times$ speedups in inference time on large models with up to $23$ million parameters while maintaining competitive inference accuracy.
Authors: Janet Wang, Yunbei Zhang, Zhengming Ding, Jihun Hamm
Abstract: The development of reliable and fair diagnostic systems is often constrained by the scarcity of labeled data. To address this challenge, our work explores the feasibility of unsupervised domain adaptation (UDA) to integrate large external datasets for developing reliable classifiers. The adoption of UDA with multiple sources can simultaneously enrich the training set and bridge the domain gap between different skin lesion datasets, which vary due to distinct acquisition protocols. Particularly, UDA shows practical promise for improving diagnostic reliability when training with a custom skin lesion dataset, where only limited labeled data are available from the target domain. In this study, we investigate three UDA training schemes based on source data utilization: single-source, combined-source, and multi-source UDA. Our findings demonstrate the effectiveness of applying UDA on multiple sources for binary and multi-class classification. A strong correlation between test error and label shift in multi-class tasks has been observed in the experiment. Crucially, our study shows that UDA can effectively mitigate bias against minority groups and enhance fairness in diagnostic systems, while maintaining superior classification performance. This is achieved even without directly implementing fairness-focused techniques. This success is potentially attributed to the increased and well-adapted demographic information obtained from multiple sources.
Authors: Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, Graham Neubig
Abstract: With advances in generative AI, there is now potential for autonomous agents to manage daily tasks via natural language commands. However, current agents are primarily created and tested in simplified synthetic environments, leading to a disconnect with real-world scenarios. In this paper, we build an environment for language-guided agents that is highly realistic and reproducible. Specifically, we focus on agents that perform tasks on the web, and create an environment with fully functional websites from four common domains: e-commerce, social forum discussions, collaborative software development, and content management. Our environment is enriched with tools (e.g., a map) and external knowledge bases (e.g., user manuals) to encourage human-like task-solving. Building upon our environment, we release a set of benchmark tasks focusing on evaluating the functional correctness of task completions. The tasks in our benchmark are diverse, long-horizon, and designed to emulate tasks that humans routinely perform on the internet. We experiment with several baseline agents, integrating recent techniques such as reasoning before acting. The results demonstrate that solving complex tasks is challenging: our best GPT-4-based agent only achieves an end-to-end task success rate of 14.41%, significantly lower than the human performance of 78.24%. These results highlight the need for further development of robust agents, that current state-of-the-art large language models are far from perfect performance in these real-life tasks, and that WebArena can be used to measure such progress.
Authors: Yuhang Zhou, Xuan Lu, Ge Gao, Qiaozhu Mei, Wei Ai
Abstract: Although remote working is increasingly adopted during the pandemic, many are concerned by the low-efficiency in the remote working. Missing in text-based communication are non-verbal cues such as facial expressions and body language, which hinders the effective communication and negatively impacts the work outcomes. Prevalent on social media platforms, emojis, as alternative non-verbal cues, are gaining popularity in the virtual workspaces well. In this paper, we study how emoji usage influences developer participation and issue resolution in virtual workspaces. To this end, we collect GitHub issues for a one-year period and apply causal inference techniques to measure the causal effect of emojis on the outcome of issues, controlling for confounders such as issue content, repository, and author information. We find that emojis can significantly reduce the resolution time of issues and attract more user participation. We also compare the heterogeneous effect on different types of issues. These findings deepen our understanding of the developer communities, and they provide design implications on how to facilitate interactions and broaden developer participation.
Authors: Yash Semlani, Mihir Relan, Krithik Ramesh
Abstract: Jet tagging is a classification problem in high-energy physics experiments that aims to identify the collimated sprays of subatomic particles, jets, from particle collisions and tag them to their emitter particle. Advances in jet tagging present opportunities for searches of new physics beyond the Standard Model. Current approaches use deep learning to uncover hidden patterns in complex collision data. However, the representation of jets as inputs to a deep learning model have been varied, and often, informative features are withheld from models. In this study, we propose a graph-based representation of a jet that encodes the most information possible. To learn best from this representation, we design Particle Chebyshev Network (PCN), a graph neural network (GNN) using Chebyshev graph convolutions (ChebConv). ChebConv has been demonstrated as an effective alternative to classical graph convolutions in GNNs and has yet to be explored in jet tagging. PCN achieves a substantial improvement in accuracy over existing taggers and opens the door to future studies into graph-based representations of jets and ChebConv layers in high-energy physics experiments. Code is available at https://github.com/YVSemlani/PCN-Jet-Tagging.
Authors: S\'ebastien Ragot
Abstract: This work proposes to measure the scope of a patent claim as the reciprocal of self-information contained in this claim. Self-information is calculated based on a probability of occurrence of the claim, where this probability is obtained from a language model. Grounded in information theory, this approach is based on the assumption that an unlikely concept is more informative than a usual concept, insofar as it is more surprising. In turn, the more surprising the information required to define the claim, the narrower its scope. Seven language models are considered, ranging from simplest models (each word or character has an identical probability) to intermediate models (based on average word or character frequencies), to large language models (LLMs) such as GPT2 and davinci-002. Remarkably, when using the simplest language models to compute the probabilities, the scope becomes proportional to the reciprocal of the number of words or characters involved in the claim, a metric already used in previous works. Application is made to multiple series of patent claims directed to distinct inventions, where each series consists of claims devised to have a gradually decreasing scope. The performance of the language models is then assessed through several ad hoc tests. The LLMs outperform models based on word and character frequencies, which themselves outdo the simplest models based on word or character counts. Interestingly, however, the character count appears to be a more reliable indicator than the word count.
Authors: Letian Zhang, Xiaotong Zhai, Zhongkai Zhao, Yongshuo Zong, Xin Wen, Bingchen Zhao
Abstract: Counterfactual reasoning, a fundamental aspect of human cognition, involves contemplating alternatives to established facts or past events, significantly enhancing our abilities in planning and decision-making. In light of the advancements in current multi-modal large language models, we explore their effectiveness in counterfactual reasoning. To facilitate this investigation, we introduce a novel dataset, C-VQA, specifically designed to test the counterfactual reasoning capabilities of modern multi-modal large language models. This dataset is constructed by infusing original questions with counterfactual presuppositions, spanning various types such as numerical and boolean queries. It encompasses a mix of real and synthetic data, representing a wide range of difficulty levels. Our thorough evaluations of contemporary vision-language models using this dataset have revealed substantial performance drops, with some models showing up to a 40% decrease, highlighting a significant gap between current models and human-like vision reasoning capabilities. We hope our dataset will serve as a vital benchmark for evaluating the counterfactual reasoning capabilities of models. Code and dataset are publicly available at https://bzhao.me/C-VQA/.
URLs: https://bzhao.me/C-VQA/.
Authors: Forrest Huang, Gang Li, Tao Li, Yang Li
Abstract: Macros are building block tasks of our everyday smartphone activity (e.g., "login", or "booking a flight"). Effectively extracting macros is important for understanding mobile interaction and enabling task automation. These macros are however difficult to extract at scale as they can be comprised of multiple steps yet hidden within programmatic components of mobile apps. In this paper, we introduce a novel approach based on Large Language Models (LLMs) to automatically extract semantically meaningful macros from both random and user-curated mobile interaction traces. The macros produced by our approach are automatically tagged with natural language descriptions and are fully executable. We conduct multiple studies to validate the quality of extracted macros, including user evaluation, comparative analysis against human-curated tasks, and automatic execution of these macros. These experiments and analyses show the effectiveness of our approach and the usefulness of extracted macros in various downstream applications.
Authors: Zhiyuan Zeng, Jiatong Yu, Tianyu Gao, Yu Meng, Tanya Goyal, Danqi Chen
Abstract: As research in large language models (LLMs) continues to accelerate, LLM-based evaluation has emerged as a scalable and cost-effective alternative to human evaluations for comparing the ever increasing list of models. This paper investigates the efficacy of these ``LLM evaluators'', particularly in using them to assess instruction following, a metric that gauges how closely generated text adheres to the given instruction. We introduce a challenging meta-evaluation benchmark, LLMBar, designed to test the ability of an LLM evaluator in discerning instruction-following outputs. The authors manually curated 419 pairs of outputs, one adhering to instructions while the other diverging, yet may possess deceptive qualities that mislead an LLM evaluator, e.g., a more engaging tone. Contrary to existing meta-evaluation, we discover that different evaluators (i.e., combinations of LLMs and prompts) exhibit distinct performance on LLMBar and even the highest-scoring ones have substantial room for improvement. We also present a novel suite of prompting strategies that further close the gap between LLM and human evaluators. With LLMBar, we hope to offer more insight into LLM evaluators and foster future research in developing better instruction-following models.
Authors: Enric Boix-Adsera, Omid Saremi, Emmanuel Abbe, Samy Bengio, Etai Littwin, Joshua Susskind
Abstract: We investigate the capabilities of transformer models on relational reasoning tasks. In these tasks, models are trained on a set of strings encoding abstract relations, and are then tested out-of-distribution on data that contains symbols that did not appear in the training dataset. We prove that for any relational reasoning task in a large family of tasks, transformers learn the abstract relations and generalize to the test set when trained by gradient descent on sufficiently large quantities of training data. This is in contrast to classical fully-connected networks, which we prove fail to learn to reason. Our results inspire modifications of the transformer architecture that add only two trainable parameters per head, and that we empirically demonstrate improve data efficiency for learning to reason.
Authors: Anass Aghbalou, Fran\c{c}ois Portier, Anne Sabourin
Abstract: When dealing with imbalanced classification data, reweighting the loss function is a standard procedure allowing to equilibrate between the true positive and true negative rates within the risk measure. Despite significant theoretical work in this area, existing results do not adequately address a main challenge within the imbalanced classification framework, which is the negligible size of one class in relation to the full sample size and the need to rescale the risk function by a probability tending to zero. To address this gap, we present two novel contributions in the setting where the rare class probability approaches zero: (1) a non asymptotic fast rate probability bound for constrained balanced empirical risk minimization, and (2) a consistent upper bound for balanced nearest neighbors estimates. Our findings provide a clearer understanding of the benefits of class-weighting in realistic settings, opening new avenues for further research in this field.
Authors: Eric J. Bigelow, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Tomer D. Ullman
Abstract: Large language models (LLMs) trained on huge corpora of text datasets demonstrate intriguing capabilities, achieving state-of-the-art performance on tasks they were not explicitly trained for. The precise nature of LLM capabilities is often mysterious, and different prompts can elicit different capabilities through in-context learning. We propose a framework that enables us to analyze in-context learning dynamics to understand latent concepts underlying LLMs' behavioral patterns. This provides a more nuanced understanding than success-or-failure evaluation benchmarks, but does not require observing internal activations as a mechanistic interpretation of circuits would. Inspired by the cognitive science of human randomness perception, we use random binary sequences as context and study dynamics of in-context learning by manipulating properties of context data, such as sequence length. In the latest GPT-3.5+ models, we find emergent abilities to generate seemingly random numbers and learn basic formal languages, with striking in-context learning dynamics where model outputs transition sharply from seemingly random behaviors to deterministic repetition.
Authors: Alec Edwards, Andrea Peruffo, Alessandro Abate
Abstract: This paper presents Fossil 2.0, a new major release of a software tool for the synthesis of certificates (e.g., Lyapunov and barrier functions) for dynamical systems modelled as ordinary differential and difference equations. Fossil 2.0 is much improved from its original release, including new interfaces, a significantly expanded certificate portfolio, controller synthesis and enhanced extensibility. We present these new features as part of this tool paper. Fossil implements a counterexample-guided inductive synthesis (CEGIS) loop ensuring the soundness of the method. Our tool uses neural networks as templates to generate candidate functions, which are then formally proven by an SMT solver acting as an assertion verifier. Improvements with respect to the first release include a wider range of certificates, synthesis of control laws, and support for discrete-time models.
Authors: Yan Cathy Hua, Paul Denny, Katerina Taskova, J\"org Wicker
Abstract: Aspect-based Sentiment Analysis (ABSA) is a fine-grained type of sentiment analysis that identifies aspects and their associated opinions from a given text. With the surge of digital opinionated text data, ABSA gained increasing popularity for its ability to mine more detailed and targeted insights. Many review papers on ABSA subtasks and solution methodologies exist, however, few focus on trends over time or systemic issues relating to research application domains, datasets, and solution approaches. To fill the gap, this paper presents a Systematic Literature Review (SLR) of ABSA studies with a focus on trends and high-level relationships among these fundamental components. This review is one of the largest SLRs on ABSA, and also, to our knowledge, the first that systematically examines the trends and inter-relations among ABSA research and data distribution across domains and solution paradigms and approaches. Our sample includes 519 primary studies screened from 4191 search results without time constraints via an innovative automatic filtering process. Our quantitative analysis not only identifies trends in nearly two decades of ABSA research development but also unveils a systemic lack of dataset and domain diversity as well as domain mismatch that may hinder the development of future ABSA research. We discuss these findings and their implications and propose suggestions for future research.
Authors: Jeongsol Kim, Geon Yeong Park, Hyungjin Chung, Jong Chul Ye
Abstract: The recent advent of diffusion models has led to significant progress in solving inverse problems, leveraging these models as effective generative priors. Nonetheless, there remain challenges related to the ill-posed nature of such problems, often due to inherent ambiguities in measurements or intrinsic system symmetries. To address this, drawing inspiration from the human ability to resolve visual ambiguities through perceptual biases, here we introduce a novel latent diffusion inverse solver by regularization by texts (TReg). Specifically, TReg applies the textual description of the preconception of the solution during the reverse diffusion sampling, of which the description is dynamically reinforced through null-text optimization for adaptive negation. Our comprehensive experimental results demonstrate that TReg successfully mitigates ambiguity in the inverse problems, enhancing their effectiveness and accuracy.
Authors: Md Zobaer Islam, Sabit Ekin, John F. O'Hara, Gary Yen
Abstract: In this study, we present a deep learning-based approach for time-series respiration data classification. The dataset contains regular breathing patterns as well as various forms of abnormal breathing, obtained through non-contact incoherent light-wave sensing (LWS) technology. Given the one-dimensional (1D) nature of the data, we employed a 1D convolutional neural network (1D-CNN) for classification purposes. Genetic algorithm was employed to optimize the 1D-CNN architecture to maximize classification accuracy. Addressing the computational complexity associated with training the 1D-CNN across multiple generations, we implemented transfer learning from a pre-trained model. This approach significantly reduced the computational time required for training, thereby enhancing the efficiency of the optimization process. This study contributes valuable insights into the potential applications of deep learning methodologies for enhancing respiratory anomaly detection through precise and efficient respiration classification.
Authors: Wei Liu, Weihao Zeng, Keqing He, Yong Jiang, Junxian He
Abstract: Instruction tuning is a standard technique employed to align large language models to end tasks and user preferences after the initial pretraining phase. Recent research indicates the critical role of data engineering in instruction tuning -- when appropriately selected, only limited data is necessary to achieve superior performance. However, we still lack a principled understanding of what makes good instruction tuning data for alignment, and how we should select data automatically and effectively. In this work, we delve deeply into automatic data selection strategies for alignment. We start with controlled studies to measure data across three dimensions: complexity, quality, and diversity, along which we examine existing methods and introduce novel techniques for enhanced data measurement. Subsequently, we propose a simple strategy to select data samples based on the measurement. We present deita (short for Data-Efficient Instruction Tuning for Alignment), a series of models fine-tuned from LLaMA and Mistral models using data samples automatically selected with our proposed approach. Empirically, deita performs better or on par with the state-of-the-art open-source alignment models with only 6K SFT training data samples -- over 10x less than the data used in the baselines. When further trained with direct preference optimization (DPO), deita-Mistral-7B + DPO trained with 6K SFT and 10K DPO samples achieve 7.55 MT-Bench and 90.06% AlpacaEval scores. We anticipate this work to provide tools on automatic data selection, facilitating data-efficient alignment. We release our models as well as the selected datasets for future researches to effectively align models more efficiently.
Authors: Wesley H. Holliday, Alexander Kristoffersen, Eric Pacuit
Abstract: By classic results in social choice theory, any reasonable preferential voting method sometimes gives individuals an incentive to report an insincere preference. The extent to which different voting methods are more or less resistant to such strategic manipulation has become a key consideration for comparing voting methods. Here we measure resistance to manipulation by whether neural networks of varying sizes can learn to profitably manipulate a given voting method in expectation, given different types of limited information about how other voters will vote. We trained over 70,000 neural networks of 26 sizes to manipulate against 8 different voting methods, under 6 types of limited information, in committee-sized elections with 5-21 voters and 3-6 candidates. We find that some voting methods, such as Borda, are highly manipulable by networks with limited information, while others, such as Instant Runoff, are not, despite being quite profitably manipulated by an ideal manipulator with full information. For the two probability models for elections that we use, the overall least manipulable of the 8 methods we study are Condorcet methods, namely Minimax and Split Cycle.
Authors: Chanyoung Chung, Georgios Georgakis, Patrick Spieler, Curtis Padgett, Shehryar Khattak
Abstract: Understanding terrain topology at long-range is crucial for the success of off-road robotic missions, especially when navigating at high-speeds. LiDAR sensors, which are currently heavily relied upon for geometric mapping, provide sparse measurements when mapping at greater distances. To address this challenge, we present a novel learning-based approach capable of predicting terrain elevation maps at long-range using only onboard egocentric images in real-time. Our proposed method is comprised of three main elements. First, a transformer-based encoder is introduced that learns cross-view associations between the egocentric views and prior bird-eye-view elevation map predictions. Second, an orientation-aware positional encoding is proposed to incorporate the 3D vehicle pose information over complex unstructured terrain with multi-view visual image features. Lastly, a history-augmented learn-able map embedding is proposed to achieve better temporal consistency between elevation map predictions to facilitate the downstream navigational tasks. We experimentally validate the applicability of our proposed approach for autonomous offroad robotic navigation in complex and unstructured terrain using real-world offroad driving data. Furthermore, the method is qualitatively and quantitatively compared against the current state-of-the-art methods. Extensive field experiments demonstrate that our method surpasses baseline models in accurately predicting terrain elevation while effectively capturing the overall terrain topology at long-ranges. Finally, ablation studies are conducted to highlight and understand the effect of key components of the proposed approach and validate their suitability to improve offroad robotic navigation capabilities.
Authors: Syed Mekael Wasti, Ken Q. Pu, Ali Neshati
Abstract: The evolution of Large Language Models (LLMs) has showcased remarkable capacities for logical reasoning and natural language comprehension. These capabilities can be leveraged in solutions that semantically and textually model complex problems. In this paper, we present our efforts toward constructing a framework that can serve as an intermediary between a user and their user interface (UI), enabling dynamic and real-time interactions. We employ a system that stands upon textual semantic mappings of UI components, in the form of annotations. These mappings are stored, parsed, and scaled in a custom data structure, supplementary to an agent-based prompting backend engine. Employing textual semantic mappings allows each component to not only explain its role to the engine but also provide expectations. By comprehending the needs of both the user and the components, our LLM engine can classify the most appropriate application, extract relevant parameters, and subsequently execute precise predictions of the user's expected actions. Such an integration evolves static user interfaces into highly dynamic and adaptable solutions, introducing a new frontier of intelligent and responsive user experiences.
Authors: Hanling Yi, Feng Lin, Hongbin Li, Peiyang Ning, Xiaotian Yu, Rong Xiao
Abstract: This research aims to accelerate the inference speed of large language models (LLMs) with billions of parameters. We propose \textbf{S}mart \textbf{P}arallel \textbf{A}uto-\textbf{C}orrect d\textbf{E}coding (SPACE), an innovative approach designed for achieving lossless acceleration of LLMs. By integrating semi-autoregressive inference and speculative decoding capabilities, SPACE uniquely enables autoregressive LLMs to parallelize token generation and verification. This is realized through a specialized semi-autoregressive supervised fine-tuning process that equips existing LLMs with the ability to simultaneously predict multiple tokens. Additionally, an auto-correct decoding algorithm facilitates the simultaneous generation and verification of token sequences within a single model invocation. Through extensive experiments on a range of LLMs, SPACE has demonstrated inference speedup ranging from 2.7x-4.0x on HumanEval-X while maintaining output quality.
Authors: Sisipho Hamlomo, Marcellin Atemkeng, Yusuf Brima, Chuneeta Nunhokee, Jeremy Baxter
Abstract: The large volume and complexity of medical imaging datasets are bottlenecks for storage, transmission, and processing. To tackle these challenges, the application of low-rank matrix approximation (LRMA) and its derivative, local LRMA (LLRMA) has demonstrated potential. A detailed analysis of the literature identifies LRMA and LLRMA methods applied to various imaging modalities, and the challenges and limitations associated with existing LRMA and LLRMA methods are addressed. We note a significant shift towards a preference for LLRMA in the medical imaging field since 2015, demonstrating its potential and effectiveness in capturing complex structures in medical data compared to LRMA. Acknowledging the limitations of shallow similarity methods used with LLRMA, we suggest advanced semantic image segmentation for similarity measure, explaining in detail how it can measure similar patches and their feasibility. We note that LRMA and LLRMA are mainly applied to unstructured medical data, and we propose extending their application to different medical data types, including structured and semi-structured. This paper also discusses how LRMA and LLRMA can be applied to regular data with missing entries and the impact of inaccuracies in predicting missing values and their effects. We discuss the impact of patch size and propose the use of random search (RS) to determine the optimal patch size. To enhance feasibility, a hybrid approach using Bayesian optimization and RS is proposed, which could improve the application of LRMA and LLRMA in medical imaging.
Authors: Andrea Ferigo, Elia Cunegatti, Giovanni Iacca
Abstract: One of the most striking capabilities behind the learning mechanisms of the brain is the adaptation, through structural and functional plasticity, of its synapses. While synapses have the fundamental role of transmitting information across the brain, several studies show that it is the neuron activations that produce changes on synapses. Yet, most plasticity models devised for artificial Neural Networks (NNs), e.g., the ABCD rule, focus on synapses, rather than neurons, therefore optimizing synaptic-specific Hebbian parameters. This approach, however, increases the complexity of the optimization process since each synapse is associated to multiple Hebbian parameters. To overcome this limitation, we propose a novel plasticity model, called Neuron-centric Hebbian Learning (NcHL), where optimization focuses on neuron- rather than synaptic-specific Hebbian parameters. Compared to the ABCD rule, NcHL reduces the parameters from $5W$ to $5N$, being $W$ and $N$ the number of weights and neurons, and usually $N \ll W$. We also devise a ``weightless'' NcHL model, which requires less memory by approximating the weights based on a record of neuron activations. Our experiments on two robotic locomotion tasks reveal that NcHL performs comparably to the ABCD rule, despite using up to $\sim97$ times less parameters, thus allowing for scalable plasticity
Authors: Zeyu Liu, Souvik Kundu, Anni Li, Junrui Wan, Lianghao Jiang, Peter Anthony Beerel
Abstract: We present a novel Parameter-Efficient Fine-Tuning (PEFT) method, dubbed as Adaptive Freezing of Low Rank Adaptation (AFLoRA). Specifically, for each pre-trained frozen weight tensor, we add a parallel path of trainable low-rank matrices, namely a down-projection and an up-projection matrix, each of which is followed by a feature transformation vector. Based on a novel freezing score, we the incrementally freeze these projection matrices during fine-tuning to reduce the computation and alleviate over-fitting. Our experimental results demonstrate that we can achieve state-of-the-art performance with an average improvement of up to $0.85\%$ as evaluated on GLUE benchmark while yeilding up to $9.5\times$ fewer average trainable parameters. While compared in terms of runtime, AFLoRA can yield up to $1.86\times$ improvement as opposed to similar PEFT alternatives. Besides the practical utility of our approach, we provide insights on the trainability requirements of LoRA paths at different modules and the freezing schedule for the different projection matrices. Code will be released.
Authors: Hao Tang, Lianglun Cheng, Guoheng Huang, Zhengguang Tan, Junhao Lu, Kaihong Wu
Abstract: Image segmentation holds a vital position in the realms of diagnosis and treatment within the medical domain. Traditional convolutional neural networks (CNNs) and Transformer models have made significant advancements in this realm, but they still encounter challenges because of limited receptive field or high computing complexity. Recently, State Space Models (SSMs), particularly Mamba and its variants, have demonstrated notable performance in the field of vision. However, their feature extraction methods may not be sufficiently effective and retain some redundant structures, leaving room for parameter reduction. Motivated by previous spatial and channel attention methods, we propose Triplet Mamba-UNet. The method leverages residual VSS Blocks to extract intensive contextual features, while Triplet SSM is employed to fuse features across spatial and channel dimensions. We conducted experiments on ISIC17, ISIC18, CVC-300, CVC-ClinicDB, Kvasir-SEG, CVC-ColonDB, and Kvasir-Instrument datasets, demonstrating the superior segmentation performance of our proposed TM-UNet. Additionally, compared to the previous VM-UNet, our model achieves a one-third reduction in parameters.
Authors: Linchen Qian, Jiasong Chen, Linhai Ma, Timur Urakov, Weiyong Gu, Liang Liang
Abstract: Lumbar disc degeneration, a progressive structural wear and tear of lumbar intervertebral disc, is regarded as an essential role on low back pain, a significant global health concern. Automated lumbar spine geometry reconstruction from MR images will enable fast measurement of medical parameters to evaluate the lumbar status, in order to determine a suitable treatment. Existing image segmentation-based techniques often generate erroneous segments or unstructured point clouds, unsuitable for medical parameter measurement. In this work, we present TransDeformer: a novel attention-based deep learning approach that reconstructs the geometry of the lumbar spine with high spatial accuracy and mesh correspondence across patients, and we also present a variant of TransDeformer for error estimation. Specially, we devise new attention modules with a new attention formula, which integrate image features and tokenized contour features to predict the displacements of the points on a shape template without the need for image segmentation. The deformed template reveals the lumbar spine geometry in an image. Experiment results show that our TransDeformer generates artifact-free geometry outputs, and its variant predicts the error of a reconstructed geometry. Our code is available at https://github.com/linchenq/TransDeformer-Mesh.
Authors: Alexander Nemecek, Yuzhou Jiang, Erman Ayday
Abstract: Recent advancements of large language models (LLMs) have resulted in indistinguishable text outputs comparable to human-generated text. Watermarking algorithms are potential tools that offer a way to differentiate between LLM- and human-generated text by embedding detectable signatures within LLM-generated output. However, current watermarking schemes lack robustness against known attacks against watermarking algorithms. In addition, they are impractical considering an LLM generates tens of thousands of text outputs per day and the watermarking algorithm needs to memorize each output it generates for the detection to work. In this work, focusing on the limitations of current watermarking schemes, we propose the concept of a "topic-based watermarking algorithm" for LLMs. The proposed algorithm determines how to generate tokens for the watermarked LLM output based on extracted topics of an input prompt or the output of a non-watermarked LLM. Inspired from previous work, we propose using a pair of lists (that are generated based on the specified extracted topic(s)) that specify certain tokens to be included or excluded while generating the watermarked output of the LLM. Using the proposed watermarking algorithm, we show the practicality of a watermark detection algorithm. Furthermore, we discuss a wide range of attacks that can emerge against watermarking algorithms for LLMs and the benefit of the proposed watermarking scheme for the feasibility of modeling a potential attacker considering its benefit vs. loss.
Authors: Juan C. Mej\'ia-Fragoso, Manuel A. Florez, Roc\'io Bernal-Olaya
Abstract: Accurate determination of the geothermal gradient is critical for assessing the geothermal energy potential of a given region. Of particular interest is the case of Colombia, a country with abundant geothermal resources. A history of active oil and gas exploration and production has left drilled boreholes in different geological settings, providing direct measurements of the geothermal gradient. Unfortunately, large regions of the country where geothermal resources might exist lack such measurements. Indirect geophysical measurements are costly and difficult to perform at regional scales. Computational thermal models could be constructed, but they require very detailed knowledge of the underlying geology and uniform sampling of subsurface temperatures to be well-constrained. We present an alternative approach that leverages recent advances in supervised machine learning and available direct measurements to predict the geothermal gradient in regions where only global-scale geophysical datasets and course geological knowledge are available. We find that a Gradient Boosted Regression Tree algorithm yields optimal predictions and extensively validate the trained model. We show that predictions of our model are within 12\% accuracy and that independent measurements performed by other authors agree well with our model. Finnally, we present a geothermal gradient map for Colombia that highlights regions where futher exploration and data collection should be performed.
Authors: Hugo Caselles-Dupr\'e, Charles Mellerio, Paul H\'erent, Aliz\'ee Lopez-Persem, Benoit B\'eranger, Mathieu Soularue, Pierre Fautrel, Gauthier Vernier, Matthieu Cord
Abstract: The reconstruction of images observed by subjects from fMRI data collected during visual stimuli has made significant strides in the past decade, thanks to the availability of extensive fMRI datasets and advancements in generative models for image generation. However, the application of visual reconstruction has remained limited. Reconstructing visual imagination presents a greater challenge, with potentially revolutionary applications ranging from aiding individuals with disabilities to verifying witness accounts in court. The primary hurdles in this field are the absence of data collection protocols for visual imagery and the lack of datasets on the subject. Traditionally, fMRI-to-image relies on data collected from subjects exposed to visual stimuli, which poses issues for generating visual imagery based on the difference of brain activity between visual stimulation and visual imagery. For the first time, we have compiled a substantial dataset (around 6h of scans) on visual imagery along with a proposed data collection protocol. We then train a modified version of an fMRI-to-image model and demonstrate the feasibility of reconstructing images from two modes of imagination: from memory and from pure imagination. This marks an important step towards creating a technology that allow direct reconstruction of visual imagery.
Authors: Zohre Karimi, Shing-Hei Ho, Bao Thach, Alan Kuntz, Daniel S. Brown
Abstract: Automating robotic surgery via learning from demonstration (LfD) techniques is extremely challenging. This is because surgical tasks often involve sequential decision-making processes with complex interactions of physical objects and have low tolerance for mistakes. Prior works assume that all demonstrations are fully observable and optimal, which might not be practical in the real world. This paper introduces a sample-efficient method that learns a robust reward function from a limited amount of ranked suboptimal demonstrations consisting of partial-view point cloud observations. The method then learns a policy by optimizing the learned reward function using reinforcement learning (RL). We show that using a learned reward function to obtain a policy is more robust than pure imitation learning. We apply our approach on a physical surgical electrocautery task and demonstrate that our method can perform well even when the provided demonstrations are suboptimal and the observations are high-dimensional point clouds. Code and videos available here: https://sites.google.com/view/lfdinelectrocautery
Authors: Evan Shieh, Faye-Marie Vassel, Cassidy Sugimoto, Thema Monroe-White
Abstract: The rapid deployment of generative language models (LMs) has raised concerns about social biases affecting the well-being of diverse consumers. The extant literature on generative LMs has primarily examined bias via explicit identity prompting. However, prior research on bias in earlier language-based technology platforms, including search engines, has shown that discrimination can occur even when identity terms are not specified explicitly. Studies of bias in LM responses to open-ended prompts (where identity classifications are left unspecified) are lacking and have not yet been grounded in end-consumer harms. Here, we advance studies of generative LM bias by considering a broader set of natural use cases via open-ended prompting. In this "laissez-faire" setting, we find that synthetically generated texts from five of the most pervasive LMs (ChatGPT3.5, ChatGPT4, Claude2.0, Llama2, and PaLM2) perpetuate harms of omission, subordination, and stereotyping for minoritized individuals with intersectional race, gender, and/or sexual orientation identities (AI/AN, Asian, Black, Latine, MENA, NH/PI, Female, Non-binary, Queer). We find widespread evidence of bias to an extent that such individuals are hundreds to thousands of times more likely to encounter LM-generated outputs that portray their identities in a subordinated manner compared to representative or empowering portrayals. We also document a prevalence of stereotypes (e.g. perpetual foreigner) in LM-generated outputs that are known to trigger psychological harms that disproportionately affect minoritized individuals. These include stereotype threat, which leads to impaired cognitive performance and increased negative self-perception. Our findings highlight the urgent need to protect consumers from discriminatory harms caused by language models and invest in critical AI education programs tailored towards empowering diverse consumers.
Authors: Qingyi Liu, Yekun Chai, Shuohuan Wang, Yu Sun, Qiwei Peng, Keze Wang, Hua Wu
Abstract: Amidst the rapid advancements in generative language models, the investigation of how training data shapes the performance of GPT models is still emerging. This paper presents GPTfluence, a novel approach that leverages a featurized simulation to assess the impact of training examples on the training dynamics of GPT models. Our approach not only traces the influence of individual training instances on performance trajectories, such as loss and other key metrics, on targeted test points but also enables a comprehensive comparison with existing methods across various training scenarios in GPT models, ranging from 14 million to 2.8 billion parameters, across a range of downstream tasks. Contrary to earlier methods that struggle with generalization to new data, GPTfluence introduces a parameterized simulation of training dynamics, demonstrating robust generalization capabilities to unseen training data. This adaptability is evident across both fine-tuning and instruction-tuning scenarios, spanning tasks in natural language understanding and generation. We will make our code and data publicly available.
Authors: Chi Tran, Huong Le Thanh
Abstract: Large Language Models (LLMs) and Multimodal Large language models (MLLMs) have taken the world by storm with impressive abilities in complex reasoning and linguistic comprehension. Meanwhile there are plethora of works related to Vietnamese Large Language Models, the lack of high-quality resources in multimodality limits the progress of Vietnamese MLLMs. In this paper, we pioneer in address this by introducing LaVy, a state-of-the-art Vietnamese MLLM, and we also introduce LaVy-Bench benchmark designated for evaluating MLLMs's understanding on Vietnamese visual language tasks. Our project is public at https://github.com/baochi0212/LaVy
Authors: Charis Stamouli, Ingvar Ziemann, George J. Pappas
Abstract: We study the quadratic prediction error method -- i.e., nonlinear least squares -- for a class of time-varying parametric predictor models satisfying a certain identifiability condition. While this method is known to asymptotically achieve the optimal rate for a wide range of problems, there have been no non-asymptotic results matching these optimal rates outside of a select few, typically linear, model classes. By leveraging modern tools from learning with dependent data, we provide the first rate-optimal non-asymptotic analysis of this method for our more general setting of nonlinearly parametrized model classes. Moreover, we show that our results can be applied to a particular class of identifiable AutoRegressive Moving Average (ARMA) models, resulting in the first optimal non-asymptotic rates for identification of ARMA models.
Authors: Aref Azizpour, Tai D. Nguyen, Manil Shrestha, Kaidi Xu, Edward Kim, Matthew C. Stamm
Abstract: As generative AI progresses rapidly, new synthetic image generators continue to emerge at a swift pace. Traditional detection methods face two main challenges in adapting to these generators: the forensic traces of synthetic images from new techniques can vastly differ from those learned during training, and access to data for these new generators is often limited. To address these issues, we introduce the Ensemble of Expert Embedders (E3), a novel continual learning framework for updating synthetic image detectors. E3 enables the accurate detection of images from newly emerged generators using minimal training data. Our approach does this by first employing transfer learning to develop a suite of expert embedders, each specializing in the forensic traces of a specific generator. Then, all embeddings are jointly analyzed by an Expert Knowledge Fusion Network to produce accurate and reliable detection decisions. Our experiments demonstrate that E3 outperforms existing continual learning methods, including those developed specifically for synthetic image detection.
Authors: Siyuan Feng, Jiawei Liu, Ruihang Lai, Charlie F. Ruan, Yong Yu, Lingming Zhang, Tianqi Chen
Abstract: Deploying machine learning (ML) on diverse computing platforms is crucial to accelerate and broaden their applications. However, it presents significant software engineering challenges due to the fast evolution of models, especially the recent Large Language Models (LLMs), and the emergence of new computing platforms. Current ML frameworks are primarily engineered for CPU and CUDA platforms, leaving a big gap in enabling emerging ones like Metal, Vulkan, and WebGPU. While a traditional bottom-up development pipeline fails to close the gap timely, we introduce TapML, a top-down approach and tooling designed to streamline the deployment of ML systems on diverse platforms, optimized for developer productivity. Unlike traditional bottom-up methods, which involve extensive manual testing and debugging, TapML automates unit testing through test carving and adopts a migration-based strategy for gradually offloading model computations from mature source platforms to emerging target platforms. By leveraging realistic inputs and remote connections for gradual target offloading, TapML accelerates the validation and minimizes debugging scopes, significantly optimizing development efforts. TapML was developed and applied through a year-long, real-world effort that successfully deployed significant emerging models and platforms. Through serious deployments of 82 emerging models in 17 distinct architectures across 5 emerging platforms, we showcase the effectiveness of TapML in enhancing developer productivity while ensuring model reliability and efficiency. Furthermore, we summarize comprehensive case studies from our real-world development, offering best practices for developing emerging ML systems.
Authors: Scarlett Raine, Ross Marchant, Brano Kusy, Frederic Maire, Niko Suenderhauf, Tobias Fischer
Abstract: Broad-scale marine surveys performed by underwater vehicles significantly increase the availability of coral reef imagery, however it is costly and time-consuming for domain experts to label images. Point label propagation is an approach used to leverage existing image data labeled with sparse point labels. The resulting augmented ground truth generated is then used to train a semantic segmentation model. Here, we first demonstrate that recent advances in foundation models enable generation of multi-species coral augmented ground truth masks using denoised DINOv2 features and K-Nearest Neighbors (KNN), without the need for any pre-training or custom-designed algorithms. For extremely sparsely labeled images, we propose a labeling regime based on human-in-the-loop principles, resulting in significant improvement in annotation efficiency: If only 5 point labels per image are available, our proposed human-in-the-loop approach improves on the state-of-the-art by 17.3% for pixel accuracy and 22.6% for mIoU; and by 10.6% and 19.1% when 10 point labels per image are available. Even if the human-in-the-loop labeling regime is not used, the denoised DINOv2 features with a KNN outperforms the prior state-of-the-art by 3.5% for pixel accuracy and 5.7% for mIoU (5 grid points). We also provide a detailed analysis of how point labeling style and the quantity of points per image affects the point label propagation quality and provide general recommendations on maximizing point label efficiency.
Authors: Francis McCann Ramirez, Luka Chkhetiani, Andrew Ehrenberg, Robert McHardy, Rami Botros, Yash Khare, Andrea Vanzo, Taufiquzzaman Peyash, Gabriel Oexle, Michael Liang, Ilya Sklyar, Enver Fakhan, Ahmed Etefy, Daniel McCrystal, Sam Flamini, Domenic Donato, Takuya Yoshioka
Abstract: This paper describes AssemblyAI's industrial-scale automatic speech recognition (ASR) system, designed to meet the requirements of large-scale, multilingual ASR serving various application needs. Our system leverages a diverse training dataset comprising unsupervised (12.5M hours), supervised (188k hours), and pseudo-labeled (1.6M hours) data across four languages. We provide a detailed description of our model architecture, consisting of a full-context 600M-parameter Conformer encoder pre-trained with BEST-RQ and an RNN-T decoder fine-tuned jointly with the encoder. Our extensive evaluation demonstrates competitive word error rates (WERs) against larger and more computationally expensive models, such as Whisper large and Canary-1B. Furthermore, our architectural choices yield several key advantages, including an improved code-switching capability, a 5x inference speedup compared to an optimized Whisper baseline, a 30% reduction in hallucination rate on speech data, and a 90% reduction in ambient noise compared to Whisper, along with significantly improved time-stamp accuracy. Throughout this work, we adopt a system-centric approach to analyzing various aspects of fully-fledged ASR models to gain practically relevant insights useful for real-world services operating at scale.