Authors: No\"el Vouitsis, Rasa Hosseinzadeh, Brendan Leigh Ross, Valentin Villecroze, Satya Krishna Gorti, Jesse C. Cresswell, Gabriel Loaiza-Ganem
Abstract: Although diffusion models can generate remarkably high-quality samples, they are intrinsically bottlenecked by their expensive iterative sampling procedure. Consistency models (CMs) have recently emerged as a promising diffusion model distillation method, reducing the cost of sampling by generating high-fidelity samples in just a few iterations. Consistency model distillation aims to solve the probability flow ordinary differential equation (ODE) defined by an existing diffusion model. CMs are not directly trained to minimize error against an ODE solver, rather they use a more computationally tractable objective. As a way to study how effectively CMs solve the probability flow ODE, and the effect that any induced error has on the quality of generated samples, we introduce Direct CMs, which \textit{directly} minimize this error. Intriguingly, we find that Direct CMs reduce the ODE solving error compared to CMs but also result in significantly worse sample quality, calling into question why exactly CMs work well in the first place. Full code is available at: https://github.com/layer6ai-labs/direct-cms.
Authors: Sasha Doubov, Nikhil Sardana, Vitaliy Chiley
Abstract: Small, highly trained, open-source large language models are widely used due to their inference efficiency, but further improving their quality remains a challenge. Sparse upcycling is a promising approach that transforms a pretrained dense model into a Mixture-of-Experts (MoE) architecture, increasing the model's parameter count and quality. In this work, we compare the effectiveness of sparse upcycling against continued pretraining (CPT) across different model sizes, compute budgets, and pretraining durations. Our experiments show that sparse upcycling can achieve better quality, with improvements of over 20% relative to CPT in certain scenarios. However, this comes with a significant inference cost, leading to 40% slowdowns in high-demand inference settings for larger models. Our findings highlight the trade-off between model quality and inference efficiency, offering insights for practitioners seeking to balance model quality and deployment constraints.
Authors: Vima Gupta, Kartik Sinha, Ada Gavrilovska, Anand Padmanabha Iyer
Abstract: Mixture-of-Experts (MoE) architectures have recently gained popularity in enabling efficient scaling of large language models. However, we uncover a fundamental tension: while MoEs are designed for selective expert activation, production serving requires request batching, which forces the activation of all experts and negates MoE's efficiency benefits during the decode phase. We present Lynx, a system that enables efficient MoE inference through dynamic, batch-aware expert selection. Our key insight is that expert importance varies significantly across tokens and inference phases, creating opportunities for runtime optimization. Lynx leverages this insight through a lightweight framework that dynamically reduces active experts while preserving model accuracy. Our evaluations show that Lynx achieves up to 1.55x reduction in inference latency while maintaining negligible accuracy loss from baseline model across complex code generation and mathematical reasoning tasks.
Authors: Thomas Marshall, Adam Scherlis, Nora Belrose
Abstract: We propose affine concept editing (ACE) as an approach for steering language models' behavior by intervening directly in activations. We begin with an affine decomposition of model activation vectors and show that prior methods for steering model behavior correspond to subsets of terms of this decomposition. We then provide a derivation of ACE and test it on refusal using Llama 3 8B and Hermes Eagle RWKV v5. ACE ultimately combines affine subspace projection and activation addition to reliably control the model's refusal responses across prompt types. We evaluate the results using LLM-based scoring on a collection of harmful and harmless prompts. Our experiments demonstrate that ACE consistently achieves more precise control over model behavior and generalizes to models where directional ablation via affine subspace projection alone produces incoherent outputs. Code for reproducing our results is available at https://github.com/EleutherAI/steering-llama3 .
Authors: Erik Wijmans, Brody Huval, Alexander Hertzberg, Vladlen Koltun, Philipp Kr\"ahenb\"uhl
Abstract: As language models grow ever larger, so do their vocabularies. This has shifted the memory footprint of LLMs during training disproportionately to one single layer: the cross-entropy in the loss computation. Cross-entropy builds up a logit matrix with entries for each pair of input tokens and vocabulary items and, for small models, consumes an order of magnitude more memory than the rest of the LLM combined. We propose Cut Cross-Entropy (CCE), a method that computes the cross-entropy loss without materializing the logits for all tokens into global memory. Rather, CCE only computes the logit for the correct token and evaluates the log-sum-exp over all logits on the fly. We implement a custom kernel that performs the matrix multiplications and the log-sum-exp reduction over the vocabulary in flash memory, making global memory consumption for the cross-entropy computation negligible. This has a dramatic effect. Taking the Gemma 2 (2B) model as an example, CCE reduces the memory footprint of the loss computation from 24 GB to 1 MB, and the total training-time memory consumption of the classifier head from 28 GB to 1 GB. To improve the throughput of CCE, we leverage the inherent sparsity of softmax and propose to skip elements of the gradient computation that have a negligible (i.e., below numerical precision) contribution to the gradient. Experiments demonstrate that the dramatic reduction in memory consumption is accomplished without sacrificing training speed or convergence.
Authors: Soham Gadgil, Joshua Galanter, Mohammadreza Negahdar
Abstract: Chronic Obstructive Pulmonary Disorder (COPD) is an irreversible and progressive disease which is highly heritable. Clinically, COPD is defined using the summary measures derived from a spirometry test but these are not always adequate. Here we show that using the high-dimensional raw spirogram can provide a richer signal compared to just using the summary measures. We design a transformer-based deep learning technique to process the raw spirogram values along with demographic information and predict clinically-relevant endpoints related to COPD. Our method is able to perform better than prior works while being more computationally efficient. Using the weights learned by the model, we make the framework more interpretable by identifying parts of the spirogram that are important for the model predictions. Pairing up with a board-certified pulmonologist, we also provide clinical insights into the different aspects of the spirogram and show that the explanations obtained from the model align with underlying medical knowledge.
Authors: Mohammad Saiful Islam, Mohamed Sami Rakha, William Pourmajidi, Janakan Sivaloganathan, John Steinbacher, Andriy Miranskyy
Abstract: As Large-Scale Cloud Systems (LCS) become increasingly complex, effective anomaly detection is critical for ensuring system reliability and performance. However, there is a shortage of large-scale, real-world datasets available for benchmarking anomaly detection methods. To address this gap, we introduce a new high-dimensional dataset from IBM Cloud, collected over 4.5 months from the IBM Cloud Console. This dataset comprises 39,365 rows and 117,448 columns of telemetry data. Additionally, we demonstrate the application of machine learning models for anomaly detection and discuss the key challenges faced in this process. This study and the accompanying dataset provide a resource for researchers and practitioners in cloud system monitoring. It facilitates more efficient testing of anomaly detection methods in real-world data, helping to advance the development of robust solutions to maintain the health and performance of large-scale cloud infrastructures.
Authors: Akhil Singampalli, Danish Gufran, Sudeep Pasricha
Abstract: Machine learning (ML) based indoor localization solutions are critical for many emerging applications, yet their efficacy is often compromised by hardware/software variations across mobile devices (i.e., device heterogeneity) and the threat of ML data poisoning attacks. Conventional methods aimed at countering these challenges show limited resilience to the uncertainties created by these phenomena. In response, in this paper, we introduce SAFELOC, a novel framework that not only minimizes localization errors under these challenging conditions but also ensures model compactness for efficient mobile device deployment. Our framework targets a distributed and co-operative learning environment that uses federated learning (FL) to preserve user data privacy and assumes heterogeneous mobile devices carried by users (just like in most real-world scenarios). Within this heterogeneous FL context, SAFELOC introduces a novel fused neural network architecture that performs data poisoning detection and localization, with a low model footprint. Additionally, a dynamic saliency map-based aggregation strategy is designed to adapt based on the severity of the detected data poisoning scenario. Experimental evaluations demonstrate that SAFELOC achieves improvements of up to 5.9x in mean localization error, 7.8x in worst-case localization error, and a 2.1x reduction in model inference latency compared to state-of-the-art indoor localization frameworks, across diverse building floorplans, mobile devices, and ML data poisoning attack scenarios.
Authors: Quan Zhou
Abstract: Ensuring fairness has emerged as one of the primary concerns in AI and its related algorithms. Over time, the field of machine learning fairness has evolved to address these issues. This paper provides an extensive overview of this field and introduces two formal frameworks to tackle open questions in machine learning fairness. In one framework, operator-valued optimisation and min-max objectives are employed to address unfairness in time-series problems. This approach showcases state-of-the-art performance on the notorious COMPAS benchmark dataset, demonstrating its effectiveness in real-world scenarios. In the second framework, the challenge of lacking sensitive attributes, such as gender and race, in commonly used datasets is addressed. This issue is particularly pressing because existing algorithms in this field predominantly rely on the availability or estimations of such attributes to assess and mitigate unfairness. Here, a framework for a group-blind bias-repair is introduced, aiming to mitigate bias without relying on sensitive attributes. The efficacy of this approach is showcased through analyses conducted on the Adult Census Income dataset. Additionally, detailed algorithmic analyses for both frameworks are provided, accompanied by convergence guarantees, ensuring the robustness and reliability of the proposed methodologies.
Authors: Sanggeon Yun, Ryozo Masukawa, William Youngwoo Chung, Minhyoung Na, Nathaniel Bastian, Mohsen Imani
Abstract: The increasing demand for robust security solutions across various industries has made Video Anomaly Detection (VAD) a critical task in applications such as intelligent surveillance, evidence investigation, and violence detection. Traditional approaches to VAD often rely on finetuning large pre-trained models, which can be computationally expensive and impractical for real-time or resource-constrained environments. To address this, MissionGNN introduced a more efficient method by training a graph neural network (GNN) using a fixed knowledge graph (KG) derived from large language models (LLMs) like GPT-4. While this approach demonstrated significant efficiency in computational power and memory, it faces limitations in dynamic environments where frequent updates to the KG are necessary due to evolving behavior trends and shifting data patterns. These updates typically require cloud-based computation, posing challenges for edge computing applications. In this paper, we propose a novel framework that facilitates continuous KG adaptation directly on edge devices, overcoming the limitations of cloud dependency. Our method dynamically modifies the KG through a three-phase process: pruning, alternating, and creating nodes, enabling real-time adaptation to changing data trends. This continuous learning approach enhances the robustness of anomaly detection models, making them more suitable for deployment in dynamic and resource-constrained environments.
Authors: Libo Wang
Abstract: In order to address the chain of thought in the large language model inference cost surge, this research proposes to use a sparse attention mechanism that only focuses on a few relevant tokens. The researcher constructed a new attention mechanism and used GiantRabbit trained with custom GPTs as an experimental tool. The experiment tested and compared the reasoning time, correctness score and chain of thought length of this model and o1 Preview in solving the linear algebra test questions of MIT OpenCourseWare. The results show that GiantRabbit's reasoning time and chain of thought length are significantly lower than o1 Preview, confirming the feasibility of the sparse attention mechanism in reducing chain of thought reasoning. Detailed architectural details and experimental process have been uploaded to Github, the link is:https://github.com/brucewang123456789/GeniusTrail.git.
URLs: https://github.com/brucewang123456789/GeniusTrail.git.
Authors: Frederic Koehler, Holden Lee, Thuy-Duong Vuong
Abstract: We consider the problem of sampling a multimodal distribution with a Markov chain given a small number of samples from the stationary measure. Although mixing can be arbitrarily slow, we show that if the Markov chain has a $k$th order spectral gap, initialization from a set of $\tilde O(k/\varepsilon^2)$ samples from the stationary distribution will, with high probability over the samples, efficiently generate a sample whose conditional law is $\varepsilon$-close in TV distance to the stationary measure. In particular, this applies to mixtures of $k$ distributions satisfying a Poincar\'e inequality, with faster convergence when they satisfy a log-Sobolev inequality. Our bounds are stable to perturbations to the Markov chain, and in particular work for Langevin diffusion over $\mathbb R^d$ with score estimation error, as well as Glauber dynamics combined with approximation error from pseudolikelihood estimation. This justifies the success of data-based initialization for score matching methods despite slow mixing for the data distribution, and improves and generalizes the results of Koehler and Vuong (2023) to have linear, rather than exponential, dependence on $k$ and apply to arbitrary semigroups. As a consequence of our results, we show for the first time that a natural class of low-complexity Ising measures can be efficiently learned from samples.
Authors: Hoyun Choi, Sungyeop Lee, B. Kahng, Junghyo Jo
Abstract: Numerical simulation is a predominant tool for studying the dynamics in complex systems, but large-scale simulations are often intractable due to computational limitations. Here, we introduce the Neural Graph Simulator (NGS) for simulating time-invariant autonomous systems on graphs. Utilizing a graph neural network, the NGS provides a unified framework to simulate diverse dynamical systems with varying topologies and sizes without constraints on evaluation times through its non-uniform time step and autoregressive approach. The NGS offers significant advantages over numerical solvers by not requiring prior knowledge of governing equations and effectively handling noisy or missing data with a robust training scheme. It demonstrates superior computational efficiency over conventional methods, improving performance by over $10^5$ times in stiff problems. Furthermore, it is applied to real traffic data, forecasting traffic flow with state-of-the-art accuracy. The versatility of the NGS extends beyond the presented cases, offering numerous potential avenues for enhancement.
Authors: Valentin Frank Ingmar Guenter, Athanasios Sideris
Abstract: We propose a novel algorithm for combined unit/filter and layer pruning of deep neural networks that functions during training and without requiring a pre-trained network to apply. Our algorithm optimally trades-off learning accuracy and pruning levels while balancing layer vs. unit/filter pruning and computational vs. parameter complexity using only three user-defined parameters, which are easy to interpret and tune. The optimal network structure is found as the solution of a stochastic optimization problem over the network weights and the parameters of variational Bernoulli distributions for 0/1 Random Variables scaling the units and layers of the network. Pruning occurs when a variational parameter converges to 0 rendering the corresponding structure permanently inactive, thus saving computations during training and prediction. A key contribution of our approach is to define a cost function that combines the objectives of prediction accuracy and network pruning in a computational/parameter complexity-aware manner and the automatic selection of the many regularization parameters. We show that the solutions of the optimization problem to which the algorithm converges are deterministic networks. We analyze the ODE system that underlies our stochastic optimization algorithm and establish domains of attraction around zero for the dynamics of the network parameters. These results provide theoretical support for safely pruning units/filters and/or layers during training and lead to practical pruning conditions. We evaluate our method on the CIFAR-10/100 and ImageNet datasets using ResNet architectures and demonstrate that our method improves upon layer only or unit only pruning and favorably competes with combined unit/filter and layer pruning algorithms requiring pre-trained networks with respect to pruning ratios and test accuracy.
Authors: Rishav Chourasia, Uzair Javaid, Biplap Sikdar
Abstract: We introduce a set of useful expressions of Differential Privacy (DP) notions in terms of the Laplace transform of the privacy loss distribution. Its bare form expression appears in several related works on analyzing DP, either as an integral or an expectation. We show that recognizing the expression as a Laplace transform unlocks a new way to reason about DP properties by exploiting the duality between time and frequency domains. Leveraging our interpretation, we connect the $(q, \rho(q))$-R\'enyi DP curve and the $(\epsilon, \delta(\epsilon))$-DP curve as being the Laplace and inverse-Laplace transforms of one another. This connection shows that the R\'enyi divergence is well-defined for complex orders $q = \gamma + i \omega$. Using our Laplace transform-based analysis, we also prove an adaptive composition theorem for $(\epsilon, \delta)$-DP guarantees that is exactly tight (i.e., matches even in constants) for all values of $\epsilon$. Additionally, we resolve an issue regarding symmetry of $f$-DP on subsampling that prevented equivalence across all functional DP notions.
Authors: Bhavtosh Rath, Pushkar Chennu, David Relyea, Prathyusha Kanmanth Reddy, Amit Pande
Abstract: Recent advancements in session-based recommendation models using deep learning techniques have demonstrated significant performance improvements. While they can enhance model sophistication and improve the relevance of recommendations, they also make it challenging to implement a scalable real-time solution. To addressing this challenge, we propose GRAINRec- a Graph and Attention Integrated session-based recommendation model that generates recommendations in real-time. Our scope of work is item recommendations in online retail where a session is defined as an ordered sequence of digital guest actions, such as page views or adds to cart. The proposed model generates recommendations by considering the importance of all items in the session together, letting us predict relevant recommendations dynamically as the session evolves. We also propose a heuristic approach to implement real-time inferencing that meets Target platform's service level agreement (SLA). The proposed architecture lets us predict relevant recommendations dynamically as the session evolves, rather than relying on pre-computed recommendations for each item. Evaluation results of the proposed model show an average improvement of 1.5% across all offline evaluation metrics. A/B tests done over a 2 week duration showed an increase of 10% in click through rate and 9% increase in attributable demand. Extensive ablation studies are also done to understand our model performance for different parameters.
Authors: Jun-Young Kim, Deok-Seon Kim, Seo-Hyun Lee
Abstract: In recent years, brain-computer interfaces have made advances in decoding various motor-related tasks, including gesture recognition and movement classification, utilizing electroencephalogram (EEG) data. These developments are fundamental in exploring how neural signals can be interpreted to recognize specific physical actions. This study centers on a written alphabet classification task, where we aim to decode EEG signals associated with handwriting. To achieve this, we incorporate hand kinematics to guide the extraction of the consistent embeddings from high-dimensional neural recordings using auxiliary variables (CEBRA). These CEBRA embeddings, along with the EEG, are processed by a parallel convolutional neural network model that extracts features from both data sources simultaneously. The model classifies nine different handwritten characters, including symbols such as exclamation marks and commas, within the alphabet. We evaluate the model using a quantitative five-fold cross-validation approach and explore the structure of the embedding space through visualizations. Our approach achieves a classification accuracy of 91 % for the nine-class task, demonstrating the feasibility of fine-grained handwriting decoding from EEG.
Authors: Spencer Giddens, Fang Liu
Abstract: As data-driven and AI-based decision making gains widespread adoption in most disciplines, it is crucial that both data privacy and decision fairness are appropriately addressed. While differential privacy (DP) provides a robust framework for guaranteeing privacy and several widely accepted methods have been proposed for improving fairness, the vast majority of existing literature treats the two concerns independently. For methods that do consider privacy and fairness simultaneously, they often only apply to a specific machine learning task, limiting their generalizability. In response, we introduce SAFES, a Sequential PrivAcy and Fairness Enhancing data Synthesis procedure that sequentially combines DP data synthesis with a fairness-aware data transformation. SAFES allows full control over the privacy-fairness-utility trade-off via tunable privacy and fairness parameters. We illustrate SAFES by combining AIM, a graphical model-based DP data synthesizer, with a popular fairness-aware data pre-processing transformation. Empirical evaluations on the Adult and COMPAS datasets demonstrate that for reasonable privacy loss, SAFES-generated synthetic data achieve significantly improved fairness metrics with relatively low utility loss.
Authors: Youngjin Seol, Jaewoong Choi, Seunghyun Lee, Janghyeok Yoon
Abstract: Machine learning (ML) models are valuable tools for analyzing the impact of technology using patent citation information. However, existing ML-based methods often struggle to account for the dynamic nature of the technology impact over time and the interdependencies of these impacts across different periods. This study proposes a multi-task learning (MTL) approach to enhance the prediction of technology impact across various time frames by leveraging knowledge sharing and simultaneously monitoring the evolution of technology impact. First, we quantify the technology impacts and identify patterns through citation analysis over distinct time periods. Next, we develop MTL models to predict citation counts using multiple patent indicators over time. Finally, we examine the changes in key input indicators and their patterns over different periods using the SHapley Additive exPlanation method. We also offer guidelines for validating and interpreting the results by employing statistical methods and natural language processing techniques. A case study on battery technologies demonstrates that our approach not only deepens the understanding of technology impact, but also improves prediction accuracy, yielding valuable insights for both academia and industry.
Authors: Mary Isabelle Wisell, Salimeh Yasaei Sekeh
Abstract: Sparse deep neural networks (DNNs) excel in real-world applications like robotics and computer vision, by reducing computational demands that hinder usability. However, recent studies aim to boost DNN efficiency by trimming redundant neurons or filters based on task relevance, but neglect their adaptability to distribution shifts. We aim to enhance these existing techniques by introducing a companion network, Ghost Connect-Net (GC-Net), to monitor the connections in the original network with distribution generalization advantage. GC-Net's weights represent connectivity measurements between consecutive layers of the original network. After pruning GC-Net, the pruned locations are mapped back to the original network as pruned connections, allowing for the combination of magnitude and connectivity-based pruning methods. Experimental results using common DNN benchmarks, such as CIFAR-10, Fashion MNIST, and Tiny ImageNet show promising results for hybridizing the method, and using GC-Net guidance for later layers of a network and direct pruning on earlier layers. We provide theoretical foundations for GC-Net's approach to improving generalization under distribution shifts.
Authors: Xuanhao Pan, Chenguang Wang, Chaolong Ying, Ye Xue, Tianshu Yu
Abstract: The Travelling Salesman Problem (TSP) remains a fundamental challenge in combinatorial optimization, inspiring diverse algorithmic strategies. This paper revisits the "heatmap + Monte Carlo Tree Search (MCTS)" paradigm that has recently gained traction for learning-based TSP solutions. Within this framework, heatmaps encode the likelihood of edges forming part of the optimal tour, and MCTS refines this probabilistic guidance to discover optimal solutions. Contemporary approaches have predominantly emphasized the refinement of heatmap generation through sophisticated learning models, inadvertently sidelining the critical role of MCTS. Our extensive empirical analysis reveals two pivotal insights: 1) The configuration of MCTS strategies profoundly influences the solution quality, demanding meticulous tuning to leverage their full potential; 2) Our findings demonstrate that a rudimentary and parameter-free heatmap, derived from the intrinsic $k$-nearest nature of TSP, can rival or even surpass the performance of complicated heatmaps, with strong generalizability across various scales. Empirical evaluations across various TSP scales underscore the efficacy of our approach, achieving competitive results. These observations challenge the prevailing focus on heatmap sophistication, advocating a reevaluation of the paradigm to harness both components synergistically. Our code is available at: https://github.com/LOGO-CUHKSZ/rethink_mcts_tsp.
Authors: Jinjie Liu, Hang Qiu
Abstract: Machine learning models deployed on edge devices have enabled numerous exciting new applications, such as humanoid robots, AR glasses, and autonomous vehicles. However, the computing resources available on these edge devices are not catching up with the ever-growing number of parameters in these models. As the models become bigger and more complicated, the novel yet sophisticated structure challenges the inference runtime optimization. We present FluidML, a generic runtime memory management and optimization framework that can flexibly transform the model execution blueprint to achieve faster and more memory-efficient inference. Evaluations across different platforms show that FluidML can consistently reduce the end-to-end inference latency by up to 25.38% for popular language models and reduce peak memory usage by up to 41.47%, compared to state-of-the-art approaches. FluidML is of ~30K line of codes, built for general-purpose usage, and will be released as an open-source inference runtime optimization framework to the community.
Authors: Hu Wang, Congbo Ma, Ibrahim Almakky, Ian Reid, Gustavo Carneiro, Mohammad Yaqub
Abstract: Weight-averaged model-merging has emerged as a powerful approach in deep learning, capable of enhancing model performance without fine-tuning or retraining. However, the underlying mechanisms that explain its effectiveness remain largely unexplored. In this paper, we investigate this technique from three novel perspectives to provide deeper insights into how and why weight-averaged model-merging works: (1) we examine the intrinsic patterns captured by the learning of the model weights, through the visualizations of their patterns on several datasets, showing that these weights often encode structured and interpretable patterns; (2) we investigate model ensemble merging strategies based on averaging on weights versus averaging on features, providing detailed analyses across diverse architectures and datasets; and (3) we explore the impact on model-merging prediction stability in terms of changing the parameter magnitude, revealing insights into the way of weight averaging works as regularization by showing the robustness across different parameter scales. Our findings shed light on the "black box" of weight-averaged model-merging, offering valuable insights and practical recommendations that advance the model-merging process.
Authors: Pablo Fern\'andez-Pi\~neiro, Manuel Fer\'andez-Veiga, Rebeca P. D\'iaz-Redondo, Ana Fern\'andez-Vilas, Mart\'in Gonz\'alez-Soto
Abstract: In prototype-based federated learning, the exchange of model parameters between clients and the master server is replaced by transmission of prototypes or quantized versions of the data samples to the aggregation server. A fully decentralized deployment of prototype- based learning, without a central agregartor of prototypes, is more robust upon network failures and reacts faster to changes in the statistical distribution of the data, suggesting potential advantages and quick adaptation in dynamic learning tasks, e.g., when the data sources are IoT devices or when data is non-iid. In this paper, we consider the problem of designing a communication-efficient decentralized learning system based on prototypes. We address the challenge of prototype redundancy by leveraging on a twofold data compression technique, i.e., sending only update messages if the prototypes are informationtheoretically useful (via the Jensen-Shannon distance), and using clustering on the prototypes to compress the update messages used in the gossip protocol. We also use parallel instead of sequential gossiping, and present an analysis of its age-of-information (AoI). Our experimental results show that, with these improvements, the communications load can be substantially reduced without decreasing the convergence rate of the learning algorithm.
Authors: Ke Xu, Ziliang Wang, Wei Zheng, Yuhao Ma, Chenglin Wang, Nengxue Jiang, Cai Cao
Abstract: Cross-domain recommendation (CDR) methods are proposed to tackle the sparsity problem in click through rate (CTR) estimation. Existing CDR methods directly transfer knowledge from the source domains to the target domain and ignore the heterogeneities among domains, including feature dimensional heterogeneity and latent space heterogeneity, which may lead to negative transfer. Besides, most of the existing methods are based on single-source transfer, which cannot simultaneously utilize knowledge from multiple source domains to further improve the model performance in the target domain. In this paper, we propose a centralized-distributed transfer model (CDTM) for CDR based on multi-source heterogeneous transfer learning. To address the issue of feature dimension heterogeneity, we build a dual embedding structure: domain specific embedding (DSE) and global shared embedding (GSE) to model the feature representation in the single domain and the commonalities in the global space,separately. To solve the latent space heterogeneity, the transfer matrix and attention mechanism are used to map and combine DSE and GSE adaptively. Extensive offline and online experiments demonstrate the effectiveness of our model.
Authors: Jargalmaa Batmunkh, Yusuke Iida, Takayoshi Oba, Haruhisa Iijima
Abstract: The complex structure and extensive details of solar spectral data, combined with a recent surge in volume, present significant processing challenges. To address this, we propose a deep learning-based compression technique using deep autoencoder (DAE) and 1D-convolutional autoencoder (CAE) models developed with Hinode SOT/SP data. We focused on compressing Stokes I and V polarization spectra from the quiet Sun, as well as from active regions, providing a novel insight into comprehensive spectral analysis by incorporating spectra from extreme magnetic fields. The results indicate that the CAE model outperforms the DAE model in reconstructing Stokes profiles, demonstrating greater robustness and achieving reconstruction errors around the observational noise level. The proposed method has proven effective in compressing Stokes I and V spectra from both the quiet Sun and active regions, highlighting its potential for impactful applications in solar spectral analysis, such as detection of unusual spectral signals.
Authors: Anton Johansson, Arunselvan Ramaswamy
Abstract: The development of robust generative models for highly varied non-stationary time series data is a complex yet important problem. Traditional models for time series data prediction, such as Long Short-Term Memory (LSTM), are inefficient and generalize poorly as they cannot capture complex temporal relationships. In this paper, we present a probabilistic generative model that can be trained to capture temporal information, and that is robust to data errors. We call it Time Deep Latent Gaussian Model (tDLGM). Its novel architecture is inspired by Deep Latent Gaussian Model (DLGM). Our model is trained to minimize a loss function based on the negative log loss. One contributing factor to Time Deep Latent Gaussian Model (tDLGM) robustness is our regularizer, which accounts for data trends. Experiments conducted show that tDLGM is able to reconstruct and generate complex time series data, and that it is robust against to noise and faulty data.
Authors: Yi Xu, Ziming Mao, Xiangxi Mo, Shu Liu, Ion Stoica
Abstract: The rapid growth of LLMs has revolutionized natural language processing and AI analysis, but their increasing size and memory demands present significant challenges. A common solution is to spill over to CPU memory; however, traditional GPU-CPU memory swapping often results in higher latency and lower throughput. This paper introduces Pie, an LLM inference framework that addresses these challenges with performance-transparent swapping and adaptive expansion. By leveraging predictable memory access patterns and the high bandwidth of modern hardware like the NVIDIA GH200 Grace Hopper Superchip, Pie enables concurrent data swapping without affecting foreground computation, expanding effective memory without added latency. Adaptive expansion dynamically adjusts CPU memory allocation based on real-time information, optimizing memory usage and performance under varying conditions. Pie maintains low computation latency, high throughput, and high elasticity. Our experimental evaluation demonstrates that Pie achieves optimal swapping policy during cache warmup and effectively balances increased memory capacity with negligible impact on computation. With its extended capacity, Pie outperforms vLLM by up to 1.9X in throughput and 2X in latency. Additionally, Pie can reduce GPU memory usage by up to 1.67X while maintaining the same performance. Compared to FlexGen, an offline profiling-based swapping solution, Pie achieves magnitudes lower latency and 9.4X higher throughput.
Authors: Yuang Cai, Yuyu Yuan, Jinsheng Shi, Qinhong Lin
Abstract: The alignment of large language models (LLMs) is crucial for generating helpful and harmless content. Existing approaches leverage preference-based human feedback data to learn the reward function and align the LLM with the feedback data. However, these approaches focus on modeling the reward difference between the chosen and rejected demonstrations, rather than directly modeling the true reward from each demonstration. Moreover, these approaches assume that the reward is only obtained at the end of the sentence, which overlooks the modeling of intermediate rewards. These issues lead to insufficient use of training signals in the feedback data, limiting the representation and generalization ability of the reward and potentially resulting in reward hacking. In this paper, we formulate LLM alignment as a Bayesian Inverse Reinforcement Learning (BIRL) problem and propose a novel training objective, Approximated Variational Alignment (AVA), to perform LLM alignment through Approximated Variational Reward Imitation Learning (AVRIL). The BIRL formulation facilitates intermediate reward modeling and direct reward modeling on each single demonstration, which enhances the utilization of training signals in the feedback data. Experiments show that AVA outperforms existing LLM alignment approaches in reward modeling, RL fine-tuning, and direct optimization.
Authors: Miaoxi Zhu, Yan Sun, Li Shen, Bo Du, Dacheng Tao
Abstract: Minimax optimization is gaining increasing attention in modern machine learning applications. Driven by large-scale models and massive volumes of data collected from edge devices, as well as the concern to preserve client privacy, communication-efficient distributed minimax optimization algorithms become popular, such as Local Stochastic Gradient Descent Ascent (Local-SGDA), and Local Decentralized SGDA (Local-DSGDA). While most existing research on distributed minimax algorithms focuses on convergence rates, computation complexity, and communication efficiency, the generalization performance remains underdeveloped, whereas generalization ability is a pivotal indicator for evaluating the holistic performance of a model when fed with unknown data. In this paper, we propose the stability-based generalization analytical framework for Distributed-SGDA, which unifies two popular distributed minimax algorithms including Local-SGDA and Local-DSGDA, and conduct a comprehensive analysis of stability error, generalization gap, and population risk across different metrics under various settings, e.g., (S)C-(S)C, PL-SC, and NC-NC cases. Our theoretical results reveal the trade-off between the generalization gap and optimization error and suggest hyperparameters choice to obtain the optimal population risk. Numerical experiments for Local-SGDA and Local-DSGDA validate the theoretical results.
Authors: Richard John, Lukas Herron, Pratyush Tiwary
Abstract: Generative artificial intelligence is now a widely used tool in molecular science. Despite the popularity of probabilistic generative models, numerical experiments benchmarking their performance on molecular data are lacking. In this work, we introduce and explain several classes of generative models, broadly sorted into two categories: flow-based models and diffusion models. We select three representative models: Neural Spline Flows, Conditional Flow Matching, and Denoising Diffusion Probabilistic Models, and examine their accuracy, computational cost, and generation speed across datasets with tunable dimensionality, complexity, and modal asymmetry. Our findings are varied, with no one framework being the best for all purposes. In a nutshell, (i) Neural Spline Flows do best at capturing mode asymmetry present in low-dimensional data, (ii) Conditional Flow Matching outperforms other models for high-dimensional data with low complexity, and (iii) Denoising Diffusion Probabilistic Models appears the best for low-dimensional data with high complexity. Our datasets include a Gaussian mixture model and the dihedral torsion angle distribution of the Aib\textsubscript{9} peptide, generated via a molecular dynamics simulation. We hope our taxonomy of probabilistic generative frameworks and numerical results may guide model selection for a wide range of molecular tasks.
Authors: Benjamin Kolicic, Alberto Caron, Chris Hicks, Vasilios Mavroudis
Abstract: In this paper, we address the critical need for interpretable and uncertainty-aware machine learning models in the context of online learning for high-risk industries, particularly cyber-security. While deep learning and other complex models have demonstrated impressive predictive capabilities, their opacity and lack of uncertainty quantification present significant questions about their trustworthiness. We propose a novel pipeline for online supervised learning problems in cyber-security, that harnesses the inherent interpretability and uncertainty awareness of Additive Gaussian Processes (AGPs) models. Our approach aims to balance predictive performance with transparency while improving the scalability of AGPs, which represents their main drawback, potentially enabling security analysts to better validate threat detection, troubleshoot and reduce false positives, and generally make trustworthy, informed decisions. This work contributes to the growing field of interpretable AI by proposing a class of models that can be significantly beneficial for high-stake decision problems such as the ones typical of the cyber-security domain. The source code is available.
Authors: Guy Shalev, Frederik Kratzert
Abstract: The Caravan large-sample hydrology dataset (Kratzert et al., 2023) was created to standardize and harmonize streamflow data from various regional datasets, combined with globally available meteorological forcing and catchment attributes. This community-driven project also allows researchers to conveniently extend the dataset for additional basins, as done 6 times to date (see https://github.com/kratzert/Caravan/discussions/10). We present a novel extension to Caravan, focusing on enriching the meteorological forcing data. Our extension adds three precipitation nowcast products (CPC, IMERG v07 Early, and CHIRPS) and three weather forecast products (ECMWF IFS HRES, GraphCast, and CHIRPS-GEFS) to the existing ERA5-Land reanalysis data. The inclusion of diverse data sources, particularly weather forecasts, enables more robust evaluation and benchmarking of hydrological models, especially for real-time forecasting scenarios. To the best of our knowledge, this extension makes Caravan the first large-sample hydrology dataset to incorporate weather forecast data, significantly enhancing its capabilities and fostering advancements in hydrological research, benchmarking, and real-time hydrologic forecasting. The data is publicly available under a CC-BY-4.0 license on Zenodo in two parts (https://zenodo.org/records/14161235, https://zenodo.org/records/14161281) and on Google Cloud Platform (GCP) - see more under the Data Availability chapter.
URLs: https://github.com/kratzert/Caravan/discussions/10)., https://zenodo.org/records/14161235,, https://zenodo.org/records/14161281)
Authors: Till Korten (Helmholtz-Zentrum Dresden-Rossendorf HZDR), Vladimir Rybnikov (Deutsches Elektronen-Synchrotron DESY), Mathias Vogt (Deutsches Elektronen-Synchrotron DESY), Juliane Roensch-Schulenburg (Deutsches Elektronen-Synchrotron DESY), Peter Steinbach (Helmholtz-Zentrum Dresden-Rossendorf HZDR), Najmeh Mirian (Helmholtz-Zentrum Dresden-Rossendorf HZDR)
Abstract: Electron beam accelerators are essential in many scientific and technological fields. Their operation relies heavily on the stability and precision of the electron beam. Traditional diagnostic techniques encounter difficulties in addressing the complex and dynamic nature of electron beams. Particularly in the context of free-electron lasers (FELs), it is fundamentally impossible to measure the lasing-on and lasingoff electron power profiles for a single electron bunch. This is a crucial hurdle in the exact reconstruction of the photon pulse profile. To overcome this hurdle, we developed a machine learning model that predicts the temporal power profile of the electron bunch in the lasing-off regime using machine parameters that can be obtained when lasing is on. The model was statistically validated and showed superior predictions compared to the state-of-the-art batch calibrations. The work we present here is a critical element for a virtual pulse reconstruction diagnostic (VPRD) tool designed to reconstruct the power profile of individual photon pulses without requiring repeated measurements in the lasing-off regime. This promises to significantly enhance the diagnostic capabilities in FELs at large.
Authors: Sejik Park
Abstract: Residual connections are one of the most important components in neural network architectures for mitigating the vanishing gradient problem and facilitating the training of much deeper networks. One possible explanation for how residual connections aid deeper network training is by promoting feature reuse. However, we identify and analyze the limitations of feature reuse with vanilla residual connections. To address these limitations, we propose modifications in training methods. Specifically, we provide an additional opportunity for the model to learn feature reuse with residual connections through two types of iterations during training. The first type of iteration involves using droppath, which enforces feature reuse by randomly dropping a subset of layers. The second type of iteration focuses on training the dropped parts of the model while freezing the undropped parts. As a result, the dropped parts learn in a way that encourages feature reuse, as the model relies on the undropped parts with feature reuse in mind. Overall, we demonstrated performance improvements in models with residual connections for image classification in certain cases.
Authors: Xiang-Rui Ni, Peng Pan, Jia-Rui Lin
Abstract: In the Architecture Engineering & Construction (AEC) industry, how design behaviors impact design quality remains unclear. This study proposes a novel approach, which, for the first time, identifies and quantitatively describes the relationship between design behaviors and quality of design based on Building Information Modeling (BIM). Real-time collection and log mining are integrated to collect raw data of design behaviors. Feature engineering and various machine learning models are then utilized for quantitative modeling and interpretation. Results confirm an existing quantifiable relationship which can be learned by various models. The best-performing model using Extremely Random Trees achieved an R2 value of 0.88 on the test set. Behavioral features related to designer's skill level and changes of design intentions are identified to have significant impacts on design quality. These findings deepen our understanding of the design process and help forming BIM designs with better quality.
Authors: Aditya Borse, Rutwik Gulakala, Marcus Stoffel
Abstract: Optimisation for crashworthiness is a critical part of the vehicle development process. Due to stringent regulations and increasing market demands, multiple factors must be considered within a limited timeframe. However, for optimal crashworthiness design, multiobjective optimisation is necessary, and for complex parts, multiple design parameters must be evaluated. This crashworthiness analysis requires computationally intensive finite element simulations. This challenge leads to the need for inverse multi-parameter multi-objective optimisation. This challenge leads to the need for multi-parameter, multi-objective inverse optimisation. This article investigates a machine learning-based method for this type of optimisation, focusing on the design optimisation of a multi-cell side sill to improve crashworthiness results. Furthermore, the optimiser is coupled with an FE solver to achieve improved results.
Authors: Zikai Zhou, Shitong Shao, Lichen Bai, Zhiqiang Xu, Bo Han, Zeke Xie
Abstract: Text-to-image diffusion model is a popular paradigm that synthesizes personalized images by providing a text prompt and a random Gaussian noise. While people observe that some noises are ``golden noises'' that can achieve better text-image alignment and higher human preference than others, we still lack a machine learning framework to obtain those golden noises. To learn golden noises for diffusion sampling, we mainly make three contributions in this paper. First, we identify a new concept termed the \textit{noise prompt}, which aims at turning a random Gaussian noise into a golden noise by adding a small desirable perturbation derived from the text prompt. Following the concept, we first formulate the \textit{noise prompt learning} framework that systematically learns ``prompted'' golden noise associated with a text prompt for diffusion models. Second, we design a noise prompt data collection pipeline and collect a large-scale \textit{noise prompt dataset}~(NPD) that contains 100k pairs of random noises and golden noises with the associated text prompts. With the prepared NPD as the training dataset, we trained a small \textit{noise prompt network}~(NPNet) that can directly learn to transform a random noise into a golden noise. The learned golden noise perturbation can be considered as a kind of prompt for noise, as it is rich in semantic information and tailored to the given text prompt. Third, our extensive experiments demonstrate the impressive effectiveness and generalization of NPNet on improving the quality of synthesized images across various diffusion models, including SDXL, DreamShaper-xl-v2-turbo, and Hunyuan-DiT. Moreover, NPNet is a small and efficient controller that acts as a plug-and-play module with very limited additional inference and computational costs, as it just provides a golden noise instead of a random noise without accessing the original pipeline.
Authors: Jan Hansen-Palmus, Michael Truong-Le, Oliver Hausd\"orfer, Alok Verma
Abstract: Large Language Models (LLMs) have pushed the frontier of artificial intelligence but are comprised of hundreds of billions of parameters and operations. For faster inference latency, LLMs are deployed on multiple hardware accelerators through various Model Parallelism strategies. Our paper looks into the details on one such strategy - Tensor Parallel - and proposes to reduce latency by compressing inter-accelerator communication. We leverage fine grained quantization techniques to compress selected activations by 3.5 - 4.5x. Our proposed method leads up to 2x reduction of time-to-first-token (TTFT) with negligible model performance degradation.
Authors: Lena Stempfle, Arthur James, Julie Josse, Tobias Gauss, Fredrik D. Johansson
Abstract: Inherently interpretable machine learning (IML) models provide valuable insights for clinical decision-making but face challenges when features have missing values. Classical solutions like imputation or excluding incomplete records are often unsuitable in applications where values are missing at test time. In this work, we conducted a survey with 71 clinicians from 29 trauma centers across France, including 20 complete responses to study the interaction between medical professionals and IML applied to data with missing values. This provided valuable insights into how missing data is interpreted in clinical machine learning. We used the prediction of hemorrhagic shock as a concrete example to gauge the willingness and readiness of the participants to adopt IML models from three classes of methods. Our findings show that, while clinicians value interpretability and are familiar with common IML methods, classical imputation techniques often misalign with their intuition, and that models that natively handle missing values are preferred. These results emphasize the need to integrate clinical intuition into future IML models for better human-computer interaction.
Authors: Zhengyi Wang, Jonathan Lorraine, Yikai Wang, Hang Su, Jun Zhu, Sanja Fidler, Xiaohui Zeng
Abstract: This work explores expanding the capabilities of large language models (LLMs) pretrained on text to generate 3D meshes within a unified model. This offers key advantages of (1) leveraging spatial knowledge already embedded in LLMs, derived from textual sources like 3D tutorials, and (2) enabling conversational 3D generation and mesh understanding. A primary challenge is effectively tokenizing 3D mesh data into discrete tokens that LLMs can process seamlessly. To address this, we introduce LLaMA-Mesh, a novel approach that represents the vertex coordinates and face definitions of 3D meshes as plain text, allowing direct integration with LLMs without expanding the vocabulary. We construct a supervised fine-tuning (SFT) dataset enabling pretrained LLMs to (1) generate 3D meshes from text prompts, (2) produce interleaved text and 3D mesh outputs as required, and (3) understand and interpret 3D meshes. Our work is the first to demonstrate that LLMs can be fine-tuned to acquire complex spatial knowledge for 3D mesh generation in a text-based format, effectively unifying the 3D and text modalities. LLaMA-Mesh achieves mesh generation quality on par with models trained from scratch while maintaining strong text generation performance.
Authors: Jifan Gao, Guanhua Chen
Abstract: Causal concept effect estimation is gaining increasing interest in the field of interpretable machine learning. This general approach explains the behaviors of machine learning models by estimating the causal effect of human-understandable concepts, which represent high-level knowledge more comprehensibly than raw inputs like tokens. However, existing causal concept effect explanation methods assume complete observation of all concepts involved within the dataset, which can fail in practice due to incomplete annotations or missing concept data. We theoretically demonstrate that unobserved concepts can bias the estimation of the causal effects of observed concepts. To address this limitation, we introduce the Missingness-aware Causal Concept Explainer (MCCE), a novel framework specifically designed to estimate causal concept effects when not all concepts are observable. Our framework learns to account for residual bias resulting from missing concepts and utilizes a linear predictor to model the relationships between these concepts and the outputs of black-box machine learning models. It can offer explanations on both local and global levels. We conduct validations using a real-world dataset, demonstrating that MCCE achieves promising performance compared to state-of-the-art explanation methods in causal concept effect estimation.
Authors: Alkis Kalavasis, Anay Mehrotra, Grigoris Velegkas
Abstract: Specifying all desirable properties of a language model is challenging, but certain requirements seem essential. Given samples from an unknown language, the trained model should produce valid strings not seen in training and be expressive enough to capture the language's full richness. Otherwise, outputting invalid strings constitutes "hallucination," and failing to capture the full range leads to "mode collapse." We ask if a language model can meet both requirements. We investigate this within a statistical language generation setting building on Gold and Angluin. Here, the model receives random samples from a distribution over an unknown language K, which belongs to a possibly infinite collection of languages. The goal is to generate unseen strings from K. We say the model generates from K with consistency and breadth if, as training size increases, its output converges to all unseen strings in K. Kleinberg and Mullainathan [KM24] asked if consistency and breadth in language generation are possible. We answer this negatively: for a large class of language models, including next-token prediction models, this is impossible for most collections of candidate languages. This contrasts with [KM24]'s result, showing consistent generation without breadth is possible for any countable collection of languages. Our finding highlights that generation with breadth fundamentally differs from generation without breadth. As a byproduct, we establish near-tight bounds on the number of samples needed for generation with or without breadth. Finally, our results offer hope: consistent generation with breadth is achievable for any countable collection of languages when negative examples (strings outside K) are available alongside positive ones. This suggests that post-training feedback, which encodes negative examples, can be crucial in reducing hallucinations while limiting mode collapse.
Authors: Benedikt Alkin, Tobias Kronlachner, Samuele Papa, Stefan Pirker, Thomas Lichtenegger, Johannes Brandstetter
Abstract: Advancements in computing power have made it possible to numerically simulate large-scale fluid-mechanical and/or particulate systems, many of which are integral to core industrial processes. Among the different numerical methods available, the discrete element method (DEM) provides one of the most accurate representations of a wide range of physical systems involving granular and discontinuous materials. Consequently, DEM has become a widely accepted approach for tackling engineering problems connected to granular flows and powder mechanics. Additionally, DEM can be integrated with grid-based computational fluid dynamics (CFD) methods, enabling the simulation of chemical processes taking place, e.g., in fluidized beds. However, DEM is computationally intensive because of the intrinsic multiscale nature of particulate systems, restricting simulation duration or number of particles. Towards this end, NeuralDEM presents an end-to-end approach to replace slow numerical DEM routines with fast, adaptable deep learning surrogates. NeuralDEM is capable of picturing long-term transport processes across different regimes using macroscopic observables without any reference to microscopic model parameters. First, NeuralDEM treats the Lagrangian discretization of DEM as an underlying continuous field, while simultaneously modeling macroscopic behavior directly as additional auxiliary fields. Second, NeuralDEM introduces multi-branch neural operators scalable to real-time modeling of industrially-sized scenarios - from slow and pseudo-steady to fast and transient. Such scenarios have previously posed insurmountable challenges for deep learning models. Notably, NeuralDEM faithfully models coupled CFD-DEM fluidized bed reactors of 160k CFD cells and 500k DEM particles for trajectories of 28s. NeuralDEM will open many new doors to advanced engineering and much faster process cycles.
Authors: Alexander C. Li, Yuandong Tian, Beidi Chen, Deepak Pathak, Xinlei Chen
Abstract: Conventional wisdom suggests that pre-training Vision Transformers (ViT) improves downstream performance by learning useful representations. Is this actually true? We investigate this question and find that the features and representations learned during pre-training are not essential. Surprisingly, using only the attention patterns from pre-training (i.e., guiding how information flows between tokens) is sufficient for models to learn high quality features from scratch and achieve comparable downstream performance. We show this by introducing a simple method called attention transfer, where only the attention patterns from a pre-trained teacher ViT are transferred to a student, either by copying or distilling the attention maps. Since attention transfer lets the student learn its own features, ensembling it with a fine-tuned teacher also further improves accuracy on ImageNet. We systematically study various aspects of our findings on the sufficiency of attention maps, including distribution shift settings where they underperform fine-tuning. We hope our exploration provides a better understanding of what pre-training accomplishes and leads to a useful alternative to the standard practice of fine-tuning
Authors: Yang Xu, Fatemeh Pourahmadian
Abstract: A deep learning framework is developed for multiscale characterization of poroelastic media from full waveform data which is known as poroelastography. Special attention is paid to heterogeneous environments whose multiphase properties may drastically change across several scales. Described in space-frequency, the data takes the form of focal solid displacement and pore pressure fields in various neighborhoods furnished either by reconstruction from remote data or direct measurements depending on the application. The objective is to simultaneously recover the six hydromechanical properties germane to Biot equations and their spatial distribution in a robust and efficient manner. Two major challenges impede direct application of existing state-of-the-art techniques for this purpose: (i) the sought-for properties belong to vastly different and potentially uncertain scales, and~(ii) the loss function is multi-objective and multi-scale (both in terms of its individual components and the total loss). To help bridge the gap, we propose the idea of \emph{network scaling} where the neural property maps are constructed by unit shape functions composed into a scaling layer. In this model, the unknown network parameters (weights and biases) remain of O(1) during training. This forms the basis for explicit scaling of the loss components and their derivatives with respect to the network parameters. Thereby, we propose the physics-based \emph{dynamic scaling} approach for adaptive loss balancing. The idea is first presented in a generic form for multi-physics and multi-scale PDE systems, and then applied through a set of numerical experiments to poroelastography. The results are presented along with reconstructions by way of gradient normalization (GradNorm) and Softmax adaptive weights (SoftAdapt) for loss balancing. A comparative analysis of the methods and corresponding results is provided.
Authors: Ruili Zhao, Jun Cai, Jiangtao Luo, Junpeng Gao, Yongyi Ran
Abstract: Low-Earth orbit (LEO) satellites utilizing beam hopping (BH) technology offer extensive coverage, low latency, high bandwidth, and significant flexibility. However, the uneven geographical distribution and temporal variability of ground traffic demands, combined with the high mobility of LEO satellites, present significant challenges for efficient beam resource utilization. Traditional BH methods based on GEO satellites fail to address issues such as satellite interference, overlapping coverage, and mobility. This paper explores a Digital Twin (DT)-based collaborative resource allocation network for multiple LEO satellites with overlapping coverage areas. A two-tier optimization problem, focusing on load balancing and cell service fairness, is proposed to maximize throughput and minimize inter-cell service delay. The DT layer optimizes the allocation of overlapping coverage cells by designing BH patterns for each satellite, while the LEO layer optimizes power allocation for each selected service cell. At the DT layer, an Actor-Critic network is deployed on each agent, with a global critic network in the cloud center. The A3C algorithm is employed to optimize the DT layer. Concurrently, the LEO layer optimization is performed using a Multi-Agent Reinforcement Learning algorithm, where each beam functions as an independent agent. The simulation results show that this method reduces satellite load disparity by about 72.5% and decreases the average delay to 12ms. Additionally, our approach outperforms other benchmarks in terms of throughput, ensuring a better alignment between offered and requested data.
Authors: Yijia Xiao, Edward Sun, Yiqiao Jin, Wei Wang
Abstract: RNAs are essential molecules that carry genetic information vital for life, with profound implications for drug development and biotechnology. Despite this importance, RNA research is often hindered by the vast literature available on the topic. To streamline this process, we introduce RNA-GPT, a multi-modal RNA chat model designed to simplify RNA discovery by leveraging extensive RNA literature. RNA-GPT integrates RNA sequence encoders with linear projection layers and state-of-the-art large language models (LLMs) for precise representation alignment, enabling it to process user-uploaded RNA sequences and deliver concise, accurate responses. Built on a scalable training pipeline, RNA-GPT utilizes RNA-QA, an automated system that gathers RNA annotations from RNACentral using a divide-and-conquer approach with GPT-4o and latent Dirichlet allocation (LDA) to efficiently handle large datasets and generate instruction-tuning samples. Our experiments indicate that RNA-GPT effectively addresses complex RNA queries, thereby facilitating RNA research. Additionally, we present RNA-QA, a dataset of 407,616 RNA samples for modality alignment and instruction tuning, further advancing the potential of RNA research tools.
Authors: Finn Bartels, Lu Xing, Cise Midoglu, Matthias Boeker, Toralf Kirsten, P{\aa}l Halvorsen
Abstract: We present SoccerGuard, a novel framework for predicting injuries in women's soccer using Machine Learning (ML). This framework can ingest data from multiple sources, including subjective wellness and training load reports from players, objective GPS sensor measurements, third-party player statistics, and injury reports verified by medical personnel. We experiment with a number of different settings related to synthetic data generation, input and output window sizes, and ML models for prediction. Our results show that, given the right configurations and feature combinations, injury event prediction can be undertaken with considerable accuracy. The optimal results are achieved when input windows are reduced and larger combined output windows are defined, in combination with an ideally balanced data set. The framework also includes a dashboard with a user-friendly Graphical User Interface (GUI) to support interactive analysis and visualization.
Authors: Shrey Shah, Alex Lin, Scott Lin, Josh Patel, Michael Lam, Kevin Zhu
Abstract: Accurate damage prediction is crucial for disaster preparedness and response strategies, particularly given the frequent earthquakes in Turkey. Utilizing datasets on earthquake data, infrastructural quality metrics, and contemporary socioeconomic factors, we tested various machine-learning architectures to forecast death tolls and fatalities per affected population. Our findings indicate that the Random Forest model provides the most reliable predictions. The model highlights earthquake magnitude and building stability as the primary determinants of damage. This research contributes to the reduction of fatalities in future seismic events in Turkey.
Authors: Yingheng Wang, Zichen Wang, Gil Sadeh, Luca Zancato, Alessandro Achille, George Karypis, Huzefa Rangwala
Abstract: Self-supervised training of language models (LMs) has seen great success for protein sequences in learning meaningful representations and for generative drug design. Most protein LMs are based on the Transformer architecture trained on individual proteins with short context lengths. Such protein LMs cannot extrapolate to longer proteins and protein complexes well. They also fail to account for the underlying biological mechanisms carried out by biomolecular interactions and dynamics i.e., proteins often interact with other proteins, molecules, and pathways in complex biological systems. In this work, we propose LC-PLM based on an alternative protein LM architecture, BiMamba-S, built off selective structured state-space models, to learn high-quality universal protein representations at the amino acid token level using masked language modeling. We also introduce its graph-contextual variant, LC-PLM-G, which contextualizes protein-protein interaction (PPI) graphs for a second stage of training. LC-PLM demonstrates favorable neural scaling laws, better length extrapolation capability, and a 7% to 34% improvement on protein downstream tasks than Transformer-based ESM-2. LC-PLM-G further trained within the context of PPI graphs shows promising results on protein structure and function prediction tasks. Our study demonstrates the benefit of increasing the context size with computationally efficient LM architecture (e.g. structured state space models) in learning universal protein representations and incorporating molecular interaction context contained in biological graphs.
Authors: Sami Baral, Eamon Worden, Wen-Chiang Lim, Zhuang Luo, Christopher Santorelli, Ashish Gurung, Neil Heffernan
Abstract: The effectiveness of feedback in enhancing learning outcomes is well documented within Educational Data Mining (EDM). Various prior research has explored methodologies to enhance the effectiveness of feedback. Recent developments in Large Language Models (LLMs) have extended their utility in enhancing automated feedback systems. This study aims to explore the potential of LLMs in facilitating automated feedback in math education. We examine the effectiveness of LLMs in evaluating student responses by comparing 3 different models: Llama, SBERT-Canberra, and GPT4 model. The evaluation requires the model to provide both a quantitative score and qualitative feedback on the student's responses to open-ended math problems. We employ Mistral, a version of Llama catered to math, and fine-tune this model for evaluating student responses by leveraging a dataset of student responses and teacher-written feedback for middle-school math problems. A similar approach was taken for training the SBERT model as well, while the GPT4 model used a zero-shot learning approach. We evaluate the model's performance in scoring accuracy and the quality of feedback by utilizing judgments from 2 teachers. The teachers utilized a shared rubric in assessing the accuracy and relevance of the generated feedback. We conduct both quantitative and qualitative analyses of the model performance. By offering a detailed comparison of these methods, this study aims to further the ongoing development of automated feedback systems and outlines potential future directions for leveraging generative LLMs to create more personalized learning experiences.
Authors: Xuan Hu, Qijun Chen, Nicholas H. Luo, Richy J. Zheng, Shaofan Li
Abstract: Peridynamics is a non-local continuum mechanics theory that offers unique advantages for modeling problems involving discontinuities and complex deformations. Within the peridynamic framework, various formulations exist, among which the material correspondence formulation stands out for its ability to directly incorporate traditional continuum material models, making it highly applicable to a range of engineering challenges. A notable advancement in this area is the bond-associated correspondence model, which not only resolves issues of material instability but also achieves high computational accuracy. However, the bond-associated model typically requires higher computational costs than FEA, which can limit its practical application. To address this computational challenge, we propose a novel surrogate model based on a message-passing neural network (MPNN) specifically designed for the bond-associated peridynamic material correspondence formulation. Leveraging the similarities between graph structure and the neighborhood connectivity inherent to peridynamics, we construct an MPNN that can transfers domain knowledge from peridynamics into a computational graph and shorten the computation time via GPU acceleration. Unlike conventional graph neural networks that focus on node features, our model emphasizes edge-based features, capturing the essential material point interactions in the formulation. A key advantage of this neural network approach is its flexibility: it does not require fixed neighborhood connectivity, making it adaptable across diverse configurations and scalable for complex systems. Furthermore, the model inherently possesses translational and rotational invariance, enabling it to maintain physical objectivity: a critical requirement for accurate mechanical modeling.
Authors: Rohit Singh, Anil Kumar Yerrapragada, Radha Krishna Ganti
Abstract: Random Access is a critical procedure using which a User Equipment (UE) identifies itself to a Base Station (BS). Random Access starts with the UE transmitting a random preamble on the Physical Random Access Channel (PRACH). In a conventional BS receiver, the UE's specific preamble is identified by correlation with all the possible preambles. The PRACH signal is also used to estimate the timing advance which is induced by propagation delay. Correlation-based receivers suffer from false peaks and missed detection in scenarios dominated by high fading and low signal-to-noise ratio. This paper describes the design of a hybrid receiver that consists of an AI/ML model for preamble detection followed by conventional peak detection for the Timing Advance estimation. The proposed receiver combines the Power Delay Profiles of correlation windows across multiple antennas and uses the combination as input to a Neural Network model. The model predicts the presence or absence of a user in a particular preamble window, after which the timing advance is estimated by peak detection. Results show superior performance of the hybrid receiver compared to conventional receivers both for simulated and real hardware-captured datasets.
Authors: Amirabbas Afzali, Borna Khodabandeh, Ali Rasekh, Mahyar JafariNodeh, Sepehr kazemi, Simon Gottschalk
Abstract: Contrastive learning models have demonstrated impressive abilities to capture semantic similarities by aligning representations in the embedding space. However, their performance can be limited by the quality of the training data and its inherent biases. While Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) have been applied to generative models to align them with human preferences, their use in contrastive learning has yet to be explored. This paper introduces a novel method for training contrastive learning models using Preference Optimization (PO) to break down complex concepts. Our method systematically aligns model behavior with desired preferences, enhancing performance on the targeted task. In particular, we focus on enhancing model robustness against typographic attacks, commonly seen in contrastive models like CLIP. We further apply our method to disentangle gender understanding and mitigate gender biases, offering a more nuanced control over these sensitive attributes. Our experiments demonstrate that models trained using PO outperform standard contrastive learning techniques while retaining their ability to handle adversarial challenges and maintain accuracy on other downstream tasks. This makes our method well-suited for tasks requiring fairness, robustness, and alignment with specific preferences. We evaluate our method on several vision-language tasks, tackling challenges such as typographic attacks. Additionally, we explore the model's ability to disentangle gender concepts and mitigate gender bias, showcasing the versatility of our approach.
Authors: Suhyeok Jang, Seojin Kim, Jinwoo Shin, Jongheon Jeong
Abstract: The remarkable advances in deep learning have led to the emergence of many off-the-shelf classifiers, e.g., large pre-trained models. However, since they are typically trained on clean data, they remain vulnerable to adversarial attacks. Despite this vulnerability, their superior performance and transferability make off-the-shelf classifiers still valuable in practice, demanding further work to provide adversarial robustness for them in a post-hoc manner. A recently proposed method, denoised smoothing, leverages a denoiser model in front of the classifier to obtain provable robustness without additional training. However, the denoiser often creates hallucination, i.e., images that have lost the semantics of their originally assigned class, leading to a drop in robustness. Furthermore, its noise-and-denoise procedure introduces a significant distribution shift from the original distribution, causing the denoised smoothing framework to achieve sub-optimal robustness. In this paper, we introduce Fine-Tuning with Confidence-Aware Denoised Image Selection (FT-CADIS), a novel fine-tuning scheme to enhance the certified robustness of off-the-shelf classifiers. FT-CADIS is inspired by the observation that the confidence of off-the-shelf classifiers can effectively identify hallucinated images during denoised smoothing. Based on this, we develop a confidence-aware training objective to handle such hallucinated images and improve the stability of fine-tuning from denoised images. In this way, the classifier can be fine-tuned using only images that are beneficial for adversarial robustness. We also find that such a fine-tuning can be done by updating a small fraction of parameters of the classifier. Extensive experiments demonstrate that FT-CADIS has established the state-of-the-art certified robustness among denoised smoothing methods across all $\ell_2$-adversary radius in various benchmarks.
Authors: Carles Mil\`a, Teodimiro Matsena, Edgar Jamisse, Jovito Nunes, Quique Bassat, Paula Petrone, Elisa Sicuri, Charfudin Sacoor, Cathryn Tonne
Abstract: Many studies have predicted SocioEconomic Position (SEP) for aggregated spatial units such as villages using satellite data, but SEP prediction at the household level and other sources of imagery have not been yet explored. We assembled a dataset of 975 households in a semi-rural district in southern Mozambique, consisting of self-reported asset, expenditure, and income SEP data, as well as multimodal imagery including satellite images and a ground-based photograph survey of 11 household elements. We fine-tuned a convolutional neural network to extract feature vectors from the images, which we then used in regression analyzes to model household SEP using different sets of image types. The best prediction performance was found when modeling asset-based SEP using random forest models with all image types, while the performance for expenditure- and income-based SEP was lower. Using SHAP, we observed clear differences between the images with the largest positive and negative effects, as well as identified the most relevant household elements in the predictions. Finally, we fitted an additional reduced model using only the identified relevant household elements, which had an only slightly lower performance compared to models using all images. Our results show how ground-based household photographs allow to zoom in from an area-level to an individual household prediction while minimizing the data collection effort by using explainable machine learning. The developed workflow can be potentially integrated into routine household surveys, where the collected household imagery could be used for other purposes, such as refined asset characterization and environmental exposure assessment.
Authors: Maria Miguel Beir\~ao, Jo\~ao Matos, Tiago Gon\c{c}alves, Camila Kase, Luis Filipe Nakayama, Denise de Freitas, Jaime S. Cardoso
Abstract: Keratitis is an inflammatory corneal condition responsible for 10% of visual impairment in low- and middle-income countries (LMICs), with bacteria, fungi, or amoeba as the most common infection etiologies. While an accurate and timely diagnosis is crucial for the selected treatment and the patients' sight outcomes, due to the high cost and limited availability of laboratory diagnostics in LMICs, diagnosis is often made by clinical observation alone, despite its lower accuracy. In this study, we investigate and compare different deep learning approaches to diagnose the source of infection: 1) three separate binary models for infection type predictions; 2) a multitask model with a shared backbone and three parallel classification layers (Multitask V1); and, 3) a multitask model with a shared backbone and a multi-head classification layer (Multitask V2). We used a private Brazilian cornea dataset to conduct the empirical evaluation. We achieved the best results with Multitask V2, with an area under the receiver operating characteristic curve (AUROC) confidence intervals of 0.7413-0.7740 (bacteria), 0.8395-0.8725 (fungi), and 0.9448-0.9616 (amoeba). A statistical analysis of the impact of patient features on models' performance revealed that sex significantly affects amoeba infection prediction, and age seems to affect fungi and bacteria predictions.
Authors: Ravi Kant Gupta, Shounak Das, Amit Sethi
Abstract: Whole Slide Imaging (WSI) is a cornerstone of digital pathology, offering detailed insights critical for diagnosis and research. Yet, the gigapixel size of WSIs imposes significant computational challenges, limiting their practical utility. Our novel approach addresses these challenges by leveraging various encoders for intelligent data reduction and employing a different classification model to ensure robust, permutation-invariant representations of WSIs. A key innovation of our method is the ability to distill the complex information of an entire WSI into a single vector, effectively capturing the essential features needed for accurate analysis. This approach significantly enhances the computational efficiency of WSI analysis, enabling more accurate pathological assessments without the need for extensive computational resources. This breakthrough equips us with the capability to effectively address the challenges posed by large image resolutions in whole-slide imaging, paving the way for more scalable and effective utilization of WSIs in medical diagnostics and research, marking a significant advancement in the field.
Authors: Penghui Yang, Chen-Chen Zong, Sheng-Jun Huang, Lei Feng, Bo An
Abstract: Traditional knowledge distillation focuses on aligning the student's predicted probabilities with both ground-truth labels and the teacher's predicted probabilities. However, the transition to predicted probabilities from logits would obscure certain indispensable information. To address this issue, it is intuitive to additionally introduce a logit-level loss function as a supplement to the widely used probability-level loss function, for exploiting the latent information of logits. Unfortunately, we empirically find that the amalgamation of the newly introduced logit-level loss and the previous probability-level loss will lead to performance degeneration, even trailing behind the performance of employing either loss in isolation. We attribute this phenomenon to the collapse of the classification head, which is verified by our theoretical analysis based on the neural collapse theory. Specifically, the gradients of the two loss functions exhibit contradictions in the linear classifier yet display no such conflict within the backbone. Drawing from the theoretical analysis, we propose a novel method called dual-head knowledge distillation, which partitions the linear classifier into two classification heads responsible for different losses, thereby preserving the beneficial effects of both losses on the backbone while eliminating adverse influences on the classification head. Extensive experiments validate that our method can effectively exploit the information inside the logits and achieve superior performance against state-of-the-art counterparts.
Authors: Juan Pablo Contreras, Crist\'obal Guzm\'an, David Mart\'inez-Rubio
Abstract: We develop algorithms for the optimization of convex objectives that have H\"older continuous $q$-th derivatives with respect to a $p$-norm by using a $q$-th order oracle, for $p, q \geq 1$. We can also optimize other structured functions. We do this by developing a non-Euclidean inexact accelerated proximal point method that makes use of an inexact uniformly convex regularizer. We also provide nearly matching lower bounds for any deterministic algorithm that interacts with the function via a local oracle.
Authors: Nicklas Boserup, Gefan Yang, Michael Lind Severinsen, Christy Anna Hipsley, Stefan Sommer
Abstract: We introduce a methodology for performing parameter inference in high-dimensional, non-linear diffusion processes. We illustrate its applicability for obtaining insights into the evolution of and relationships between species, including ancestral state reconstruction. Estimation is performed by utilising score matching to approximate diffusion bridges, which are subsequently used in an importance sampler to estimate log-likelihoods. The entire setup is differentiable, allowing gradient ascent on approximated log-likelihoods. This allows both parameter inference and diffusion mean estimation. This novel, numerically stable, score matching-based parameter inference framework is presented and demonstrated on biological two- and three-dimensional morphometry data.
Authors: Daniele Bracale, Subha Maity, Felipe Maia Polo, Seamus Somerstep, Moulinath Banerjee, Yuekai Sun
Abstract: Often in prediction tasks, the predictive model itself can influence the distribution of the target variable, a phenomenon termed performative prediction. Generally, this influence stems from strategic actions taken by stakeholders with a vested interest in predictive models. A key challenge that hinders the widespread adaptation of performative prediction in machine learning is that practitioners are generally unaware of the social impacts of their predictions. To address this gap, we propose a methodology for learning the distribution map that encapsulates the long-term impacts of predictive models on the population. Specifically, we model agents' responses as a cost-adjusted utility maximization problem and propose estimates for said cost. Our approach leverages optimal transport to align pre-model exposure (ex ante) and post-model exposure (ex post) distributions. We provide a rate of convergence for this proposed estimate and assess its quality through empirical demonstrations on a credit-scoring dataset.
Authors: Moran Yanuka, Assaf Ben Kish, Yonatan Bitton, Idan Szpektor, Raja Giryes
Abstract: Recent research increasingly focuses on training vision-language models (VLMs) with long, detailed image captions. However, small-scale VLMs often struggle to balance the richness of these captions with the risk of hallucinating content during fine-tuning. In this paper, we explore how well VLMs adapt to such captions. To quantify caption quality, we propose Decomposed NLI (DNLI), an evaluation framework that breaks down generated captions into individual propositions, assessing each in isolation. This fine-grained analysis reveals a critical balance between capturing descriptive details and preventing hallucinations. Our findings show that simply reducing caption complexity or employing standard data curation techniques does not effectively resolve this issue. To tackle this challenge, we introduce Knowledge Adapted (KnowAda) fine-tuning, a data-centric approach that automatically adapts training data with the model's existing knowledge and visual understanding. KnowAda minimizes hallucinations while preserving high descriptiveness. We validate this approach across several small-scale VLMs (up to 7B parameters) and dense caption datasets, demonstrating that KnowAda effectively balances hallucination reduction and descriptiveness. Our results show that KnowAda outperforms various baselines in both automatic metrics and human evaluations. We will release our code and models.
Authors: Sanjay Haresh, Daniel Dijkman, Apratim Bhattacharyya, Roland Memisevic
Abstract: Robotics tasks are highly compositional by nature. For example, to perform a high-level task like cleaning the table a robot must employ low-level capabilities of moving the effectors to the objects on the table, pick them up and then move them off the table one-by-one, while re-evaluating the consequently dynamic scenario in the process. Given that large vision language models (VLMs) have shown progress on many tasks that require high level, human-like reasoning, we ask the question: if the models are taught the requisite low-level capabilities, can they compose them in novel ways to achieve interesting high-level tasks like cleaning the table without having to be explicitly taught so? To this end, we present ClevrSkills - a benchmark suite for compositional reasoning in robotics. ClevrSkills is an environment suite developed on top of the ManiSkill2 simulator and an accompanying dataset. The dataset contains trajectories generated on a range of robotics tasks with language and visual annotations as well as multi-modal prompts as task specification. The suite includes a curriculum of tasks with three levels of compositional understanding, starting with simple tasks requiring basic motor skills. We benchmark multiple different VLM baselines on ClevrSkills and show that even after being pre-trained on large numbers of tasks, these models fail on compositional reasoning in robotics tasks.
Authors: Jongmin Mun, Seungwoo Kwak, Ilmun Kim
Abstract: We explore the trade-off between privacy and statistical utility in private two-sample testing under local differential privacy (LDP) for both multinomial and continuous data. We begin by addressing the multinomial case, where we introduce private permutation tests using practical privacy mechanisms such as Laplace, discrete Laplace, and Google's RAPPOR. We then extend our multinomial approach to continuous data via binning and study its uniform separation rates under LDP over H\"older and Besov smoothness classes. The proposed tests for both discrete and continuous cases rigorously control the type I error for any finite sample size, strictly adhere to LDP constraints, and achieve minimax separation rates under LDP. The attained minimax rates reveal inherent privacy-utility trade-offs that are unavoidable in private testing. To address scenarios with unknown smoothness parameters in density testing, we propose an adaptive test based on a Bonferroni-type approach that ensures robust performance without prior knowledge of the smoothness parameters. We validate our theoretical findings with extensive numerical experiments and demonstrate the practical relevance and effectiveness of our proposed methods.
Authors: Shiyu Wang, Hao Ding, Yupeng Gu, Sergul Aydore, Kousha Kalantari, Branislav Kveton
Abstract: The growth of recommender systems (RecSys) is driven by digitization and the need for personalized content in areas such as e-commerce and video streaming. The content in these systems often changes rapidly and therefore they constantly face the ongoing cold-start problem, where new items lack interaction data and are hard to value. Existing solutions for the cold-start problem, such as content-based recommenders and hybrid methods, leverage item metadata to determine item similarities. The main challenge with these methods is their reliance on structured and informative metadata to capture detailed item similarities, which may not always be available. This paper introduces a novel approach for cold-start item recommendation that utilizes the language model (LM) to estimate item similarities, which are further integrated as a Bayesian prior with classic recommender systems. This approach is generic and able to boost the performance of various recommenders. Specifically, our experiments integrate it with both sequential and collaborative filtering-based recommender and evaluate it on two real-world datasets, demonstrating the enhanced performance of the proposed approach.
Authors: Wenbo Zhang, Aditya Majumdar, Amulya Yadav
Abstract: Code-mixing(CM) or code-switching(CSW) refers to the juxtaposition of linguistic units from two or more languages during the conversation or sometimes even a single utterance. Code-mixing introduces unique challenges in daily life, such as syntactic mismatches and semantic blending, that are rarely encountered in monolingual settings. Large language models (LLMs) have revolutionized the field of natural language processing (NLP) by offering unprecedented capabilities in understanding human languages. However, the effectiveness of current state-of-the-art multilingual LLMs has not yet been fully explored in the CM scenario. To fill this gap, we first benchmark the performance of multilingual LLMs on various code-mixing NLP tasks. Then we propose to improve the multilingual LLMs' ability to understand code-mixing through reinforcement learning from human feedback (RLHF) and code-mixed machine translation tasks. Given the high-cost and time-consuming preference labeling procedure, we improve this by utilizing LLMs as annotators to perform the reinforcement learning from AI feedback (RLAIF). The experiments show the effectiveness of the proposed method.
Authors: Mariusz Wisniewski, Zeeshan A. Rana, Ivan Petrunin, Alan Holt, Stephen Harman
Abstract: Drone detection has benefited from improvements in deep neural networks, but like many other applications, suffers from the availability of accurate data for training. Synthetic data provides a potential for low-cost data generation and has been shown to improve data availability and quality. However, models trained on synthetic datasets need to prove their ability to perform on real-world data, known as the problem of sim-to-real transferability. Here, we present a drone detection Faster-RCNN model trained on a purely synthetic dataset that transfers to real-world data. We found that it achieves an AP_50 of 97.0% when evaluated on the MAV-Vid - a real dataset of flying drones - compared with 97.8% for an equivalent model trained on real-world data. Our results show that using synthetic data for drone detection has the potential to reduce data collection costs and improve labelling quality. These findings could be a starting point for more elaborate synthetic drone datasets. For example, realistic recreations of specific scenarios could de-risk the dataset generation of safety-critical applications such as the detection of drones at airports. Further, synthetic data may enable reliable drone detection systems, which could benefit other areas, such as unmanned traffic management systems. The code is available https://github.com/mazqtpopx/cranfield-synthetic-drone-detection alongside the datasets https://huggingface.co/datasets/mazqtpopx/cranfield-synthetic-drone-detection.
URLs: https://github.com/mazqtpopx/cranfield-synthetic-drone-detection, https://huggingface.co/datasets/mazqtpopx/cranfield-synthetic-drone-detection.
Authors: Chaoyang Luo, Yan Zou, Wanying Li, Nanjing Huang
Abstract: Neural Ordinary Differential Equations (Neural ODEs), as a novel category of modeling big data methods, cleverly link traditional neural networks and dynamical systems. However, it is challenging to ensure the dynamics system reaches a correctly predicted state within a user-defined fixed time. To address this problem, we propose a new method for training Neural ODEs using fixed-time stability (FxTS) Lyapunov conditions. Our framework, called FxTS-Net, is based on the novel FxTS loss (FxTS-Loss) designed on Lyapunov functions, which aims to encourage convergence to accurate predictions in a user-defined fixed time. We also provide an innovative approach for constructing Lyapunov functions to meet various tasks and network architecture requirements, achieved by leveraging supervised information during training. By developing a more precise time upper bound estimation for bounded non-vanishingly perturbed systems, we demonstrate that minimizing FxTS-Loss not only guarantees FxTS behavior of the dynamics but also input perturbation robustness. For optimising FxTS-Loss, we also propose a learning algorithm, in which the simulated perturbation sampling method can capture sample points in critical regions to approximate FxTS-Loss. Experimentally, we find that FxTS-Net provides better prediction performance and better robustness under input perturbation.
Authors: Qin Yang
Abstract: Innate values describe agents' intrinsic motivations, which reflect their inherent interests and preferences to pursue goals and drive them to develop diverse skills satisfying their various needs. The essence of reinforcement learning (RL) is learning from interaction based on reward-driven behaviors, much like natural agents. It is an excellent model to describe the innate-values-driven (IV) behaviors of AI agents. Especially developing the awareness of the AI agent through balancing internal and external utilities based on its needs in different tasks is a crucial problem for individuals learning to support AI agents integrating human society with safety and harmony in the long term. This paper proposes a hierarchical compound intrinsic value reinforcement learning model -- innate-values-driven reinforcement learning termed IVRL to describe the complex behaviors of AI agents' interaction. We formulated the IVRL model and proposed two IVRL models: DQN and A2C. By comparing them with benchmark algorithms such as DQN, DDQN, A2C, and PPO in the Role-Playing Game (RPG) reinforcement learning test platform VIZDoom, we demonstrated that rationally organizing various individual needs can effectively achieve better performance.
Authors: Md Fahim Anjum
Abstract: Recent advances in image generation, particularly via diffusion models, have led to impressive improvements in image synthesis quality. Despite this, diffusion models are still challenged by model-induced artifacts and limited stability in image fidelity. In this work, we hypothesize that the primary cause of this issue is the improper resampling operation that introduces aliasing in the diffusion model and a careful alias-free resampling dictated by image processing theory can improve the model's performance in image synthesis. We propose the integration of alias-free resampling layers into the UNet architecture of diffusion models without adding extra trainable parameters, thereby maintaining computational efficiency. We then assess whether these theory-driven modifications enhance image quality and rotational equivariance. Our experimental results on benchmark datasets, including CIFAR-10, MNIST, and MNIST-M, reveal consistent gains in image quality, particularly in terms of FID and KID scores. Furthermore, we propose a modified diffusion process that enables user-controlled rotation of generated images without requiring additional training. Our findings highlight the potential of theory-driven enhancements such as alias-free resampling in generative models to improve image quality while maintaining model efficiency and pioneer future research directions to incorporate them into video-generating diffusion models, enabling deeper exploration of the applications of alias-free resampling in generative modeling.
Authors: Gyu Min Kim, Jeong Min Jeon
Abstract: Traditional neural networks (multi-layer perceptrons) have become an important tool in data science due to their success across a wide range of tasks. However, their performance is sometimes unsatisfactory, and they often require a large number of parameters, primarily due to their reliance on the linear combination structure. Meanwhile, additive regression has been a popular alternative to linear regression in statistics. In this work, we introduce novel deep neural networks that incorporate the idea of additive regression. Our neural networks share architectural similarities with Kolmogorov-Arnold networks but are based on simpler yet flexible activation and basis functions. Additionally, we introduce several hybrid neural networks that combine this architecture with that of traditional neural networks. We derive their universal approximation properties and demonstrate their effectiveness through simulation studies and a real-data application. The numerical results indicate that our neural networks generally achieve better performance than traditional neural networks while using fewer parameters.
Authors: Xinyu He, Jose Sepulveda, Mostafa Rahmani, Alyssa Woo, Fei Wang, Hanghang Tong
Abstract: Due to the difficulty of acquiring large-scale explicit user feedback, implicit feedback (e.g., clicks or other interactions) is widely applied as an alternative source of data, where user-item interactions can be modeled as a bipartite graph. Due to the noisy and biased nature of implicit real-world user-item interactions, identifying and rectifying noisy interactions are vital to enhance model performance and robustness. Previous works on purifying user-item interactions in collaborative filtering mainly focus on mining the correlation between user/item embeddings and noisy interactions, neglecting the benefit of temporal patterns in determining noisy interactions. Time information, while enhancing the model utility, also bears its natural advantage in helping to determine noisy edges, e.g., if someone usually watches horror movies at night and talk shows in the morning, a record of watching a horror movie in the morning is more likely to be noisy interaction. Armed with this observation, we introduce a simple yet effective mechanism for generating time-aware user/item embeddings and propose two strategies for denoising bipartite temporal graph in recommender systems (DeBaTeR): the first is through reweighting the adjacency matrix (DeBaTeR-A), where a reliability score is defined to reweight the edges through both soft assignment and hard assignment; the second is through reweighting the loss function (DeBaTeR-L), where weights are generated to reweight user-item samples in the losses. Extensive experiments have been conducted to demonstrate the efficacy of our methods and illustrate how time information indeed helps identifying noisy edges.
Authors: Yinghao Ma, Jiaxi Su, Dong-Ling Deng
Abstract: Classical verification of quantum learning allows classical clients to reliably leverage quantum computing advantages by interacting with untrusted quantum servers. Yet, current quantum devices available in practice suffers from a variety of noises and whether existed classical verification protocols carry over to noisy scenarios remains unclear. Here, we propose an efficient classical error rectification algorithm to reconstruct the noise-free results given by the quantum Fourier sampling circuit with practical constant-level noises. In particular, we prove that the error rectification algorithm can restore the heavy Fourier coefficients by using a small number of noisy samples that scales logarithmically with the problem size. We apply this algorithm to the agnostic parity learning task with uniform input marginal and prove that this task can be accomplished in an efficient way on noisy quantum devices with our algorithm. In addition, we prove that a classical client with access to the random example oracle can verify the agnostic parity learning results from the noisy quantum prover in an efficient way, under the condition that the Fourier coefficients are sparse. Our results demonstrate the feasibility of classical verification of quantum learning advantages with noises, which provide a valuable guide for both theoretical studies and practical applications with current noisy intermediate scale quantum devices.
Authors: Sahibzada Adil Shahzad, Ammarah Hashmi, Yan-Tsung Peng, Yu Tsao, Hsin-Min Wang
Abstract: Multimodal deepfakes involving audiovisual manipulations are a growing threat because they are difficult to detect with the naked eye or using unimodal deep learningbased forgery detection methods. Audiovisual forensic models, while more capable than unimodal models, require large training datasets and are computationally expensive for training and inference. Furthermore, these models lack interpretability and often do not generalize well to unseen manipulations. In this study, we examine the detection capabilities of a large language model (LLM) (i.e., ChatGPT) to identify and account for any possible visual and auditory artifacts and manipulations in audiovisual deepfake content. Extensive experiments are conducted on videos from a benchmark multimodal deepfake dataset to evaluate the detection performance of ChatGPT and compare it with the detection capabilities of state-of-the-art multimodal forensic models and humans. Experimental results demonstrate the importance of domain knowledge and prompt engineering for video forgery detection tasks using LLMs. Unlike approaches based on end-to-end learning, ChatGPT can account for spatial and spatiotemporal artifacts and inconsistencies that may exist within or across modalities. Additionally, we discuss the limitations of ChatGPT for multimedia forensic tasks.
Authors: Franck Rothen, Samuel Klein, Matthew Leigh, Tobias Golling
Abstract: Machine learning is becoming increasingly popular in the context of particle physics. Supervised learning, which uses labeled Monte Carlo (MC) simulations, remains one of the most widely used methods for discriminating signals beyond the Standard Model. However, this paper suggests that supervised models may depend excessively on artifacts and approximations from Monte Carlo simulations, potentially limiting their ability to generalize well to real data. This study aims to enhance the generalization properties of supervised models by reducing the sharpness of local minima. It reviews the application of four distinct white-box adversarial attacks in the context of classifying Higgs boson decay signals. The attacks are divided into weight space attacks, and feature space attacks. To study and quantify the sharpness of different local minima this paper presents two analysis methods: gradient ascent and reduced Hessian eigenvalue analysis. The results show that white-box adversarial attacks significantly improve generalization performance, albeit with increased computational complexity.
Authors: Thivin Anandh, Divij Ghose, Himanshu Jain, Pratham Sunkad, Sashikumaar Ganesan, Volker John
Abstract: This paper proposes and studies two extensions of applying hp-variational physics-informed neural networks, more precisely the FastVPINNs framework, to convection-dominated convection-diffusion-reaction problems. First, a term in the spirit of a SUPG stabilization is included in the loss functional and a network architecture is proposed that predicts spatially varying stabilization parameters. Having observed that the selection of the indicator function in hard-constrained Dirichlet boundary conditions has a big impact on the accuracy of the computed solutions, the second novelty is the proposal of a network architecture that learns good parameters for a class of indicator functions. Numerical studies show that both proposals lead to noticeably more accurate results than approaches that can be found in the literature.
Authors: Zepeng Huo, Jason Alan Fries, Alejandro Lozano, Jeya Maria Jose Valanarasu, Ethan Steinberg, Louis Blankemeier, Akshay S. Chaudhari, Curtis Langlotz, Nigam H. Shah
Abstract: With the rise of medical foundation models and the growing availability of imaging data, scalable pretraining techniques offer a promising way to identify imaging biomarkers predictive of future disease risk. While current self-supervised methods for 3D medical imaging models capture local structural features like organ morphology, they fail to link pixel biomarkers with long-term health outcomes due to a missing context problem. Current approaches lack the temporal context necessary to identify biomarkers correlated with disease progression, as they rely on supervision derived only from images and concurrent text descriptions. To address this, we introduce time-to-event pretraining, a pretraining framework for 3D medical imaging models that leverages large-scale temporal supervision from paired, longitudinal electronic health records (EHRs). Using a dataset of 18,945 CT scans (4.2 million 2D images) and time-to-event distributions across thousands of EHR-derived tasks, our method improves outcome prediction, achieving an average AUROC increase of 23.7% and a 29.4% gain in Harrell's C-index across 8 benchmark tasks. Importantly, these gains are achieved without sacrificing diagnostic classification performance. This study lays the foundation for integrating longitudinal EHR and 3D imaging data to advance clinical risk prediction.
Authors: Dhananjay Tomar, Alexander Binder, Andreas Kleppe
Abstract: Domain generalisation in computational histopathology is challenging because the images are substantially affected by differences among hospitals due to factors like fixation and staining of tissue and imaging equipment. We hypothesise that focusing on nuclei can improve the out-of-domain (OOD) generalisation in cancer detection. We propose a simple approach to improve OOD generalisation for cancer detection by focusing on nuclear morphology and organisation, as these are domain-invariant features critical in cancer detection. Our approach integrates original images with nuclear segmentation masks during training, encouraging the model to prioritise nuclei and their spatial arrangement. Going beyond mere data augmentation, we introduce a regularisation technique that aligns the representations of masks and original images. We show, using multiple datasets, that our method improves OOD generalisation and also leads to increased robustness to image corruptions and adversarial attacks. The source code is available at https://github.com/undercutspiky/SFL/
Authors: Shuzhi Gong, Richard O. Sinnott, Jianzhong Qi, Cecile Paris
Abstract: The spread of fake news on social media poses significant threats to individuals and society. Text-based and graph-based models have been employed for fake news detection by analysing news content and propagation networks, showing promising results in specific scenarios. However, these data-driven models heavily rely on pre-existing in-distribution data for training, limiting their performance when confronted with fake news from emerging or previously unseen domains, known as out-of-distribution (OOD) data. Tackling OOD fake news is a challenging yet critical task. In this paper, we introduce the Causal Subgraph-oriented Domain Adaptive Fake News Detection (CSDA) model, designed to enhance zero-shot fake news detection by extracting causal substructures from propagation graphs using in-distribution data and generalising this approach to OOD data. The model employs a graph neural network based mask generation process to identify dominant nodes and edges within the propagation graph, using these substructures for fake news detection. Additionally, the performance of CSDA is further improved through contrastive learning in few-shot scenarios, where a limited amount of OOD data is available for training. Extensive experiments on public social media datasets demonstrate that CSDA effectively handles OOD fake news detection, achieving a 7 to 16 percents accuracy improvement over other state-of-the-art models.
Authors: Shravan Venkatraman, Jaskaran Singh Walia, Joe Dhanith P R
Abstract: Image classification is a computer vision task where a model analyzes an image to categorize it into a specific label. Vision Transformers (ViT) improve this task by leveraging self-attention to capture complex patterns and long range relationships between image patches. However, a key challenge for ViTs is efficiently incorporating multiscale feature representations, which is inherent in CNNs through their hierarchical structure. In this paper, we introduce the Scale-Aware Graph Attention Vision Transformer (SAG-ViT), a novel framework that addresses this challenge by integrating multi-scale features. Using EfficientNet as a backbone, the model extracts multi-scale feature maps, which are divided into patches to preserve semantic information. These patches are organized into a graph based on spatial and feature similarities, with a Graph Attention Network (GAT) refining the node embeddings. Finally, a Transformer encoder captures long-range dependencies and complex interactions. The SAG-ViT is evaluated on benchmark datasets, demonstrating its effectiveness in enhancing image classification performance.
Authors: Joanna Kaleta, Pawe{\l} Skier\'s, Jan Dubi\'nski, Przemys{\l}aw Korzeniowski, Kamil Deja
Abstract: We introduce Mediffusion -- a new method for semi-supervised learning with explainable classification based on a joint diffusion model. The medical imaging domain faces unique challenges due to scarce data labelling -- insufficient for standard training, and critical nature of the applications that require high performance, confidence, and explainability of the models. In this work, we propose to tackle those challenges with a single model that combines standard classification with a diffusion-based generative task in a single shared parametrisation. By sharing representations, our model effectively learns from both labeled and unlabeled data while at the same time providing accurate explanations through counterfactual examples. In our experiments, we show that our Mediffusion achieves results comparable to recent semi-supervised methods while providing more reliable and precise explanations.
Authors: L. M. Kreusser, H. E. Lockyer, E. H. M\"uller, P. Singh
Abstract: Splitting methods are widely used for solving initial value problems (IVPs) due to their ability to simplify complicated evolutions into more manageable subproblems which can be solved efficiently and accurately. Traditionally, these methods are derived using analytic and algebraic techniques from numerical analysis, including truncated Taylor series and their Lie algebraic analogue, the Baker--Campbell--Hausdorff formula. These tools enable the development of high-order numerical methods that provide exceptional accuracy for small timesteps. Moreover, these methods often (nearly) conserve important physical invariants, such as mass, unitarity, and energy. However, in many practical applications the computational resources are limited. Thus, it is crucial to identify methods that achieve the best accuracy within a fixed computational budget, which might require taking relatively large timesteps. In this regime, high-order methods derived with traditional methods often exhibit large errors since they are only designed to be asymptotically optimal. Machine Learning techniques offer a potential solution since they can be trained to efficiently solve a given IVP with less computational resources. However, they are often purely data-driven, come with limited convergence guarantees in the small-timestep regime and do not necessarily conserve physical invariants. In this work, we propose a framework for finding machine learned splitting methods that are computationally efficient for large timesteps and have provable convergence and conservation guarantees in the small-timestep limit. We demonstrate numerically that the learned methods, which by construction converge quadratically in the timestep size, can be significantly more efficient than established methods for the Schr\"{o}dinger equation if the computational budget is limited.
Authors: Junjie Zhou, Lin Wang, Qiang Meng, Xiaofan Wang
Abstract: Generating realistic and diverse road scenarios is essential for autonomous vehicle testing and validation. Nevertheless, owing to the complexity and variability of real-world road environments, creating authentic and varied scenarios for intelligent driving testing is challenging. In this paper, we propose DiffRoad, a novel diffusion model designed to produce controllable and high-fidelity 3D road scenarios. DiffRoad leverages the generative capabilities of diffusion models to synthesize road layouts from white noise through an inverse denoising process, preserving real-world spatial features. To enhance the quality of generated scenarios, we design the Road-UNet architecture, optimizing the balance between backbone and skip connections for high-realism scenario generation. Furthermore, we introduce a road scenario evaluation module that screens adequate and reasonable scenarios for intelligent driving testing using two critical metrics: road continuity and road reasonableness. Experimental results on multiple real-world datasets demonstrate DiffRoad's ability to generate realistic and smooth road structures while maintaining the original distribution. Additionally, the generated scenarios can be fully automated into the OpenDRIVE format, facilitating generalized autonomous vehicle simulation testing. DiffRoad provides a rich and diverse scenario library for large-scale autonomous vehicle testing and offers valuable insights for future infrastructure designs that are better suited for autonomous vehicles.
Authors: Chen-Long Duan, Yong Li, Xiu-Shen Wei, Lin Zhao
Abstract: Pre-training plays a vital role in various vision tasks, such as object recognition and detection. Commonly used pre-training methods, which typically rely on randomized approaches like uniform or Gaussian distributions to initialize model parameters, often fall short when confronted with long-tailed distributions, especially in detection tasks. This is largely due to extreme data imbalance and the issue of simplicity bias. In this paper, we introduce a novel pre-training framework for object detection, called Dynamic Rebalancing Contrastive Learning with Dual Reconstruction (2DRCL). Our method builds on a Holistic-Local Contrastive Learning mechanism, which aligns pre-training with object detection by capturing both global contextual semantics and detailed local patterns. To tackle the imbalance inherent in long-tailed data, we design a dynamic rebalancing strategy that adjusts the sampling of underrepresented instances throughout the pre-training process, ensuring better representation of tail classes. Moreover, Dual Reconstruction addresses simplicity bias by enforcing a reconstruction task aligned with the self-consistency principle, specifically benefiting underrepresented tail classes. Experiments on COCO and LVIS v1.0 datasets demonstrate the effectiveness of our method, particularly in improving the mAP/AP scores for tail classes.
Authors: Mohamad Mohamad, Francesco Ponzio, Santa Di Cataldo, Damien Ambrosetti, Xavier Descombes
Abstract: Renal Cell Carcinoma is typically asymptomatic at the early stages for many patients. This leads to a late diagnosis of the tumor, where the curability likelihood is lower, and makes the mortality rate of Renal Cell Carcinoma high, with respect to its incidence rate. To increase the survival chance, a fast and correct categorization of the tumor subtype is paramount. Nowadays, computerized methods, based on artificial intelligence, represent an interesting opportunity to improve the productivity and the objectivity of the microscopy-based Renal Cell Carcinoma diagnosis. Nonetheless, much of their exploitation is hampered by the paucity of annotated dataset, essential for a proficient training of supervised machine learning technologies. This study sets out to investigate a novel self supervised training strategy for machine learning diagnostic tools, based on the multi-resolution nature of the histological samples. We aim at reducing the need of annotated dataset, without significantly reducing the accuracy of the tool. We demonstrate the classification capability of our tool on a whole slide imaging dataset for Renal Cancer subtyping, and we compare our solution with several state-of-the-art classification counterparts.
Authors: M. Quattromini, M. A. Bucci, S. Cherubini, O. Semeraro
Abstract: This study presents a novel hybrid approach that combines Graph Neural Networks (GNNs) with Reynolds-Averaged Navier Stokes (RANS) equations to enhance the accuracy of mean flow reconstruction across a range of fluid dynamics applications. Traditional purely data-driven Neural Networks (NNs) models, often struggle maintaining physical consistency. Moreover, they typically require large datasets to achieve reliable performances. The GNN framework, which naturally handles unstructured data such as complex geometries in Computational Fluid Dynamics (CFD), is here integrated with RANS equations as a physical baseline model. The methodology leverages the adjoint method, enabling the use of RANS-derived gradients as optimization terms in the GNN training process. This ensures that the learned model adheres to the governing physics, maintaining physical consistency while improving the prediction accuracy. We test our approach on multiple CFD scenarios, including cases involving generalization with respect to the Reynolds number, sparse measurements, denoising and inpainting of missing portions of the mean flow. The results demonstrate significant improvements in the accuracy of the reconstructed mean flow compared to purely data-driven models, using limited amounts of data in the training dataset. The key strengths of this study are the integration of physical laws into the training process of the GNN, and the ability to achieve high-accuracy predictions with a limited amount of data, making this approach particularly valuable for applications in fluid dynamics where data is often scarce.
Authors: Benedikt B\"ock, Sadaf Syed, Wolfgang Utschick
Abstract: This work addresses the fundamental linear inverse problem in compressive sensing (CS) by introducing a new type of regularizing generative prior. Our proposed method utilizes ideas from classical dictionary-based CS and, in particular, sparse Bayesian learning (SBL), to integrate a strong regularization towards sparse solutions. At the same time, by leveraging the notion of conditional Gaussianity, it also incorporates the adaptability from generative models to training data. However, unlike most state-of-the-art generative models, it is able to learn from a few compressed and noisy data samples and requires no optimization algorithm for solving the inverse problem. Additionally, similar to Dirichlet prior networks, our model parameterizes a conjugate prior enabling its application for uncertainty quantification. We support our approach theoretically through the concept of variational inference and validate it empirically using different types of compressible signals.
Authors: Yunuo Wang, Ningning Yang, Jialin Li
Abstract: Generative Adversarial Networks (GANs) have surfaced as a revolutionary element within the domain of low-dose computed tomography (LDCT) imaging, providing an advanced resolution to the enduring issue of reconciling radiation exposure with image quality. This comprehensive review synthesizes the rapid advancements in GAN-based LDCT denoising techniques, examining the evolution from foundational architectures to state-of-the-art models incorporating advanced features such as anatomical priors, perceptual loss functions, and innovative regularization strategies. We critically analyze various GAN architectures, including conditional GANs (cGANs), CycleGANs, and Super-Resolution GANs (SRGANs), elucidating their unique strengths and limitations in the context of LDCT denoising. The evaluation provides both qualitative and quantitative results related to the improvements in performance in benchmark and clinical datasets with metrics such as PSNR, SSIM, and LPIPS. After highlighting the positive results, we discuss some of the challenges preventing a wider clinical use, including the interpretability of the images generated by GANs, synthetic artifacts, and the need for clinically relevant metrics. The review concludes by highlighting the essential significance of GAN-based methodologies in the progression of precision medicine via tailored LDCT denoising models, underlining the transformative possibilities presented by artificial intelligence within contemporary radiological practice.
Authors: Gagan Aggarwal, Anupam Gupta, Andres Perlroth, Grigoris Velegkas
Abstract: We study a setting where agents use no-regret learning algorithms to participate in repeated auctions. \citet{kolumbus2022auctions} showed, rather surprisingly, that when bidders participate in second-price auctions using no-regret bidding algorithms, no matter how large the number of interactions $T$ is, the runner-up bidder may not converge to bidding truthfully. Our first result shows that this holds for \emph{general deterministic} truthful auctions. We also show that the ratio of the learning rates of the bidders can \emph{qualitatively} affect the convergence of the bidders. Next, we consider the problem of revenue maximization in this environment. In the setting with fully rational bidders, \citet{myerson1981optimal} showed that revenue can be maximized by using a second-price auction with reserves.We show that, in stark contrast, in our setting with learning bidders, \emph{randomized} auctions can have strictly better revenue guarantees than second-price auctions with reserves, when $T$ is large enough. Finally, we study revenue maximization in the non-asymptotic regime. We define a notion of {\em auctioneer regret} comparing the revenue generated to the revenue of a second price auction with truthful bids. When the auctioneer has to use the same auction throughout the interaction, we show an (almost) tight regret bound of $\smash{\widetilde \Theta(T^{3/4})}.$ If the auctioneer can change auctions during the interaction, but in a way that is oblivious to the bids, we show an (almost) tight bound of $\smash{\widetilde \Theta(\sqrt{T})}.$
Authors: M\'arton Sz\'ep, Daniel Rueckert, R\"udiger von Eisenhart-Rothe, Florian Hinterwimmer
Abstract: Employing pre-trained Large Language Models (LLMs) has become the de facto standard in Natural Language Processing (NLP) despite their extensive data requirements. Motivated by the recent surge in research focused on training LLMs with limited data, particularly in low-resource domains and languages, this paper surveys recent transfer learning approaches to optimize model performance in downstream tasks where data is scarce. We first address initial and continued pre-training strategies to better leverage prior knowledge in unseen domains and languages. We then examine how to maximize the utility of limited data during fine-tuning and few-shot learning. The final section takes a task-specific perspective, reviewing models and methods suited for different levels of data scarcity. Our goal is to provide practitioners with practical guidelines for overcoming the challenges posed by constrained data while also highlighting promising directions for future research.
Authors: Zi-Xuan Huang, Jia-Wei Chen, Zhi-Peng Zhang, Chia-Mu Yu
Abstract: Visual prompting (VP) is a new technique that adapts well-trained frozen models for source domain tasks to target domain tasks. This study examines VP's benefits for black-box model-level backdoor detection. The visual prompt in VP maps class subspaces between source and target domains. We identify a misalignment, termed class subspace inconsistency, between clean and poisoned datasets. Based on this, we introduce \textsc{BProm}, a black-box model-level detection method to identify backdoors in suspicious models, if any. \textsc{BProm} leverages the low classification accuracy of prompted models when backdoors are present. Extensive experiments confirm \textsc{BProm}'s effectiveness.
Authors: Nataliya Sevryugina, Serena Costanzo, Steve de Bruyn Kops, Colm-cille Caulfield, Iraj Mortazavi, Taraneh Sayadi
Abstract: Computational Fluid Dynamics (CFD) is an indispensable method of fluid modelling in engineering applications, reducing the need for physical prototypes and testing for tasks such as design optimisation and performance analysis. Depending on the complexity of the system under consideration, models ranging from low to high fidelity can be used for prediction, allowing significant speed-up. However, the choice of model requires information about the actual dynamics of the flow regime. Correctly identifying the regions/clusters of flow that share the same dynamics has been a challenging research topic to date. In this study, we propose a novel hybrid approach to flow clustering. It consists of characterising each sample point of the system with equation-based features, i.e. features are budgets that represent the contribution of each term from the original governing equation to the local dynamics at each sample point. This was achieved by applying the Sparse Identification of Nonlinear Dynamical systems (SINDy) method pointwise to time evolution data. The method proceeds with equation-based clustering using the Girvan-Newman algorithm. This allows the detection of communities that share the same physical dynamics. The algorithm is implemented in both Eulerian and Lagrangian frameworks. In the Lagrangian, i.e. dynamic approach, the clustering is performed on the trajectory of each point, allowing the change of clusters to be represented also in time. The performance of the algorithm is first tested on a flow around a cylinder. The construction of the dynamic clusters in this test case clearly shows the evolution of the wake from the steady state solution through the transient to the oscillatory solution. Dynamic clustering was then successfully tested on turbulent flow data. Two distinct and well-defined clusters were identified and their temporal evolution was reconstructed.
Authors: Anindya Sundar Das, Guansong Pang, Monowar Bhuyan
Abstract: Visual anomaly detection targets to detect images that notably differ from normal pattern, and it has found extensive application in identifying defective parts within the manufacturing industry. These anomaly detection paradigms predominantly focus on training detection models using only clean, unlabeled normal samples, assuming an absence of contamination; a condition often unmet in real-world scenarios. The performance of these methods significantly depends on the quality of the data and usually decreases when exposed to noise. We introduce a systematic adaptive method that employs deviation learning to compute anomaly scores end-to-end while addressing data contamination by assigning relative importance to the weights of individual instances. In this approach, the anomaly scores for normal instances are designed to approximate scalar scores obtained from the known prior distribution. Meanwhile, anomaly scores for anomaly examples are adjusted to exhibit statistically significant deviations from these reference scores. Our approach incorporates a constrained optimization problem within the deviation learning framework to update instance weights, resolving this problem for each mini-batch. Comprehensive experiments on the MVTec and VisA benchmark datasets indicate that our proposed method surpasses competing techniques and exhibits both stability and robustness in the presence of data contamination.
Authors: Qianqian Zhang, Ye Hu, Minchae Jung
Abstract: The rapid advancement of low Earth orbit (LEO) satellite communication systems has significantly enhanced global connectivity, offering high-capacity, low-latency services crucial for next-generation applications. However, the dense configuration of LEO constellations poses challenges in resource allocation optimization and interference management, complicating coexistence with other communication systems. To address these limitations, this paper proposes a novel framework for optimizing the beam scheduling and resource allocation in multi-beam LEO systems. To satisfy the uneven terrestrial traffic demand, a hybrid beam pattern is employed to enhance the downlink quality of service and minimize the transmission latency from LEO satellites to ground user terminals. Additionally, a dynamic co-channel interference (CCI) control mechanism is developed to mitigate inter-beam interference within the LEO constellation and limit cross-system interference affecting protected users from other networks. The problem of user-beam-frequency allocation with power optimization is formulated as a mixed-integer dynamic programming model and solved using a low-complexity neural network-based graph generation algorithm. Simulation results show that the proposed approach outperforms the baseline methods of full frequency reuse and single-channel transmission, and highlights the potential for further performance improvement with multi-user transmissions.
Authors: Renjie Cao, Miaoyan Hu, Jiahan Wei, Baha Ihnaini
Abstract: Moral sentiments expressed in natural language significantly influence both online and offline environments, shaping behavioral styles and interaction patterns, including social media selfpresentation, cyberbullying, adherence to social norms, and ethical decision-making. To effectively measure moral sentiments in natural language processing texts, it is crucial to utilize large, annotated datasets that provide nuanced understanding for accurate analysis and modeltraining. However, existing corpora, while valuable, often face linguistic limitations. To address this gap in the Chinese language domain,we introduce the Moral Foundation Weibo Corpus. This corpus consists of 25,671 Chinese comments on Weibo, encompassing six diverse topic areas. Each comment is manually annotated by at least three systematically trained annotators based on ten moral categories derived from a grounded theory of morality. To assess annotator reliability, we present the kappa testresults, a gold standard for measuring consistency. Additionally, we apply several the latest large language models to supplement the manual annotations, conducting analytical experiments to compare their performance and report baseline results for moral sentiment classification.
Authors: Nancy R. Newlin, Kurt Schilling, Serge Koudoro, Bramsh Qamar Chandio, Praitayini Kanakaraj, Daniel Moyer, Claire E. Kelly, Sila Genc, Jian Chen, Joseph Yuan-Mou Yang, Ye Wu, Yifei He, Jiawei Zhang, Qingrun Zeng, Fan Zhang, Nagesh Adluru, Vishwesh Nath, Sudhir Pathak, Walter Schneider, Anurag Gade, Yogesh Rathi, Tom Hendriks, Anna Vilanova, Maxime Chamberland, Tomasz Pieciak, Dominika Ciupek, Antonio Trist\'an Vega, Santiago Aja-Fern\'andez, Maciej Malawski, Gani Ouedraogo, Julia Machnio, Christian Ewert, Paul M. Thompson, Neda Jahanshad, Eleftherios Garyfallidis, Bennett A. Landman
Abstract: White matter alterations are increasingly implicated in neurological diseases and their progression. International-scale studies use diffusion-weighted magnetic resonance imaging (DW-MRI) to qualitatively identify changes in white matter microstructure and connectivity. Yet, quantitative analysis of DW-MRI data is hindered by inconsistencies stemming from varying acquisition protocols. There is a pressing need to harmonize the preprocessing of DW-MRI datasets to ensure the derivation of robust quantitative diffusion metrics across acquisitions. In the MICCAI-CDMRI 2023 QuantConn challenge, participants were provided raw data from the same individuals collected on the same scanner but with two different acquisitions and tasked with preprocessing the DW-MRI to minimize acquisition differences while retaining biological variation. Submissions are evaluated on the reproducibility and comparability of cross-acquisition bundle-wise microstructure measures, bundle shape features, and connectomics. The key innovations of the QuantConn challenge are that (1) we assess bundles and tractography in the context of harmonization for the first time, (2) we assess connectomics in the context of harmonization for the first time, and (3) we have 10x additional subjects over prior harmonization challenge, MUSHAC and 100x over SuperMUDI. We find that bundle surface area, fractional anisotropy, connectome assortativity, betweenness centrality, edge count, modularity, nodal strength, and participation coefficient measures are most biased by acquisition and that machine learning voxel-wise correction, RISH mapping, and NeSH methods effectively reduce these biases. In addition, microstructure measures AD, MD, RD, bundle length, connectome density, efficiency, and path length are least biased by these acquisition differences.
Authors: Xun Zhou, Charlie Ruan, Zihe Zhao, Tianqi Chen, Chris Donahue
Abstract: We present the MIDInfinite, a web application capable of generating symbolic music using a large-scale generative AI model locally on commodity hardware. Creating this demo involved porting the Anticipatory Music Transformer, a large language model (LLM) pre-trained on the Lakh MIDI dataset, to the Machine Learning Compilation (MLC) framework. Once the model is ported, MLC facilitates inference on a variety of runtimes including C++, mobile, and the browser. We envision that MLC has the potential to bridge the gap between the landscape of increasingly capable music AI models and technology more familiar to music software developers. As a proof of concept, we build a web application that allows users to generate endless streams of multi-instrumental MIDI in the browser, either from scratch or conditioned on a prompt. On commodity hardware (an M3 Macbook Pro), our demo can generate 51 notes per second, which is faster than real-time playback for 72.9% of generations, and increases to 86.3% with 2 seconds of upfront buffering.
Authors: Xingya Wang, Yang Han, Yushi Liu, Szu-Yu Tang, Jason C. Hsu
Abstract: The ideal estimand for comparing a new treatment $Rx$ with a control $C$ is the $\textit{counterfactual}$ efficacy $Rx:C$, the expected differential outcome between $Rx$ and $C$ if each patient were given $\textit{both}$. While counterfactual $\textit{point estimation}$ from $\textit{factual}$ Randomized Controlled Trials (RCTs) has been available, this article shows $\textit{counterfactual}$ uncertainty quantification (CUQ), quantifying uncertainty for factual point estimates but in a counterfactual setting, is surprisingly achievable. We achieve CUQ whose variability is typically smaller than factual UQ, by creating a new statistical modeling principle called ETZ which is applicable to RCTs with $\textit{Before-and-After}$ treatment Repeated Measures, common in many therapeutic areas. We urge caution when estimate of the unobservable true condition of a patient before treatment has measurement error, because that violation of standard regression assumption can cause attenuation in estimating treatment effects. Fortunately, we prove that, for traditional medicine in general, and for targeted therapy with efficacy defined as averaged over the population, counterfactual point estimation is unbiased. However, for targeted therapy, both Real Human and Digital Twins approaches should respect this limitation, lest predicted treatment effect in $\textit{subgroups}$ will have bias.
Authors: Guillermo Alvarez, Ibrahim Ekren, Anastasis Kratsios, Xuwei Yang
Abstract: Dynamic Stackelberg games are a broad class of two-player games in which the leader acts first, and the follower chooses a response strategy to the leader's strategy. Unfortunately, only stylized Stackelberg games are explicitly solvable since the follower's best-response operator (as a function of the control of the leader) is typically analytically intractable. This paper addresses this issue by showing that the \textit{follower's best-response operator} can be approximately implemented by an \textit{attention-based neural operator}, uniformly on compact subsets of adapted open-loop controls for the leader. We further show that the value of the Stackelberg game where the follower uses the approximate best-response operator approximates the value of the original Stackelberg game. Our main result is obtained using our universal approximation theorem for attention-based neural operators between spaces of square-integrable adapted stochastic processes, as well as stability results for a general class of Stackelberg games.
Authors: Joel Casta\~no, Rafael Caba\~nas, Antonio Salmer\'on, David Lo, Silverio Mart\'inez-Fern\'andez
Abstract: The proliferation of Machine Learning (ML) models and their open-source implementations has transformed Artificial Intelligence research and applications. Platforms like Hugging Face (HF) enable the development, sharing, and deployment of these models, fostering an evolving ecosystem. While previous studies have examined aspects of models hosted on platforms like HF, a comprehensive longitudinal study of how these models change remains underexplored. This study addresses this gap by utilizing both repository mining and longitudinal analysis methods to examine over 200,000 commits and 1,200 releases from over 50,000 models on HF. We replicate and extend an ML change taxonomy for classifying commits and utilize Bayesian networks to uncover patterns in commit and release activities over time. Our findings indicate that commit activities align with established data science methodologies, such as CRISP-DM, emphasizing iterative refinement and continuous improvement. Additionally, release patterns tend to consolidate significant updates, particularly in documentation, distinguishing between granular changes and milestone-based releases. Furthermore, projects with higher popularity prioritize infrastructure enhancements early in their lifecycle, and those with intensive collaboration practices exhibit improved documentation standards. These and other insights enhance the understanding of model changes on community platforms and provide valuable guidance for best practices in model maintenance.
Authors: Ahan Bhatt, Nandan Vaghela
Abstract: This paper introduces Med-Bot, an AI-powered chatbot designed to provide users with accurate and reliable medical information. Utilizing advanced libraries and frameworks such as PyTorch, Chromadb, Langchain and Autogptq, Med-Bot is built to handle the complexities of natural language understanding in a healthcare context. The integration of llamaassisted data processing and AutoGPT-Q provides enhanced performance in processing and responding to queries based on PDFs of medical literature, ensuring that users receive precise and trustworthy information. This research details the methodologies employed in developing Med-Bot and evaluates its effectiveness in disseminating healthcare information.
Authors: Alexandra Gonz\'alez, Xavier Franch, David Lo, Silverio Mart\'inez-Fern\'andez
Abstract: Background: Open-Source Pre-Trained Models (PTMs) and datasets provide extensive resources for various Machine Learning (ML) tasks, yet these resources lack a classification tailored to Software Engineering (SE) needs. Aims: We apply an SE-oriented classification to PTMs and datasets on a popular open-source ML repository, Hugging Face (HF), and analyze the evolution of PTMs over time. Method: We conducted a repository mining study. We started with a systematically gathered database of PTMs and datasets from the HF API. Our selection was refined by analyzing model and dataset cards and metadata, such as tags, and confirming SE relevance using Gemini 1.5 Pro. All analyses are replicable, with a publicly accessible replication package. Results: The most common SE task among PTMs and datasets is code generation, with a primary focus on software development and limited attention to software management. Popular PTMs and datasets mainly target software development. Among ML tasks, text generation is the most common in SE PTMs and datasets. There has been a marked increase in PTMs for SE since 2023 Q2. Conclusions: This study underscores the need for broader task coverage to enhance the integration of ML within SE practices.
Authors: Yantao Wu, Mauro Maggioni
Abstract: Several statistical models for regression of a function $F$ on $\mathbb{R}^d$ without the statistical and computational curse of dimensionality exist, for example by imposing and exploiting geometric assumptions on the distribution of the data (e.g. that its support is low-dimensional), or strong smoothness assumptions on $F$, or a special structure $F$. Among the latter, compositional models assume $F=f\circ g$ with $g$ mapping to $\mathbb{R}^r$ with $r\ll d$, have been studied, and include classical single- and multi-index models and recent works on neural networks. While the case where $g$ is linear is rather well-understood, much less is known when $g$ is nonlinear, and in particular for which $g$'s the curse of dimensionality in estimating $F$, or both $f$ and $g$, may be circumvented. In this paper, we consider a model $F(X):=f(\Pi_\gamma X) $ where $\Pi_\gamma:\mathbb{R}^d\to[0,\rm{len}_\gamma]$ is the closest-point projection onto the parameter of a regular curve $\gamma: [0,\rm{len}_\gamma]\to\mathbb{R}^d$ and $f:[0,\rm{len}_\gamma]\to\mathbb{R}^1$. The input data $X$ is not low-dimensional, far from $\gamma$, conditioned on $\Pi_\gamma(X)$ being well-defined. The distribution of the data, $\gamma$ and $f$ are unknown. This model is a natural nonlinear generalization of the single-index model, which corresponds to $\gamma$ being a line. We propose a nonparametric estimator, based on conditional regression, and show that under suitable assumptions, the strongest of which being that $f$ is coarsely monotone, it can achieve the $one$-$dimensional$ optimal min-max rate for non-parametric regression, up to the level of noise in the observations, and be constructed in time $\mathcal{O}(d^2n\log n)$. All the constants in the learning bounds, in the minimal number of samples required for our bounds to hold, and in the computational complexity are at most low-order polynomials in $d$.
Authors: Richard Nock, Mathieu Guillame-Bert
Abstract: We focus on generative AI for a type of data that still represent one of the most prevalent form of data: tabular data. Our paper introduces two key contributions: a new powerful class of forest-based models fit for such tasks and a simple training algorithm with strong convergence guarantees in a boosting model that parallels that of the original weak / strong supervised learning setting. This algorithm can be implemented by a few tweaks to the most popular induction scheme for decision tree induction (i.e. supervised learning) with two classes. Experiments on the quality of generated data display substantial improvements compared to the state of the art. The losses our algorithm minimize and the structure of our models make them practical for related tasks that require fast estimation of a density given a generative model and an observation (even partially specified): such tasks include missing data imputation and density estimation. Additional experiments on these tasks reveal that our models can be notably good contenders to diverse state of the art methods, relying on models as diverse as (or mixing elements of) trees, neural nets, kernels or graphical models.
Authors: Masane Fuchi, Amar Zanashir, Hiroto Minami, Tomohiro Takagi
Abstract: One-hot vectors, a common method for representing discrete/categorical data, in machine learning are widely used because of their simplicity and intuitiveness. However, one-hot vectors suffer from a linear increase in dimensionality, posing computational and memory challenges, especially when dealing with datasets containing numerous categories. In this paper, we focus on tabular data generation, and reveal the multinomial diffusion faces the mode collapse phenomenon when the cardinality is high. Moreover, due to the limitations of one-hot vectors, the training phase takes time longer in such a situation. To address these issues, we propose Residual Bit Vectors (ResBit), a technique for densely representing categorical data. ResBit is an extension of analog bits and overcomes limitations of analog bits when applied to tabular data generation. Our experiments demonstrate that ResBit not only accelerates training but also maintains performance when compared with the situations before applying ResBit. Furthermore, our results indicate that many existing methods struggle with high-cardinality data, underscoring the need for lower-dimensional representations, such as ResBit and latent vectors.
Authors: YuQing Xie, Tess Smidt
Abstract: Equivariant neural networks (ENNs) have been shown to be extremely effective in applications involving underlying symmetries. By construction ENNs cannot produce lower symmetry outputs given a higher symmetry input. However, symmetry breaking occurs in many physical systems and we may obtain a less symmetric stable state from an initial highly symmetric one. Hence, it is imperative that we understand how to systematically break symmetry in ENNs. In this work, we propose a novel symmetry breaking framework that is fully equivariant and is the first which fully addresses spontaneous symmetry breaking. We emphasize that our approach is general and applicable to equivariance under any group. To achieve this, we introduce the idea of symmetry breaking sets (SBS). Rather than redesign existing networks, we design sets of symmetry breaking objects which we feed into our network based on the symmetry of our inputs and outputs. We show there is a natural way to define equivariance on these sets, which gives an additional constraint. Minimizing the size of these sets equates to data efficiency. We prove that minimizing these sets translates to a well studied group theory problem, and tabulate solutions to this problem for the point groups. Finally, we provide some examples of symmetry breaking to demonstrate how our approach works in practice. The code for these examples is available at \url{https://github.com/atomicarchitects/equivariant-SBS}.
Authors: Georgios Tsoumplekas, Vladislav Li, Panagiotis Sarigiannidis, Vasileios Argyriou
Abstract: Despite deep learning's widespread success, its data-hungry and computationally expensive nature makes it impractical for many data-constrained real-world applications. Few-Shot Learning (FSL) aims to address these limitations by enabling rapid adaptation to novel learning tasks, seeing significant growth in recent years. This survey provides a comprehensive overview of the field's latest advancements. Initially, FSL is formally defined, and its relationship with different learning fields is presented. A novel taxonomy is introduced, extending previously proposed ones, and real-world applications in classic and novel fields are described. Finally, recent trends shaping the field, outstanding challenges, and promising future research directions are discussed.
Authors: Yifan Zeng, Yiran Wu, Xiao Zhang, Huazheng Wang, Qingyun Wu
Abstract: Despite extensive pre-training in moral alignment to prevent generating harmful information, large language models (LLMs) remain vulnerable to jailbreak attacks. In this paper, we propose AutoDefense, a multi-agent defense framework that filters harmful responses from LLMs. With the response-filtering mechanism, our framework is robust against different jailbreak attack prompts, and can be used to defend different victim models. AutoDefense assigns different roles to LLM agents and employs them to complete the defense task collaboratively. The division in tasks enhances the overall instruction-following of LLMs and enables the integration of other defense components as tools. With AutoDefense, small open-source LMs can serve as agents and defend larger models against jailbreak attacks. Our experiments show that AutoDefense can effectively defense against different jailbreak attacks, while maintaining the performance at normal user request. For example, we reduce the attack success rate on GPT-3.5 from 55.74% to 7.95% using LLaMA-2-13b with a 3-agent system. Our code and data are publicly available at https://github.com/XHMY/AutoDefense.
Authors: Patricia A. Apell\'aniz, Juan Parras, Santiago Zazo
Abstract: The rising use of machine learning in various fields requires robust methods to create synthetic tabular data. Data should preserve key characteristics while addressing data scarcity challenges. Current approaches based on Generative Adversarial Networks, such as the state-of-the-art CTGAN model, struggle with the complex structures inherent in tabular data. These data often contain both continuous and discrete features with non-Gaussian distributions. Therefore, we propose a novel Variational Autoencoder (VAE)-based model that addresses these limitations. Inspired by the TVAE model, our approach incorporates a Bayesian Gaussian Mixture model (BGM) within the VAE architecture. This avoids the limitations imposed by assuming a strictly Gaussian latent space, allowing for a more accurate representation of the underlying data distribution during data generation. Furthermore, our model offers enhanced flexibility by allowing the use of various differentiable distributions for individual features, making it possible to handle both continuous and discrete data types. We thoroughly validate our model on three real-world datasets with mixed data types, including two medically relevant ones, based on their resemblance and utility. This evaluation demonstrates significant outperformance against CTGAN and TVAE, establishing its potential as a valuable tool for generating synthetic tabular data in various domains, particularly in healthcare.
Authors: Minbiao Han, Fengxue Zhang, Yuxin Chen
Abstract: This paper investigates the challenge of learning in black-box games, where the underlying utility function is unknown to any of the agents. While there is an extensive body of literature on the theoretical analysis of algorithms for computing the Nash equilibrium with complete information about the game, studies on Nash equilibrium in black-box games are less common. In this paper, we focus on learning the Nash equilibrium when the only available information about an agent's payoff comes in the form of empirical queries. We provide a no-regret learning algorithm that utilizes Gaussian processes to identify the equilibrium in such games. Our approach not only ensures a theoretical convergence rate but also demonstrates effectiveness across a variety collection of games through experimental validation.
Authors: Pengcheng Sun, Erwu Liu, Wei Ni, Kanglei Yu, Xinyu Qu, Rui Wang, Yanlong Bi, Chuanchun Zhang, Abbas Jamalipour
Abstract: Non-independent and identically distributed (Non- IID) data adversely affects federated learning (FL) while heterogeneity in communication quality can undermine the reliability of model parameter transmission, potentially degrading wireless FL convergence. This paper proposes a novel dual-segment clustering (DSC) strategy that jointly addresses communication and data heterogeneity in FL. This is achieved by defining a new signal-to-noise ratio (SNR) matrix and information quantity matrix to capture the communication and data heterogeneity, respectively. The celebrated affinity propagation algorithm is leveraged to iteratively refine the clustering of clients based on the newly defined matrices effectively enhancing model aggregation in heterogeneous environments. The convergence analysis and experimental results show that the DSC strategy can improve the convergence rate of wireless FL and demonstrate superior accuracy in heterogeneous environments compared to classical clustering methods.
Authors: Nicolas Drapier, Aladine Chetouani, Aur\'elien Chateigner
Abstract: The prediction of ship trajectories is a growing field of study in artificial intelligence. Traditional methods rely on the use of LSTM, GRU networks, and even Transformer architectures for the prediction of spatio-temporal series. This study proposes a viable alternative for predicting these trajectories using only GNSS positions. It considers this spatio-temporal problem as a natural language processing problem. The latitude/longitude coordinates of AIS messages are transformed into cell identifiers using the H3 index. Thanks to the pseudo-octal representation, it becomes easier for language models to learn the spatial hierarchy of the H3 index. The method is compared with a classical Kalman filter, widely used in the maritime domain, and introduces the Fr\'echet distance as the main evaluation metric. We show that it is possible to predict ship trajectories quite precisely up to 8 hours ahead with 30 minutes of context, using solely GNSS positions, without relying on any additional information such as speed, course, or external conditions - unlike many traditional methods. We demonstrate that this alternative works well enough to predict trajectories worldwide.
Authors: Richard Nock, Yishay Mansour
Abstract: Boosting is a highly successful ML-born optimization setting in which one is required to computationally efficiently learn arbitrarily good models based on the access to a weak learner oracle, providing classifiers performing at least slightly differently from random guessing. A key difference with gradient-based optimization is that boosting's original model does not requires access to first order information about a loss, yet the decades long history of boosting has quickly evolved it into a first order optimization setting -- sometimes even wrongfully defining it as such. Owing to recent progress extending gradient-based optimization to use only a loss' zeroth ($0^{th}$) order information to learn, this begs the question: what loss functions can be efficiently optimized with boosting and what is the information really needed for boosting to meet the original boosting blueprint's requirements? We provide a constructive formal answer essentially showing that any loss function can be optimized with boosting and thus boosting can achieve a feat not yet known to be possible in the classical $0^{th}$ order setting, since loss functions are not required to be be convex, nor differentiable or Lipschitz -- and in fact not required to be continuous either. Some tools we use are rooted in quantum calculus, the mathematical field -- not to be confounded with quantum computation -- that studies calculus without passing to the limit, and thus without using first order information.
Authors: Josh Millar, Sarab Sethi, Hamed Haddadi, Anil Madhavapeddy
Abstract: In-situ sensing devices need to be deployed in remote environments for long periods of time; minimizing their power consumption is vital for maximising both their operational lifetime and coverage. We introduce Terracorder -- a versatile multi-sensor device -- and showcase its exceptionally low power consumption using an on-device reinforcement learning scheduler. We prototype a unique device setup for biodiversity monitoring and compare its battery life using our scheduler against a number of fixed schedules; the scheduler captures more than 80% of events at less than 50% of the number of activations of the best-performing fixed schedule. We then explore how a collaborative scheduler can maximise the useful operation of a network of devices, improving overall network power consumption and robustness.
Authors: Hanwen Zhang, Dusit Niyato, Wei Zhang, Changyuan Zhao, Hongyang Du, Abbas Jamalipour, Sumei Sun, Yiyang Pei
Abstract: With the advancements of generative artificial intelligence (GenAI) models, their capabilities are expanding significantly beyond content generation and the models are increasingly being used across diverse applications. Particularly, GenAI shows great potential in addressing challenges in the electric vehicle (EV) ecosystem ranging from charging management to cyber-attack prevention. In this paper, we specifically consider Internet of electric vehicles (IoEV) and we categorize GenAI for IoEV into four different layers namely, EV's battery layer, individual EV layer, smart grid layer, and security layer. We introduce various GenAI techniques used in each layer of IoEV applications. Subsequently, public datasets available for training the GenAI models are summarized. Finally, we provide recommendations for future directions. This survey not only categorizes the applications of GenAI in IoEV across different layers but also serves as a valuable resource for researchers and practitioners by highlighting the design and implementation challenges within each layer. Furthermore, it provides a roadmap for future research directions, enabling the development of more robust and efficient IoEV systems through the integration of advanced GenAI techniques.
Authors: Jintao Zhang, Jia wei, Haofeng Huang, Pengle Zhang, Jun Zhu, Jianfei Chen
Abstract: The transformer architecture predominates across various models. As the heart of the transformer, attention has a computational complexity of O(N^2), compared to O(N) for linear transformations. When handling large sequence lengths, attention becomes the primary time-consuming component. Although quantization has proven to be an effective method for accelerating model inference, existing quantization methods primarily focus on optimizing the linear layer. In response, we first analyze the feasibility of quantization in attention detailedly. Following that, we propose SageAttention, a highly efficient and accurate quantization method for attention. The OPS (operations per second) of our approach outperforms FlashAttention2 and xformers by about 2.1 times and 2.7 times, respectively. SageAttention also achieves superior accuracy performance over FlashAttention3. Comprehensive experiments confirm that our approach incurs almost no end-to-end metrics loss across diverse models, including those for large language processing, image generation, and video generation. The codes are available at https://github.com/thu-ml/SageAttention.
Authors: Yuanqi Du, Michael Plainer, Rob Brekelmans, Chenru Duan, Frank No\'e, Carla P. Gomes, Al\'an Aspuru-Guzik, Kirill Neklyudov
Abstract: Rare event sampling in dynamical systems is a fundamental problem arising in the natural sciences, which poses significant computational challenges due to an exponentially large space of trajectories. For settings where the dynamical system of interest follows a Brownian motion with known drift, the question of conditioning the process to reach a given endpoint or desired rare event is definitively answered by Doob's h-transform. However, the naive estimation of this transform is infeasible, as it requires simulating sufficiently many forward trajectories to estimate rare event probabilities. In this work, we propose a variational formulation of Doob's h-transform as an optimization problem over trajectories between a given initial point and the desired ending point. To solve this optimization, we propose a simulation-free training objective with a model parameterization that imposes the desired boundary conditions by design. Our approach significantly reduces the search space over trajectories and avoids expensive trajectory simulation and inefficient importance sampling estimators which are required in existing methods. We demonstrate the ability of our method to find feasible transition paths on real-world molecular simulation and protein folding tasks.
Authors: Sehun Kim
Abstract: A persistence diagram provides a compact summary of persistent homology, which captures the topological features of a space at different scales. However, due to its nature as a set, incorporating it as a feature into a machine learning framework is challenging. Several methods have been proposed to use persistence diagrams as input for machine learning models, but they often require complex preprocessing steps and extensive hyperparameter tuning. In this paper, we propose a novel transformer architecture called the \textit{Extended Persistence Transformer (xPerT)}, which is highly scalable than the compared to Persformer, an existing transformer for persistence diagrams. xPerT reduces GPU memory usage by over 90\% and improves accuracy on multiple datasets. Additionally, xPerT does not require complex preprocessing steps or extensive hyperparameter tuning, making it easy to use in practice. Our code is available at https://github.com/sehunfromdaegu/xpert.
Authors: Fu-Yun Wang, Zhengyang Geng, Hongsheng Li
Abstract: Diffusion models achieve superior generation quality but suffer from slow generation speed due to the iterative nature of denoising. In contrast, consistency models, a new generative family, achieve competitive performance with significantly faster sampling. These models are trained either through consistency distillation, which leverages pretrained diffusion models, or consistency training/tuning directly from raw data. In this work, we propose a novel framework for understanding consistency models by modeling the denoising process of the diffusion model as a Markov Decision Process (MDP) and framing consistency model training as the value estimation through Temporal Difference~(TD) Learning. More importantly, this framework allows us to analyze the limitations of current consistency training/tuning strategies. Built upon Easy Consistency Tuning (ECT), we propose Stable Consistency Tuning (SCT), which incorporates variance-reduced learning using the score identity. SCT leads to significant performance improvements on benchmarks such as CIFAR-10 and ImageNet-64. On ImageNet-64, SCT achieves 1-step FID 2.42 and 2-step FID 1.55, a new SoTA for consistency models.
Authors: Etrit Haxholli, Yeti Z. G\"urb\"uz, O\u{g}ul Can, Eli Waxman
Abstract: Outperforming autoregressive models on categorical data distributions, such as textual data, remains challenging for continuous diffusion and flow models. Discrete flow matching, a recent framework for modeling categorical data, has shown competitive performance with autoregressive models. Despite its similarities with continuous flow matching, the rectification strategy applied in the continuous version does not directly extend to the discrete one due to the inherent stochasticity of discrete paths. This limitation necessitates exploring alternative methods to minimize state transitions during generation. To address this, we propose a dynamic-optimal-transport-like minimization objective for discrete flows with convex interpolants and derive its equivalent Kantorovich formulation. The latter defines transport cost solely in terms of inter-state similarity and is optimized using a minibatch strategy. Another limitation we address in the discrete flow framework is model evaluation. Unlike continuous flows, wherein the instantaneous change of variables enables density estimation, discrete models lack a similar mechanism due to the inherent non-determinism and discontinuity of their paths. To alleviate this issue, we propose an upper bound on the perplexity of discrete flow models, enabling performance evaluation and comparison with other methods.
Authors: Ismail Hakki Karaman, Gulser Koksal, Levent Eriskin, Salih Salihoglu
Abstract: In real-world applications, as data availability increases, obtaining labeled data for machine learning (ML) projects remains challenging due to the high costs and intensive efforts required for data annotation. Many ML projects, particularly those focused on multi-label classification, also grapple with data imbalance issues, where certain classes may lack sufficient data to train effective classifiers. This study introduces and examines a novel oversampling method for multi-label text classification, designed to address performance challenges associated with data imbalance. The proposed method identifies potential new samples from unlabeled data by leveraging similarity measures between instances. By iteratively searching the unlabeled dataset, the method locates instances similar to those in underrepresented classes and evaluates their contribution to classifier performance enhancement. Instances that demonstrate performance improvement are then added to the labeled dataset. Experimental results indicate that the proposed approach effectively enhances classifier performance post-oversampling.
Authors: Dhruthi, Nithin Nagaraj, Harikrishnan N B
Abstract: Inferring causal relationships in the decision-making processes of machine learning algorithms is a crucial step toward achieving explainable Artificial Intelligence (AI). In this research, we introduce a novel causality measure and a distance metric derived from Lempel-Ziv (LZ) complexity. We explore how the proposed causality measure can be used in decision trees by enabling splits based on features that most strongly \textit{cause} the outcome. We further evaluate the effectiveness of the causality-based decision tree and the distance-based decision tree in comparison to a traditional decision tree using Gini impurity. While the proposed methods demonstrate comparable classification performance overall, the causality-based decision tree significantly outperforms both the distance-based decision tree and the Gini-based decision tree on datasets generated from causal models. This result indicates that the proposed approach can capture insights beyond those of classical decision trees, especially in causally structured data. Based on the features used in the LZ causal measure based decision tree, we introduce a causal strength for each features in the dataset so as to infer the predominant causal variables for the occurrence of the outcome.
Authors: Ankita Joshi, Ashutosh Sharma, Anoushkrit Goel, Ranjeet Ranjan Jha, Chirag Ahuja, Arnav Bhavsar, Aditya Nigam
Abstract: Fiber tractography is a cornerstone of neuroimaging, enabling the detailed mapping of the brain's white matter pathways through diffusion MRI. This is crucial for understanding brain connectivity and function, making it a valuable tool in neurological applications. Despite its importance, tractography faces challenges due to its complexity and susceptibility to false positives, misrepresenting vital pathways. To address these issues, recent strategies have shifted towards deep learning, utilizing supervised learning, which depends on precise ground truth, or reinforcement learning, which operates without it. In this work, we propose Tract-RLFormer, a network utilizing both supervised and reinforcement learning, in a two-stage policy refinement process that markedly improves the accuracy and generalizability across various data-sets. By employing a tract-specific approach, our network directly delineates the tracts of interest, bypassing the traditional segmentation process. Through rigorous validation on datasets such as TractoInferno, HCP, and ISMRM-2015, our methodology demonstrates a leap forward in tractography, showcasing its ability to accurately map the brain's white matter tracts.
Authors: Guangyi Wang, Wei Peng, Lijiang Li, Wenyu Chen, Yuren Cai, Songzhi Su
Abstract: Diffusion Probabilistic Models (DPMs) have demonstrated exceptional performance in generative tasks, but this comes at the expense of sampling efficiency. To enhance sampling speed without sacrificing quality, various distillation-based accelerated sampling algorithms have been recently proposed. However, they typically require significant additional training costs and model parameter storage, which limit their practical application. In this work, we propose PCA-based Adaptive Search (PAS), which optimizes existing solvers for DPMs with minimal learnable parameters and training costs. Specifically, we first employ PCA to obtain a few orthogonal unit basis vectors to span the high-dimensional sampling space, which enables us to learn just a set of coordinates to correct the sampling direction; furthermore, based on the observation that the cumulative truncation error exhibits an ``S''-shape, we design an adaptive search strategy that further enhances the sampling efficiency and reduces the number of stored parameters to approximately 10. Extensive experiments demonstrate that PAS can significantly enhance existing fast solvers in a plug-and-play manner with negligible costs. For instance, on CIFAR10, PAS requires only 12 parameters and less than 1 minute of training on a single NVIDIA A100 GPU to optimize the DDIM from 15.69 FID (NFE=10) to 4.37.
Authors: Aixa X. Andrade, Son Nguyen, Albert Montillo
Abstract: Single-cell RNA sequencing (scRNA-seq) data are often confounded by technical or biological batch effects. Existing deep learning models mitigate these effects but often discard batch-specific information, potentially losing valuable biological insights. We propose a Mixed Effects Deep Learning (MEDL) autoencoder framework that separately models batch-invariant (fixed effects) and batch-specific (random effects) components. By decoupling batch-invariant biological states from batch variations, our framework integrates both into predictive models. Our approach also generates 2D visualizations of how the same cell appears across batches, enhancing interpretability. Retaining both fixed and random effect latent spaces improves classification accuracy. We applied our framework to three datasets spanning the cardiovascular system (Healthy Heart), Autism Spectrum Disorder (ASD), and Acute Myeloid Leukemia (AML). With 147 batches in the Healthy Heart dataset, far exceeding typical numbers, we tested our framework's ability to handle many batches. In the ASD dataset, our approach captured donor heterogeneity between autistic and healthy individuals. In the AML dataset, it distinguished donor heterogeneity despite missing cell types and diseased donors exhibiting both healthy and malignant cells. These results highlight our framework's ability to characterize fixed and random effects, enhance batch effect visualization, and improve prediction accuracy across diverse datasets.
Authors: Rafael Orozco, Huseyin Tuna Erdinc, Yunlin Zeng, Mathias Louboutin, Felix J. Herrmann
Abstract: Accurately characterizing migration velocity models is crucial for a wide range of geophysical applications, from hydrocarbon exploration to monitoring of CO2 sequestration projects. Traditional velocity model building methods such as Full-Waveform Inversion (FWI) are powerful but often struggle with the inherent complexities of the inverse problem, including noise, limited bandwidth, receiver aperture and computational constraints. To address these challenges, we propose a scalable methodology that integrates generative modeling, in the form of Diffusion networks, with physics-informed summary statistics, making it suitable for complicated imaging problems including field datasets. By defining these summary statistics in terms of subsurface-offset image volumes for poor initial velocity models, our approach allows for computationally efficient generation of Bayesian posterior samples for migration velocity models that offer a useful assessment of uncertainty. To validate our approach, we introduce a battery of tests that measure the quality of the inferred velocity models, as well as the quality of the inferred uncertainties. With modern synthetic datasets, we reconfirm gains from using subsurface-image gathers as the conditioning observable. For complex velocity model building involving salt, we propose a new iterative workflow that refines amortized posterior approximations with salt flooding and demonstrate how the uncertainty in the velocity model can be propagated to the final product reverse time migrated images. Finally, we present a proof of concept on field datasets to show that our method can scale to industry-sized problems.
Authors: Fuwei Li
Abstract: This paper presents a novel feature selection method leveraging the Wasserstein distance to improve feature selection in machine learning. Unlike traditional methods based on correlation or Kullback-Leibler (KL) divergence, our approach uses the Wasserstein distance to assess feature similarity, inherently capturing class relationships and making it robust to noisy labels. We introduce a Markov blanket-based feature selection algorithm and demonstrate its effectiveness. Our analysis shows that the Wasserstein distance-based feature selection method effectively reduces the impact of noisy labels without relying on specific noise models. We provide a lower bound on its effectiveness, which remains meaningful even in the presence of noise. Experimental results across multiple datasets demonstrate that our approach consistently outperforms traditional methods, particularly in noisy settings.
Authors: Benjamin P. S. Donitz, Declan Mages, Hiroyasu Tsukamoto, Peter Dixon, Damon Landau, Soon-Jo Chung, Erica Bufanda, Michel Ingham, Julie Castillo-Rogez
Abstract: Interstellar objects (ISOs) represent a compelling and under-explored category of celestial bodies, providing physical laboratories to understand the formation of our solar system and probe the composition and properties of material formed in exoplanetary systems. In this work, we investigate existing approaches to designing successful flyby missions to ISOs, including a deep learning-driven guidance and control algorithm for ISOs traveling at velocities over 60 km/s. We have generated spacecraft trajectories to a series of synthetic representative ISOs, simulating a ground campaign to observe the target and resolve its state, thereby determining the cruise and close approach delta-Vs required for the encounter. We discuss the accessibility of and mission design to ISOs with varying characteristics, with special focuses on 1) state covariance estimation throughout the cruise, 2) handoffs from traditional navigation approaches to novel autonomous navigation for fast flyby regimes, and 3) overall recommendations about preparing for the future in situ exploration of these targets. The lessons learned also apply to the fast flyby of other small bodies, e.g., long-period comets and potentially hazardous asteroids, which also require tactical responses with similar characteristics.
Authors: Brendan Lucier, Sarath Pattathil, Aleksandrs Slivkins, Mengxiao Zhang
Abstract: We study a game between autobidding algorithms that compete in an online advertising platform. Each autobidder is tasked with maximizing its advertiser's total value over multiple rounds of a repeated auction, subject to budget and return-on-investment constraints. We propose a gradient-based learning algorithm that is guaranteed to satisfy all constraints and achieves vanishing individual regret. Our algorithm uses only bandit feedback and can be used with the first- or second-price auction, as well as with any "intermediate" auction format. Our main result is that when these autobidders play against each other, the resulting expected liquid welfare over all rounds is at least half of the expected optimal liquid welfare achieved by any allocation. This holds whether or not the bidding dynamics converges to an equilibrium.
Authors: Benedikt Brantner, Guillaume de Romemont, Michael Kraus, Zeyuan Li
Abstract: Two of the many trends in neural network research of the past few years have been (i) the learning of dynamical systems, especially with recurrent neural networks such as long short-term memory networks (LSTMs) and (ii) the introduction of transformer neural networks for natural language processing (NLP) tasks. While some work has been performed on the intersection of these two trends, those efforts were largely limited to using the vanilla transformer directly without adjusting its architecture for the setting of a physical system. In this work we develop a transformer-inspired neural network and use it to learn a dynamical system. We (for the first time) change the activation function of the attention layer to imbue the transformer with structure-preserving properties to improve long-term stability. This is shown to be of great advantage when applying the neural network to learning the trajectory of a rigid body.
Authors: Florentia Afentaki, Gurol Saglam, Argyris Kokkinis, Kostas Siozios, Georgios Zervakis, Mehdi B Tahoori
Abstract: Printed Electronics (PE) feature distinct and remarkable characteristics that make them a prominent technology for achieving true ubiquitous computing. This is particularly relevant in application domains that require conformal and ultra-low cost solutions, which have experienced limited penetration of computing until now. Unlike silicon-based technologies, PE offer unparalleled features such as non-recurring engineering costs, ultra-low manufacturing cost, and on-demand fabrication of conformal, flexible, non-toxic, and stretchable hardware. However, PE face certain limitations due to their large feature sizes, that impede the realization of complex circuits, such as machine learning classifiers. In this work, we address these limitations by leveraging the principles of Approximate Computing and Bespoke (fully-customized) design. We propose an automated framework for designing ultra-low power Multilayer Perceptron (MLP) classifiers which employs, for the first time, a holistic approach to approximate all functions of the MLP's neurons: multiplication, accumulation, and activation. Through comprehensive evaluation across various MLPs of varying size, our framework demonstrates the ability to enable battery-powered operation of even the most intricate MLP architecture examined, significantly surpassing the current state of the art.
Authors: Layton Aho, Mark Winter, Marc DeCarlo, Agne Frismantiene, Yannick Blum, Paolo Armando Gagliardi, Olivier Pertz, Andrew R. Cohen
Abstract: Live cell microscopy captures 5-D $(x,y,z,channel,time)$ movies that display patterns of cellular motion and signaling dynamics. We present here a metric kernel function for spatiotemporal patterns of cell signaling dynamics in 5-D live cell microscopy movies unique in requiring no a priori knowledge of expected pattern dynamics, and no training data. The approach uses Kolmogorov complexity theory to compute a metric distance between movies and to measure the meaningful information among subsets of movies. Cell signaling kymographs store at each spatiotemporal cell centroid the cell signaling state, or a functional output such as velocity. Patterns of similarity are identified via the metric normalized compression distance (NCD). The NCD is a reproducing kernel for a Hilbert space that represents the input cell signaling kymographs as points in a low dimensional embedding that optimally captures the pattern similarity identified by the NCD throughout the space. The only parameter is the expected cell radii ($\mu m$). A new formulation of the cluster structure function optimally estimates the meaningful information captured by the embedding. Also presented is the cell signaling structure function (SSF), a Kolmogorov structure function that optimally measures cell signaling state as nuclear intensity w.r.t. surrounding cytoplasm, a significant improvement compared to the current state-of-the-art cytonuclear ratio. Results are presented quantifying the impact of ERK and AKT signaling between different oncogenic mutations, and by the relation between ERK signaling and cellular velocity patterns for movies of 2-D monolayers of human breast epithelial (MCF10A) cells, 3-D MCF10A spheroids under optogenetic manipulation of ERK, and human induced pluripotent stem cells.
Authors: Zitha Sasindran, Harsha Yelchuri, T. V. Prabhakar
Abstract: In this study, we present SeMaScore, generated using a segment-wise mapping and scoring algorithm that serves as an evaluation metric for automatic speech recognition tasks. SeMaScore leverages both the error rate and a more robust similarity score. We show that our algorithm's score generation improves upon the state-of-the-art BERTScore. Our experimental results show that SeMaScore corresponds well with expert human assessments, signal-to-noise ratio levels, and other natural language metrics. We outperform BERTScore by 41x in metric computation speed. Overall, we demonstrate that SeMaScore serves as a more dependable evaluation metric, particularly in real-world situations involving atypical speech patterns.
Authors: Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, Xinggang Wang
Abstract: Recently the state space models (SSMs) with efficient hardware-aware designs, i.e., the Mamba deep learning model, have shown great potential for long sequence modeling. Meanwhile building efficient and generic vision backbones purely upon SSMs is an appealing direction. However, representing visual data is challenging for SSMs due to the position-sensitivity of visual data and the requirement of global context for visual understanding. In this paper, we show that the reliance on self-attention for visual representation learning is not necessary and propose a new generic vision backbone with bidirectional Mamba blocks (Vim), which marks the image sequences with position embeddings and compresses the visual representation with bidirectional state space models. On ImageNet classification, COCO object detection, and ADE20k semantic segmentation tasks, Vim achieves higher performance compared to well-established vision transformers like DeiT, while also demonstrating significantly improved computation & memory efficiency. For example, Vim is 2.8$\times$ faster than DeiT and saves 86.8% GPU memory when performing batch inference to extract features on images with a resolution of 1248$\times$1248. The results demonstrate that Vim is capable of overcoming the computation & memory constraints on performing Transformer-style understanding for high-resolution images and it has great potential to be the next-generation backbone for vision foundation models. Code is available at https://github.com/hustvl/Vim.
Authors: Florentia Afentaki, Michael Hefenbrock, Georgios Zervakis, Mehdi B. Tahoori
Abstract: Printed Electronics (PE) stands out as a promisingtechnology for widespread computing due to its distinct attributes, such as low costs and flexible manufacturing. Unlike traditional silicon-based technologies, PE enables stretchable, conformal,and non-toxic hardware. However, PE are constrained by larger feature sizes, making it challenging to implement complex circuits such as machine learning (ML) classifiers. Approximate computing has been proven to reduce the hardware cost of ML circuits such as Multilayer Perceptrons (MLPs). In this paper, we maximize the benefits of approximate computing by integrating hardware approximation into the MLP training process. Due to the discrete nature of hardware approximation, we propose and implement a genetic-based, approximate, hardware-aware training approach specifically designed for printed MLPs. For a 5% accuracy loss, our MLPs achieve over 5x area and power reduction compared to the baseline while outperforming state of-the-art approximate and stochastic printed MLPs.
Authors: Vincent Roca, Gr\'egory Kuchcinski, Jean-Pierre Pruvo, Dorian Manouvriez, Renaud Lopes
Abstract: In MRI studies, the aggregation of imaging data from multiple acquisition sites enhances sample size but may introduce site-related variabilities that hinder consistency in subsequent analyses. Deep learning methods for image translation have emerged as a solution for harmonizing MR images across sites. In this study, we introduce IGUANe (Image Generation with Unified Adversarial Networks), an original 3D model that leverages the strengths of domain translation and straightforward application of style transfer methods for multicenter brain MR image harmonization. IGUANe extends CycleGAN by integrating an arbitrary number of domains for training through a many-to-one architecture. The framework based on domain pairs enables the implementation of sampling strategies that prevent confusion between site-related and biological variabilities. During inference, the model can be applied to any image, even from an unknown acquisition site, making it a universal generator for harmonization. Trained on a dataset comprising T1-weighted images from 11 different scanners, IGUANe was evaluated on data from unseen sites. The assessments included the transformation of MR images with traveling subjects, the preservation of pairwise distances between MR images within domains, the evolution of volumetric patterns related to age and Alzheimer$'$s disease (AD), and the performance in age regression and patient classification tasks. Comparisons with other harmonization and normalization methods suggest that IGUANe better preserves individual information in MR images and is more suitable for maintaining and reinforcing variabilities related to age and AD. Future studies may further assess IGUANe in other multicenter contexts, either using the same model or retraining it for applications to different image modalities. IGUANe is available at https://github.com/RocaVincent/iguane_harmonization.git.
URLs: https://github.com/RocaVincent/iguane_harmonization.git.
Authors: Linyi Li, Shijie Geng, Zhenwen Li, Yibo He, Hao Yu, Ziyue Hua, Guanghan Ning, Siwei Wang, Tao Xie, Hongxia Yang
Abstract: Large Language Models for code (code LLMs) have witnessed tremendous progress in recent years. With the rapid development of code LLMs, many popular evaluation benchmarks, such as HumanEval, DS-1000, and MBPP, have emerged to measure the performance of code LLMs with a particular focus on code generation tasks. However, they are insufficient to cover the full range of expected capabilities of code LLMs, which span beyond code generation to answering diverse coding-related questions. To fill this gap, we propose InfiBench, the first large-scale freeform question-answering (QA) benchmark for code to our knowledge, comprising 234 carefully selected high-quality Stack Overflow questions that span across 15 programming languages. InfiBench uses four types of model-free automatic metrics to evaluate response correctness where domain experts carefully concretize the criterion for each question. We conduct a systematic evaluation for over 100 latest code LLMs on InfiBench, leading to a series of novel and insightful findings. Our detailed analyses showcase potential directions for further advancement of code LLMs. InfiBench is fully open source at https://infi-coder.github.io/infibench and continuously expanding to foster more scientific and systematic practices for code LLM evaluation.
Authors: Huy Nguyen, Pedram Akbarian, Trang Pham, Trang Nguyen, Shujian Zhang, Nhat Ho
Abstract: The cosine router in Mixture of Experts (MoE) has recently emerged as an attractive alternative to the conventional linear router. Indeed, the cosine router demonstrates favorable performance in image and language tasks and exhibits better ability to mitigate the representation collapse issue, which often leads to parameter redundancy and limited representation potentials. Despite its empirical success, a comprehensive analysis of the cosine router in MoE has been lacking. Considering the least square estimation of the cosine routing MoE, we demonstrate that due to the intrinsic interaction of the model parameters in the cosine router via some partial differential equations, regardless of the structures of the experts, the estimation rates of experts and model parameters can be as slow as $\mathcal{O}(1/\log^{\tau}(n))$ where $\tau > 0$ is some constant and $n$ is the sample size. Surprisingly, these pessimistic non-polynomial convergence rates can be circumvented by the widely used technique in practice to stabilize the cosine router -- simply adding noises to the $L^2$ norms in the cosine router, which we refer to as \textit{perturbed cosine router}. Under the strongly identifiable settings of the expert functions, we prove that the estimation rates for both the experts and model parameters under the perturbed cosine routing MoE are significantly improved to polynomial rates. Finally, we conduct extensive simulation studies in both synthetic and real data settings to empirically validate our theoretical results.
Authors: Amelia Kawasaki, Andrew Davis, Houssam Abbas
Abstract: The widespread adoption of Large Language Models (LLMs), exemplified by OpenAI's ChatGPT, brings to the forefront the imperative to defend against adversarial threats on these models. These attacks, which manipulate an LLM's output by introducing malicious inputs, undermine the model's integrity and the trust users place in its outputs. In response to this challenge, our paper presents an innovative defensive strategy, given white box access to an LLM, that harnesses residual activation analysis between transformer layers of the LLM. We apply a novel methodology for analyzing distinctive activation patterns in the residual streams for attack prompt classification. We curate multiple datasets to demonstrate how this method of classification has high accuracy across multiple types of attack scenarios, including our newly-created attack dataset. Furthermore, we enhance the model's resilience by integrating safety fine-tuning techniques for LLMs in order to measure its effect on our capability to detect attacks. The results underscore the effectiveness of our approach in enhancing the detection and mitigation of adversarial inputs, advancing the security framework within which LLMs operate.
Authors: Hiroyuki Hanada, Tatsuya Aoyama, Satoshi Akahane, Tomonari Tanaka, Yoshito Okura, Yu Inatsu, Noriaki Hashimoto, Shion Takeno, Taro Murayama, Hanju Lee, Shinya Kojima, Ichiro Takeuchi
Abstract: We consider a machine learning setup where one training dataset is used to train multiple models across slightly different data distributions. This occurs when customized models are needed for various deployment environments. To reduce storage and training costs, we propose the DRSSS method, which combines distributionally robust (DR) optimization and safe sample screening (SSS). The key benefit of this method is that models trained on the reduced dataset will perform the same as those trained on the full dataset for all possible different environments. In this paper, we focus on covariate shift as a type of data distribution change and demonstrate the effectiveness of our method through experiments.
Authors: Aman Gupta, Aditi Sheshadri, Sujit Roy, Vishal Gaur, Manil Maskey, Rahul Ramachandran
Abstract: Global climate models typically operate at a grid resolution of hundreds of kilometers and fail to resolve atmospheric mesoscale processes, e.g., clouds, precipitation, and gravity waves (GWs). Model representation of these processes and their sources is essential to the global circulation and planetary energy budget, but subgrid scale contributions from these processes are often only approximately represented in models using parameterizations. These parameterizations are subject to approximations and idealizations, which limit their capability and accuracy. The most drastic of these approximations is the "single-column approximation" which completely neglects the horizontal evolution of these processes, resulting in key biases in current climate models. With a focus on atmospheric GWs, we present the first-ever global simulation of atmospheric GW fluxes using machine learning (ML) models trained on the WINDSET dataset to emulate global GW emulation in the atmosphere, as an alternative to traditional single-column parameterizations. Using an Attention U-Net-based architecture trained on globally resolved GW momentum fluxes, we illustrate the importance and effectiveness of global nonlocality, when simulating GWs using data-driven schemes.
Authors: Nicy Scaria, Silvester John Joseph Kennedy, Deepak Subramani
Abstract: Small Language Models (SLMs) are generally considered more compact versions of large language models (LLMs). This study investigates the ability of SLMs with parameters between 1 and 3 billion to learn, retain, and subsequently eliminate different types of noise present in the data. Four pre-trained SLMs were utilized for this: Olmo 1B, Qwen1.5 1.8B, Gemma 2B, and Phi2 2.7B. The models were instruction-tuned on noise-free data and tested using in-context examples to determine if they could learn noise through examples. Subsequently, noise patterns were introduced in instruction tuning to evaluate the noise learning, unlearning, and retention capabilities of the models. Olmo, the smallest model, was highly sensitive to noise, quickly adapting to noisy patterns. Phi2 resisted learning character-level and transliteration noise, likely due to its carefully curated, structured, and high-quality pretraining data. Gemma excelled with transliteration noise, likely benefiting from its multilingual pretraining. The findings can be used to develop robust training strategies for SLMs.
Authors: Lars Ankile, Anthony Simeonov, Idan Shenfeld, Marcel Torne, Pulkit Agrawal
Abstract: Advances in behavior cloning (BC), like action-chunking and diffusion, have enabled impressive capabilities. Still, imitation alone remains insufficient for learning reliable policies for tasks requiring precise aligning and inserting of objects, like assembly. Our key insight is that chunked BC policies effectively function as trajectory planners, enabling long-horizon tasks. Conversely, as they execute action chunks open-loop, they lack the fine-grained reactivity necessary for reliable execution. Further, we find that the performance of BC policies saturates despite increasing data. Reinforcement learning (RL) is a natural way to overcome BC's limitations, but it is not straightforward to apply directly to action-chunked models like diffusion policies. We present a simple yet effective method, ResiP (Residual for Precise Manipulation), that sidesteps these challenges by augmenting a frozen, chunked BC model with a fully closed-loop residual policy trained with RL. The residual policy is trained via on-policy RL, addressing distribution shifts and introducing reactive control without altering the BC trajectory planner. Evaluation on high-precision manipulation tasks demonstrates strong performance of ResiP over BC methods and direct RL fine-tuning. Videos, code, and data are available at https://residual-assembly.github.io.
Authors: Nikita Yudin, Roland Hildebrand, Sergey Bakhurin, Alexander Degtyarev, Anna Lisachenko, Ilya Kuruzov, Andrei Semenov, Mohammad Alkousa
Abstract: In this paper, we modify and apply the recently introduced Mixed Newton Method, which is originally designed for minimizing real-valued functions of complex variables, to the minimization of real-valued functions of real variables by extending the functions to complex space. We show that arbitrary regularizations preserve the favorable local convergence properties of the method, and construct a special type of regularization used to prevent convergence to complex minima. We compare several variants of the method applied to training neural networks with real and complex parameters.
Authors: Jinghan Li, Zhicheng Sun, Fei Li, Cao Sheng, Jiazhong Yu, Yadong Mu
Abstract: In the endeavor to make autonomous robots take actions, task planning is a major challenge that requires translating high-level task descriptions into long-horizon action sequences. Despite recent advances in language model agents, they remain prone to planning errors and limited in their ability to plan ahead. To address these limitations in robotic planning, we advocate a self-refining scheme that iteratively refines a draft plan until an equilibrium is reached. Remarkably, this process can be optimized end-to-end from an analytical perspective without the need to curate additional verifiers or reward models, allowing us to train self-refining planners in a simple supervised learning fashion. Meanwhile, a nested equilibrium sequence modeling procedure is devised for efficient closed-loop planning that incorporates useful feedback from the environment (or an internal world model). Our method is evaluated on the VirtualHome-Env benchmark, showing advanced performance with better scaling for inference computation. Code is available at https://github.com/Singularity0104/equilibrium-planner.
URLs: https://github.com/Singularity0104/equilibrium-planner.
Authors: Aoran Zhang, Wenbin Zhou, Liyan Xie, Shixiang Zhu
Abstract: Time series data are crucial across diverse domains such as finance and healthcare, where accurate forecasting and decision-making rely on advanced modeling techniques. While generative models have shown great promise in capturing the intricate dynamics inherent in time series, evaluating their performance remains a major challenge. Traditional evaluation metrics fall short due to the temporal dependencies and potential high dimensionality of the features. In this paper, we propose the REcurrent NeurAL (RENAL) Goodness-of-Fit test, a novel and statistically rigorous framework for evaluating generative time series models. By leveraging recurrent neural networks, we transform the time series into conditionally independent data pairs, enabling the application of a chi-square-based goodness-of-fit test to the temporal dependencies within the data. This approach offers a robust, theoretically grounded solution for assessing the quality of generative models, particularly in settings with limited time sequences. We demonstrate the efficacy of our method across both synthetic and real-world datasets, outperforming existing methods in terms of reliability and accuracy. Our method fills a critical gap in the evaluation of time series generative models, offering a tool that is both practical and adaptable to high-stakes applications.
Authors: Michael Ben Ali (IRIT), Omar El-Rifai (IRIT), Imen Megdiche (IRIT, IRIT-SIG, INUC), Andr\'e Peninou (IRIT, IRIT-SIG, UT2J), Olivier Teste (IRIT-SIG, IRIT, UT2J, UT)
Abstract: Over recent years, Federated Learning (FL) has proven to be one of the most promising methods of distributed learning which preserves data privacy. As the method evolved and was confronted to various real-world scenarios, new challenges have emerged. One such challenge is the presence of highly heterogeneous (often referred as non-IID) data distributions among participants of the FL protocol. A popular solution to this hurdle is Clustered Federated Learning (CFL), which aims to partition clients into groups where the distribution are homogeneous. In the literature, state-of-the-art CFL algorithms are often tested using a few cases of data heterogeneities, without systematically justifying the choices. Further, the taxonomy used for differentiating the different heterogeneity scenarios is not always straightforward. In this paper, we explore the performance of two state-of-theart CFL algorithms with respect to a proposed taxonomy of data heterogeneities in federated learning (FL). We work with three image classification datasets and analyze the resulting clusters against the heterogeneity classes using extrinsic clustering metrics. Our objective is to provide a clearer understanding of the relationship between CFL performances and data heterogeneity scenarios.
Authors: Wei Xie, Shuoyoucheng Ma, Zhenhua Wang, Enze Wang, Kai Chen, Xiaobing Sun, Baosheng Wang
Abstract: The cognitive mechanism by which Large Language Models (LLMs) solve mathematical problems remains a widely debated and unresolved issue. Currently, there is little interpretable experimental evidence that connects LLMs' problem-solving with human cognitive psychology.To determine if LLMs possess human-like mathematical reasoning, we modified the problems used in the human Cognitive Reflection Test (CRT). Our results show that, even with the use of Chains of Thought (CoT) prompts, mainstream LLMs, including the latest o1 model (noted for its reasoning capabilities), have a high error rate when solving these modified CRT problems. Specifically, the average accuracy rate dropped by up to 50% compared to the original questions.Further analysis of LLMs' incorrect answers suggests that they primarily rely on pattern matching from their training data, which aligns more with human intuition (System 1 thinking) rather than with human-like reasoning (System 2 thinking). This finding challenges the belief that LLMs have genuine mathematical reasoning abilities comparable to humans. As a result, this work may adjust overly optimistic views on LLMs' progress towards artificial general intelligence.
Authors: Damir Filipovic, Paul Schneider
Abstract: We propose a nonparametric, kernel-based joint estimator for conditional mean and covariance matrices in large unbalanced panels. Our estimator, with proven consistency and finite-sample guarantees, is applied to a comprehensive panel of monthly US stock excess returns from 1962 to 2021, conditioned on macroeconomic and firm-specific covariates. The estimator captures time-varying cross-sectional dependencies effectively, demonstrating robust statistical performance. In asset pricing, it generates conditional mean-variance efficient portfolios with out-of-sample Sharpe ratios that substantially exceed those of equal-weighted benchmarks.
Authors: Massimo Bilancia, Samuele Magro
Abstract: This paper presents a variant of the Multinomial mixture model tailored for the unsupervised classification of short text data. Traditionally, the Multinomial probability vector in this hierarchical model is assigned a Dirichlet prior distribution. Here, however, we explore an alternative prior--the Beta-Liouville distribution--which offers a more flexible correlation structure than the Dirichlet. We examine the theoretical properties of the Beta-Liouville distribution, focusing on its conjugacy with the Multinomial likelihood. This property enables the derivation of update equations for a CAVI (Coordinate Ascent Variational Inference) variational algorithm, facilitating the approximate posterior estimation of model parameters. Additionally, we propose a stochastic variant of the CAVI algorithm that enhances scalability. The paper concludes with data examples that demonstrate effective strategies for setting the Beta-Liouville hyperparameters.
Authors: Gabrielle Kaili-May Liu, Bowen Shi, Avi Caciularu, Idan Szpektor, Arman Cohan
Abstract: Multi-document (MD) processing is crucial for LLMs to handle real-world tasks such as summarization and question-answering across large sets of documents. While LLMs have improved at processing long inputs, MD contexts still present challenges, such as managing inter-document dependencies, redundancy, and incoherent structures. We introduce MDCure, a scalable and effective fine-tuning pipeline to enhance the MD capabilities of LLMs without the computational cost of pre-training or reliance on human annotated data. MDCure is based on generation of high-quality synthetic MD instruction data from sets of related articles via targeted prompts. We further introduce MDCureRM, a multi-objective reward model which filters generated data based on their training utility for MD settings. With MDCure, we fine-tune a variety of LLMs, from the FlanT5, Qwen2, and LLAMA3.1 model families, up to 70B parameters in size. Extensive evaluations on a wide range of MD and long-context benchmarks spanning various tasks show MDCure consistently improves performance over pre-trained baselines and over corresponding base models by up to 75.5%. Our code, datasets, and models are available at https://github.com/yale-nlp/MDCure.
Authors: Jia Fu, Xiao Zhang, Sepideh Pashami, Fatemeh Rahimian, Anders Holst
Abstract: In the ever-evolving adversarial machine learning landscape, developing effective defenses against patch attacks has become a critical challenge, necessitating reliable solutions to safeguard real-world AI systems. Although diffusion models have shown remarkable capacity in image synthesis and have been recently utilized to counter $\ell_p$-norm bounded attacks, their potential in mitigating localized patch attacks remains largely underexplored. In this work, we propose DiffPAD, a novel framework that harnesses the power of diffusion models for adversarial patch decontamination. DiffPAD first performs super-resolution restoration on downsampled input images, then adopts binarization, dynamic thresholding scheme and sliding window for effective localization of adversarial patches. Such a design is inspired by the theoretically derived correlation between patch size and diffusion restoration error that is generalized across diverse patch attack scenarios. Finally, DiffPAD applies inpainting techniques to the original input images with the estimated patch region being masked. By integrating closed-form solutions for super-resolution restoration and image inpainting into the conditional reverse sampling process of a pre-trained diffusion model, DiffPAD obviates the need for text guidance or fine-tuning. Through comprehensive experiments, we demonstrate that DiffPAD not only achieves state-of-the-art adversarial robustness against patch attacks but also excels in recovering naturalistic images without patch remnants. The source code is available at https://github.com/JasonFu1998/DiffPAD.
Authors: Jianyi Yang, Pengfei Li, Adam Wierman, Shaolei Ren
Abstract: Online Budgeted Matching (OBM) is a classic problem with important applications in online advertising, online service matching, revenue management, and beyond. Traditional online algorithms typically assume a small bid setting, where the maximum bid-to-budget ratio (\kappa) is infinitesimally small. While recent algorithms have tried to address scenarios with non-small or general bids, they often rely on the Fractional Last Matching (FLM) assumption, which allows for accepting partial bids when the remaining budget is insufficient. This assumption, however, does not hold for many applications with indivisible bids. In this paper, we remove the FLM assumption and tackle the open problem of OBM with general bids. We first establish an upper bound of 1-\kappa on the competitive ratio for any deterministic online algorithm. We then propose a novel meta algorithm, called MetaAd, which reduces to different algorithms with first known provable competitive ratios parameterized by the maximum bid-to-budget ratio \kappa \in [0, 1]. As a by-product, we extend MetaAd to the FLM setting and get provable competitive algorithms. Finally, we apply our competitive analysis to the design learning-augmented algorithms.
Authors: Turker Berk Donmez
Abstract: In democratic societies, electoral systems play a crucial role in translating public preferences into political representation. Among these, the D'Hondt method is widely used to ensure proportional representation, balancing fair representation with governmental stability. Recently, there has been a growing interest in applying similar principles of proportional representation to enhance interpretability in machine learning, specifically in Explainable AI (XAI). This study investigates the integration of D'Hondt-based voting principles in the DhondtXAI method, which leverages resource allocation concepts to interpret feature importance within AI models. Through a comparison of SHAP (Shapley Additive Explanations) and DhondtXAI, we evaluate their effectiveness in feature attribution within CatBoost and XGBoost models for breast cancer and diabetes prediction, respectively. The DhondtXAI approach allows for alliance formation and thresholding to enhance interpretability, representing feature importance as seats in a parliamentary view. Statistical correlation analyses between SHAP values and DhondtXAI allocations support the consistency of interpretations, demonstrating DhondtXAI's potential as a complementary tool for understanding feature importance in AI models. The results highlight that integrating electoral principles, such as proportional representation and alliances, into AI explainability can improve user understanding, especially in high-stakes fields like healthcare.
Authors: Yuming Feng, Chuye Hong, Yaru Niu, Shiqi Liu, Yuxiang Yang, Wenhao Yu, Tingnan Zhang, Jie Tan, Ding Zhao
Abstract: Recently, quadrupedal locomotion has achieved significant success, but their manipulation capabilities, particularly in handling large objects, remain limited, restricting their usefulness in demanding real-world applications such as search and rescue, construction, industrial automation, and room organization. This paper tackles the task of obstacle-aware, long-horizon pushing by multiple quadrupedal robots. We propose a hierarchical multi-agent reinforcement learning framework with three levels of control. The high-level controller integrates an RRT planner and a centralized adaptive policy to generate subgoals, while the mid-level controller uses a decentralized goal-conditioned policy to guide the robots toward these sub-goals. A pre-trained low-level locomotion policy executes the movement commands. We evaluate our method against several baselines in simulation, demonstrating significant improvements over baseline approaches, with 36.0% higher success rates and 24.5% reduction in completion time than the best baseline. Our framework successfully enables long-horizon, obstacle-aware manipulation tasks like Push-Cuboid and Push-T on Go1 robots in the real world.
Authors: Ang Lv, Ruobing Xie, Shuaipeng Li, Jiayi Liao, Xingwu Sun, Zhanhui Kang, Di Wang, Rui Yan
Abstract: We propose a novel attention mechanism, named Cog Attention, that enables attention weights to be negative for enhanced expressiveness, which stems from two key factors: (1) Cog Attention can shift the token deletion and copying function from a static OV matrix to dynamic QK inner products, with the OV matrix now focusing more on refinement or modification. The attention head can simultaneously delete, copy, or retain tokens by assigning them negative, positive, or minimal attention weights, respectively. As a result, a single attention head becomes more flexible and expressive. (2) Cog Attention improves the model's robustness against representational collapse, which can occur when earlier tokens are over-squashed into later positions, leading to homogeneous representations. Negative weights reduce effective information paths from earlier to later tokens, helping to mitigate this issue. We develop Transformer-like models which use Cog Attention as attention modules, including decoder-only models for language modeling and U-ViT diffusion models for image generation. Experiments show that models using Cog Attention exhibit superior performance compared to those employing traditional softmax attention modules. Our approach suggests a promising research direction for rethinking and breaking the entrenched constraints of traditional softmax attention, such as the requirement for non-negative weights.