Authors: Masoud Ataei, Mohammad Javad Khojasteh, Vikas Dhiman
Abstract: In this paper, we provide distance-aware error bounds for Kolmogorov Arnold Networks (KANs). We call our new error bounds estimator DAREK -- Distance Aware Error for Kolmogorov networks. Z. Liu et al. provide error bounds, which may be loose, lack distance-awareness, and are defined only up to an unknown constant of proportionality. We review the error bounds for Newton's polynomial, which is then generalized to an arbitrary spline, under Lipschitz continuity assumptions. We then extend these bounds to nested compositions of splines, arriving at error bounds for KANs. We evaluate our method by estimating an object's shape from sparse laser scan points. We use KAN to fit a smooth function to the scans and provide error bounds for the fit. We find that our method is faster than Monte Carlo approaches, and that our error bounds enclose the true obstacle shape reliably.
Authors: Luke Dramko, Claire Le Goues, Edward J. Schwartz
Abstract: Neural decompilers are machine learning models that reconstruct the source code from an executable program. Critical to the lifecycle of any machine learning model is an evaluation of its effectiveness. However, existing techniques for evaluating neural decompilation models have substantial weaknesses, especially when it comes to showing the correctness of the neural decompiler's predictions. To address this, we introduce codealign, a novel instruction-level code equivalence technique designed for neural decompilers. We provide a formal definition of a relation between equivalent instructions, which we term an equivalence alignment. We show how codealign generates equivalence alignments, then evaluate codealign by comparing it with symbolic execution. Finally, we show how the information codealign provides-which parts of the functions are equivalent and how well the variable names match-is substantially more detailed than existing state-of-the-art evaluation metrics, which report unitless numbers measuring similarity.
Authors: Felix Jimenez, Matthias Katzfuss
Abstract: Deterministic uncertainty quantification (UQ) in deep learning aims to estimate uncertainty with a single pass through a network by leveraging outputs from the network's feature extractor. Existing methods require that the feature extractor be both sensitive and smooth, ensuring meaningful input changes produce meaningful changes in feature vectors. Smoothness enables generalization, while sensitivity prevents feature collapse, where distinct inputs are mapped to identical feature vectors. To meet these requirements, current deterministic methods often retrain networks with spectral normalization. Instead of modifying training, we propose using measures of neural collapse to identify an existing intermediate layer that is both sensitive and smooth. We then fit a probabilistic model to the feature vector of this intermediate layer, which we call a probabilistic skip connection (PSC). Through empirical analysis, we explore the impact of spectral normalization on neural collapse and demonstrate that PSCs can effectively disentangle aleatoric and epistemic uncertainty. Additionally, we show that PSCs achieve uncertainty quantification and out-of-distribution (OOD) detection performance that matches or exceeds existing single-pass methods requiring training modifications. By retrofitting existing models, PSCs enable high-quality UQ and OOD capabilities without retraining.
Authors: Ziyuan Bao, Eiman Kanjo, Soumya Banerjee, Hasib-Al Rashid, Tinoosh Mohsenin
Abstract: With the growing computational capabilities of microcontroller units (MCUs), edge devices can now support machine learning models. However, deploying decentralised federated learning (DFL) on such devices presents key challenges, including intermittent connectivity, limited communication range, and dynamic network topologies. This paper proposes a novel framework, bilayer Gossip Decentralised Parallel Stochastic Gradient Descent (GD PSGD), designed to address these issues in resource-constrained environments. The framework incorporates a hierarchical communication structure using Distributed Kmeans (DKmeans) clustering for geographic grouping and a gossip protocol for efficient model aggregation across two layers: intra-cluster and inter-cluster. We evaluate the framework's performance against the Centralised Federated Learning (CFL) baseline using the MCUNet model on the CIFAR-10 dataset under IID and Non-IID conditions. Results demonstrate that the proposed method achieves comparable accuracy to CFL on IID datasets, requiring only 1.8 additional rounds for convergence. On Non-IID datasets, the accuracy loss remains under 8\% for moderate data imbalance. These findings highlight the framework's potential to support scalable and privacy-preserving learning on edge devices with minimal performance trade-offs.
Authors: Ismail B. Mustapha, Muyideen Abdulkareem, Shafaatunnur Hasan, Abideen Ganiyu, Hatem Nabus, Jin Chai Lee
Abstract: The performance of pavement under loading depends on the strength of the subgrade. However, experimental estimation of properties of pavement strengths such as California bearing ratio (CBR), unconfined compressive strength (UCS) and resistance value (R) are often tedious, time-consuming and costly, thereby inspiring a growing interest in machine learning based tools which are simple, cheap and fast alternatives. Thus, the potential application of two boosting techniques; categorical boosting (CatBoost) and extreme gradient boosting (XGBoost) and support vector regression (SVR), is similarly explored in this study for estimation of properties of subgrade soil modified with hydrated lime activated rice husk ash (HARSH). Using 121 experimental data samples of varying proportions of HARSH, plastic limit, liquid limit, plasticity index, clay activity, optimum moisture content, and maximum dry density as input for CBR, UCS and R estimation, four evaluation metrics namely coefficient of determination (R2), root mean squared error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) are used to evaluate the models' performance. The results indicate that XGBoost outperformed CatBoost and SVR in estimating these properties, yielding R2 of 0.9994, 0.9995 and 0.9999 in estimating the CBR, UCS and R respectively. Also, SVR outperformed CatBoost in estimating the CBR and R with R2 of 0.9997 respectively. On the other hand, CatBoost outperformed SVR in estimating the UCS with R2 of 0.9994. Feature sensitivity analysis shows that the three machine learning techniques are unanimous that increasing HARSH proportion lead to values of the estimated properties respectively. A comparison with previous results also shows superiority of XGBoost in estimating subgrade properties.
Authors: Sergio Rozada, Hoi-To Wai, Antonio G. Marques
Abstract: Reinforcement learning (RL) aims to estimate the action to take given a (time-varying) state, with the goal of maximizing a cumulative reward function. Predominantly, there are two families of algorithms to solve RL problems: value-based and policy-based methods, with the latter designed to learn a probabilistic parametric policy from states to actions. Most contemporary approaches implement this policy using a neural network (NN). However, NNs usually face issues related to convergence, architectural suitability, hyper-parameter selection, and underutilization of the redundancies of the state-action representations (e.g. locally similar states). This paper postulates multi-linear mappings to efficiently estimate the parameters of the RL policy. More precisely, we leverage the PARAFAC decomposition to design tensor low-rank policies. The key idea involves collecting the policy parameters into a tensor and leveraging tensor-completion techniques to enforce low rank. We establish theoretical guarantees of the proposed methods for various policy classes and validate their efficacy through numerical experiments. Specifically, we demonstrate that tensor low-rank policy models reduce computational and sample complexities in comparison to NN models while achieving similar rewards.
Authors: MZ Naser
Abstract: The widespread acceptance of empirically derived codal provisions and equations in civil engineering stands in stark contrast to the skepticism facing machine learning (ML) models, despite their shared statistical foundations. This paper examines this philosophical tension through the lens of structural engineering and explores how integrating ML challenges traditional engineering philosophies and professional identities. Recent efforts have documented how ML enhances predictive accuracy, optimizes designs, and analyzes complex behaviors. However, one might also raise concerns about the diminishing role of human intuition and the interpretability of algorithms. To showcase this rarely explored front, this paper presents how ML can be successfully integrated into various engineering problems by means of formulation via deduction, induction, and abduction. Then, this paper identifies three principal paradoxes that could arise when adopting ML: analysis paralysis (increased prediction accuracy leading to a reduced understanding of physical mechanisms), infeasible solutions (optimization resulting in unconventional designs that challenge engineering intuition), and the Rashomon effect (where contradictions in explainability methods and physics arise). This paper concludes by addressing these paradoxes and arguing the need to rethink epistemological shifts in engineering and engineering education and methodologies to harmonize traditional principles with ML.
Authors: Michail Ouroutzoglou, Mingmin Zhao, Joshua Hellerstein, Hariharan Rahul, Asima Badic, Brian S. Kim, Dina Katabi
Abstract: Chronic itch affects 13% of the US population, is highly debilitating, and underlies many medical conditions. A major challenge in clinical care and new therapeutics development is the lack of an objective measure for quantifying itch, leading to reliance on subjective measures like patients' self-assessment of itch severity. In this paper, we show that a home radio device paired with artificial intelligence (AI) can concurrently capture scratching and evaluate its impact on sleep quality by analyzing radio signals bouncing in the environment. The device eliminates the need for wearable sensors or skin contact, enabling monitoring of chronic itch over extended periods at home without burdening patients or interfering with their skin condition. To validate the technology, we conducted an observational clinical study of chronic pruritus patients, monitored at home for one month using both the radio device and an infrared camera. Comparing the output of the device to ground truth data from the camera demonstrates its feasibility and accuracy (ROC AUC = 0.997, sensitivity = 0.825, specificity = 0.997). The results reveal a significant correlation between scratching and low sleep quality, manifested as a reduction in sleep efficiency (R = 0.6, p < 0.001) and an increase in sleep latency (R = 0.68, p < 0.001). Our study underscores the potential of passive, long-term, at-home monitoring of chronic scratching and its sleep implications, offering a valuable tool for both clinical care of chronic itch patients and pharmaceutical clinical trials.
Authors: Seyed Amir Bidaki, Amir Mohammadkhah, Kiyan Rezaee, Faeze Hassani, Sadegh Eskandari, Maziar Salahi, Mohammad M. Ghassemi
Abstract: Online Continual Learning (OCL) is a critical area in machine learning, focusing on enabling models to adapt to evolving data streams in real-time while addressing challenges such as catastrophic forgetting and the stability-plasticity trade-off. This study conducts the first comprehensive Systematic Literature Review (SLR) on OCL, analyzing 81 approaches, extracting over 1,000 features (specific tasks addressed by these approaches), and identifying more than 500 components (sub-models within approaches, including algorithms and tools). We also review 83 datasets spanning applications like image classification, object detection, and multimodal vision-language tasks. Our findings highlight key challenges, including reducing computational overhead, developing domain-agnostic solutions, and improving scalability in resource-constrained environments. Furthermore, we identify promising directions for future research, such as leveraging self-supervised learning for multimodal and sequential data, designing adaptive memory mechanisms that integrate sparse retrieval and generative replay, and creating efficient frameworks for real-world applications with noisy or evolving task boundaries. By providing a rigorous and structured synthesis of the current state of OCL, this review offers a valuable resource for advancing this field and addressing its critical challenges and opportunities. The complete SLR methodology steps and extracted data are publicly available through the provided link: https://github.com/kiyan-rezaee/ Systematic-Literature-Review-on-Online-Continual-Learning
Authors: Jake H. Lee, Michael Kiper, David R. Thompson, Philip G. Brodrick
Abstract: Current and upcoming generations of visible-shortwave infrared (VSWIR) imaging spectrometers promise unprecedented capacity to quantify Earth System processes across the globe. However, reliable cloud screening remains a fundamental challenge for these instruments, where traditional spatial and temporal approaches are limited by cloud variability and limited temporal coverage. The Spectroscopic Transformer (SpecTf) addresses these challenges with a spectroscopy-specific deep learning architecture that performs cloud detection using only spectral information (no spatial or temporal data are required). By treating spectral measurements as sequences rather than image channels, SpecTf learns fundamental physical relationships without relying on spatial context. Our experiments demonstrate that SpecTf significantly outperforms the current baseline approach implemented for the EMIT instrument, and performs comparably with other machine learning methods with orders of magnitude fewer learned parameters. Critically, we demonstrate SpecTf's inherent interpretability through its attention mechanism, revealing physically meaningful spectral features the model has learned. Finally, we present SpecTf's potential for cross-instrument generalization by applying it to a different instrument on a different platform without modifications, opening the door to instrument agnostic data driven algorithms for future imaging spectroscopy tasks.
Authors: Guannan Lai, Yihui Feng, Xin Yang, Xiaoyu Deng, Hao Yu, Shuyin Xia, Guoyin Wang, Tianrui Li
Abstract: Federated Learning (FL) facilitates collaborative model training while prioritizing privacy by avoiding direct data sharing. However, most existing articles attempt to address challenges within the model's internal parameters and corresponding outputs, while neglecting to solve them at the input level. To address this gap, we propose a novel framework called Granular-Ball Federated Learning (GrBFL) for image classification. GrBFL diverges from traditional methods that rely on the finest-grained input data. Instead, it segments images into multiple regions with optimal coarse granularity, which are then reconstructed into a graph structure. We designed a two-dimensional binary search segmentation algorithm based on variance constraints for GrBFL, which effectively removes redundant information while preserving key representative features. Extensive theoretical analysis and experiments demonstrate that GrBFL not only safeguards privacy and enhances efficiency but also maintains robust utility, consistently outperforming other state-of-the-art FL methods. The code is available at https://github.com/AIGNLAI/GrBFL.
Authors: Fazl Barez, Tingchen Fu, Ameya Prabhu, Stephen Casper, Amartya Sanyal, Adel Bibi, Aidan O'Gara, Robert Kirk, Ben Bucknall, Tim Fist, Luke Ong, Philip Torr, Kwok-Yan Lam, Robert Trager, David Krueger, S\"oren Mindermann, Jos\'e Hernandez-Orallo, Mor Geva, Yarin Gal
Abstract: As AI systems become more capable, widely deployed, and increasingly autonomous in critical areas such as cybersecurity, biological research, and healthcare, ensuring their safety and alignment with human values is paramount. Machine unlearning -- the ability to selectively forget or suppress specific types of knowledge -- has shown promise for privacy and data removal tasks, which has been the primary focus of existing research. More recently, its potential application to AI safety has gained attention. In this paper, we identify key limitations that prevent unlearning from serving as a comprehensive solution for AI safety, particularly in managing dual-use knowledge in sensitive domains like cybersecurity and chemical, biological, radiological, and nuclear (CBRN) safety. In these contexts, information can be both beneficial and harmful, and models may combine seemingly harmless information for harmful purposes -- unlearning this information could strongly affect beneficial uses. We provide an overview of inherent constraints and open problems, including the broader side effects of unlearning dangerous knowledge, as well as previously unexplored tensions between unlearning and existing safety mechanisms. Finally, we investigate challenges related to evaluation, robustness, and the preservation of safety features during unlearning. By mapping these limitations and open challenges, we aim to guide future research toward realistic applications of unlearning within a broader AI safety framework, acknowledging its limitations and highlighting areas where alternative approaches may be required.
Authors: Benjamin J. Choi (Harvard John A. Paulson School of Engineering and Applied Sciences, Cambridge, MA, United States of America), Griffin Milsap (Johns Hopkins University Applied Physics Laboratory, Laurel, MD, United States of America), Clara A. Scholl (Johns Hopkins University Applied Physics Laboratory, Laurel, MD, United States of America), Francesco Tenore (Johns Hopkins University Applied Physics Laboratory, Laurel, MD, United States of America), Mattson Ogg (Johns Hopkins University Applied Physics Laboratory, Laurel, MD, United States of America)
Abstract: Current machine learning (ML)-based algorithms for filtering electroencephalography (EEG) time series data face challenges related to cumbersome training times, regularization, and accurate reconstruction. To address these shortcomings, we present an ML filtration algorithm driven by a logistic covariance-targeted adversarial denoising autoencoder (TADA). We hypothesize that the expressivity of a targeted, correlation-driven convolutional autoencoder will enable effective time series filtration while minimizing compute requirements (e.g., runtime, model size). Furthermore, we expect that adversarial training with covariance rescaling will minimize signal degradation. To test this hypothesis, a TADA system prototype was trained and evaluated on the task of removing electromyographic (EMG) noise from EEG data in the EEGdenoiseNet dataset, which includes EMG and EEG data from 67 subjects. The TADA filter surpasses conventional signal filtration algorithms across quantitative metrics (Correlation Coefficient, Temporal RRMSE, Spectral RRMSE), and performs competitively against other deep learning architectures at a reduced model size of less than 400,000 trainable parameters. Further experimentation will be necessary to assess the viability of TADA on a wider range of deployment cases.
Authors: HyunGi Kim, Siwon Kim, Jisoo Mok, Sungroh Yoon
Abstract: Deep Neural Networks have spearheaded remarkable advancements in time series forecasting (TSF), one of the major tasks in time series modeling. Nonetheless, the non-stationarity of time series undermines the reliability of pre-trained source time series forecasters in mission-critical deployment settings. In this study, we introduce a pioneering test-time adaptation framework tailored for TSF (TSF-TTA). TAFAS, the proposed approach to TSF-TTA, flexibly adapts source forecasters to continuously shifting test distributions while preserving the core semantic information learned during pre-training. The novel utilization of partially-observed ground truth and gated calibration module enables proactive, robust, and model-agnostic adaptation of source forecasters. Experiments on diverse benchmark datasets and cutting-edge architectures demonstrate the efficacy and generality of TAFAS, especially in long-term forecasting scenarios that suffer from significant distribution shifts. The code is available at https://github.com/kimanki/TAFAS.
Authors: Sara Sameer, Wei Zhang, Xin Lou, Qingyu Yan, Terence Goh, Yulin Gao
Abstract: The surging demand for batteries requires advanced battery management systems, where battery capacity modelling is a key functionality. In this paper, we aim to achieve accurate battery capacity prediction by learning from historical measurements of battery dynamics. We propose GiNet, a gated recurrent units enhanced Informer network, for predicting battery's capacity. The novelty and competitiveness of GiNet lies in its capability of capturing sequential and contextual information from raw battery data and reflecting the battery's complex behaviors with both temporal dynamics and long-term dependencies. We conducted an experimental study based on a publicly available dataset to showcase GiNet's strength of gaining a holistic understanding of battery behavior and predicting battery capacity accurately. GiNet achieves 0.11 mean absolute error for predicting the battery capacity in a sequence of future time slots without knowing the historical battery capacity. It also outperforms the latest algorithms significantly with 27% error reduction on average compared to Informer. The promising results highlight the importance of customized and optimized integration of algorithm and battery knowledge and shed light on other industry applications as well.
Authors: Lukas Moosbrugger, Valentin Seiler, Philipp Wohlgenannt, Sebastian Hegenbart, Sashko Ristov, Peter Kepplinger
Abstract: Accurate load forecasting is crucial for predictive control in many energy domain applications, with significant economic and ecological implications. To address these implications, this study provides an extensive benchmark of state-of-the-art deep learning models for short-term load forecasting in energy communities. Namely, LSTM, xLSTM, and Transformers are compared with benchmarks such as KNNs, synthetic load models, and persistence forecasting models. This comparison considers different scales of aggregation (e.g., number of household loads) and varying training data availability (e.g., training data time spans). Further, the impact of transfer learning from synthetic (standard) load profiles and the deep learning model size (i.e., parameter count) is investigated in terms of forecasting error. Implementations are publicly available and other researchers are encouraged to benchmark models using this framework. Additionally, a comprehensive case study, comprising an energy community of 50 households and a battery storage demonstrates the beneficial financial implications of accurate predictions. Key findings of this research include: (1) Simple persistence benchmarks outperform deep learning models for short-term load forecasting when the available training data is limited to six months or less; (2) Pretraining with publicly available synthetic load profiles improves the normalized Mean Absolute Error (nMAE) by an average of 1.28%pt during the first nine months of training data; (3) Increased aggregation significantly enhances the performance of deep learning models relative to persistence benchmarks; (4) Improved load forecasting, with an nMAE reduction of 1.1%pt, translates to an economic benefit of approximately 600EUR per year in an energy community comprising 50 households.
Authors: Qinghao Zhang, Wenrui Li, Pinjia Zhang
Abstract: The thermal sensitive electrical parameter (TSEP) method is crucial for enhancing the reliability of power devices through junction temperature monitoring. The TSEP method comprises three key processes: calibration, regression, and application. While significant efforts have been devoted to improving regression algorithms and increasing TSEP sensitivity to enhance junction temperature monitoring accuracy, these approaches have reached a bottleneck. In reality, the calibration method significantly influences monitoring accuracy, an aspect often overlooked in conventional TSEP methods. To address this issue, we propose a high-accuracy calibration method for transient TSEPs. First, a temperature compensation strategy based on thermal analysis is introduced to mitigate the temperature difference caused by load current during dual pulse tests. Second, the impact of stray parameters is analyzed to identify coupled parameters, which are typically neglected in existing methods. Third, it is observed that random errors follow a logarithm Gaussian distribution, covering a hidden variable. A neural network is used to obtain the junction temperature predictive model. The proposed calibration method is experimental validated in threshold voltage as an example. Compared with conventional calibration methods, the mean absolute error is reduced by over 30%. Moreover, this method does not require additional hardware cost and has good generalization.
Authors: Hyeonsoo Jo, Hyunjin Hwang, Fanchen Bu, Soo Yong Lee, Chanyoung Park, Kijung Shin
Abstract: Adversarial attacks are allegedly unnoticeable. Prior studies have designed attack noticeability measures on graphs, primarily using statistical tests to compare the topology of original and (possibly) attacked graphs. However, we observe two critical limitations in the existing measures. First, because the measures rely on simple rules, attackers can readily enhance their attacks to bypass them, reducing their attack "noticeability" and, yet, maintaining their attack performance. Second, because the measures naively leverage global statistics, such as degree distributions, they may entirely overlook attacks until severe perturbations occur, letting the attacks be almost "totally unnoticeable." To address the limitations, we introduce HideNSeek, a learnable measure for graph attack noticeability. First, to mitigate the bypass problem, HideNSeek learns to distinguish the original and (potential) attack edges using a learnable edge scorer (LEO), which scores each edge on its likelihood of being an attack. Second, to mitigate the overlooking problem, HideNSeek conducts imbalance-aware aggregation of all the edge scores to obtain the final noticeability score. Using six real-world graphs, we empirically demonstrate that HideNSeek effectively alleviates the observed limitations, and LEO (i.e., our learnable edge scorer) outperforms eleven competitors in distinguishing attack edges under five different attack methods. For an additional application, we show that LEO boost the performance of robust GNNs by removing attack-like edges.
Authors: Tarun Ram Menta, Susmit Agrawal, Chirag Agarwal
Abstract: Large Language Models (LLMs) are prevalent in modern applications but often memorize training data, leading to privacy breaches and copyright issues. Existing research has mainly focused on posthoc analyses, such as extracting memorized content or developing memorization metrics, without exploring the underlying architectural factors that contribute to memorization. In this work, we investigate memorization from an architectural lens by analyzing how attention modules at different layers impact its memorization and generalization performance. Using attribution techniques, we systematically intervene in the LLM architecture by bypassing attention modules at specific blocks while keeping other components like layer normalization and MLP transformations intact. We provide theorems analyzing our intervention mechanism from a mathematical view, bounding the difference in layer outputs with and without our attributions. Our theoretical and empirical analyses reveal that attention modules in deeper transformer blocks are primarily responsible for memorization, whereas earlier blocks are crucial for the models generalization and reasoning capabilities. We validate our findings through comprehensive experiments on different LLM families (Pythia and GPTNeo) and five benchmark datasets. Our insights offer a practical approach to mitigate memorization in LLMs while preserving their performance, contributing to safer and more ethical deployment in real world applications.
Authors: Xuran Zheng, Chang D. Yoo
Abstract: In recent years, large language models have had a very impressive performance, which largely contributed to the development and application of artificial intelligence, and the parameters and performance of the models are still growing rapidly. In particular, multimodal large language models (MLLM) can combine multiple modalities such as pictures, videos, sounds, texts, etc., and have great potential in various tasks. However, most MLLMs require very high computational resources, which is a major challenge for most researchers and developers. In this paper, we explored the utility of small-scale MLLMs and applied small-scale MLLMs to the field of autonomous driving. We hope that this will advance the application of MLLMs in real-world scenarios.
Authors: Yoseob Han
Abstract: Objective: X-ray computed tomography employing sparse projection views has emerged as a contemporary technique to mitigate radiation dose. However, due to the inadequate number of projection views, an analytic reconstruction method utilizing filtered backprojection results in severe streaking artifacts. Recently, deep learning strategies employing image-domain networks have demonstrated remarkable performance in eliminating the streaking artifact caused by analytic reconstruction methods with sparse projection views. Nevertheless, it is difficult to clarify the theoretical justification for applying deep learning to sparse view CT reconstruction, and it has been understood as restoration by removing image artifacts, not reconstruction. Approach: By leveraging the theory of deep convolutional framelets and the hierarchical decomposition of measurement, this research reveals the constraints of conventional image- and projection-domain deep learning methodologies, subsequently, the research proposes a novel dual-domain deep learning framework utilizing hierarchical decomposed measurements. Specifically, the research elucidates how the performance of the projection-domain network can be enhanced through a low-rank property of deep convolutional framelets and a bowtie support of hierarchical decomposed measurement in the Fourier domain. Main Results: This study demonstrated performance improvement of the proposed framework based on the low-rank property, resulting in superior reconstruction performance compared to conventional analytic and deep learning methods. Significance: By providing a theoretically justified deep learning approach for sparse-view CT reconstruction, this study not only offers a superior alternative to existing methods but also opens new avenues for research in medical imaging.
Authors: Yixuan Yang, Xingyu Fang, Zhaowen Cheng, Pengju Yan, Xiaolin Li
Abstract: Molecular conformation generation plays key roles in computational drug design. Recently developed deep learning methods, particularly diffusion models have reached competitive performance over traditional cheminformatical approaches. However, these methods are often time-consuming or require extra support from traditional methods. We propose EquiBoost, a boosting model that stacks several equivariant graph transformers as weak learners, to iteratively refine 3D conformations of molecules. Without relying on diffusion techniques, EquiBoost balances accuracy and efficiency more effectively than diffusion-based methods. Notably, compared to the previous state-of-the-art diffusion method, EquiBoost improves generation quality and preserves diversity, achieving considerably better precision of Average Minimum RMSD (AMR) on the GEOM datasets. This work rejuvenates boosting and sheds light on its potential to be a robust alternative to diffusion models in certain scenarios.
Authors: William T. Lunardi, Abdulrahman Banabila, Dania Herzalla, Martin L. Andreoni
Abstract: Anomaly detection involves identifying data patterns that deviate from the anticipated norm. Traditional methods struggle in high-dimensional spaces due to the curse of dimensionality. In recent years, self-supervised learning, particularly through contrastive objectives, has driven advances in anomaly detection. However, vanilla contrastive learning struggles to align with the unique demands of anomaly detection, as it lacks a pretext task tailored to the homogeneous nature of In-Distribution (ID) data and the diversity of Out-of-Distribution (OOD) anomalies. Methods that attempt to address these challenges, such as introducing hard negatives through synthetic outliers, Outlier Exposure (OE), and supervised objectives, often rely on pretext tasks that fail to balance compact clustering of ID samples with sufficient separation from OOD data. In this work, we propose Focused In-distribution Representation Modeling (FIRM), a contrastive learning objective specifically designed for anomaly detection. Unlike existing approaches, FIRM incorporates synthetic outliers into its pretext task in a way that actively shapes the representation space, promoting compact clustering of ID samples while enforcing strong separation from outliers. This formulation addresses the challenges of class collision, enhancing both the compactness of ID representations and the discriminative power of the learned feature space. We show that FIRM surpasses other contrastive methods in standard benchmarks, significantly enhancing anomaly detection compared to both traditional and supervised contrastive learning objectives. Our ablation studies confirm that FIRM consistently improves the quality of representations and shows robustness across a range of scoring methods. The code is available at: https://github.com/willtl/firm.
Authors: Drago Plecko, Paul Secombe, Andrea Clarke, Amelia Fiske, Samarra Toby, Donisha Duff, David Pilcher, Leo Anthony Celi, Rinaldo Bellomo, Elias Bareinboim
Abstract: The new era of large-scale data collection and analysis presents an opportunity for diagnosing and understanding the causes of health inequities. In this study, we describe a framework for systematically analyzing health disparities using causal inference. The framework is illustrated by investigating racial and ethnic disparities in intensive care unit (ICU) outcome between majority and minority groups in Australia (Indigenous vs. Non-Indigenous) and the United States (African-American vs. White). We demonstrate that commonly used statistical measures for quantifying inequity are insufficient, and focus on attributing the observed disparity to the causal mechanisms that generate it. We find that minority patients are younger at admission, have worse chronic health, are more likely to be admitted for urgent and non-elective reasons, and have higher illness severity. At the same time, however, we find a protective direct effect of belonging to a minority group, with minority patients showing improved survival compared to their majority counterparts, with all other variables kept equal. We demonstrate that this protective effect is related to the increased probability of being admitted to ICU, with minority patients having an increased risk of ICU admission. We also find that minority patients, while showing improved survival, are more likely to be readmitted to ICU. Thus, due to worse access to primary health care, minority patients are more likely to end up in ICU for preventable conditions, causing a reduction in the mortality rates and creating an effect that appears to be protective. Since the baseline risk of ICU admission may serve as proxy for lack of access to primary care, we developed the Indigenous Intensive Care Equity (IICE) Radar, a monitoring system for tracking the over-utilization of ICU resources by the Indigenous population of Australia across geographical areas.
Authors: Laura Puccioni, Alireza Farshin, Mariano Scazzariello, Changjie Wang, Marco Chiesa, Dejan Kostic
Abstract: Large Language Models (LLMs) have demonstrated their exceptional performance in various complex code generation tasks. However, their broader adoption is limited by significant computational demands and high resource requirements, particularly memory and processing power. To mitigate such requirements, model pruning techniques are used to create more compact models with significantly fewer parameters. However, current approaches do not focus on the efficient extraction of programming-language-specific sub-models. In this work, we explore the idea of efficiently deriving coding-specific sub-models through unstructured pruning (i.e., Wanda). We investigate the impact of different domain-specific calibration datasets on pruning outcomes across three distinct domains and extend our analysis to extracting four language-specific sub-models: Python, Java, C++, and JavaScript. We are the first to efficiently extract programming-language-specific sub-models using appropriate calibration datasets while maintaining acceptable accuracy w.r.t. full models. We are also the first to provide analytical evidence that domain-specific tasks activate distinct regions within LLMs, supporting the creation of specialized sub-models through unstructured pruning. We believe that this work has significant potential to enhance LLM accessibility for coding by reducing computational requirements to enable local execution on consumer-grade hardware, and supporting faster inference times critical for real-time development feedback.
Authors: Emilia Magnani, Ernesto De Vito, Philipp Hennig, Lorenzo Rosasco
Abstract: We consider the problem of learning convolution operators associated to compact Abelian groups. We study a regularization-based approach and provide corresponding learning guarantees, discussing natural regularity condition on the convolution kernel. More precisely, we assume the convolution kernel is a function in a translation invariant Hilbert space and analyze a natural ridge regression (RR) estimator. Building on existing results for RR, we characterize the accuracy of the estimator in terms of finite sample bounds. Interestingly, regularity assumptions which are classical in the analysis of RR, have a novel and natural interpretation in terms of space/frequency localization. Theoretical results are illustrated by numerical simulations.
Authors: Hesham G. Moussa, Arashmid Akhavain, S. Maryam Hosseini, Bill McCormick
Abstract: Machine learning models have achieved, and in some cases surpassed, human-level performance in various tasks, mainly through centralized training of static models and the use of large models stored in centralized clouds for inference. However, this centralized approach has several drawbacks, including privacy concerns, high storage demands, a single point of failure, and significant computing requirements. These challenges have driven interest in developing alternative decentralized and distributed methods for AI training and inference. Distribution introduces additional complexity, as it requires managing multiple moving parts. To address these complexities and fill a gap in the development of distributed AI systems, this work proposes a novel framework, Data and Dynamics-Aware Inference and Training Networks (DA-ITN). The different components of DA-ITN and their functions are explored, and the associated challenges and research areas are highlighted.
Authors: Dmytro Kuzmenko, Nadiya Shvai
Abstract: We propose an efficient knowledge transfer approach for model-based reinforcement learning, addressing the challenge of deploying large world models in resource-constrained environments. Our method distills a high-capacity multi-task agent (317M parameters) into a compact 1M parameter model, achieving state-of-the-art performance on the MT30 benchmark with a normalized score of 28.45, a substantial improvement over the original 1M parameter model's score of 18.93. This demonstrates the ability of our distillation technique to consolidate complex multi-task knowledge effectively. Additionally, we apply FP16 post-training quantization, reducing the model size by 50% while maintaining performance. Our work bridges the gap between the power of large models and practical deployment constraints, offering a scalable solution for efficient and accessible multi-task reinforcement learning in robotics and other resource-limited domains.
Authors: Ari Blonda, Shan Gao, Hamed Hatami, Pooya Hatami
Abstract: Two seminal papers--Alon, Livni, Malliaris, Moran (STOC 2019) and Bun, Livni, and Moran (FOCS 2020)--established the equivalence between online learnability and globally stable PAC learnability in binary classification. However, Chase, Chornomaz, Moran, and Yehudayoff (STOC 2024) recently showed that this equivalence does not hold in the agnostic setting. Specifically, they proved that in the agnostic setting, only finite hypothesis classes are globally stable learnable. Therefore, agnostic global stability is too restrictive to capture interesting hypothesis classes. To address this limitation, Chase \emph{et al.} introduced two relaxations of agnostic global stability. In this paper, we characterize the classes that are learnable under their proposed relaxed conditions, resolving the two open problems raised in their work. First, we prove that in the setting where the stability parameter can depend on the excess error (the gap between the learner's error and the best achievable error by the hypothesis class), agnostic stability is fully characterized by the Littlestone dimension. Consequently, as in the realizable case, this form of learnability is equivalent to online learnability. As part of the proof of this theorem, we strengthen the celebrated result of Bun et al. by showing that classes with infinite Littlestone dimension are not stably PAC learnable, even if we allow the stability parameter to depend on the excess error. For the second relaxation proposed by Chase et al., we prove that only finite hypothesis classes are globally stable learnable even if we restrict the agnostic setting to distributions with small population loss.
Authors: Chong Liu, Dan Qiao, Ming Yin, Ilija Bogunovic, Yu-Xiang Wang
Abstract: This work studies linear bandits under a new notion of gap-adjusted misspecification and is an extension of Liu et al. (2023). When the underlying reward function is not linear, existing linear bandits work usually relies on a uniform misspecification parameter $\epsilon$ that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever $\epsilon > 0$. We propose a more natural model of misspecification which only requires the approximation error at each input $x$ to be proportional to the suboptimality gap at $x$. It captures the intuition that, for optimization problems, near-optimal regions should matter more and we can tolerate larger approximation errors in suboptimal regions. Quite surprisingly, we show that the classical LinUCB algorithm -- designed for the realizable case -- is automatically robust against such $\rho$-gap-adjusted misspecification with parameter $\rho$ diminishing at $O(1/(d \sqrt{\log T}))$. It achieves a near-optimal $O(\sqrt{T})$ regret for problems that the best-known regret is almost linear in time horizon $T$. We further advance this frontier by presenting a novel phased elimination-based algorithm whose gap-adjusted misspecification parameter $\rho = O(1/\sqrt{d})$ does not scale with $T$. This algorithm attains optimal $O(\sqrt{T})$ regret and is deployment-efficient, requiring only $\log T$ batches of exploration. It also enjoys an adaptive $O(\log T)$ regret when a constant suboptimality gap exists. Technically, our proof relies on a novel self-bounding argument that bounds the part of the regret due to misspecification by the regret itself, and a new inductive lemma that limits the misspecification error within the suboptimality gap for all valid actions in each batch selected by G-optimal design.
Authors: Valentin De Bortoli, Alexandre Galashov, Arthur Gretton, Arnaud Doucet
Abstract: Speculative sampling is a popular technique for accelerating inference in Large Language Models by generating candidate tokens using a fast draft model and accepting or rejecting them based on the target model's distribution. While speculative sampling was previously limited to discrete sequences, we extend it to diffusion models, which generate samples via continuous, vector-valued Markov chains. In this context, the target model is a high-quality but computationally expensive diffusion model. We propose various drafting strategies, including a simple and effective approach that does not require training a draft model and is applicable out of the box to any diffusion model. Our experiments demonstrate significant generation speedup on various diffusion models, halving the number of function evaluations, while generating exact samples from the target model.
Authors: Maximilian Dreyer, Jim Berend, Tobias Labarta, Johanna Vielhaben, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek
Abstract: Unlike human-engineered systems such as aeroplanes, where each component's role and dependencies are well understood, the inner workings of AI models remain largely opaque, hindering verifiability and undermining trust. This paper introduces SemanticLens, a universal explanation method for neural networks that maps hidden knowledge encoded by components (e.g., individual neurons) into the semantically structured, multimodal space of a foundation model such as CLIP. In this space, unique operations become possible, including (i) textual search to identify neurons encoding specific concepts, (ii) systematic analysis and comparison of model representations, (iii) automated labelling of neurons and explanation of their functional roles, and (iv) audits to validate decision-making against requirements. Fully scalable and operating without human input, SemanticLens is shown to be effective for debugging and validation, summarizing model knowledge, aligning reasoning with expectations (e.g., adherence to the ABCDE-rule in melanoma classification), and detecting components tied to spurious correlations and their associated training data. By enabling component-level understanding and validation, the proposed approach helps bridge the "trust gap" between AI models and traditional engineered systems. We provide code for SemanticLens on https://github.com/jim-berend/semanticlens and a demo on https://semanticlens.hhi-research-insights.eu.
URLs: https://github.com/jim-berend/semanticlens, https://semanticlens.hhi-research-insights.eu.
Authors: Armando Collado-Villaverde, Pablo Mu\~noz, Maria D. R-Moreno
Abstract: Missing data in time-series analysis poses significant challenges, affecting the reliability of downstream applications. Imputation, the process of estimating missing values, has emerged as a key solution. This paper introduces BRATI, a novel deep-learning model designed to address multivariate time-series imputation by combining Bidirectional Recurrent Networks and Attention mechanisms. BRATI processes temporal dependencies and feature correlations across long and short time horizons, utilizing two imputation blocks that operate in opposite temporal directions. Each block integrates recurrent layers and attention mechanisms to effectively resolve long-term dependencies. We evaluate BRATI on three real-world datasets under diverse missing-data scenarios: randomly missing values, fixed-length missing sequences, and variable-length missing sequences. Our findings demonstrate that BRATI consistently outperforms state-of-the-art models, delivering superior accuracy and robustness in imputing multivariate time-series data.
Authors: Yu-Hao Huang, Chang Xu, Yueying Wu, Wu-Jun Li, Jiang Bian
Abstract: Time series generation models are crucial for applications like data augmentation and privacy preservation. Most existing time series generation models are typically designed to generate data from one specified domain. While leveraging data from other domain for better generalization is proved to work in other application areas, this approach remains challenging for time series modeling due to the large divergence in patterns among different real world time series categories. In this paper, we propose a multi-domain time series diffusion model with domain prompts, named TimeDP. In TimeDP, we utilize a time series semantic prototype module which defines time series prototypes to represent time series basis, each prototype vector serving as "word" representing some elementary time series feature. A prototype assignment module is applied to extract the extract domain specific prototype weights, for learning domain prompts as generation condition. During sampling, we extract "domain prompt" with few-shot samples from the target domain and use the domain prompts as condition to generate time series samples. Experiments demonstrate that our method outperforms baselines to provide the state-of-the-art in-domain generation quality and strong unseen domain generation capability.
Authors: Gerald Tesauro, Gregory R. Galperin
Abstract: We present a Monte-Carlo simulation algorithm for real-time policy improvement of an adaptive controller. In the Monte-Carlo simulation, the long-term expected reward of each possible action is statistically measured, using the initial policy to make decisions in each step of the simulation. The action maximizing the measured expected reward is then taken, resulting in an improved policy. Our algorithm is easily parallelizable and has been implemented on the IBM SP1 and SP2 parallel-RISC supercomputers. We have obtained promising initial results in applying this algorithm to the domain of backgammon. Results are reported for a wide variety of initial policies, ranging from a random policy to TD-Gammon, an extremely strong multi-layer neural network. In each case, the Monte-Carlo algorithm gives a substantial reduction, by as much as a factor of 5 or more, in the error rate of the base players. The algorithm is also potentially useful in many other adaptive control applications in which it is possible to simulate the environment.
Authors: Weihua Cheng, Hanwen Du, Chunxiao Li, Ersheng Ni, Liangdi Tan, Tianqi Xu, Yongxin Ni
Abstract: Knowledge Tracing (KT) is crucial in education assessment, which focuses on depicting students' learning states and assessing students' mastery of subjects. With the rise of modern online learning platforms, particularly massive open online courses (MOOCs), an abundance of interaction data has greatly advanced the development of the KT technology. Previous research commonly adopts deterministic representation to capture students' knowledge states, which neglects the uncertainty during student interactions and thus fails to model the true knowledge state in learning process. In light of this, we propose an Uncertainty-Aware Knowledge Tracing model (UKT) which employs stochastic distribution embeddings to represent the uncertainty in student interactions, with a Wasserstein self-attention mechanism designed to capture the transition of state distribution in student learning behaviors. Additionally, we introduce the aleatory uncertainty-aware contrastive learning loss, which strengthens the model's robustness towards different types of uncertainties. Extensive experiments on six real-world datasets demonstrate that UKT not only significantly surpasses existing deep learning-based models in KT prediction, but also shows unique advantages in handling the uncertainty of student interactions.
Authors: Yiwen Huang, Aaron Gokaslan, Volodymyr Kuleshov, James Tompkin
Abstract: There is a widely-spread claim that GANs are difficult to train, and GAN architectures in the literature are littered with empirical tricks. We provide evidence against this claim and build a modern GAN baseline in a more principled manner. First, we derive a well-behaved regularized relativistic GAN loss that addresses issues of mode dropping and non-convergence that were previously tackled via a bag of ad-hoc tricks. We analyze our loss mathematically and prove that it admits local convergence guarantees, unlike most existing relativistic losses. Second, our new loss allows us to discard all ad-hoc tricks and replace outdated backbones used in common GANs with modern architectures. Using StyleGAN2 as an example, we present a roadmap of simplification and modernization that results in a new minimalist baseline -- R3GAN. Despite being simple, our approach surpasses StyleGAN2 on FFHQ, ImageNet, CIFAR, and Stacked MNIST datasets, and compares favorably against state-of-the-art GANs and diffusion models.
Authors: Joris Bekkers
Abstract: Pressing is a fundamental defensive strategy in football, characterized by applying pressure on the ball owning team to regain possession. Despite its significance, existing metrics for measuring pressing often lack precision or comprehensive consideration of positional data, player movement and speed. This research introduces an innovative framework for quantifying pressing intensity, leveraging advancements in positional tracking data and components from Spearman's Pitch Control model. Our method integrates player velocities, movement directions, and reaction times to compute the time required for a defender to intercept an attacker or the ball. This time-to-intercept measure is then transformed into probabilistic values using a logistic function, enabling dynamic and intuitive analysis of pressing situations at the individual frame level. the model captures how every player's movement influences pressure on the field, offering actionable insights for coaches, analysts, and decision-makers. By providing a robust and intepretable metric, our approach facilitates the identification of pressing strategies, advanced situational analyses, and the derivation of metrics, advancing the analytical capabilities for modern football.
Authors: Shashwat Kumar, Arafat Rahman, Robert Gutierrez, Sarah Livermon, Allison N. McCrady, Silvia Blemker, Rebecca Scharf, Anuj Srivastava, Laura E. Barnes
Abstract: Clinical assessments for neuromuscular disorders, such as Spinal Muscular Atrophy (SMA) and Duchenne Muscular Dystrophy (DMD), continue to rely on subjective measures to monitor treatment response and disease progression. We introduce a novel method using wearable sensors to objectively assess motor function during daily activities in 19 patients with DMD, 9 with SMA, and 13 age-matched controls. Pediatric movement data is complex due to confounding factors such as limb length variations in growing children and variability in movement speed. Our approach uses Shape-based Principal Component Analysis to align movement trajectories and identify distinct kinematic patterns, including variations in motion speed and asymmetry. Both DMD and SMA cohorts have individuals with motor function on par with healthy controls. Notably, patients with SMA showed greater activation of the motion asymmetry pattern. We further combined projections on these principal components with partial least squares (PLS) to identify a covariation mode with a canonical correlation of r = 0.78 (95% CI: [0.34, 0.94]) with muscle fat infiltration, the Brooke score (a motor function score), and age-related degenerative changes, proposing a novel motor function index. This data-driven method can be deployed in home settings, enabling better longitudinal tracking of treatment efficacy for children with neuromuscular disorders.
Authors: Joe Omatoi, Abdul M Mohammed, Dennis Trujillo
Abstract: This study explores how causal inference models, specifically the Linear Non-Gaussian Acyclic Model (LiNGAM), can extract causal relationships between demographic factors, treatments, conditions, and outcomes from observational patient data, enabling insights beyond correlation. Unlike traditional randomized controlled trials (RCTs), which establish causal relationships within narrowly defined populations, our method leverages broader observational data, improving generalizability. Using over 40 features in the Duke MRI Breast Cancer dataset, we found that Adjuvant Anti-Her2 Neu Therapy increased local recurrence-free survival by 169 days, while Skin/Nipple involvement reduced it by 351 days. These findings highlight the therapy's importance for Her2-positive patients and the need for targeted interventions for high-risk cases, informing personalized treatment strategies.
Authors: Cuihui Xia, Lei Yue, Deliang Chen, Yuyang Li, Hongqiang Yang, Ancheng Xue, Zhiqiang Li, Qing He, Guoqing Zhang, Dambaru Ballab Kattel, Lei Lei, Ming Zhou
Abstract: Traditional equation-driven hydrological models often struggle to accurately predict streamflow in challenging regional Earth systems like the Tibetan Plateau, while hybrid and existing algorithm-driven models face difficulties in interpreting hydrological behaviors. This work introduces HydroTrace, an algorithm-driven, data-agnostic model that substantially outperforms these approaches, achieving a Nash-Sutcliffe Efficiency of 98% and demonstrating strong generalization on unseen data. Moreover, HydroTrace leverages advanced attention mechanisms to capture spatial-temporal variations and feature-specific impacts, enabling the quantification and spatial resolution of streamflow partitioning as well as the interpretation of hydrological behaviors such as glacier-snow-streamflow interactions and monsoon dynamics. Additionally, a large language model (LLM)-based application allows users to easily understand and apply HydroTrace's insights for practical purposes. These advancements position HydroTrace as a transformative tool in hydrological and broader Earth system modeling, offering enhanced prediction accuracy and interpretability.
Authors: Rancy Chepchirchir, Jill Sunday, Raymond Confidence, Dong Zhang, Talha Chaudhry, Udunna C. Anazodo, Kendi Muchungi, Yujing Zou
Abstract: In Sub-Saharan Africa (SSA), the utilization of lower-quality Magnetic Resonance Imaging (MRI) technology raises questions about the applicability of machine learning methods for clinical tasks. This study aims to provide a robust deep learning-based brain tumor segmentation (BraTS) method tailored for the SSA population using a threefold approach. Firstly, the impact of domain shift from the SSA training data on model efficacy was examined, revealing no significant effect. Secondly, a comparative analysis of 3D and 2D full-resolution models using the nnU-Net framework indicates similar performance of both the models trained for 300 epochs achieving a five-fold cross-validation score of 0.93. Lastly, addressing the performance gap observed in SSA validation as opposed to the relatively larger BraTS glioma (GLI) validation set, two strategies are proposed: fine-tuning SSA cases using the GLI+SSA best-pretrained 2D fullres model at 300 epochs, and introducing a novel neural style transfer-based data augmentation technique for the SSA cases. This investigation underscores the potential of enhancing brain tumor prediction within SSA's unique healthcare landscape.
Authors: Victor Nascimento Ribeiro, Nina S. T. Hirata
Abstract: Video-based Automatic License Plate Recognition (ALPR) involves extracting vehicle license plate text information from video captures. Traditional systems typically rely heavily on high-end computing resources and utilize multiple frames to recognize license plates, leading to increased computational overhead. In this paper, we propose two methods capable of efficiently extracting exactly one frame per vehicle and recognizing its license plate characters from this single image, thus significantly reducing computational demands. The first method uses Visual Rhythm (VR) to generate time-spatial images from videos, while the second employs Accumulative Line Analysis (ALA), a novel algorithm based on single-line video processing for real-time operation. Both methods leverage YOLO for license plate detection within the frame and a Convolutional Neural Network (CNN) for Optical Character Recognition (OCR) to extract textual information. Experiments on real videos demonstrate that the proposed methods achieve results comparable to traditional frame-by-frame approaches, with processing speeds three times faster.
Authors: Kirandeep Kaur, Manya Chadha, Vinayak Gupta, Chirag Shah
Abstract: Conventional recommendation systems (RSs) are typically optimized to enhance performance metrics uniformly across all training samples, inadvertently overlooking the needs of diverse user populations. The performance disparity among various populations can harm the model's robustness to sub-populations due to the varying user properties. While large language models (LLMs) show promise in enhancing RS performance, their practical applicability is hindered by high costs, inference latency, and degraded performance on long user queries. To address these challenges, we propose a hybrid task allocation framework designed to promote social good by equitably serving all user groups. By adopting a two-phase approach, we promote a strategic assignment of tasks for efficient and responsible adaptation of LLMs. Our strategy works by first identifying the weak and inactive users that receive a suboptimal ranking performance by RSs. Next, we use an in-context learning approach for such users, wherein each user interaction history is contextualized as a distinct ranking task. We evaluate our hybrid framework by incorporating eight different recommendation algorithms and three different LLMs -- both open and close-sourced. Our results on three real-world datasets show a significant reduction in weak users and improved robustness to subpopulations without disproportionately escalating costs.
Authors: Srikar Yellapragada, Kowshik Thopalli, Vivek Narayanaswamy, Wesam Sakla, Yang Liu, Yamen Mubarka, Dimitris Samaras, Jayaraman J. Thiagarajan
Abstract: Vision Transformers (ViTs) have shown success across a variety of tasks due to their ability to capture global image representations. Recent studies have identified the existence of high-norm tokens in ViTs, which can interfere with unsupervised object discovery. To address this, the use of "registers" which are additional tokens that isolate high norm patch tokens while capturing global image-level information has been proposed. While registers have been studied extensively for object discovery, their generalization properties particularly in out-of-distribution (OOD) scenarios, remains underexplored. In this paper, we examine the utility of register token embeddings in providing additional features for improving generalization and anomaly rejection. To that end, we propose a simple method that combines the special CLS token embedding commonly employed in ViTs with the average-pooled register embeddings to create feature representations which are subsequently used for training a downstream classifier. We find that this enhances OOD generalization and anomaly rejection, while maintaining in-distribution (ID) performance. Extensive experiments across multiple ViT backbones trained with and without registers reveal consistent improvements of 2-4\% in top-1 OOD accuracy and a 2-3\% reduction in false positive rates for anomaly detection. Importantly, these gains are achieved without additional computational overhead.
Authors: Gianfranco Cortes, Baba C. Vemuri
Abstract: Nonrigid registration is vital to medical image analysis but remains challenging for diffusion MRI (dMRI) due to its high-dimensional, orientation-dependent nature. While classical methods are accurate, they are computationally demanding, and deep neural networks, though efficient, have been underexplored for nonrigid dMRI registration compared to structural imaging. We present a novel, deep learning framework for model-free, nonrigid registration of raw diffusion MRI data that does not require explicit reorientation. Unlike previous methods relying on derived representations such as diffusion tensors or fiber orientation distribution functions, in our approach, we formulate the registration as an equivariant diffeomorphism of position-and-orientation space. Central to our method is an $\mathsf{SE}(3)$-equivariant UNet that generates velocity fields while preserving the geometric properties of a raw dMRI's domain. We introduce a new loss function based on the maximum mean discrepancy in Fourier space, implicitly matching ensemble average propagators across images. Experimental results on Human Connectome Project dMRI data demonstrate competitive performance compared to state-of-the-art approaches, with the added advantage of bypassing the overhead for estimating derived representations. This work establishes a foundation for data-driven, geometry-aware dMRI registration directly in the acquisition space.
Authors: Md Saif Hassan Onim, Travis S. Humble, Himanshu Thapliyal
Abstract: Stress can increase the possibility of cognitive impairment and decrease the quality of life in older adults. Smart healthcare can deploy quantum machine learning to enable preventive and diagnostic support. This work introduces a unique technique to address stress detection as an anomaly detection problem that uses quantum hybrid support vector machines. With the help of a wearable smartwatch, we mapped baseline sensor reading as normal data and stressed sensor reading as anomaly data using cortisol concentration as the ground truth. We have used quantum computing techniques to explore the complex feature spaces with kernel-based preprocessing. We illustrate the usefulness of our method by doing experimental validation on 40 older adults with the help of the TSST protocol. Our findings highlight that using a limited number of features, quantum machine learning provides improved accuracy compared to classical methods. We also observed that the recall value using quantum machine learning is higher compared to the classical method. The higher recall value illustrates the potential of quantum machine learning in healthcare, as missing anomalies could result in delayed diagnostics or treatment.
Authors: J. Kvapil (Los Alamos National Laboratory), G. Borca-Tasciuc (Rensselaer Polytechnic Institute), H. Bossi (Massachusetts Institute of Technology), K. Chen (Central China Normal University), Y. Chen (Central China Normal University), Y. Corrales Morales (Massachusetts Institute of Technology), H. Da Costa (Los Alamos National Laboratory), C. Da Silva (Los Alamos National Laboratory), C. Dean (Massachusetts Institute of Technology), J. Durham (Los Alamos National Laboratory), S. Fu (University of North Texas), C. Hao (Georgia Institute of Technology), P. Harris (Massachusetts Institute of Technology), O. Hen (Massachusetts Institute of Technology), H. Jheng (Massachusetts Institute of Technology), Y. Lee (Massachusetts Institute of Technology), P. Li (Georgia Institute of Technology), X. Li (Los Alamos National Laboratory), Y. Lin (Los Alamos National Laboratory), M. X. Liu (Los Alamos National Laboratory), V. Loncar (Massachusetts Institute of Technology), J. P. Mitrevski (Fermilab), A. Olvera (University of North Texas), M. L. Purschke (Brookhaven National Laboratory), J. S. Renck (Los Alamos National Laboratory), G. Roland (Massachusetts Institute of Technology), J. Schambach (Oak Ridge National Laboratory), Z. Shi (Los Alamos National Laboratory), N. Tran (Fermilab), N. Wuerfel (University of Michigan), B. Xu (Georgia Institute of Technology), D. Yu (New Jersey Institute of Technology), H. Zhang (Georgia Institute of Technology)
Abstract: This R\&D project, initiated by the DOE Nuclear Physics AI-Machine Learning initiative in 2022, leverages AI to address data processing challenges in high-energy nuclear experiments (RHIC, LHC, and future EIC). Our focus is on developing a demonstrator for real-time processing of high-rate data streams from sPHENIX experiment tracking detectors. The limitations of a 15 kHz maximum trigger rate imposed by the calorimeters can be negated by intelligent use of streaming technology in the tracking system. The approach efficiently identifies low momentum rare heavy flavor events in high-rate p+p collisions (3MHz), using Graph Neural Network (GNN) and High Level Synthesis for Machine Learning (hls4ml). Success at sPHENIX promises immediate benefits, minimizing resources and accelerating the heavy-flavor measurements. The approach is transferable to other fields. For the EIC, we develop a DIS-electron tagger using Artificial Intelligence - Machine Learning (AI-ML) algorithms for real-time identification, showcasing the transformative potential of AI and FPGA technologies in high-energy nuclear and particle experiments real-time data processing pipelines.
Authors: Jinhang Chai, Elynn Chen, Jianqing Fan
Abstract: In dynamic decision-making scenarios across business and healthcare, leveraging sample trajectories from diverse populations can significantly enhance reinforcement learning (RL) performance for specific target populations, especially when sample sizes are limited. While existing transfer learning methods primarily focus on linear regression settings, they lack direct applicability to reinforcement learning algorithms. This paper pioneers the study of transfer learning for dynamic decision scenarios modeled by non-stationary finite-horizon Markov decision processes, utilizing neural networks as powerful function approximators and backward inductive learning. We demonstrate that naive sample pooling strategies, effective in regression settings, fail in Markov decision processes.To address this challenge, we introduce a novel ``re-weighted targeting procedure'' to construct ``transferable RL samples'' and propose ``transfer deep $Q^*$-learning'', enabling neural network approximation with theoretical guarantees. We assume that the reward functions are transferable and deal with both situations in which the transition densities are transferable or nontransferable. Our analytical techniques for transfer learning in neural network approximation and transition density transfers have broader implications, extending to supervised transfer learning with neural networks and domain shift scenarios. Empirical experiments on both synthetic and real datasets corroborate the advantages of our method, showcasing its potential for improving decision-making through strategically constructing transferable RL samples in non-stationary reinforcement learning contexts.
Authors: Kaitlyn J. Lee, Alejandro Schuler
Abstract: Answering causal questions often involves estimating linear functionals of conditional expectations, such as the average treatment effect or the effect of a longitudinal modified treatment policy. By the Riesz representation theorem, these functionals can be expressed as the expected product of the conditional expectation of the outcome and the Riesz representer, a key component in doubly robust estimation methods. Traditionally, the Riesz representer is estimated indirectly by deriving its explicit analytical form, estimating its components, and substituting these estimates into the known form (e.g., the inverse propensity score). However, deriving or estimating the analytical form can be challenging, and substitution methods are often sensitive to practical positivity violations, leading to higher variance and wider confidence intervals. In this paper, we propose a novel gradient boosting algorithm to directly estimate the Riesz representer without requiring its explicit analytical form. This method is particularly suited for tabular data, offering a flexible, nonparametric, and computationally efficient alternative to existing methods for Riesz regression. Through simulation studies, we demonstrate that our algorithm performs on par with or better than indirect estimation techniques across a range of functionals, providing a user-friendly and robust solution for estimating causal quantities.
Authors: Tommaso Soru, Jim Marshall
Abstract: In the constantly changing field of data-driven decision making, accurately predicting future events is crucial for strategic planning in various sectors. The emergence of Large Language Models (LLMs) marks a significant advancement in this area, offering advanced tools that utilise extensive text data for prediction. In this industry paper, we introduce a novel method for AI-driven foresight using LLMs. Building on top of previous research, we employ data on current trends and their trajectories for generating forecasts on 15 different topics. Subsequently, we estimate their probabilities via a multi-step approach based on log probabilities. We show we achieve a Brier score of 0.186, meaning a +26% improvement over random chance and a +19% improvement over widely-available AI systems.
Authors: Matteo Ravasi
Abstract: Solving inverse problems with the reverse process of a diffusion model represents an appealing avenue to produce highly realistic, yet diverse solutions from incomplete and possibly noisy measurements, ultimately enabling uncertainty quantification at scale. However, because of the intractable nature of the score function of the likelihood term (i.e., $\nabla_{\mathbf{x}_t} p(\mathbf{y} | \mathbf{x}_t)$), various samplers have been proposed in the literature that use different (more or less accurate) approximations of such a gradient to guide the diffusion process towards solutions that match the observations. In this work, I consider two sampling algorithms recently proposed under the name of Diffusion Posterior Sampling (DPS) and Pseudo-inverse Guided Diffusion Model (PGDM), respectively. In DSP, the guidance term used at each step of the reverse diffusion process is obtained by applying the adjoint of the modeling operator to the residual obtained from a one-step denoising estimate of the solution. On the other hand, PGDM utilizes a pseudo-inverse operator that originates from the fact that the one-step denoised solution is not assumed to be deterministic, rather modeled as a Gaussian distribution. Through an extensive set of numerical examples on two geophysical inverse problems (namely, seismic interpolation and seismic inversion), I show that two key aspects for the success of any measurement-guided diffusion process are: i) our ability to re-parametrize the inverse problem such that the sought after model is bounded between -1 and 1 (a pre-requisite for any diffusion model); ii) the choice of the training dataset used to learn the implicit prior that guides the reverse diffusion process. Numerical examples on synthetic and field datasets reveal that PGDM outperforms DPS in both scenarios at limited additional cost.
Authors: Yuan Gao, Mu Qiao
Abstract: The growth in the use of online advertising to foster brand awareness over recent years is largely attributable to the ubiquity of social media. One pivotal technology contributing to the success of online brand advertising is frequency capping, a mechanism that enables marketers to control the number of times an ad is shown to a specific user. However, the very foundation of this technology is being scrutinized as the industry gravitates towards advertising solutions that prioritize user privacy. This paper delves into the issue of reach measurement and optimization within the context of $k$-anonymity, a privacy-preserving model gaining traction across major online advertising platforms. We outline how to report reach within this new privacy landscape and demonstrate how probabilistic discounting, a probabilistic adaptation of traditional frequency capping, can be employed to optimize campaign performance. Experiments are performed to assess the trade-off between user privacy and the efficacy of online brand advertising. Notably, we discern a significant dip in performance as long as privacy is introduced, yet this comes with a limited additional cost for advertising platforms to offer their users more privacy.
Authors: Juno Kim, Dimitri Meunier, Arthur Gretton, Taiji Suzuki, Zhu Li
Abstract: We provide a convergence analysis of deep feature instrumental variable (DFIV) regression (Xu et al., 2021), a nonparametric approach to IV regression using data-adaptive features learned by deep neural networks in two stages. We prove that the DFIV algorithm achieves the minimax optimal learning rate when the target structural function lies in a Besov space. This is shown under standard nonparametric IV assumptions, and an additional smoothness assumption on the regularity of the conditional distribution of the covariate given the instrument, which controls the difficulty of Stage 1. We further demonstrate that DFIV, as a data-adaptive algorithm, is superior to fixed-feature (kernel or sieve) IV methods in two ways. First, when the target function possesses low spatial homogeneity (i.e., it has both smooth and spiky/discontinuous regions), DFIV still achieves the optimal rate, while fixed-feature methods are shown to be strictly suboptimal. Second, comparing with kernel-based two-stage regression estimators, DFIV is provably more data efficient in the Stage 1 samples.
Authors: Nathan Phelps, Daniel J. Lizotte, Douglas G. Woolford
Abstract: There is a widespread and longstanding belief that machine learning models are biased towards the majority (or negative) class when learning from imbalanced data, leading them to neglect or ignore the minority (or positive) class. In this study, we show that this belief is not necessarily correct for decision trees, and that their bias can actually be in the opposite direction. Motivated by a recent simulation study that suggested that decision trees can be biased towards the minority class, our paper aims to reconcile the conflict between that study and decades of other works. First, we critically evaluate past literature on this problem, finding that failing to consider the data generating process has led to incorrect conclusions about the bias in decision trees. We then prove that, under specific conditions related to the predictors, decision trees fit to purity and trained on a dataset with only one positive case are biased towards the minority class. Finally, we demonstrate that splits in a decision tree are also biased when there is more than one positive case. Our findings have implications on the use of popular tree-based models, such as random forests.
Authors: Golriz Hosseinimanesh, Farnoosh Ghadiri, Francois Guibault, Farida Cheriet, Julia Keren
Abstract: Designing a dental crown is a time-consuming and labor intensive process. Our goal is to simplify crown design and minimize the tediousness of making manual adjustments while still ensuring the highest level of accuracy and consistency. To this end, we present a new end- to-end deep learning approach, coined Dental Mesh Completion (DMC), to generate a crown mesh conditioned on a point cloud context. The dental context includes the tooth prepared to receive a crown and its surroundings, namely the two adjacent teeth and the three closest teeth in the opposing jaw. We formulate crown generation in terms of completing this point cloud context. A feature extractor first converts the input point cloud into a set of feature vectors that represent local regions in the point cloud. The set of feature vectors is then fed into a transformer to predict a new set of feature vectors for the missing region (crown). Subsequently, a point reconstruction head, followed by a multi-layer perceptron, is used to predict a dense set of points with normals. Finally, a differentiable point-to-mesh layer serves to reconstruct the crown surface mesh. We compare our DMC method to a graph-based convolutional neural network which learns to deform a crown mesh from a generic crown shape to the target geometry. Extensive experiments on our dataset demonstrate the effectiveness of our method, which attains an average of 0.062 Chamfer Distance.The code is available at:https://github.com/Golriz-code/DMC.gi
Authors: Yijun Zuo
Abstract: The least trimmed squares (LTS) estimator is a renowned robust alternative to the classic least squares estimator and is popular in location, regression, machine learning, and AI literature. Many studies exist on LTS, including its robustness, computation algorithms, extension to non-linear cases, asymptotics, etc. The LTS has been applied in the penalized regression in a high-dimensional real-data sparse-model setting where dimension $p$ (in thousands) is much larger than sample size $n$ (in tens, or hundreds). In such a practical setting, the sample size $n$ often is the count of sub-population that has a special attribute (e.g. the count of patients of Alzheimer's, Parkinson's, Leukemia, or ALS, etc.) among a population with a finite fixed size N. Asymptotic analysis assuming that $n$ tends to infinity is not practically convincing and legitimate in such a scenario. A non-asymptotic or finite sample analysis will be more desirable and feasible. This article establishes some finite sample (non-asymptotic) error bounds for estimating and predicting based on LTS with high probability for the first time.
Authors: Zixuan Ke, Yifei Ming, Xuan-Phi Nguyen, Caiming Xiong, Shafiq Joty
Abstract: Domain-adaptive post-training of large language models (LLMs) has emerged as a promising approach for specialized domains such as medicine and finance. However, significant challenges remain in identifying optimal adaptation criteria and training strategies across varying data and model configurations. To address these challenges, we introduce FINDAP, a systematic and fine-grained investigation into domain-adaptive post-training of LLMs for the finance domain. Our approach begins by identifying the core capabilities required for the target domain and designing a comprehensive evaluation suite aligned with these needs. We then analyze the effectiveness of key post-training stages, including continual pretraining, instruction tuning, and preference alignment. Building on these insights, we propose an effective training recipe centered on a novel preference data distillation method, which leverages process signals from a generative reward model. The resulting model, Llama-Fin, achieves state-of-the-art performance across a wide range of financial tasks. Our analysis also highlights how each post-training stage contributes to distinct capabilities, uncovering specific challenges and effective solutions, providing valuable insights for domain adaptation of LLMs. Project page: https://github.com/SalesforceAIResearch/FinDap
Authors: Corentin Delacour
Abstract: Ising machines (IM) are physics-inspired alternatives to von Neumann architectures for solving hard optimization tasks. By mapping binary variables to coupled Ising spins, IMs can naturally solve unconstrained combinatorial optimization problems such as finding maximum cuts in graphs. However, despite their importance in practical applications, constrained problems remain challenging to solve for IMs that require large quadratic energy penalties to ensure the correspondence between energy ground states and constrained optimal solutions. To relax this requirement, we propose a self-adaptive IM that iteratively shapes its energy landscape using a Lagrange relaxation of constraints and avoids prior tuning of penalties. Using a probabilistic-bit (p-bit) IM emulated in software, we benchmark our algorithm with multidimensional knapsack problems (MKP) and quadratic knapsack problems (QKP), the latter being an Ising problem with linear constraints. For QKP with 300 variables, the proposed algorithm finds better solutions than state-of-the-art IMs such as Fujitsu's Digital Annealer and requires 7,500x fewer samples. Our results show that adapting the energy landscape during the search can speed up IMs for constrained optimization.
Authors: Bhargava Uppuluri, Anjel Patel, Neil Mehta, Sridhar Kamath, Pratyush Chakraborty
Abstract: In autonomous driving, traditional Computer Vision (CV) agents often struggle in unfamiliar situations due to biases in the training data. Deep Reinforcement Learning (DRL) agents address this by learning from experience and maximizing rewards, which helps them adapt to dynamic environments. However, ensuring their generalization remains challenging, especially with static training environments. Additionally, DRL models lack transparency, making it difficult to guarantee safety in all scenarios, particularly those not seen during training. To tackle these issues, we propose a method that combines DRL with Curriculum Learning for autonomous driving. Our approach uses a Proximal Policy Optimization (PPO) agent and a Variational Autoencoder (VAE) to learn safe driving in the CARLA simulator. The agent is trained using two-fold curriculum learning, progressively increasing environment difficulty and incorporating a collision penalty in the reward function to promote safety. This method improves the agent's adaptability and reliability in complex environments, and understand the nuances of balancing multiple reward components from different feedback signals in a single scalar reward function. Keywords: Computer Vision, Deep Reinforcement Learning, Variational Autoencoder, Proximal Policy Optimization, Curriculum Learning, Autonomous Driving.
Authors: Yota Maeda, Ken Arai, Yu Tanaka, Yu Terada, Hiroshi Ueno, Hiroyuki Tezuka
Abstract: The discovery of causal relationships from observed data has attracted significant interest from disciplines such as economics, social sciences, epidemiology, and biology. In practical applications, considerable knowledge of the underlying systems is often unavailable, and real data are often associated with nonlinear causal structures, which make the direct use of most conventional causality analysis methods difficult. This study proposes a novel quantum Peter-Clark (qPC) algorithm for causal discovery that does not assume any underlying model structures. Based on the independence conditional tests in a class of reproducing kernel Hilbert spaces characterized by quantum circuits, the proposed qPC algorithm can explore causal relationships from the observed data drawn from arbitrary distributions. We conducted systematic experiments on fundamental graph parts of causal structures, demonstrating that the qPC algorithm exhibits a significantly better performance, particularly with smaller sample sizes compared to its classical counterpart. Furthermore, we proposed a novel optimization approach based on Kernel Target Alignment (KTA) for determining hyperparameters of quantum kernels. This method effectively reduced the risk of false positives in causal discovery, enabling more reliable inference. Our theoretical and experimental results demonstrate that the proposed quantum algorithm can empower classical algorithms for robust and accurate inference in causal discovery, supporting them in regimes where classical algorithms typically fail. Additionally, the effectiveness of this method was validated using the Boston Housing dataset as a real-world application. These findings demonstrate the new potential of quantum circuit-based causal discovery methods in addressing practical challenges, particularly in small-sample scenarios where traditional approaches have shown limitations.
Authors: Oleg Sautenkov, Yasheerah Yaqoot, Artem Lykov, Muhammad Ahsan Mustafa, Grik Tadevosyan, Aibek Akhmetkazy, Miguel Altamirano Cabrera, Mikhail Martynov, Sausar Karaf, Dzmitry Tsetserukou
Abstract: The UAV-VLA (Visual-Language-Action) system is a tool designed to facilitate communication with aerial robots. By integrating satellite imagery processing with the Visual Language Model (VLM) and the powerful capabilities of GPT, UAV-VLA enables users to generate general flight paths-and-action plans through simple text requests. This system leverages the rich contextual information provided by satellite images, allowing for enhanced decision-making and mission planning. The combination of visual analysis by VLM and natural language processing by GPT can provide the user with the path-and-action set, making aerial operations more efficient and accessible. The newly developed method showed the difference in the length of the created trajectory in 22% and the mean error in finding the objects of interest on a map in 34.22 m by Euclidean distance in the K-Nearest Neighbors (KNN) approach.
Authors: Ronghao Dang, Yuqian Yuan, Wenqi Zhang, Yifei Xin, Boqiang Zhang, Long Li, Liuyi Wang, Qinyang Zeng, Xin Li, Lidong Bing
Abstract: The enhancement of generalization in robots by large vision-language models (LVLMs) is increasingly evident. Therefore, the embodied cognitive abilities of LVLMs based on egocentric videos are of great interest. However, current datasets for embodied video question answering lack comprehensive and systematic evaluation frameworks. Critical embodied cognitive issues, such as robotic self-cognition, dynamic scene perception, and hallucination, are rarely addressed. To tackle these challenges, we propose ECBench, a high-quality benchmark designed to systematically evaluate the embodied cognitive abilities of LVLMs. ECBench features a diverse range of scene video sources, open and varied question formats, and 30 dimensions of embodied cognition. To ensure quality, balance, and high visual dependence, ECBench uses class-independent meticulous human annotation and multi-round question screening strategies. Additionally, we introduce ECEval, a comprehensive evaluation system that ensures the fairness and rationality of the indicators. Utilizing ECBench, we conduct extensive evaluations of proprietary, open-source, and task-specific LVLMs. ECBench is pivotal in advancing the embodied cognitive capabilities of LVLMs, laying a solid foundation for developing reliable core models for embodied agents. All data and code are available at https://github.com/Rh-Dang/ECBench.
Authors: Laurenz Ruzicka, Alexander Spenke, Stephan Bergmann, Gerd Nolden, Bernhard Kohn, Clemens Heitzinger
Abstract: Fingerprint mosaicking, which is the process of combining multiple fingerprint images into a single master fingerprint, is an essential process in modern biometric systems. However, it is prone to errors that can significantly degrade fingerprint image quality. This paper proposes a novel deep learning-based approach to detect and score mosaicking artifacts in fingerprint images. Our method leverages a self-supervised learning framework to train a model on large-scale unlabeled fingerprint data, eliminating the need for manual artifact annotation. The proposed model effectively identifies mosaicking errors, achieving high accuracy on various fingerprint modalities, including contactless, rolled, and pressed fingerprints and furthermore proves to be robust to different data sources. Additionally, we introduce a novel mosaicking artifact score to quantify the severity of errors, enabling automated evaluation of fingerprint images. By addressing the challenges of mosaicking artifact detection, our work contributes to improving the accuracy and reliability of fingerprint-based biometric systems.
Authors: Rujie Wu, Xiaojian Ma, Hai Ci, Yue Fan, Yuxuan Wang, Haozhe Zhao, Qing Li, Yizhou Wang
Abstract: This paper introduce LongViTU, a large-scale (~121k QA pairs, ~900h videos), automatically generated dataset for long-form video understanding. We developed a systematic approach that organizes videos into a hierarchical tree structure and incorporates self-revision mechanisms to ensure high-quality QA pairs. Each QA pair in LongViTU features: 1) long-term context (average certificate length of 4.6 minutes); 2) rich knowledge and condensed reasoning (commonsense, causality, planning, etc.); and 3) explicit timestamp labels for relevant events. LongViTU also serves as a benchmark for instruction following in long-form and streaming video understanding. We evaluate the open-source state-of-the-art long video understanding model, LongVU, and the commercial model, Gemini-1.5-Pro, on our benchmark. They achieve GPT-4 scores of 49.9 and 52.3, respectively, underscoring the substantial challenge posed by our benchmark. Further supervised fine-tuning (SFT) on LongVU led to performance improvements of 12.0% on our benchmark, 2.2% on the in-distribution (ID) benchmark EgoSchema, 1.0%, 2.2% and 1.2% on the out-of-distribution (OOD) benchmarks VideoMME (Long), WorldQA and OpenEQA, respectively. These outcomes demonstrate LongViTU's high data quality and robust OOD generalizability.
Authors: Zengqi Peng, Yubin Wang, Xu Han, Lei Zheng, Jun Ma
Abstract: Recent advancements in reinforcement learning (RL) demonstrate the significant potential in autonomous driving. Despite this promise, challenges such as the manual design of reward functions and low sample efficiency in complex environments continue to impede the development of safe and effective driving policies. To tackle these issues, we introduce LearningFlow, an innovative automated policy learning workflow tailored to urban driving. This framework leverages the collaboration of multiple large language model (LLM) agents throughout the RL training process. LearningFlow includes a curriculum sequence generation process and a reward generation process, which work in tandem to guide the RL policy by generating tailored training curricula and reward functions. Particularly, each process is supported by an analysis agent that evaluates training progress and provides critical insights to the generation agent. Through the collaborative efforts of these LLM agents, LearningFlow automates policy learning across a series of complex driving tasks, and it significantly reduces the reliance on manual reward function design while enhancing sample efficiency. Comprehensive experiments are conducted in the high-fidelity CARLA simulator, along with comparisons with other existing methods, to demonstrate the efficacy of our proposed approach. The results demonstrate that LearningFlow excels in generating rewards and curricula. It also achieves superior performance and robust generalization across various driving tasks, as well as commendable adaptation to different RL algorithms.
Authors: Leonard Lupin-Jimenez, Moein Darman, Subhashis Hazarika, Tianning Wu, Michael Gray, Ruyoing He, Anthony Wong, Ashesh Chattopadhyay
Abstract: Building on top of the success in AI-based atmospheric emulation, we propose an AI-based ocean emulation and downscaling framework focusing on the high-resolution regional ocean over Gulf of Mexico. Regional ocean emulation presents unique challenges owing to the complex bathymetry and lateral boundary conditions as well as from fundamental biases in deep learning-based frameworks, such as instability and hallucinations. In this paper, we develop a deep learning-based framework to autoregressively integrate ocean-surface variables over the Gulf of Mexico at $8$ Km spatial resolution without unphysical drifts over decadal time scales and simulataneously downscale and bias-correct it to $4$ Km resolution using a physics-constrained generative model. The framework shows both short-term skills as well as accurate long-term statistics in terms of mean and variability.
Authors: Hounsu Kim, Taegyun Kwon, Juhan Nam
Abstract: Diffusion models have been widely used in the generative domain due to their convincing performance in modeling complex data distributions. Moreover, they have shown competitive results on discriminative tasks, such as image segmentation. While diffusion models have also been explored for automatic music transcription, their performance has yet to reach a competitive level. In this paper, we focus on discrete diffusion model's refinement capabilities and present a novel architecture for piano transcription. Our model utilizes Neighborhood Attention layers as the denoising module, gradually predicting the target high-resolution piano roll, conditioned on the finetuned features of a pretrained acoustic model. To further enhance refinement, we devise a novel strategy which applies distinct transition states during training and inference stage of discrete diffusion models. Experiments on the MAESTRO dataset show that our approach outperforms previous diffusion-based piano transcription models and the baseline model in terms of F1 score. Our code is available in https://github.com/hanshounsu/d3rm.
Authors: Shuo Tong, Han Liu, Runyuan Guo, Xueqiong Tian, Wenqing Wang, Ding Liu, Youmin Zhang
Abstract: Data-driven soft sensors (DDSS) have become mainstream methods for predicting key performance indicators in process industries. However, DDSS development requires complex and costly customized designs tailored to various tasks during the modeling process. Moreover, DDSS are constrained to a single structured data modality, limiting their ability to incorporate additional contextual knowledge. Furthermore, DDSSs' limited representation learning leads to weak predictive performance with scarce data. To address these challenges, we propose a general framework named LLM-TKESS (large language model for text-based knowledge-embedded soft sensing), harnessing the powerful general problem-solving capabilities, cross-modal knowledge transfer abilities, and few-shot capabilities of LLM for enhanced soft sensing modeling. Specifically, an auxiliary variable series encoder (AVS Encoder) is proposed to unleash LLM's potential for capturing temporal relationships within series and spatial semantic relationships among auxiliary variables. Then, we propose a two-stage fine-tuning alignment strategy: in the first stage, employing parameter-efficient fine-tuning through autoregressive training adjusts LLM to rapidly accommodate process variable data, resulting in a soft sensing foundation model (SSFM). Subsequently, by training adapters, we adapt the SSFM to various downstream tasks without modifying its architecture. Then, we propose two text-based knowledge-embedded soft sensors, integrating new natural language modalities to overcome the limitations of pure structured data models. Furthermore, benefiting from LLM's pre-existing world knowledge, our model demonstrates outstanding predictive capabilities in small sample conditions. Using the thermal deformation of air preheater rotor as a case study, we validate through extensive experiments that LLM-TKESS exhibits outstanding performance.
Authors: Laurenz Ruzicka, Bernhard Kohn, Clemens Heitzinger
Abstract: Contactless fingerprint recognition systems offer a hygienic, user-friendly, and efficient alternative to traditional contact-based methods. However, their accuracy heavily relies on precise fingertip detection and segmentation, particularly under challenging background conditions. This paper introduces TipSegNet, a novel deep learning model that achieves state-of-the-art performance in segmenting fingertips directly from grayscale hand images. TipSegNet leverages a ResNeXt-101 backbone for robust feature extraction, combined with a Feature Pyramid Network (FPN) for multi-scale representation, enabling accurate segmentation across varying finger poses and image qualities. Furthermore, we employ an extensive data augmentation strategy to enhance the model's generalizability and robustness. TipSegNet outperforms existing methods, achieving a mean Intersection over Union (mIoU) of 0.987 and an accuracy of 0.999, representing a significant advancement in contactless fingerprint segmentation. This enhanced accuracy has the potential to substantially improve the reliability and effectiveness of contactless biometric systems in real-world applications.
Authors: Zeyd Boukhers, Cong Yang
Abstract: The availability of metadata for scientific documents is pivotal in propelling scientific knowledge forward and for adhering to the FAIR principles (i.e. Findability, Accessibility, Interoperability, and Reusability) of research findings. However, the lack of sufficient metadata in published documents, particularly those from smaller and mid-sized publishers, hinders their accessibility. This issue is widespread in some disciplines, such as the German Social Sciences, where publications often employ diverse templates. To address this challenge, our study evaluates various feature learning and prediction methods, including natural language processing (NLP), computer vision (CV), and multimodal approaches, for extracting metadata from documents with high template variance. We aim to improve the accessibility of scientific documents and facilitate their wider use. To support our comparison of these methods, we provide comprehensive experimental results, analyzing their accuracy and efficiency in extracting metadata. Additionally, we provide valuable insights into the strengths and weaknesses of various feature learning and prediction methods, which can guide future research in this field.
Authors: Yoseob Han, Dufan Wu, Kyungsang Kim, Quanzheng Li
Abstract: Objective: There exist several X-ray computed tomography (CT) scanning strategies to reduce a radiation dose, such as (1) sparse-view CT, (2) low-dose CT, and (3) region-of-interest (ROI) CT (called interior tomography). To further reduce the dose, the sparse-view and/or low-dose CT settings can be applied together with interior tomography. Interior tomography has various advantages in terms of reducing the number of detectors and decreasing the X-ray radiation dose. However, a large patient or small field-of-view (FOV) detector can cause truncated projections, and then the reconstructed images suffer from severe cupping artifacts. In addition, although the low-dose CT can reduce the radiation exposure dose, analytic reconstruction algorithms produce image noise. Recently, many researchers have utilized image-domain deep learning (DL) approaches to remove each artifact and demonstrated impressive performances, and the theory of deep convolutional framelets supports the reason for the performance improvement. Approach: In this paper, we found that the image-domain convolutional neural network (CNN) is difficult to solve coupled artifacts, based on deep convolutional framelets. Significance: To address the coupled problem, we decouple it into two sub-problems: (i) image domain noise reduction inside truncated projection to solve low-dose CT problem and (ii) extrapolation of projection outside truncated projection to solve the ROI CT problem. The decoupled sub-problems are solved directly with a novel proposed end-to-end learning using dual-domain CNNs. Main results: We demonstrate that the proposed method outperforms the conventional image-domain deep learning methods, and a projection-domain CNN shows better performance than the image-domain CNNs which are commonly used by many researchers.
Authors: David J Poland
Abstract: This paper presents a novel predictive maintenance framework centered on Enhanced Quantile Regression Neural Networks EQRNNs, for anticipating system failures in industrial robotics. We address the challenge of early failure detection through a hybrid approach that combines advanced neural architectures. The system leverages dual computational stages: first implementing an EQRNN optimized for processing multi-sensor data streams including vibration, thermal, and power signatures, followed by an integrated Spiking Neural Network SNN, layer that enables microsecond-level response times. This architecture achieves notable accuracy rates of 92.3\% in component failure prediction with a 90-hour advance warning window. Field testing conducted on an industrial scale with 50 robotic systems demonstrates significant operational improvements, yielding a 94\% decrease in unexpected system failures and 76\% reduction in maintenance-related downtimes. The framework's effectiveness in processing complex, multi-modal sensor data while maintaining computational efficiency validates its applicability for Industry 4.0 manufacturing environments.
Authors: Ver\'onica \'Alvarez, Santiago Mazuelas, Jose A. Lozano
Abstract: Multiple supervised learning scenarios are composed by a sequence of classification tasks. For instance, multi-task learning and continual learning aim to learn a sequence of tasks that is either fixed or grows over time. Existing techniques for learning tasks that are in a sequence are tailored to specific scenarios, lacking adaptability to others. In addition, most of existing techniques consider situations in which the order of the tasks in the sequence is not relevant. However, it is common that tasks in a sequence are evolving in the sense that consecutive tasks often have a higher similarity. This paper presents a learning methodology that is applicable to multiple supervised learning scenarios and adapts to evolving tasks. Differently from existing techniques, we provide computable tight performance guarantees and analytically characterize the increase in the effective sample size. Experiments on benchmark datasets show the performance improvement of the proposed methodology in multiple scenarios and the reliability of the presented performance guarantees.
Authors: Van Thien Nguyen, William Guicquero, Gilles Sicard
Abstract: Even if Application-Specific Integrated Circuits (ASIC) have proven to be a relevant choice for integrating inference at the edge, they are often limited in terms of applicability. In this paper, we demonstrate that an ASIC neural network accelerator dedicated to image processing can be applied to multiple tasks of different levels: image classification and compression, while requiring a very limited hardware. The key component is a reconfigurable, mixed-precision (3b/2b/1b) encoder that takes advantage of proper weight and activation quantizations combined with convolutional layer structural pruning to lower hardware-related constraints (memory and computing). We introduce an automatic adaptation of linear symmetric quantizer scaling factors to perform quantized levels equalization, aiming at stabilizing quinary and ternary weights training. In addition, a proposed layer-shared Bit-Shift Normalization significantly simplifies the implementation of the hardware-expensive Batch Normalization. For a specific configuration in which the encoder design only requires 1Mb, the classification accuracy reaches 87.5% on CIFAR-10. Besides, we also show that this quantized encoder can be used to compress image patch-by-patch while the reconstruction can performed remotely, by a dedicated full-frame decoder. This solution typically enables an end-to-end compression almost without any block artifacts, outperforming patch-based state-of-the-art techniques employing a patch-constant bitrate.
Authors: Richard Schwank, Andrew McCormack, Mathias Drton
Abstract: Proposed in Hyv\"arinen (2005), score matching is a parameter estimation procedure that does not require computation of distributional normalizing constants. In this work we utilize the geometric median of means to develop a robust score matching procedure that yields consistent parameter estimates in settings where the observed data has been contaminated. A special appeal of the proposed method is that it retains convexity in exponential family models. The new method is therefore particularly attractive for non-Gaussian, exponential family graphical models where evaluation of normalizing constants is intractable. Support recovery guarantees for such models when contamination is present are provided. Additionally, support recovery is studied in numerical experiments and on a precipitation dataset. We demonstrate that the proposed robust score matching estimator performs comparably to the standard score matching estimator when no contamination is present but greatly outperforms this estimator in a setting with contamination.
Authors: Tobias Kortus (for the Bergen pCT Collaboration), Ralf Keidel (for the Bergen pCT Collaboration), Nicolas R. Gauger (for the Bergen pCT Collaboration), Jan Kieseler (for the Bergen pCT Collaboration)
Abstract: Reinforcement learning demonstrated immense success in modelling complex physics-driven systems, providing end-to-end trainable solutions by interacting with a simulated or real environment, maximizing a scalar reward signal. In this work, we propose, building upon previous work, a multi-agent reinforcement learning approach with assignment constraints for reconstructing particle tracks in pixelated particle detectors. Our approach optimizes collaboratively a parametrized policy, functioning as a heuristic to a multidimensional assignment problem, by jointly minimizing the total amount of particle scattering over the reconstructed tracks in a readout frame. To satisfy constraints, guaranteeing a unique assignment of particle hits, we propose a safety layer solving a linear assignment problem for every joint action. Further, to enforce cost margins, increasing the distance of the local policies predictions to the decision boundaries of the optimizer mappings, we recommend the use of an additional component in the blackbox gradient estimation, forcing the policy to solutions with lower total assignment costs. We empirically show on simulated data, generated for a particle detector developed for proton imaging, the effectiveness of our approach, compared to multiple single- and multi-agent baselines. We further demonstrate the effectiveness of constraints with cost margins for both optimization and generalization, introduced by wider regions with high reconstruction performance as well as reduced predictive instabilities. Our results form the basis for further developments in RL-based tracking, offering both enhanced performance with constrained policies and greater flexibility in optimizing tracking algorithms through the option for individual and team rewards.
Authors: Robin Burke, Gediminas Adomavicius, Toine Bogers, Tommaso Di Noia, Dominik Kowald, Julia Neidhardt, \"Ozlem \"Ozg\"obek, Maria Soledad Pera, Nava Tintarev, J\"urgen Ziegler
Abstract: Multistakeholder recommender systems are those that account for the impacts and preferences of multiple groups of individuals, not just the end users receiving recommendations. Due to their complexity, evaluating these systems cannot be restricted to the overall utility of a single stakeholder, as is often the case of more mainstream recommender system applications. In this article, we focus our discussion on the intricacies of the evaluation of multistakeholder recommender systems. We bring attention to the different aspects involved in the evaluation of multistakeholder recommender systems - from the range of stakeholders involved (including but not limited to producers and consumers) to the values and specific goals of each relevant stakeholder. Additionally, we discuss how to move from theoretical principles to practical implementation, providing specific use case examples. Finally, we outline open research directions for the RecSys community to explore. We aim to provide guidance to researchers and practitioners about how to think about these complex and domain-dependent issues of evaluation in the course of designing, developing, and researching applications with multistakeholder aspects.
Authors: Yuxuan Li, Cheng Zhang, Wen Wang, Yongming Huang
Abstract: Radio map, or pathloss map prediction, is a crucial method for wireless network modeling and management. By leveraging deep learning to construct pathloss patterns from geographical maps, an accurate digital replica of the transmission environment could be established with less computational overhead and lower prediction error compared to traditional model-driven techniques. While existing state-of-the-art (SOTA) methods predominantly rely on convolutional architectures, this paper introduces a hybrid transformer-convolution model, termed RadioTransformer, to enhance the accuracy of radio map prediction. The proposed model features a multi-scale transformer-based encoder for efficient feature extraction and a convolution-based decoder for precise pixel-level image reconstruction. Simulation results demonstrate that the proposed scheme significantly improves prediction accuracy, and over a 30% reduction in root mean square error (RMSE) is achieved compared to typical SOTA approaches.
Authors: Ruben Grandia, Espen Knoop, Michael A. Hopkins, Georg Wiedebach, Jared Bishop, Steven Pickles, David M\"uller, Moritz B\"acher
Abstract: Legged robots have achieved impressive feats in dynamic locomotion in challenging unstructured terrain. However, in entertainment applications, the design and control of these robots face additional challenges in appealing to human audiences. This work aims to unify expressive, artist-directed motions and robust dynamic mobility for legged robots. To this end, we introduce a new bipedal robot, designed with a focus on character-driven mechanical features. We present a reinforcement learning-based control architecture to robustly execute artistic motions conditioned on command signals. During runtime, these command signals are generated by an animation engine which composes and blends between multiple animation sources. Finally, an intuitive operator interface enables real-time show performances with the robot. The complete system results in a believable robotic character, and paves the way for enhanced human-robot engagement in various contexts, in entertainment robotics and beyond.
Authors: Shoucheng Song, Youfang Lin, Sheng Han, Chang Yao, Hao Wu, Shuo Wang, Kai Lv
Abstract: Communication has been widely employed to enhance multi-agent collaboration. Previous research has typically assumed delay-free communication, a strong assumption that is challenging to meet in practice. However, real-world agents suffer from channel delays, receiving messages sent at different time points, termed {\it{Asynchronous Communication}}, leading to cognitive biases and breakdowns in collaboration. This paper first defines two communication delay settings in MARL and emphasizes their harm to collaboration. To handle the above delays, this paper proposes a novel framework, Communication Delay-tolerant Multi-Agent Collaboration (CoDe). At first, CoDe learns an intent representation as messages through future action inference, reflecting the stable future behavioral trends of the agents. Then, CoDe devises a dual alignment mechanism of intent and timeliness to strengthen the fusion process of asynchronous messages. In this way, agents can extract the long-term intent of others, even from delayed messages, and selectively utilize the most recent messages that are relevant to their intent. Experimental results demonstrate that CoDe outperforms baseline algorithms in three MARL benchmarks without delay and exhibits robustness under fixed and time-varying delays.
Authors: Tianle Tao, Shizhao Peng, Tianyu Mei, Shoumo Li, Haogang Zhu
Abstract: Accurate nonlinear computation is a key challenge in privacy-preserving machine learning (PPML). Most existing frameworks approximate it through linear operations, resulting in significant precision loss. This paper proposes an efficient, verifiable and accurate security 2-party logistic regression framework (EVA-S2PLoR), which achieves accurate nonlinear function computation through a novel secure element-wise multiplication protocol and its derived protocols. Our framework primarily includes secure 2-party vector element-wise multiplication, addition to multiplication, reciprocal, and sigmoid function based on data disguising technology, where high efficiency and accuracy are guaranteed by the simple computation flow based on the real number domain and the few number of fixed communication rounds. We provide secure and robust anomaly detection through dimension transformation and Monte Carlo methods. EVA-S2PLoR outperforms many advanced frameworks in terms of precision (improving the performance of the sigmoid function by about 10 orders of magnitude compared to most frameworks) and delivers the best overall performance in secure logistic regression experiments.
Authors: Ludwic Leonard, Nils Thuerey, Ruediger Westermann
Abstract: We introduce a single-view reconstruction technique of volumetric fields in which multiple light scattering effects are omnipresent, such as in clouds. We model the unknown distribution of volumetric fields using an unconditional diffusion model trained on a novel benchmark dataset comprising 1,000 synthetically simulated volumetric density fields. The neural diffusion model is trained on the latent codes of a novel, diffusion-friendly, monoplanar representation. The generative model is used to incorporate a tailored parametric diffusion posterior sampling technique into different reconstruction tasks. A physically-based differentiable volume renderer is employed to provide gradients with respect to light transport in the latent space. This stands in contrast to classic NeRF approaches and makes the reconstructions better aligned with observed data. Through various experiments, we demonstrate single-view reconstruction of volumetric clouds at a previously unattainable quality.
Authors: Artem Fedorchenko, Tanel Alum\"ae
Abstract: This paper presents an approach for generating high-quality, same-language subtitles for Estonian TV content. We fine-tune the Whisper model on human-generated Estonian subtitles and enhance it with iterative pseudo-labeling and large language model (LLM) based post-editing. Our experiments demonstrate notable subtitle quality improvement through pseudo-labeling with an unlabeled dataset. We find that applying LLM-based editing at test time enhances subtitle accuracy, while its use during training does not yield further gains. This approach holds promise for creating subtitle quality close to human standard and could be extended to real-time applications.
Authors: Atharva Mutsaddi, Aditya Choudhary
Abstract: Plagiarism involves using another person's work or concepts without proper attribution, presenting them as original creations. With the growing amount of data communicated in regional languages such as Marathi -- one of India's regional languages -- it is crucial to design robust plagiarism detection systems tailored for low-resource languages. Language models like Bidirectional Encoder Representations from Transformers (BERT) have demonstrated exceptional capability in text representation and feature extraction, making them essential tools for semantic analysis and plagiarism detection. However, the application of BERT for low-resource languages remains under-explored, particularly in the context of plagiarism detection. This paper presents a method to enhance the accuracy of plagiarism detection for Marathi texts using BERT sentence embeddings in conjunction with Term Frequency-Inverse Document Frequency (TF-IDF) feature representation. This approach effectively captures statistical, semantic, and syntactic aspects of text features through a weighted voting ensemble of machine learning models.
Authors: Fabian H\"orst, Moritz Rempe, Helmut Becker, Lukas Heine, Julius Keyl, Jens Kleesiek
Abstract: Digital Pathology is a cornerstone in the diagnosis and treatment of diseases. A key task in this field is the identification and segmentation of cells in hematoxylin and eosin-stained images. Existing methods for cell segmentation often require extensive annotated datasets for training and are limited to a predefined cell classification scheme. To overcome these limitations, we propose $\text{CellViT}^{{\scriptscriptstyle ++}}$, a framework for generalized cell segmentation in digital pathology. $\text{CellViT}^{{\scriptscriptstyle ++}}$ utilizes Vision Transformers with foundation models as encoders to compute deep cell features and segmentation masks simultaneously. To adapt to unseen cell types, we rely on a computationally efficient approach. It requires minimal data for training and leads to a drastically reduced carbon footprint. We demonstrate excellent performance on seven different datasets, covering a broad spectrum of cell types, organs, and clinical settings. The framework achieves remarkable zero-shot segmentation and data-efficient cell-type classification. Furthermore, we show that $\text{CellViT}^{{\scriptscriptstyle ++}}$ can leverage immunofluorescence stainings to generate training datasets without the need for pathologist annotations. The automated dataset generation approach surpasses the performance of networks trained on manually labeled data, demonstrating its effectiveness in creating high-quality training datasets without expert annotations. To advance digital pathology, $\text{CellViT}^{{\scriptscriptstyle ++}}$ is available as an open-source framework featuring a user-friendly, web-based interface for visualization and annotation. The code is available under https://github.com/TIO-IKIM/CellViT-plus-plus.
Authors: Ritam Guha, Nilavra Pathak
Abstract: Counterfactual estimators are critical for learning and refining policies using logged data, a process known as Off-Policy Evaluation (OPE). OPE allows researchers to assess new policies without costly experiments, speeding up the evaluation process. Online experimental methods, such as A/B tests, are effective but often slow, thus delaying the policy selection and optimization process. In this work, we explore the application of OPE methods in the context of resource allocation in dynamic auction environments. Given the competitive nature of environments where rapid decision-making is crucial for gaining a competitive edge, the ability to quickly and accurately assess algorithmic performance is essential. By utilizing counterfactual estimators as a preliminary step before conducting A/B tests, we aim to streamline the evaluation process, reduce the time and resources required for experimentation, and enhance confidence in the chosen policies. Our investigation focuses on the feasibility and effectiveness of using these estimators to predict the outcomes of potential resource allocation strategies, evaluate their performance, and facilitate more informed decision-making in policy selection. Motivated by the outcomes of our initial study, we envision an advanced analytics system designed to seamlessly and dynamically assess new resource allocation strategies and policies.
Authors: Nora Gourmelon, Konrad Heidler, Erik Loebel, Daniel Cheng, Julian Klink, Anda Dong, Fei Wu, Noah Maul, Moritz Koch, Marcel Dreier, Dakota Pyles, Thorsten Seehaus, Matthias Braun, Andreas Maier, Vincent Christlein
Abstract: Calving front position variation of marine-terminating glaciers is an indicator of ice mass loss and a crucial parameter in numerical glacier models. Deep Learning (DL) systems can automatically extract this position from Synthetic Aperture Radar (SAR) imagery, enabling continuous, weather- and illumination-independent, large-scale monitoring. This study presents the first comparison of DL systems on a common calving front benchmark dataset. A multi-annotator study with ten annotators is performed to contrast the best-performing DL system against human performance. The best DL model's outputs deviate 221 m on average, while the average deviation of the human annotators is 38 m. This significant difference shows that current DL systems do not yet match human performance and that further research is needed to enable fully automated monitoring of glacier calving fronts. The study of Vision Transformers, foundation models, and the inclusion and processing strategy of more information are identified as avenues for future research.
Authors: Daniela Antonova, Allegra Laro, Audra McMillan, Lorenz Wolf
Abstract: Differentially private (DP) selection involves choosing a high-scoring candidate from a finite candidate pool, where each score depends on a sensitive dataset. This problem arises naturally in a variety of contexts including model selection, hypothesis testing, and within many DP algorithms. Classical methods, such as Report Noisy Max (RNM), assume all candidates' scores are equally sensitive to changes in a single individual's data, but this often isn't the case. To address this, algorithms like the Generalised Exponential Mechanism (GEM) leverage variability in candidate sensitivities. However, we observe that while these algorithms can outperform RNM in some situations, they may underperform in others - they can even perform worse than random selection. In this work, we explore how the distribution of scores and sensitivities impacts DP selection mechanisms. In all settings we study, we find that there exists a mechanism that utilises heterogeneity in the candidate sensitivities that outperforms standard mechanisms like RNM. However, no single mechanism uniformly outperforms RNM. We propose using the correlation between the scores and sensitivities as the basis for deciding which DP selection mechanism to use. Further, we design a slight variant of GEM, modified GEM that generally performs well whenever GEM performs poorly. Relying on the correlation heuristic we propose combined GEM, which adaptively chooses between GEM and modified GEM and outperforms both in polarised settings.
Authors: Mengfan Liu, Wei Wang, Chuan Wu
Abstract: With the advancement of serverless computing, running machine learning (ML) inference services over a serverless platform has been advocated, given its labor-free scalability and cost effectiveness. Mixture-of-Experts (MoE) models have been a dominant type of model architectures to enable large models nowadays, with parallel expert networks. Serving large MoE models on serverless computing is potentially beneficial, but has been underexplored due to substantial challenges in handling the skewed expert popularity and scatter-gather communication bottleneck in MoE model execution, for cost-efficient serverless MoE deployment and performance guarantee. We study optimized MoE model deployment and distributed inference serving on a serverless platform, that effectively predict expert selection, pipeline communication with model execution, and minimize the overall billed cost of serving MoE models. Especially, we propose a Bayesian optimization framework with multi-dimensional epsilon-greedy search to learn expert selections and optimal MoE deployment achieving optimal billed cost, including: 1) a Bayesian decision-making method for predicting expert popularity; 2) flexibly pipelined scatter-gather communication; and 3) an optimal model deployment algorithm for distributed MoE serving. Extensive experiments on AWS Lambda show that our designs reduce the billed cost of all MoE layers by at least 75.67% compared to CPU clusters while maintaining satisfactory inference throughput. As compared to LambdaML in serverless computing, our designs achieves 43.41% lower cost with a throughput decrease of at most 18.76%.
Authors: Laura State, Alejandra Bringas Colmenarejo, Andrea Beretta, Salvatore Ruggieri, Franco Turini, Stephanie Law
Abstract: Explainable AI (XAI) provides methods to understand non-interpretable machine learning models. However, we have little knowledge about what legal experts expect from these explanations, including their legal compliance with, and value against European Union legislation. To close this gap, we present the Explanation Dialogues, an expert focus study to uncover the expectations, reasoning, and understanding of legal experts and practitioners towards XAI, with a specific focus on the European General Data Protection Regulation. The study consists of an online questionnaire and follow-up interviews, and is centered around a use-case in the credit domain. We extract both a set of hierarchical and interconnected codes using grounded theory, and present the standpoints of the participating experts towards XAI. We find that the presented explanations are hard to understand and lack information, and discuss issues that can arise from the different interests of the data controller and subject. Finally, we present a set of recommendations for developers of XAI methods, and indications of legal areas of discussion. Among others, recommendations address the presentation, choice, and content of an explanation, technical risks as well as the end-user, while we provide legal pointers to the contestability of explanations, transparency thresholds, intellectual property rights as well as the relationship between involved parties.
Authors: Hantao Lou, Jiaming Ji, Kaile Wang, Yaodong Yang
Abstract: The rapid advancement of large language models (LLMs) has led to significant improvements in their capabilities, but also to increased concerns about their alignment with human values and intentions. Current alignment strategies, including adaptive training and inference-time methods, have demonstrated potential in this area. However, these approaches still struggle to balance deployment complexity and capability across various tasks and difficulties. In this work, we introduce the Streaming Distribution Induce Aligner (Stream Aligner), a novel alignment paradigm that combines efficiency with enhanced performance in various tasks throughout the generation process. Stream Aligner achieves dynamic sentence-level correction by using a small model to learn the preferences of the suffix sentence, iteratively correcting the suffix sentence output by the upstream model, and then using the corrected sentence to replace the suffix sentence in subsequent generations. Compared to Aligner, our experiments demonstrate that Stream Aligner reduces reliance on the capabilities of additional models, enhances the reasoning abilities of LLMs, and decreases latency during user interaction. Specifically, Stream Aligner-2B model has achieved an improvement of 76.1% in helpfulness, 36.0% in harmlessness on the tested Llama2-70B-chat model, and Stream Aligner-8B has achieved an improvement of 3.5% on the math ability of the tested Llama3-70B-Instruct model.
Authors: Nolan P Shaw, P Michael Furlong, Britt Anderson, Jeff Orchard
Abstract: At the risk of overstating the case, connectionist approaches to machine learning, i.e. neural networks, are enjoying a small vogue right now. However, these methods require large volumes of data and produce models that are uninterpretable to humans. An alternative framework that is compatible with neural networks and gradient-based learning, but explicitly models compositionality, is Vector Symbolic Architectures (VSAs). VSAs are a family of algebras on high-dimensional vector representations. They arose in cognitive science from the need to unify neural processing and the kind of symbolic reasoning that humans perform. While machine learning methods have benefited from category theoretical analyses, VSAs have not yet received similar treatment. In this paper, we present a first attempt at applying category theory to VSAs. Specifically, we conduct a brief literature survey demonstrating the lacking intersection of these two topics, provide a list of desiderata for VSAs, and propose that VSAs may be understood as a (division) rig in a category enriched over a monoid in Met (the category of Lawvere metric spaces). This final contribution suggests that VSAs may be generalised beyond current implementations. It is our hope that grounding VSAs in category theory will lead to more rigorous connections with other research, both within and beyond, learning and cognition.
Authors: Sileshi Nibret Zeleke, Amsalu Fentie Jember, Mario Bochicchio
Abstract: Encrypted network communication ensures confidentiality, integrity, and privacy between endpoints. However, attackers are increasingly exploiting encryption to conceal malicious behavior. Detecting unknown encrypted malicious traffic without decrypting the payloads remains a significant challenge. In this study, we investigate the integration of explainable artificial intelligence (XAI) techniques to detect malicious network traffic. We employ ensemble learning models to identify malicious activity using multi-view features extracted from various aspects of encrypted communication. To effectively represent malicious communication, we compiled a robust dataset with 1,127 unique connections, more than any other available open-source dataset, and spanning 54 malware families. Our models were benchmarked against the CTU-13 dataset, achieving performance of over 99% accuracy, precision, and F1-score. Additionally, the eXtreme Gradient Boosting (XGB) model demonstrated 99.32% accuracy, 99.53% precision, and 99.43% F1-score on our custom dataset. By leveraging Shapley Additive Explanations (SHAP), we identified that the maximum packet size, mean inter-arrival time of packets, and transport layer security version used are the most critical features for the global model explanation. Furthermore, key features were identified as important for local explanations across both datasets for individual traffic samples. These insights provide a deeper understanding of the model decision-making process, enhancing the transparency and reliability of detecting malicious encrypted traffic.
Authors: Pedro F. Silvestre, Peter Pietzuch
Abstract: Modern deep learning (DL) workloads increasingly use complex deep reinforcement learning (DRL) algorithms that generate training data within the learning loop. This results in programs with several nested loops and dynamic data dependencies between tensors. While DL systems with eager execution support such dynamism, they lack the optimizations and smart scheduling of graph-based execution. Graph-based execution, however, cannot express dynamic tensor shapes, instead requiring the use of multiple static subgraphs. Either execution model for DRL thus leads to redundant computation, reduced parallelism, and less efficient memory management. We describe TimeRL, a system for executing dynamic DRL programs that combines the dynamism of eager execution with the whole-program optimizations and scheduling of graph-based execution. TimeRL achieves this by introducing the declarative programming model of recurrent tensors, which allows users to define dynamic dependencies as intuitive recurrence equations. TimeRL translates recurrent tensors into a polyhedral dependence graph (PDG) with dynamic dependencies as symbolic expressions. Through simple PDG transformations, TimeRL applies whole-program optimizations, such as automatic vectorization, incrementalization, and operator fusion. The PDG also allows for the computation of an efficient program-wide execution schedule, which decides on buffer deallocations, buffer donations, and GPU/CPU memory swapping. We show that TimeRL executes current DRL algorithms up to 47$\times$ faster than existing DRL systems, while using 16$\times$ less GPU peak memory.
Authors: Maximilian Alber, Stephan Tietz, Jonas Dippel, Timo Milbich, Timoth\'ee Lesort, Panos Korfiatis, Moritz Kr\"ugener, Beatriz Perez Cancer, Neelay Shah, Alexander M\"ollers, Philipp Seegerer, Alexandra Carpen-Amarie, Kai Standvoss, Gabriel Dernbach, Edwin de Jong, Simon Schallenberg, Andreas Kunft, Helmut Hoffer von Ankershoffen, Gavin Schaeferle, Patrick Duffy, Matt Redlon, Philipp Jurmeister, David Horst, Lukas Ruff, Klaus-Robert M\"uller, Frederick Klauschen, Andrew Norgan
Abstract: Recent advances in digital pathology have demonstrated the effectiveness of foundation models across diverse applications. In this report, we present a novel vision foundation model based on the RudolfV approach. Our model was trained on a dataset comprising 1.2 million histopathology whole slide images, collected from two medical institutions: Mayo Clinic and Charit\'e - Universt\"atsmedizin Berlin. Comprehensive evaluations show that our model achieves state-of-the-art performance across twenty-one public benchmark datasets, even though it is neither the largest model by parameter count nor by training dataset size.
Authors: Jerry Chongyi Hu, Mohammed Shahid Modi, Boleslaw K. Szymanski
Abstract: Studying public sentiment during crises is crucial for understanding how opinions and sentiments shift, resulting in polarized societies. We study Weibo, the most popular microblogging site in China, using posts made during the outbreak of the COVID-19 crisis. The study period includes the pre-COVID-19 stage, the outbreak stage, and the early stage of epidemic prevention. We use Llama 3 8B, a Large Language Model, to analyze users' sentiments on the platform by classifying them into positive, negative, sarcastic, and neutral categories. Analyzing sentiment shifts on Weibo provides insights into how social events and government actions influence public opinion. This study contributes to understanding the dynamics of social sentiments during health crises, fulfilling a gap in sentiment analysis for Chinese platforms. By examining these dynamics, we aim to offer valuable perspectives on digital communication's role in shaping society's responses during unprecedented global challenges.
Authors: Ilias Diakonikolas, Daniel M. Kane, Sihan Liu, Thanasis Pittas
Abstract: We study the task of high-dimensional entangled mean estimation in the subset-of-signals model. Specifically, given $N$ independent random points $x_1,\ldots,x_N$ in $\mathbb{R}^D$ and a parameter $\alpha \in (0, 1)$ such that each $x_i$ is drawn from a Gaussian with mean $\mu$ and unknown covariance, and an unknown $\alpha$-fraction of the points have identity-bounded covariances, the goal is to estimate the common mean $\mu$. The one-dimensional version of this task has received significant attention in theoretical computer science and statistics over the past decades. Recent work [LY20; CV24] has given near-optimal upper and lower bounds for the one-dimensional setting. On the other hand, our understanding of even the information-theoretic aspects of the multivariate setting has remained limited. In this work, we design a computationally efficient algorithm achieving an information-theoretically near-optimal error. Specifically, we show that the optimal error (up to polylogarithmic factors) is $f(\alpha,N) + \sqrt{D/(\alpha N)}$, where the term $f(\alpha,N)$ is the error of the one-dimensional problem and the second term is the sub-Gaussian error rate. Our algorithmic approach employs an iterative refinement strategy, whereby we progressively learn more accurate approximations $\hat \mu$ to $\mu$. This is achieved via a novel rejection sampling procedure that removes points significantly deviating from $\hat \mu$, as an attempt to filter out unusually noisy samples. A complication that arises is that rejection sampling introduces bias in the distribution of the remaining points. To address this issue, we perform a careful analysis of the bias, develop an iterative dimension-reduction strategy, and employ a novel subroutine inspired by list-decodable learning that leverages the one-dimensional result.
Authors: Haozhi Qi, Brent Yi, Mike Lambeta, Yi Ma, Roberto Calandra, Jitendra Malik
Abstract: Learning policies in simulation and transferring them to the real world has become a promising approach in dexterous manipulation. However, bridging the sim-to-real gap for each new task requires substantial human effort, such as careful reward engineering, hyperparameter tuning, and system identification. In this work, we present a system that leverages low-level skills to address these challenges for more complex tasks. Specifically, we introduce a hierarchical policy for in-hand object reorientation based on previously acquired rotation skills. This hierarchical policy learns to select which low-level skill to execute based on feedback from both the environment and the low-level skill policies themselves. Compared to learning from scratch, the hierarchical policy is more robust to out-of-distribution changes and transfers easily from simulation to real-world environments. Additionally, we propose a generalizable object pose estimator that uses proprioceptive information, low-level skill predictions, and control errors as inputs to estimate the object pose over time. We demonstrate that our system can reorient objects, including symmetrical and textureless ones, to a desired pose.
Authors: Runjie Yan, Yinbo Chen, Xiaolong Wang
Abstract: Score Distillation Sampling (SDS) has made significant strides in distilling image-generative models for 3D generation. However, its maximum-likelihood-seeking behavior often leads to degraded visual quality and diversity, limiting its effectiveness in 3D applications. In this work, we propose Consistent Flow Distillation (CFD), which addresses these limitations. We begin by leveraging the gradient of the diffusion ODE or SDE sampling process to guide the 3D generation. From the gradient-based sampling perspective, we find that the consistency of 2D image flows across different viewpoints is important for high-quality 3D generation. To achieve this, we introduce multi-view consistent Gaussian noise on the 3D object, which can be rendered from various viewpoints to compute the flow gradient. Our experiments demonstrate that CFD, through consistent flows, significantly outperforms previous methods in text-to-3D generation.
Authors: David McAllister, Matthew Tancik, Jiaming Song, Angjoo Kanazawa
Abstract: Large-scale AI model training divides work across thousands of GPUs, then synchronizes gradients across them at each step. This incurs a significant network burden that only centralized, monolithic clusters can support, driving up infrastructure costs and straining power systems. We propose Decentralized Diffusion Models, a scalable framework for distributing diffusion model training across independent clusters or datacenters by eliminating the dependence on a centralized, high-bandwidth networking fabric. Our method trains a set of expert diffusion models over partitions of the dataset, each in full isolation from one another. At inference time, the experts ensemble through a lightweight router. We show that the ensemble collectively optimizes the same objective as a single model trained over the whole dataset. This means we can divide the training burden among a number of "compute islands," lowering infrastructure costs and improving resilience to localized GPU failures. Decentralized diffusion models empower researchers to take advantage of smaller, more cost-effective and more readily available compute like on-demand GPU nodes rather than central integrated systems. We conduct extensive experiments on ImageNet and LAION Aesthetics, showing that decentralized diffusion models FLOP-for-FLOP outperform standard diffusion models. We finally scale our approach to 24 billion parameters, demonstrating that high-quality diffusion models can now be trained with just eight individual GPU nodes in less than a week.
Authors: Numair Sani, Daniel Malinsky, Ilya Shpitser
Abstract: Causal approaches to post-hoc explainability for black-box prediction models (e.g., deep neural networks trained on image pixel data) have become increasingly popular. However, existing approaches have two important shortcomings: (i) the "explanatory units" are micro-level inputs into the relevant prediction model, e.g., image pixels, rather than interpretable macro-level features that are more useful for understanding how to possibly change the algorithm's behavior, and (ii) existing approaches assume there exists no unmeasured confounding between features and target model predictions, which fails to hold when the explanatory units are macro-level variables. Our focus is on the important setting where the analyst has no access to the inner workings of the target prediction algorithm, rather only the ability to query the output of the model in response to a particular input. To provide causal explanations in such a setting, we propose to learn causal graphical representations that allow for arbitrary unmeasured confounding among features. We demonstrate the resulting graph can differentiate between interpretable features that causally influence model predictions versus those that are merely associated with model predictions due to confounding. Our approach is motivated by a counterfactual theory of causal explanation wherein good explanations point to factors that are "difference-makers" in an interventionist sense.
Authors: Aseem Baranwal, Kimon Fountoulakis, Aukosh Jagannath
Abstract: We study the node classification problem on feature-decorated graphs in the sparse setting, i.e., when the expected degree of a node is $O(1)$ in the number of nodes, in the fixed-dimensional asymptotic regime, i.e., the dimension of the feature data is fixed while the number of nodes is large. Such graphs are typically known to be locally tree-like. We introduce a notion of Bayes optimality for node classification tasks, called asymptotic local Bayes optimality, and compute the optimal classifier according to this criterion for a fairly general statistical data model with arbitrary distributions of the node features and edge connectivity. The optimal classifier is implementable using a message-passing graph neural network architecture. We then compute the generalization error of this classifier and compare its performance against existing learning methods theoretically on a well-studied statistical model with naturally identifiable signal-to-noise ratios (SNRs) in the data. We find that the optimal message-passing architecture interpolates between a standard MLP in the regime of low graph signal and a typical convolution in the regime of high graph signal. Furthermore, we prove a corresponding non-asymptotic result.
Authors: Shintaro Nakamura, Masashi Sugiyama
Abstract: We study the real-valued combinatorial pure exploration problem in the stochastic multi-armed bandit (R-CPE-MAB). We study the case where the size of the action set is polynomial with respect to the number of arms. In such a case, the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits. We introduce an algorithm named the combinatorial gap-based exploration (CombGapE) algorithm, whose sample complexity upper bound matches the lower bound up to a problem-dependent constant factor. We numerically show that the CombGapE algorithm outperforms existing methods significantly in both synthetic and real-world datasets.
Authors: Jinghua Zhang, Li Liu, Olli Silv\'en, Matti Pietik\"ainen, Dewen Hu
Abstract: Few-shot Class-Incremental Learning (FSCIL) presents a unique challenge in Machine Learning (ML), as it necessitates the Incremental Learning (IL) of new classes from sparsely labeled training samples without forgetting previous knowledge. While this field has seen recent progress, it remains an active exploration area. This paper aims to provide a comprehensive and systematic review of FSCIL. In our in-depth examination, we delve into various facets of FSCIL, encompassing the problem definition, the discussion of the primary challenges of unreliable empirical risk minimization and the stability-plasticity dilemma, general schemes, and relevant problems of IL and Few-shot Learning (FSL). Besides, we offer an overview of benchmark datasets and evaluation metrics. Furthermore, we introduce the Few-shot Class-incremental Classification (FSCIC) methods from data-based, structure-based, and optimization-based approaches and the Few-shot Class-incremental Object Detection (FSCIOD) methods from anchor-free and anchor-based approaches. Beyond these, we present several promising research directions within FSCIL that merit further investigation.
Authors: Ziang Chen, Rong Ge
Abstract: In this work, we study the mean-field flow for learning subspace-sparse polynomials using stochastic gradient descent and two-layer neural networks, where the input distribution is standard Gaussian and the output only depends on the projection of the input onto a low-dimensional subspace. We establish a necessary condition for SGD-learnability, involving both the characteristics of the target function and the expressiveness of the activation function. In addition, we prove that the condition is almost sufficient, in the sense that a condition slightly stronger than the necessary condition can guarantee the exponential decay of the loss functional to zero.
Authors: Zhao Kang, Xuanting Xie, Bingheng Li, Erlin Pan
Abstract: In today's data-driven digital era, the amount as well as complexity, such as multi-view, non-Euclidean, and multi-relational, of the collected data are growing exponentially or even faster. Clustering, which unsupervisely extracts valid knowledge from data, is extremely useful in practice. However, existing methods are independently developed to handle one particular challenge at the expense of the others. In this work, we propose a simple but effective framework for complex data clustering (CDC) that can efficiently process different types of data with linear complexity. We first utilize graph filtering to fuse geometry structure and attribute information. We then reduce the complexity with high-quality anchors that are adaptively learned via a novel similarity-preserving regularizer. We illustrate the cluster-ability of our proposed method theoretically and experimentally. In particular, we deploy CDC to graph data of size 111M.
Authors: Jiaojiao Zhang, Linglingzhi Zhu, Mikael Johansson
Abstract: We introduce a novel differentially private algorithm for online federated learning that employs temporally correlated noise to enhance utility while ensuring privacy of continuously released models. To address challenges posed by DP noise and local updates with streaming non-iid data, we develop a perturbed iterate analysis to control the impact of the DP noise on the utility. Moreover, we demonstrate how the drift errors from local updates can be effectively managed under a quasi-strong convexity condition. Subject to an $(\epsilon, \delta)$-DP budget, we establish a dynamic regret bound over the entire time horizon, quantifying the impact of key parameters and the intensity of changes in dynamic environments. Numerical experiments confirm the efficacy of the proposed algorithm.
Authors: Joo Seung Lee, Malini Mahendra, Anil Aswani
Abstract: Mechanical ventilation is a critical life support intervention that delivers controlled air and oxygen to a patient's lungs, assisting or replacing spontaneous breathing. While several data-driven approaches have been proposed to optimize ventilator control strategies, they often lack interpretability and alignment with domain knowledge, hindering clinical adoption. This paper presents a methodology for interpretable reinforcement learning (RL) aimed at improving mechanical ventilation control as part of connected health systems. Using a causal, nonparametric model-based off-policy evaluation, we assess RL policies for their ability to enhance patient-specific outcomes-specifically, increasing blood oxygen levels (SpO2), while avoiding aggressive ventilator settings that may cause ventilator-induced lung injuries and other complications. Through numerical experiments on real-world ICU data from the MIMIC-III database, we demonstrate that our interpretable decision tree policy achieves performance comparable to state-of-the-art deep RL methods while outperforming standard behavior cloning approaches. The results highlight the potential of interpretable, data-driven decision support systems to improve safety and efficiency in personalized ventilation strategies, paving the way for seamless integration into connected healthcare environments.
Authors: Xingyu Li, Fei Tao, Wei Ye, Aydin Nassehi, John W. Sutherland
Abstract: In this study, we introduce Generative Manufacturing Systems (GMS) as a novel approach to effectively manage and coordinate autonomous manufacturing assets, thereby enhancing their responsiveness and flexibility to address a wide array of production objectives and human preferences. Deviating from traditional explicit modeling, GMS employs generative AI, including diffusion models and ChatGPT, for implicit learning from envisioned futures, marking a shift from a model-optimum to a training-sampling decision-making. Through the integration of generative AI, GMS enables complex decision-making through interactive dialogue with humans, allowing manufacturing assets to generate multiple high-quality global decisions that can be iteratively refined based on human feedback. Empirical findings showcase GMS's substantial improvement in system resilience and responsiveness to uncertainties, with decision times reduced from seconds to milliseconds. The study underscores the inherent creativity and diversity in the generated solutions, facilitating human-centric decision-making through seamless and continuous human-machine interactions.
Authors: Tobias Leemann, Alina Fastowski, Felix Pfeiffer, Gjergji Kasneci
Abstract: We address the critical challenge of applying feature attribution methods to the transformer architecture, which dominates current applications in natural language processing and beyond. Traditional attribution methods to explainable AI (XAI) explicitly or implicitly rely on linear or additive surrogate models to quantify the impact of input features on a model's output. In this work, we formally prove an alarming incompatibility: transformers are structurally incapable of representing linear or additive surrogate models used for feature attribution, undermining the grounding of these conventional explanation methodologies. To address this discrepancy, we introduce the Softmax-Linked Additive Log Odds Model (SLALOM), a novel surrogate model specifically designed to align with the transformer framework. SLALOM demonstrates the capacity to deliver a range of insightful explanations with both synthetic and real-world datasets. We highlight SLALOM's unique efficiency-quality curve by showing that SLALOM can produce explanations with substantially higher fidelity than competing surrogate models or provide explanations of comparable quality at a fraction of their computational costs. We release code for SLALOM as an open-source project online at https://github.com/tleemann/slalom_explanations.
Authors: Shai Feldman, Yaniv Romano
Abstract: We develop a method to generate prediction sets with a guaranteed coverage rate that is robust to corruptions in the training data, such as missing or noisy variables. Our approach builds on conformal prediction, a powerful framework to construct prediction sets that are valid under the i.i.d assumption. Importantly, naively applying conformal prediction does not provide reliable predictions in this setting, due to the distribution shift induced by the corruptions. To account for the distribution shift, we assume access to privileged information (PI). The PI is formulated as additional features that explain the distribution shift, however, they are only available during training and absent at test time. We approach this problem by introducing a novel generalization of weighted conformal prediction and support our method with theoretical coverage guarantees. Empirical experiments on both real and synthetic datasets indicate that our approach achieves a valid coverage rate and constructs more informative predictions compared to existing methods, which are not supported by theoretical guarantees.
Authors: Alexander Olza, David Soto, Roberto Santana
Abstract: In cognitive neuroscience and brain-computer interface research, accurately predicting imagined stimuli is crucial. This study investigates the effectiveness of Domain Adaptation (DA) in enhancing imagery prediction using primarily visual data from fMRI scans of 18 subjects. Initially, we train a baseline model on visual stimuli to predict imagined stimuli, utilizing data from 14 brain regions. We then develop several models to improve imagery prediction, comparing different DA methods. Our results demonstrate that DA significantly enhances imagery prediction in binary classification on our dataset, as well as in multiclass classification on a publicly available dataset. We then conduct a DA-enhanced searchlight analysis, followed by permutation-based statistical tests to identify brain regions where imagery decoding is consistently above chance across subjects. Our DA-enhanced searchlight predicts imagery contents in a highly distributed set of brain regions, including the visual cortex and the frontoparietal cortex, thereby outperforming standard cross-domain classification methods. The complete code and data for this paper have been made openly available for the use of the scientific community.
Authors: Youdong Guo, Timothy E. Holy
Abstract: Non-negative matrix factorization (NMF) is an important tool in signal processing and widely used to separate mixed sources into their components. Algorithms for NMF require that the user choose the number of components in advance, and if the results are unsatisfying one typically needs to start again with a different number of components. To make NMF more interactive and incremental, here we introduce GSVD-NMF, a method that proposes new components based on the generalized singular value decomposition (GSVD) to address discrepancies between the initial under-complete NMF results and the SVD of the original matrix. Simulation and experimental results demonstrate that GSVD-NMF often effectively recovers multiple missing components in under-complete NMF, with the recovered NMF solutions frequently reaching better local optima. The results further show that GSVD-NMF is compatible with various NMF algorithms and that directly augmenting components is more efficient than rerunning NMF from scratch with additional components. By deliberately starting from under-complete NMF, GSVD-NMF has the potential to be a recommended approach for a range of general NMF applications.
Authors: Wall Kim
Abstract: Sequence modeling with State Space models (SSMs) has demonstrated performance surpassing that of Transformers in various tasks, raising expectations for their potential to outperform the Decision Transformer and its enhanced variants in offline reinforcement learning (RL). However, decision models based on Mamba, a state-of-the-art SSM, failed to achieve superior performance compared to these enhanced Decision Transformers. We hypothesize that this limitation arises from information loss during the selective scanning phase. To address this, we propose the Decision MetaMamba (DMM), which augments Mamba with a token mixer in its input layer. This mixer explicitly accounts for the multimodal nature of offline RL inputs, comprising state, action, and return-to-go. The DMM demonstrates improved performance while significantly reducing parameter count compared to prior models. Notably, similar performance gains were achieved using a simple linear token mixer, emphasizing the importance of preserving information from proximate time steps rather than the specific design of the token mixer itself. This novel modification to Mamba's input layer represents a departure from conventional timestamp-based encoding approaches used in Transformers. By enhancing performance of Mamba in offline RL, characterized by memory efficiency and fast inference, this work opens new avenues for its broader application in future RL research.
Authors: Junxiong Wang, Daniele Paliotta, Avner May, Alexander M. Rush, Tri Dao
Abstract: Linear RNN architectures, like Mamba, can be competitive with Transformer models in language modeling while having advantageous deployment characteristics. Given the focus on training large-scale Transformer models, we consider the challenge of converting these pretrained models for deployment. We demonstrate that it is feasible to distill large Transformers into linear RNNs by reusing the linear projection weights from attention layers with academic GPU resources. The resulting hybrid model, which incorporates a quarter of the attention layers, achieves performance comparable to the original Transformer in chat benchmarks and outperforms open-source hybrid Mamba models trained from scratch with trillions of tokens in both chat benchmarks and general benchmarks. Moreover, we introduce a hardware-aware speculative decoding algorithm that accelerates the inference speed of Mamba and hybrid models. Overall we show how, with limited computation resources, we can remove many of the original attention layers and generate from the resulting model more efficiently. Our top-performing model, distilled from Llama3-8B-Instruct, achieves a 29.61 length-controlled win rate on AlpacaEval 2 against GPT-4 and 7.35 on MT-Bench, surpassing the best 8B scale instruction-tuned linear RNN model. We also find that the distilled model has natural length extrapolation, showing almost perfect accuracy in the needle-in-a-haystack test at 20x the distillation length. Code and pre-trained checkpoints are open-sourced at https://github.com/jxiw/MambaInLlama and https://github.com/itsdaniele/speculative_mamba.
URLs: https://github.com/jxiw/MambaInLlama, https://github.com/itsdaniele/speculative_mamba.
Authors: Natalia Zhang, Xinqi Wang, Qiwen Cui, Runlong Zhou, Sham M. Kakade, Simon S. Du
Abstract: We initiate the study of Preference-Based Multi-Agent Reinforcement Learning (PbMARL), exploring both theoretical foundations and empirical validations. We define the task as identifying the Nash equilibrium from a preference-only offline dataset in general-sum games, a problem marked by the challenge of sparse feedback signals. Our theory establishes the upper complexity bounds for Nash Equilibrium in effective PbMARL, demonstrating that single-policy coverage is inadequate and highlighting the importance of unilateral dataset coverage. These theoretical insights are verified through comprehensive experiments. To enhance the practical performance, we further introduce two algorithmic techniques. (1) We propose a Mean Squared Error (MSE) regularization along the time axis to achieve a more uniform reward distribution and improve reward learning outcomes. (2) We propose an additional penalty based on the distribution of the dataset to incorporate pessimism, improving stability and effectiveness during training. Our findings underscore the multifaceted approach required for PbMARL, paving the way for effective preference-based multi-agent systems.
Authors: Recep Can Yavas, Yuqi Huang, Vincent Y. F. Tan, Jonathan Scarlett
Abstract: We develop a general framework for clustering and distribution matching problems with bandit feedback. We consider a $K$-armed bandit model where some subset of $K$ arms is partitioned into $M$ groups. Within each group, the random variable associated to each arm follows the same distribution on a finite alphabet. At each time step, the decision maker pulls an arm and observes its outcome from the random variable associated to that arm. Subsequent arm pulls depend on the history of arm pulls and their outcomes. The decision maker has no knowledge of the distributions of the arms or the underlying partitions. The task is to devise an online algorithm to learn the underlying partition of arms with the least number of arm pulls on average and with an error probability not exceeding a pre-determined value~$\delta$. Several existing problems fall under our general framework, including finding $M$ pairs of arms, odd arm identification, and $N$-ary clustering of $K$ arms belong to our general framework. We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $\delta$. Furthermore, we develop a computationally-efficient online algorithm based on the Track-and-Stop method and Frank--Wolfe algorithm, and show that the average number of arm pulls of our algorithm asymptotically matches that of the lower bound. Our refined analysis also uncovers a novel bound on the speed at which the average number of arm pulls of our algorithm converges to the fundamental limit as $\delta$ vanishes.
Authors: Erik B. Terres-Escudero, Javier Del Ser, Pablo Garcia Bringas
Abstract: The so-called Forward-Forward Algorithm (FFA) has recently gained momentum as an alternative to the conventional back-propagation algorithm for neural network learning, yielding competitive performance across various modeling tasks. By replacing the backward pass of gradient back-propagation with two contrastive forward passes, the FFA avoids several shortcomings undergone by its predecessor (e.g., vanishing/exploding gradient) by enabling layer-wise training heuristics. In classification tasks, this contrastive method has been proven to effectively create a latent sparse representation of the input data, ultimately favoring discriminability. However, FFA exhibits an inherent asymmetric gradient behavior due to an imbalanced loss function between positive and negative data, adversely impacting on the model's generalization capabilities and leading to an accuracy degradation. To address this issue, this work proposes the Symmetric Forward-Forward Algorithm (SFFA), a novel modification of the original FFA which partitions each layer into positive and negative neurons. This allows the local fitness function to be defined as the ratio between the activation of positive neurons and the overall layer activity, resulting in a symmetric loss landscape during the training phase. To evaluate the enhanced convergence of our method, we conduct several experiments using multiple image classification benchmarks, comparing the accuracy of models trained with SFFA to those trained with its FFA counterpart. As a byproduct of this reformulation, we explore the advantages of using a layer-wise training algorithm for Continual Learning (CL) tasks. The specialization of neurons and the sparsity of their activations induced by layer-wise training algorithms enable efficient CL strategies that incorporate new knowledge (classes) into the neural network, while preventing catastrophic forgetting of previously...
Authors: Marko Tuononen, Dani Korpi, Ville Hautam\"aki
Abstract: We propose a novel method for interpreting neural networks, focusing on convolutional neural network-based receiver model. The method identifies which unit or units of the model contain most (or least) information about the channel parameter(s) of the interest, providing insights at both global and local levels -- with global explanations aggregating local ones. Experiments on link-level simulations demonstrate the method's effectiveness in identifying units that contribute most (and least) to signal-to-noise ratio processing. Although we focus on a radio receiver model, the method generalizes to other neural network architectures and applications, offering robust estimation even in high-dimensional settings.
Authors: Yuzheng Hu, Pingbang Hu, Han Zhao, Jiaqi W. Ma
Abstract: How can we attribute the behaviors of machine learning models to their training data? While the classic influence function sheds light on the impact of individual samples, it often fails to capture the more complex and pronounced collective influence of a set of samples. To tackle this challenge, we study the Most Influential Subset Selection (MISS) problem, which aims to identify a subset of training samples with the greatest collective influence. We conduct a comprehensive analysis of the prevailing approaches in MISS, elucidating their strengths and weaknesses. Our findings reveal that influence-based greedy heuristics, a dominant class of algorithms in MISS, can provably fail even in linear regression. We delineate the failure modes, including the errors of influence function and the non-additive structure of the collective influence. Conversely, we demonstrate that an adaptive version of these heuristics which applies them iteratively, can effectively capture the interactions among samples and thus partially address the issues. Experiments on real-world datasets corroborate these theoretical findings and further demonstrate that the merit of adaptivity can extend to more complex scenarios such as classification tasks and non-linear neural networks. We conclude our analysis by emphasizing the inherent trade-off between performance and computational efficiency, questioning the use of additive metrics such as the Linear Datamodeling Score, and offering a range of discussions.
Authors: Yilong Li, Jingyu Liu, Hao Zhang, M Badri Narayanan, Utkarsh Sharma, Shuai Zhang, Pan Hu, Yijing Zeng, Jayaram Raghuram, Suman Banerjee
Abstract: Deploying large language models (LLMs) locally on mobile devices is advantageous in scenarios where transmitting data to remote cloud servers is either undesirable due to privacy concerns or impractical due to network connection. Recent advancements (MLC, 2023a; Gerganov, 2023) have facilitated the local deployment of LLMs. However, local deployment also presents challenges, particularly in balancing quality (generative performance), latency, and throughput within the hardware constraints of mobile devices. In this paper, we introduce our lightweight, all-in-one automated benchmarking framework that allows users to evaluate LLMs on mobile devices. We provide a comprehensive benchmark of various popular LLMs with different quantization configurations (both weights and activations) across multiple mobile platforms with varying hardware capabilities. Unlike traditional benchmarks that assess full-scale models on high-end GPU clusters, we focus on evaluating resource efficiency (memory and power consumption) and harmful output for compressed models on mobile devices. Our key observations include i) differences in energy efficiency and throughput across mobile platforms; ii) the impact of quantization on memory usage, GPU execution time, and power consumption; and iii) accuracy and performance degradation of quantized models compared to their non-quantized counterparts; and iv) the frequency of hallucinations and toxic content generated by compressed LLMs on mobile devices.
Authors: Oleg Filatov, Jan Ebert, Jiangtao Wang, Stefan Kesselheim
Abstract: One of the main challenges in optimal scaling of large language models (LLMs) is the prohibitive cost of hyperparameter tuning, particularly learning rate $\eta$ and batch size $B$. While techniques like $\mu$P (Yang et al., 2022) provide scaling rules for optimal $\eta$ transfer in the infinite model size limit, the optimal scaling behavior in the infinite data size limit remains unknown. We fill in this gap by observing for the first time an intricate dependence of optimal $\eta$ scaling on the pretraining token budget $T$, $B$ and its relation to the critical batch size $B_\mathrm{crit}$, which we measure to evolve as $B_\mathrm{crit} \propto T$. Furthermore, we show that the optimal batch size is positively correlated with $B_\mathrm{crit}$: keeping it fixed becomes suboptimal over time even if learning rate is scaled optimally. Surprisingly, our results demonstrate that the observed optimal $\eta$ and $B$ dynamics are preserved with $\mu$P model scaling, challenging the conventional view of $B_\mathrm{crit}$ dependence solely on loss value. Complementing optimality, we examine the sensitivity of loss to changes in learning rate, where we find the sensitivity to decrease with increase of $T$ and to remain constant with $\mu$P model scaling. We hope our results make the first step towards a unified picture of the joint optimal data and model scaling.
Authors: Zhengyu Hu, Yichuan Li, Zhengyu Chen, Jingang Wang, Han Liu, Kyumin Lee, Kaize Ding
Abstract: Textual Attributed Graphs (TAGs) are crucial for modeling complex real-world systems, yet leveraging large language models (LLMs) for TAGs presents unique challenges due to the gap between sequential text processing and graph-structured data. We introduce AskGNN, a novel approach that bridges this gap by leveraging In-Context Learning (ICL) to integrate graph data and task-specific information into LLMs. AskGNN employs a Graph Neural Network (GNN)-powered structure-enhanced retriever to select labeled nodes across graphs, incorporating complex graph structures and their supervision signals. Our learning-to-retrieve algorithm optimizes the retriever to select example nodes that maximize LLM performance on graph. Experiments across three tasks and seven LLMs demonstrate AskGNN's superior effectiveness in graph task performance, opening new avenues for applying LLMs to graph-structured data without extensive fine-tuning.
Authors: Francisco Erivaldo Fernandes Junior, Antti Oulasvirta
Abstract: Developing a reinforcement learning (RL) agent often involves identifying values for numerous parameters, covering the policy, reward function, environment, and agent-internal architecture. Since these parameters are interrelated in complex ways, optimizing them is a black-box problem that proves especially challenging for nonexperts. Although existing optimization-as-a-service platforms (e.g., Vizier and Optuna) can handle such problems, they are impractical for RL systems, since the need for manual user mapping of each parameter to distinct components makes the effort cumbersome. It also requires understanding of the optimization process, limiting the systems' application beyond the machine learning field and restricting access in areas such as cognitive science, which models human decision-making. To tackle these challenges, the paper presents AgentForge, a flexible low-code platform to optimize any parameter set across an RL system. Available at https://github.com/feferna/AgentForge, it allows an optimization problem to be defined in a few lines of code and handed to any of the interfaced optimizers. With AgentForge, the user can optimize the parameters either individually or jointly. The paper presents an evaluation of its performance for a challenging vision-based RL problem.
Authors: Matthias Holzenkamp, Dongyu Lyu, Ulrich Kleinekath\"ofer, Peter Zaspel
Abstract: Uncertainty estimations for machine learning interatomic potentials (MLIPs) are crucial for quantifying model error and identifying informative training samples in active learning strategies. In this study, we evaluate uncertainty estimations of Gaussian process regression (GPR)-based MLIPs, including the predictive GPR standard deviation and ensemble-based uncertainties. We do this in terms of calibration and in terms of impact on model performance in an active learning scheme. We consider GPR models with Coulomb and Smooth Overlap of Atomic Positions (SOAP) representations as inputs to predict potential energy surfaces and excitation energies of molecules. Regarding calibration, we find that ensemble-based uncertainty estimations show already poor global calibration (e.g., averaged over the whole test set). In contrast, the GPR standard deviation shows good global calibration, but when grouping predictions by their uncertainty, we observe a systematical bias for predictions with high uncertainty. Although an increasing uncertainty correlates with an increasing bias, the bias is not captured quantitatively by the uncertainty. Therefore, the GPR standard deviation can be useful to identify predictions with a high bias and error but, without further knowledge, should not be interpreted as a quantitative measure for a potential error range. Selecting the samples with the highest GPR standard deviation from a fixed configuration space leads to a model that overemphasizes the borders of the configuration space represented in the fixed dataset. This may result in worse performance in more densely sampled areas but better generalization for extrapolation tasks.
Authors: Yujie Zhao, Jose Efraim Aguilar Escamill, Weyl Lu, Huazheng Wang
Abstract: Reinforcement Learning from Human Feedback (RLHF) has recently surged in popularity, particularly for aligning large language models and other AI systems with human intentions. At its core, RLHF can be viewed as a specialized instance of Preference-based Reinforcement Learning (PbRL), where the preferences specifically originate from human judgments rather than arbitrary evaluators. Despite this connection, most existing approaches in both RLHF and PbRL primarily focus on optimizing a mean reward objective, neglecting scenarios that necessitate risk-awareness, such as AI safety, healthcare, and autonomous driving. These scenarios often operate under a one-episode-reward setting, which makes conventional risk-sensitive objectives inapplicable. To address this, we explore and prove the applicability of two risk-aware objectives to PbRL : nested and static quantile risk objectives. We also introduce Risk-AwarePbRL (RA-PbRL), an algorithm designed to optimize both nested and static objectives. Additionally, we provide a theoretical analysis of the regret upper bounds, demonstrating that they are sublinear with respect to the number of episodes, and present empirical results to support our findings. Our code is available in https://github.com/aguilarjose11/PbRLNeurips.
Authors: Elia Cunegatti, Leonardo Lucio Custode, Giovanni Iacca
Abstract: Network pruning focuses on computational techniques that aim to reduce a given model's computational cost by removing a subset of its parameters while having minimal impact on performance. Throughout the last decade, the most widely used pruning paradigm has been pruning and re-training, which nowadays is inconvenient due to the vast amount of pre-trained models, which are in any case too expensive to re-train. In this paper, we exploit functional information from dense pre-trained models, i.e., their activations, to obtain sparse models that maximize the activations' alignment w.r.t. their corresponding dense models. Hence, we propose \textsc{NeuroAL}, a \emph{top-up} algorithm that can be used on top of any given pruning algorithm for LLMs, which modifies the block-wise and row-wise sparsity exploiting information from both the dense model and its sparse version to maximize the \emph{neuron alignment} among activations. Differently from existing methods, our approach adaptively selects the best hyperparameters for the block-wise and row-wise sparsity ratios w.r.t. the model and the desired sparsity, and requires \emph{no re-training}. We test our method over 276 cases combining four LLM families, three sparsity ratios, and ten language tasks (three language modeling and seven zero-shot datasets), showing how it consistently outperforms the latest state-of-the-art methods in terms of performance-runtime trade-off. The code is available at \href{https://github.com/eliacunegatti/NeuroAL}{https://github.com/eliacunegatti/NeuroAL}.
URLs: https://github.com/eliacunegatti/NeuroAL, https://github.com/eliacunegatti/NeuroAL
Authors: Einari Vaaras, Manu Airaksinen, Okko R\"as\"anen
Abstract: Self-supervised learning (SSL) is a data-driven learning approach that utilizes the innate structure of the data to guide the learning process. In contrast to supervised learning, which depends on external labels, SSL utilizes the inherent characteristics of the data to produce its own supervisory signal. However, one frequent issue with SSL methods is representation collapse, where the model outputs a constant input-invariant feature representation. This issue hinders the potential application of SSL methods to new data modalities, as trying to avoid representation collapse wastes researchers' time and effort. This paper introduces a novel SSL algorithm for time-series data called Prediction of Functionals from Masked Latents (PFML). Instead of predicting masked input signals or their latent representations directly, PFML operates by predicting statistical functionals of the input signal corresponding to masked embeddings, given a sequence of unmasked embeddings. The algorithm is designed to avoid representation collapse, rendering it straightforwardly applicable to different time-series data domains, such as novel sensor modalities in clinical data. We demonstrate the effectiveness of PFML through complex, real-life classification tasks across three different data modalities: infant posture and movement classification from multi-sensor inertial measurement unit data, emotion recognition from speech data, and sleep stage classification from EEG data. The results show that PFML is superior to a conceptually similar SSL method and a contrastive learning-based SSL method. Additionally, PFML is on par with the current state-of-the-art SSL method, while also being conceptually simpler and without suffering from representation collapse.
Authors: Feng Xiong, Runxi Cheng, Wang Chen, Zhanqiu Zhang, Yiwen Guo, Chun Yuan, Ruifeng Xu
Abstract: Model merging has recently gained attention as an economical and scalable approach to incorporate task-specific weights from various tasks into a unified multi-task model. For example, in Task Arithmetic (TA), adding the fine-tuned weights of different tasks can enhance the model's performance on those tasks, while subtracting them leads to task forgetting. Although TA is highly effective, interference among task still hampers the performance of the merged model. Existing methods for handling conflicts between task generally rely on empirical selection, resulting in suboptimal performance. In this paper, we introduce an Adaptive Weight Disentanglement method. We begin by theoretically proving that task vectors employed in model merging should be orthogonal to minimize interference among tasks. Guided by this insight, we initialize redundant vectors such that, when subtracted from the original task vectors, the resulting vectors exhibit increased orthogonality. Additionally, we impose an norm constraint on the redundant vectors to preserve the performance of the task-specific models. Experimental results demonstrate the effectiveness of our proposed technique: it successfully extracts redundant vectors, and after their subtraction, the task vectors not only retain robust performance but also achieve superior fusion outcomes. Our code is available at \href{https://github.com/FarisXiong/AWD.git}{https://github.com/FarisXiong/AWD.git}.
URLs: https://github.com/FarisXiong/AWD.git, https://github.com/FarisXiong/AWD.git
Authors: Jiaojiao Zhang, Linglingzhi Zhu, Dominik Fay, Mikael Johansson
Abstract: We introduce a locally differentially private (LDP) algorithm for online federated learning that employs temporally correlated noise to improve utility while preserving privacy. To address challenges posed by the correlated noise and local updates with streaming non-IID data, we develop a perturbed iterate analysis that controls the impact of the noise on the utility. Moreover, we demonstrate how the drift errors from local updates can be effectively managed for several classes of nonconvex loss functions. Subject to an $(\epsilon,\delta)$-LDP budget, we establish a dynamic regret bound that quantifies the impact of key parameters and the intensity of changes in the dynamic environment on the learning performance. Numerical experiments confirm the efficacy of the proposed algorithm.
Authors: Jiang Yang, Yuxiang Zhao, Quanhui Zhu
Abstract: In recent years, deep learning, powered by neural networks, has achieved widespread success in solving high-dimensional problems, particularly those with low-dimensional feature structures. This success stems from their ability to identify and learn low dimensional features tailored to the problems. Understanding how neural networks extract such features during training dynamics remains a fundamental question in deep learning theory. In this work, we propose a novel perspective by interpreting the neurons in the last hidden layer of a neural network as basis functions that represent essential features. To explore the linear independence of these basis functions throughout the deep learning dynamics, we introduce the concept of 'effective rank'. Our extensive numerical experiments reveal a notable phenomenon: the effective rank increases progressively during the learning process, exhibiting a staircase-like pattern, while the loss function concurrently decreases as the effective rank rises. We refer to this observation as the 'staircase phenomenon'. Specifically, for deep neural networks, we rigorously prove the negative correlation between the loss function and effective rank, demonstrating that the lower bound of the loss function decreases with increasing effective rank. Therefore, to achieve a rapid descent of the loss function, it is critical to promote the swift growth of effective rank. Ultimately, we evaluate existing advanced learning methodologies and find that these approaches can quickly achieve a higher effective rank, thereby avoiding redundant staircase processes and accelerating the rapid decline of the loss function.
Authors: Yun Qu, Yuhang Jiang, Boyuan Wang, Yixiu Mao, Cheems Wang, Chang Liu, Xiangyang Ji
Abstract: Reinforcement learning (RL) often encounters delayed and sparse feedback in real-world applications, even with only episodic rewards. Previous approaches have made some progress in reward redistribution for credit assignment but still face challenges, including training difficulties due to redundancy and ambiguous attributions stemming from overlooking the multifaceted nature of mission performance evaluation. Hopefully, Large Language Model (LLM) encompasses fruitful decision-making knowledge and provides a plausible tool for reward redistribution. Even so, deploying LLM in this case is non-trivial due to the misalignment between linguistic knowledge and the symbolic form requirement, together with inherent randomness and hallucinations in inference. To tackle these issues, we introduce LaRe, a novel LLM-empowered symbolic-based decision-making framework, to improve credit assignment. Key to LaRe is the concept of the Latent Reward, which works as a multi-dimensional performance evaluation, enabling more interpretable goal attainment from various perspectives and facilitating more effective reward redistribution. We examine that semantically generated code from LLM can bridge linguistic knowledge and symbolic latent rewards, as it is executable for symbolic objects. Meanwhile, we design latent reward self-verification to increase the stability and reliability of LLM inference. Theoretically, reward-irrelevant redundancy elimination in the latent reward benefits RL performance from more accurate reward estimation. Extensive experimental results witness that LaRe (i) achieves superior temporal credit assignment to SOTA methods, (ii) excels in allocating contributions among multiple agents, and (iii) outperforms policies trained with ground truth rewards for certain tasks.
Authors: Mohsen Rashki
Abstract: Machine learning (ML) has emerged as a powerful tool for tackling complex regression and classification tasks, yet its success often hinges on the quality of training data. This study introduces an ML paradigm inspired by domain knowledge of the structure of output function, akin to physics-informed ML, but rooted in probabilistic principles rather than physical laws. The proposed approach integrates the probabilistic structure of the target variable (such as its cumulative distribution function) into the training process. This probabilistic information is obtained from historical data or estimated using structural reliability methods during experimental design. By embedding domain-specific probabilistic insights into the learning process, the technique enhances model accuracy and mitigates risks of overfitting and underfitting. Applications in regression, image denoising, and classification demonstrate the approach's effectiveness in addressing real-world problems.
Authors: Yongyu Wang
Abstract: Graph Neural Networks (GNNs) are a highly effective neural network architecture for processing graph-structured data. Unlike traditional neural networks that rely solely on the features of the data as input, GNNs leverage both the graph structure, which represents the relationships between data points, and the feature matrix of the data to optimize their feature representation. This unique capability enables GNNs to achieve superior performance across various tasks. However, it also makes GNNs more susceptible to noise from both the graph structure and data features, which can significantly increase the training difficulty and degrade their performance. To address this issue, this paper proposes a novel method for selecting noise-sensitive training samples from the original training set to construct a smaller yet more effective training set for model training. These samples are then used to enhance the model's ability to handle noise-prone instances effectively. We have evaluated our approach on three of the most classical GNN models -- GCN, GAT, and GraphSAGE -- as well as three widely used benchmark datasets: Cora, Citeseer, and PubMed. Our experiments demonstrate that the proposed method can substantially boost the overall training of Graph Neural Networks compared to using randomly constructed training sets.
Authors: Jonathan Schmidt, Luca Schmidt, Felix Strnad, Nicole Ludwig, Philipp Hennig
Abstract: Local climate information is crucial for impact assessment and decision-making, yet coarse global climate simulations cannot capture small-scale phenomena. Current statistical downscaling methods infer these phenomena as temporally decoupled spatial patches. However, to preserve physical properties, estimating spatio-temporally coherent high-resolution weather dynamics for multiple variables across long time horizons is crucial. We present a novel generative approach that uses a score-based diffusion model trained on high-resolution reanalysis data to capture the statistical properties of local weather dynamics. After training, we condition on coarse climate model data to generate weather patterns consistent with the aggregate information. As this inference task is inherently uncertain, we leverage the probabilistic nature of diffusion models and sample multiple trajectories. We evaluate our approach with high-resolution reanalysis information before applying it to the climate model downscaling task. We then demonstrate that the model generates spatially and temporally coherent weather dynamics that align with global climate output.
Authors: Taneesh Gupta, Rahul Madhavan, Xuchao Zhang, Chetan Bansal, Saravan Rajmohan
Abstract: We introduce REFA, a family of reference-free alignment methods that optimize over multiple user preferences while enforcing fine-grained length control. Our approach integrates deviation-based weighting to emphasize high-quality responses more strongly, length normalization to prevent trivial short-response solutions, and an EOS-probability regularizer to mitigate dataset-induced brevity biases. Theoretically, we show that under the Uncertainty Reduction with Sequence Length Assertion (URSLA), naive length normalization can still incentivize length-based shortcuts. By contrast, REFA corrects these subtle incentives, guiding models toward genuinely more informative and higher-quality outputs. Empirically, REFA sets a new state-of-the-art among reference-free alignment methods, producing richer responses aligned more closely with human preferences. Compared to a base supervised fine-tuned (SFT) mistral-7b model that achieves 8.4% length-controlled win rate (LC-WR) and 6.2% win rate (WR), our best REFA configuration attains 21.62% LC-WR and 19.87% WR on the AlpacaEval v2 benchmark. This represents a substantial improvement over both the strongest multi-preference baseline, InfoNCA (16.82% LC-WR, 10.44% WR), and the strongest reference-free baseline, SimPO (20.01% LC-WR, 17.65% WR)
Authors: Orson Mengara
Abstract: With the rapid development of generative artificial intelligence, particularly large language models, a number of sub-fields of deep learning have made significant progress and are now very useful in everyday applications. For example, well-known financial institutions simulate a wide range of scenarios for various models created by their research teams using reinforcement learning, both before production and after regular operations. In this work, we propose a backdoor attack that focuses solely on data poisoning. This particular backdoor attack is classified as an attack without prior consideration or trigger, and we name it FinanceLLMsBackRL. Our aim is to examine the potential effects of large language models that use reinforcement learning systems for text production or speech recognition, finance, physics, or the ecosystem of contemporary artificial intelligence models.
Authors: Afek Steinberg, Ran Eisenberg, Ofir Lindenbaum
Abstract: Temporal alignment of sequences is a fundamental challenge in many applications, such as computer vision and bioinformatics, where local time shifting needs to be accounted for. Misalignment can lead to poor model generalization, especially in high-dimensional sequences. Existing methods often struggle with optimization when dealing with high-dimensional sparse data, falling into poor alignments. Feature selection is frequently used to enhance model performance for sparse data. However, a fixed set of selected features would not generally work for dynamically changing sequences and would need to be modified based on the state of the sequence. Therefore, modifying the selected feature based on contextual input would result in better alignment. Our suggested method, Conditional Deep Canonical Temporal Time Warping (CDCTW), is designed for temporal alignment in sparse temporal data to address these challenges. CDCTW enhances alignment accuracy for high dimensional time-dependent views be performing dynamic time warping on data embedded in maximally correlated subspace which handles sparsity with novel feature selection method. We validate the effectiveness of CDCTW through extensive experiments on various datasets, demonstrating superior performance over previous techniques.
Authors: Shuo Yu, Shan Jin, Ming Li, Tabinda Sarwar, Feng Xia
Abstract: Understanding communication and information processing among brain regions of interest (ROIs) is highly dependent on long-range connectivity, which plays a crucial role in facilitating diverse functional neural integration across the entire brain. However, previous studies generally focused on the short-range dependencies within brain networks while neglecting the long-range dependencies, limiting an integrated understanding of brain-wide communication. To address this limitation, we propose Adaptive Long-range aware TransformER (ALTER), a brain graph transformer to capture long-range dependencies between brain ROIs utilizing biased random walk. Specifically, we present a novel long-range aware strategy to explicitly capture long-range dependencies between brain ROIs. By guiding the walker towards the next hop with higher correlation value, our strategy simulates the real-world brain-wide communication. Furthermore, by employing the transformer framework, ALERT adaptively integrates both short- and long-range dependencies between brain ROIs, enabling an integrated understanding of multi-level communication across the entire brain. Extensive experiments on ABIDE and ADNI datasets demonstrate that ALTER consistently outperforms generalized state-of-the-art graph learning methods (including SAN, Graphormer, GraphTrans, and LRGNN) and other graph learning based brain network analysis methods (including FBNETGEN, BrainNetGNN, BrainGNN, and BrainNETTF) in neurological disease diagnosis. Cases of long-range dependencies are also presented to further illustrate the effectiveness of ALTER. The implementation is available at https://github.com/yushuowiki/ALTER.
Authors: Kunpeng Xu, Lifei Chen, Shengrui Wang
Abstract: In the realm of time series analysis, tackling the phenomenon of concept drift poses a significant challenge. Concept drift -- characterized by the evolving statistical properties of time series data, affects the reliability and accuracy of conventional analysis models. This is particularly evident in co-evolving scenarios where interactions among variables are crucial. This paper presents Drift2Matrix, a novel framework that leverages kernel-induced self-representation for adaptive responses to concept drift in time series. Drift2Matrix employs a kernel-based learning mechanism to generate a representation matrix, encapsulating the inherent dynamics of co-evolving time series. This matrix serves as a key tool for identification and adaptation to concept drift by observing its temporal variations. Furthermore, Drift2Matrix effectively identifies prevailing patterns and offers insights into emerging trends through pattern evolution analysis. Our empirical evaluation of Drift2Matrix across various datasets demonstrates its effectiveness in handling the complexities of concept drift. This approach introduces a novel perspective in the theoretical domain of co-evolving time series analysis, enhancing adaptability and accuracy in the face of dynamic data environments.
Authors: Yinkai Wang, Xiaohui Chen, Liping Liu, Soha Hassoun
Abstract: The annotation (assigning structural chemical identities) of MS/MS spectra remains a significant challenge due to the enormous molecular diversity in biological samples and the limited scope of reference databases. Currently, the vast majority of spectral measurements remain in the "dark chemical space" without structural annotations. To improve annotation, we propose MADGEN (Mass-spec Attends to De Novo Molecular GENeration), a scaffold-based method for de novo molecular structure generation guided by mass spectrometry data. MADGEN operates in two stages: scaffold retrieval and spectra-conditioned molecular generation starting with the scaffold. In the first stage, given an MS/MS spectrum, we formulate scaffold retrieval as a ranking problem and employ contrastive learning to align mass spectra with candidate molecular scaffolds. In the second stage, starting from the retrieved scaffold, we employ the MS/MS spectrum to guide an attention-based generative model to generate the final molecule. Our approach constrains the molecular generation search space, reducing its complexity and improving generation accuracy. We evaluate MADGEN on three datasets (NIST23, CANOPUS, and MassSpecGym) and evaluate MADGEN's performance with a predictive scaffold retriever and with an oracle retriever. We demonstrate the effectiveness of using attention to integrate spectral information throughout the generation process to achieve strong results with the oracle retriever.
Authors: David Restrepo, Chenwei Wu, Yueran Jia, Jaden K. Sun, Jack Gallifant, Catherine G. Bielick, Yugang Jia, Leo A. Celi
Abstract: Accurate imputation of missing laboratory values in electronic health records (EHRs) is critical to enable robust clinical predictions and reduce biases in AI systems in healthcare. Existing methods, such as variational autoencoders (VAEs) and decision tree-based approaches such as XGBoost, struggle to model the complex temporal and contextual dependencies in EHR data, mainly in underrepresented groups. In this work, we propose Lab-MAE, a novel transformer-based masked autoencoder framework that leverages self-supervised learning for the imputation of continuous sequential lab values. Lab-MAE introduces a structured encoding scheme that jointly models laboratory test values and their corresponding timestamps, enabling explicit capturing temporal dependencies. Empirical evaluation on the MIMIC-IV dataset demonstrates that Lab-MAE significantly outperforms the state-of-the-art baselines such as XGBoost across multiple metrics, including root mean square error (RMSE), R-squared (R2), and Wasserstein distance (WD). Notably, Lab-MAE achieves equitable performance across demographic groups of patients, advancing fairness in clinical predictions. We further investigate the role of follow-up laboratory values as potential shortcut features, revealing Lab-MAE's robustness in scenarios where such data is unavailable. The findings suggest that our transformer-based architecture, adapted to the characteristics of the EHR data, offers a foundation model for more accurate and fair clinical imputation models. In addition, we measure and compare the carbon footprint of Lab-MAE with the baseline XGBoost model, highlighting its environmental requirements.
Authors: Shi Bin Hoo, Samuel M\"uller, David Salinas, Frank Hutter
Abstract: Foundation models have become popular in forecasting due to their ability to make accurate predictions, even with minimal fine-tuning on specific datasets. In this paper, we demonstrate how the newly released regression variant of TabPFN, a general tabular foundation model, can be applied to time series forecasting. We propose a straightforward approach, TabPFN-TS, which pairs TabPFN with simple feature engineering to achieve strong forecasting performance. Despite its simplicity and with only 11M parameters, TabPFN-TS outperforms Chronos-Mini, a model of similar size, and matches or even slightly outperforms Chronos-Large, which has 65-fold more parameters. A key strength of our method lies in its reliance solely on artificial data during pre-training, avoiding the need for large training datasets and eliminating the risk of benchmark contamination.
Authors: Nandan Kumar Jha, Brandon Reagen
Abstract: The pervasiveness of proprietary language models has raised critical privacy concerns, necessitating advancements in private inference (PI), where computations are performed directly on encrypted data without revealing users' sensitive information. While PI offers a promising solution, its practical deployment is hindered by substantial communication and latency overheads, primarily stemming from nonlinear operations. To address this, we introduce an information-theoretic framework to characterize the role of nonlinearities in decoder-only language models, laying a principled foundation for optimizing transformer-architectures tailored to the demands of PI. By leveraging Shannon's entropy as a quantitative measure, we uncover the previously unexplored dual significance of nonlinearities: beyond ensuring training stability, they are crucial for maintaining attention head diversity. Specifically, we find that their removal triggers two critical failure modes: {\em entropy collapse} in deeper layers that destabilizes training, and {\em entropic overload} in earlier layers that leads to under-utilization of Multi-Head Attention's (MHA) representational capacity. We propose an entropy-guided attention mechanism paired with a novel entropy regularization technique to mitigate entropic overload. Additionally, we explore PI-friendly alternatives to layer normalization for preventing entropy collapse and stabilizing the training of LLMs with reduced-nonlinearities. Our study bridges the gap between information theory and architectural design, establishing entropy dynamics as a principled guide for developing efficient PI architectures. The code and implementation are available at https://github.com/Nandan91/entropy-guided-attention-llm
URLs: https://github.com/Nandan91/entropy-guided-attention-llm
Authors: Xiaoqing Zhang, Ang Lv, Yuhan Liu, Flood Sung, Wei Liu, Shuo Shang, Xiuying Chen, Rui Yan
Abstract: Large language models (LLMs) excel at few-shot in-context learning (ICL) without requiring parameter updates. However, as the number of ICL demonstrations increases from a few to many, performance tends to plateau and eventually decline. We identify two primary causes for this trend: the suboptimal negative log-likelihood (NLL) optimization objective and the incremental data noise. To address these issues, we introduce DrICL, a novel optimization method that enhances model performance through Differentiated Learning and advantage-based Reweighting objectives. Globally, DrICL utilizes differentiated learning to optimize the NLL objective, ensuring that many-shot performance surpasses zero-shot levels. Locally, it dynamically adjusts the weighting of many-shot demonstrations by leveraging cumulative advantages inspired by reinforcement learning, thereby improving generalization. This approach allows the model to handle varying numbers of shots effectively, mitigating the impact of noisy data. Recognizing the lack of multi-task datasets with diverse many-shot distributions, we develop the Many-Shot ICL Benchmark (ICL-50)-a large-scale benchmark of 50 tasks that cover shot numbers from 1 to 350 within sequences of up to 8,000 tokens-for fine-tuning purposes. ICL-50 facilitates the evaluation of many-shot ICL strategies across seven prominent NLP tasks and 50 distinct datasets. Experimental results demonstrate that LLMs enhanced with DrICL achieve significant improvements in many-shot setups across various tasks, including both in-domain and out-of-domain scenarios. We release the code and benchmark dataset hoping to facilitate further research in many-shot ICL.
Authors: Yaozhong Shi, Zachary E. Ross, Domniki Asimaki, Kamyar Azizzadenesheli
Abstract: Expanding on neural operators, we propose a novel framework for stochastic process learning across arbitrary domains. In particular, we develop operator flow matching (OFM) for learning stochastic process priors on function spaces. OFM provides the probability density of the values of any collection of points and enables mathematically tractable functional regression at new points with mean and density estimation. Our method outperforms state-of-the-art models in stochastic process learning, functional regression, and prior learning.
Authors: Wenying Duan, Shujun Guo, Wei huang, Hong Rao, Xiaoxi He
Abstract: Spatial-temporal data, fundamental to many intelligent applications, reveals dependencies indicating causal links between present measurements at specific locations and historical data at the same or other locations. Within this context, adaptive spatial-temporal graph neural networks (ASTGNNs) have emerged as valuable tools for modelling these dependencies, especially through a data-driven approach rather than pre-defined spatial graphs. While this approach offers higher accuracy, it presents increased computational demands. Addressing this challenge, this paper delves into the concept of localisation within ASTGNNs, introducing an innovative perspective that spatial dependencies should be dynamically evolving over time. We introduce \textit{DynAGS}, a localised ASTGNN framework aimed at maximising efficiency and accuracy in distributed deployment. This framework integrates dynamic localisation, time-evolving spatial graphs, and personalised localisation, all orchestrated around the Dynamic Graph Generator, a light-weighted central module leveraging cross attention. The central module can integrate historical information in a node-independent manner to enhance the feature representation of nodes at the current moment. This improved feature representation is then used to generate a dynamic sparse graph without the need for costly data exchanges, and it supports personalised localisation. Performance assessments across two core ASTGNN architectures and nine real-world datasets from various applications reveal that \textit{DynAGS} outshines current benchmarks, underscoring that the dynamic modelling of spatial dependencies can drastically improve model expressibility, flexibility, and system efficiency, especially in distributed settings.
Authors: Haonan Shi, Tu Ouyang, An Wang
Abstract: Instruction tuning has proven effective in enhancing Large Language Models' (LLMs) performance on downstream tasks. However, real-world fine-tuning faces inherent conflicts between model providers' intellectual property protection, clients' data privacy requirements, and tuning costs. While recent approaches like split learning and offsite tuning demonstrate promising architectures for privacy-preserving fine-tuning, there is a gap in systematically addressing the multidimensional trade-offs required for diverse real-world deployments. We propose several indicative evaluation metrics to guide design trade-offs for privacy-preserving fine-tuning and a series of example designs, collectively named GuardedTuning; they result from novel combinations of system architectures with adapted privacy-enhancement methods and emerging computation techniques. Each design represents distinct trade-offs across model utility, privacy guarantees, and costs. Experimental results demonstrate that these designs protect against data reconstruction attacks while maintaining competitive fine-tuning performance.
Authors: Van Thien Nguyen, William Guicquero, Gilles Sicard
Abstract: Adjusting the quantization according to the data or to the model loss seems mandatory to enable a high accuracy in the context of quantized neural networks. This work presents Histogram-Equalized Quantization (HEQ), an adaptive framework for linear symmetric quantization. HEQ automatically adapts the quantization thresholds using a unique step size optimization. We empirically show that HEQ achieves state-of-the-art performances on CIFAR-10. Experiments on the STL-10 dataset even show that HEQ enables a proper training of our proposed logic-gated (OR, MUX) residual networks with a higher accuracy at a lower hardware complexity than previous work.
Authors: Hugo Freeman
Abstract: This paper studies a linear and additively separable regression model for multidimensional panel data of three or more dimensions with unobserved interactive fixed effects. The main estimator follows a double debias approach, and requires two preliminary steps to control unobserved heterogeneity. First, the model is embedded within the standard two-dimensional panel framework and restrictions are formed under which the factor structure methods in Bai (2009) lead to consistent estimation of model parameters, but at slow rates of convergence. The second step develops a weighted fixed-effects method that is robust to the multidimensional nature of the problem and achieves the parametric rate of consistency. This second step is combined with a double debias procedure for asymptotically normal slope estimates. The methods are implemented to estimate the demand elasticity for beer.
Authors: Kun Li, Liangshu Zhu
Abstract: Uplift modeling has been used effectively in fields such as marketing and customer retention, to target those customers who are more likely to respond due to the campaign or treatment. Essentially, it is a machine learning technique that predicts the gain from performing some action with respect to not taking it. A popular class of uplift models is the transformation approach that redefines the target variable with the original treatment indicator. These transformation approaches only need to train and predict the difference in outcomes directly. The main drawback of these approaches is that in general it does not use the information in the treatment indicator beyond the construction of the transformed outcome and usually is not efficient. In this paper, we design a novel transformed outcome for the case of the binary target variable and unlock the full value of the samples with zero outcome. From a practical perspective, our new approach is flexible and easy to use. Experimental results on synthetic and real-world datasets obviously show that our new approach outperforms the traditional one. At present, our new approach has already been applied to precision marketing in a China nation-wide financial holdings group.
Authors: Yusuf Brima, Ulf Krumnack, Simone Pika, Gunther Heidemann
Abstract: Disentangled representation learning in speech processing has lagged behind other domains, largely due to the lack of datasets with annotated generative factors for robust evaluation. To address this, we propose SynSpeech, a novel large-scale synthetic speech dataset specifically designed to enable research on disentangled speech representations. SynSpeech includes controlled variations in speaker identity, spoken text, and speaking style, with three dataset versions to support experimentation at different levels of complexity. In this study, we present a comprehensive framework to evaluate disentangled representation learning techniques, applying both linear probing and established supervised disentanglement metrics to assess the modularity, compactness, and informativeness of the representations learned by a state-of-the-art model. Using the RAVE model as a test case, we find that SynSpeech facilitates benchmarking across a range of factors, achieving promising disentanglement of simpler features like gender and speaking style, while highlighting challenges in isolating complex attributes like speaker identity. This benchmark dataset and evaluation framework fills a critical gap, supporting the development of more robust and interpretable speech representation learning methods.
Authors: Tao Sun, Yan Hao, Shengyu Huang, Silvio Savarese, Konrad Schindler, Marc Pollefeys, Iro Armeni
Abstract: Building 3D geometric maps of man-made spaces is a well-established and active field that is fundamental to computer vision and robotics. However, considering the evolving nature of built environments, it is essential to question the capabilities of current mapping efforts in handling temporal changes. In addition, spatiotemporal mapping holds significant potential for achieving sustainability and circularity goals. Existing mapping approaches focus on small changes, such as object relocation or self-driving car operation; in all cases where the main structure of the scene remains fixed. Consequently, these approaches fail to address more radical changes in the structure of the built environment, such as geometry and topology. To this end, we introduce the Nothing Stands Still (NSS) benchmark, which focuses on the spatiotemporal registration of 3D scenes undergoing large spatial and temporal change, ultimately creating one coherent spatiotemporal map. Specifically, the benchmark involves registering two or more partial 3D point clouds (fragments) from the same scene but captured from different spatiotemporal views. In addition to the standard pairwise registration, we assess the multi-way registration of multiple fragments that belong to any temporal stage. As part of NSS, we introduce a dataset of 3D point clouds recurrently captured in large-scale building indoor environments that are under construction or renovation. The NSS benchmark presents three scenarios of increasing difficulty, to quantify the generalization ability of point cloud registration methods over space (within one building and across buildings) and time. We conduct extensive evaluations of state-of-the-art methods on NSS. The results demonstrate the necessity for novel methods specifically designed to handle large spatiotemporal changes. The homepage of our benchmark is at http://nothing-stands-still.com.
Authors: Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue
Abstract: Multimodal large language models (MLLMs) have gained significant attention due to their strong multimodal understanding capability. However, existing works rely heavily on modality-specific encoders, which usually differ in architecture and are limited to common modalities. In this paper, we present OneLLM, an MLLM that aligns eight modalities to language using a unified framework. We achieve this through a unified multimodal encoder and a progressive multimodal alignment pipeline. In detail, we first train an image projection module to connect a vision encoder with LLM. Then, we build a universal projection module (UPM) by mixing multiple image projection modules and dynamic routing. Finally, we progressively align more modalities to LLM with the UPM. To fully leverage the potential of OneLLM in following instructions, we also curated a comprehensive multimodal instruction dataset, including 2M items from image, audio, video, point cloud, depth/normal map, IMU and fMRI brain activity. OneLLM is evaluated on 25 diverse benchmarks, encompassing tasks such as multimodal captioning, question answering and reasoning, where it delivers excellent performance. Code, data, model and online demo are available at https://github.com/csuhan/OneLLM
Authors: Dyah A. M. G. Wisnu, Stefano Rini, Ryandhimas E. Zezario, Hsin-Min Wang, Yu Tsao
Abstract: This paper introduces HAAQI-Net, a non-intrusive deep learning-based music audio quality assessment model for hearing aid users. Unlike traditional methods like the Hearing Aid Audio Quality Index (HAAQI) that require intrusive reference signal comparisons, HAAQI-Net offers a more accessible and computationally efficient alternative. By utilizing a Bidirectional Long Short-Term Memory (BLSTM) architecture with attention mechanisms and features extracted from the pre-trained BEATs model, it can predict HAAQI scores directly from music audio clips and hearing loss patterns. Experimental results demonstrate HAAQI-Net's effectiveness, achieving a Linear Correlation Coefficient (LCC) of 0.9368 , a Spearman's Rank Correlation Coefficient (SRCC) of 0.9486 , and a Mean Squared Error (MSE) of 0.0064 and inference time significantly reduces from 62.52 to 2.54 seconds. To address computational overhead, a knowledge distillation strategy was applied, reducing parameters by 75.85% and inference time by 96.46%, while maintaining strong performance (LCC: 0.9071 , SRCC: 0.9307 , MSE: 0.0091 ). To expand its capabilities, HAAQI-Net was adapted to predict subjective human scores like the Mean Opinion Score (MOS) through fine-tuning. This adaptation significantly improved prediction accuracy, validated through statistical analysis. Furthermore, the robustness of HAAQI-Net was evaluated under varying Sound Pressure Level (SPL) conditions, revealing optimal performance at a reference SPL of 65 dB, with accuracy gradually decreasing as SPL deviated from this point. The advancements in subjective score prediction, SPL robustness, and computational efficiency position HAAQI-Net as a scalable solution for music audio quality assessment in hearing aid applications, contributing to efficient and accurate models in audio signal processing and hearing aid technology.
Authors: Philipp Spitzer, Joshua Holstein, Patrick Hemmer, Michael V\"ossing, Niklas K\"uhl, Dominik Martin, Gerhard Satzger
Abstract: The integration of artificial intelligence (AI) into human decision-making processes at the workplace presents both opportunities and challenges. One promising approach to leverage existing complementary capabilities is allowing humans to delegate individual instances of decision tasks to AI. However, enabling humans to delegate instances effectively requires them to assess several factors. One key factor is the analysis of both their own capabilities and those of the AI in the context of the given task. In this work, we conduct a behavioral study to explore the effects of providing contextual information to support this delegation decision. Specifically, we investigate how contextual information about the AI and the task domain influence humans' delegation decisions to an AI and their impact on the human-AI team performance. Our findings reveal that access to contextual information significantly improves human-AI team performance in delegation settings. Finally, we show that the delegation behavior changes with the different types of contextual information. Overall, this research advances the understanding of computer-supported, collaborative work and provides actionable insights for designing more effective collaborative systems.
Authors: Massimiliano Datres, Gian Paolo Leonardi, Alessio Figalli, David Sutter
Abstract: We introduce a novel capacity measure 2sED for statistical models based on the effective dimension. The new quantity provably bounds the generalization error under mild assumptions on the model. Furthermore, simulations on standard data sets and popular model architectures show that 2sED correlates well with the training error. For Markovian models, we show how to efficiently approximate 2sED from below through a layerwise iterative approach, which allows us to tackle deep learning models with a large number of parameters. Simulation results suggest that the approximation is good for different prominent models and data sets.
Authors: Yihong Tang, Zhaokai Wang, Ao Qu, Yihao Yan, Zhaofeng Wu, Dingyi Zhuang, Jushi Kai, Kebing Hou, Xiaotong Guo, Han Zheng, Tiange Luo, Jinhua Zhao, Zhan Zhao, Wei Ma
Abstract: Citywalk, a recently popular form of urban travel, requires genuine personalization and understanding of fine-grained requests compared to traditional itinerary planning. In this paper, we introduce the novel task of Open-domain Urban Itinerary Planning (OUIP), which generates personalized urban itineraries from user requests in natural language. We then present ITINERA, an OUIP system that integrates spatial optimization with large language models to provide customized urban itineraries based on user needs. This involves decomposing user requests, selecting candidate points of interest (POIs), ordering the POIs based on cluster-aware spatial optimization, and generating the itinerary. Experiments on real-world datasets and the performance of the deployed system demonstrate our system's capacity to deliver personalized and spatially coherent itineraries compared to current solutions. Source codes of ITINERA are available at https://github.com/YihongT/ITINERA.
Authors: Pengchao Han, Chao Huang, Geng Tian, Ming Tang, Xin Liu
Abstract: Split federated learning (SFL) is a recent distributed approach for collaborative model training among multiple clients. In SFL, a global model is typically split into two parts, where clients train one part in a parallel federated manner, and a main server trains the other. Despite the recent research on SFL algorithm development, the convergence analysis of SFL is missing in the literature, and this paper aims to fill this gap. The analysis of SFL can be more challenging than that of federated learning (FL), due to the potential dual-paced updates at the clients and the main server. We provide convergence analysis of SFL for strongly convex and general convex objectives on heterogeneous data. The convergence rates are $O(1/T)$ and $O(1/\sqrt[3]{T})$, respectively, where $T$ denotes the total number of rounds for SFL training. We further extend the analysis to non-convex objectives and the scenario where some clients may be unavailable during training. Experimental experiments validate our theoretical results and show that SFL outperforms FL and split learning (SL) when data is highly heterogeneous across a large number of clients.
Authors: Pawanesh Pawanesh, Charu Sharma, Niteesh Sahni
Abstract: In this study, we model the Indian stock market as heterogenous scale free network, which is then embedded in a two dimensional hyperbolic space through a machine learning based technique called as coalescent embedding. This allows us to apply the hyperbolic kmeans algorithm on the Poincare disc and the clusters so obtained resemble the original network communities more closely than the clusters obtained via Euclidean kmeans on the basis of well-known measures normalised mutual information and adjusted mutual information. Through this, we are able to clearly distinguish between periods of market stability and volatility by applying non-parametric statistical tests with a significance level of 0.05 to geometric measures namely hyperbolic distance and hyperbolic shortest path distance. After that, we are able to spot significant market change early by leveraging the Bollinger Band analysis on the time series of modularity in the embedded networks of each window. Finally, the radial distance and the Equidistance Angular coordinates help in visualizing the embedded network in the Poincare disc and it is seen that specific market sectors cluster together.
Authors: Ruben Ciranni, Giorgio Mariani, Michele Mancusi, Emilian Postolache, Giorgio Fabbro, Emanuele Rodol\`a, Luca Cosmo
Abstract: We present COCOLA (Coherence-Oriented Contrastive Learning for Audio), a contrastive learning method for musical audio representations that captures the harmonic and rhythmic coherence between samples. Our method operates at the level of the stems composing music tracks and can input features obtained via Harmonic-Percussive Separation (HPS). COCOLA allows the objective evaluation of generative models for music accompaniment generation, which are difficult to benchmark with established metrics. In this regard, we evaluate recent music accompaniment generation models, demonstrating the effectiveness of the proposed method. We release the model checkpoints trained on public datasets containing separate stems (MUSDB18-HQ, MoisesDB, Slakh2100, and CocoChorales).
Authors: Niccolo Anceschi, Federico Ferrari, David B. Dunson, Himel Mallick
Abstract: It is increasingly common in a wide variety of applied settings to collect data of multiple different types on the same set of samples. Our particular focus in this article is on studying relationships between such multiview features and responses. A motivating application arises in the context of precision medicine where multi-omics data are collected to correlate with clinical outcomes. It is of interest to infer dependence within and across views while combining multimodal information to improve the prediction of outcomes. The signal-to-noise ratio can vary substantially across views, motivating more nuanced statistical tools beyond standard late and early fusion. This challenge comes with the need to preserve interpretability, select features, and obtain accurate uncertainty quantification. We propose a joint additive factor regression model (JAFAR) with a structured additive design, accounting for shared and view-specific components. We ensure identifiability via a novel dependent cumulative shrinkage process (D-CUSP) prior. We provide an efficient implementation via a partially collapsed Gibbs sampler and extend our approach to allow flexible feature and outcome distributions. Prediction of time-to-labor onset from immunome, metabolome, and proteome data illustrates performance gains against state-of-the-art competitors. Our open-source software (R package) is available at https://github.com/niccoloanceschi/jafar.
Authors: Rob Cornish
Abstract: We consider the problem of symmetrising a neural network along a group homomorphism: given a homomorphism $\varphi : H \to G$, we would like a procedure that converts $H$-equivariant neural networks to $G$-equivariant ones. We formulate this in terms of Markov categories, which allows us to consider neural networks whose outputs may be stochastic, but with measure-theoretic details abstracted away. We obtain a flexible and compositional framework for symmetrisation that relies on minimal assumptions about the structure of the group and the underlying neural network architecture. Our approach recovers existing canonicalisation and averaging techniques for symmetrising deterministic models, and extends to provide a novel methodology for symmetrising stochastic models also. Beyond this, our findings also demonstrate the utility of Markov categories for addressing complex problems in machine learning in a conceptually clear yet mathematically precise way.
Authors: Muhammad Akbar Husnoo, Adnan Anwar, Md Enamul Haque, A. N. Mahmood
Abstract: The increasing security and privacy concerns in the Smart Grid sector have led to a significant demand for robust intrusion detection systems within critical smart grid infrastructure. To address the challenges posed by privacy preservation and decentralized power system zones with distinct data ownership, Federated Learning (FL) has emerged as a promising privacy-preserving solution which facilitates collaborative training of attack detection models without necessitating the sharing of raw data. However, FL presents several implementation limitations in the power system domain due to its heavy reliance on a centralized aggregator and the risks of privacy leakage during model update transmission. To overcome these technical bottlenecks, this paper introduces a novel decentralized federated anomaly detection scheme based on two main gossip protocols namely Random Walk and Epidemic. Our findings indicate that the Random Walk protocol exhibits superior performance compared to the Epidemic protocol, highlighting its efficacy in decentralized federated learning environments. Experimental validation of the proposed framework utilizing publicly available industrial control systems datasets demonstrates superior attack detection accuracy while safeguarding data confidentiality and mitigating the impact of communication latency and stragglers. Furthermore, our approach yields a notable 35% improvement in training time compared to conventional FL, underscoring the efficacy and robustness of our decentralized learning method.
Authors: Nicholas Karris, Evangelos A. Nikitopoulos, Ioannis Kevrekidis, Seungjoon Lee, Alexander Cloninger
Abstract: We develop an algorithm to approximate the time evolution of a probability distribution without explicitly learning an operator that governs the evolution. A particular application of interest is discrete measures $\mu_t^N$ that arise from systems of $N$ particles in $\mathbb R^d$. In many such situations, the individual particles move chaotically on short time scales, making it difficult to learn the dynamics of a governing operator, but the bulk distribution $\mu_t^N$ approximates an absolutely continuous measure $\mu_t$ that evolves ``smoothly.'' If $\mu_t$ is known on some time interval, then linearized optimal transport theory provides an Euler-like scheme for approximating the evolution of $\mu_t$ using its ``tangent vector field'' (represented as a time-dependent vector field on $\mathbb R^d$), which can be computed as a limit of optimal transport maps. We propose an analog of this Euler approximation to predict the evolution of the discrete measure $\mu_t^N$ (without knowing $\mu_t$). To approximate the analogous tangent vector field, we use a finite difference over a time step that sits between two time scales of the system -- long enough for a large-$N$ evolution ($\mu_t$) to emerge but short enough to satisfactorily approximate the derivative object used in the Euler scheme. The emergence of the limiting behavior ensures the optimal transport maps closely approximate the vector field describing the bulk distribution's smooth evolution instead of the individual particles' more chaotic movements. We demonstrate the efficacy of our approach with two illustrative examples, Gaussian diffusion and a cell chemotaxis model, and show that our method succeeds in predicting the bulk behavior over relatively large steps.
Authors: Ying-Chieh Hsu, Stanley Yung-Chuan Liu, Chao-Jung Huang, Chi-Wei Wu, Ren-Kai Cheng, Jane Yung-Jen Hsu, Shang-Ran Huang, Yuan-Ren Cheng, Fu-Shun Hsu
Abstract: This study investigated the use of deep learning to identify multi-level upper airway collapses in obstructive sleep apnea (OSA) patients based on snoring sounds. We fi-ne-tuned ResNet-50 and Audio Spectrogram Transformer (AST) models using snoring recordings from 37 subjects undergoing drug-induced sleep endoscopy (DISE) between 2020 and 2021. Snoring sounds were labeled according to the VOTE (Velum, Orophar-ynx, Tongue Base, Epiglottis) classification, resulting in 259 V, 403 O, 77 T, 13 E, 1016 VO, 46 VT, 140 OT, 39 OE, 30 VOT, and 3150 non-snoring (N) 0.5-second clips. The models were trained for two multi-label classification tasks: identifying obstructions at V, O, T, and E levels, and identifying retropalatal (RP) and retroglossal (RG) obstruc-tions. Results showed AST slightly outperformed ResNet-50, demonstrating good abil-ity to identify V (F1-score: 0.71, MCC: 0.61, AUC: 0.89), O (F1-score: 0.80, MCC: 0.72, AUC: 0.94), and RP obstructions (F1-score: 0.86, MCC: 0.77, AUC: 0.97). However, both models struggled with T, E, and RG classifications due to limited data. Retrospective analysis of a full-night recording showed the potential to profile airway obstruction dynamics. We expect this information, combined with polysomnography and other clinical parameters, can aid clinical triage and treatment planning for OSA patients.
Authors: Will Dorrell, Kyle Hsu, Luke Hollingsworth, Jin Hwa Lee, Jiajun Wu, Chelsea Finn, Peter E Latham, Tim EJ Behrens, James CR Whittington
Abstract: Why do biological and artificial neurons sometimes modularise, each encoding a single meaningful variable, and sometimes entangle their representation of many variables? In this work, we develop a theory of when biologically inspired networks -- those that are nonnegative and energy efficient -- modularise their representation of source variables (sources). We derive necessary and sufficient conditions on a sample of sources that determine whether the neurons in an optimal biologically-inspired linear autoencoder modularise. Our theory applies to any dataset, extending far beyond the case of statistical independence studied in previous work. Rather we show that sources modularise if their support is ``sufficiently spread''. From this theory, we extract and validate predictions in a variety of empirical studies on how data distribution affects modularisation in nonlinear feedforward and recurrent neural networks trained on supervised and unsupervised tasks. Furthermore, we apply these ideas to neuroscience data, showing that range independence can be used to understand the mixing or modularising of spatial and reward information in entorhinal recordings in seemingly conflicting experiments. Further, we use these results to suggest alternate origins of mixed-selectivity, beyond the predominant theory of flexible nonlinear classification. In sum, our theory prescribes precise conditions on when neural activities modularise, providing tools for inducing and elucidating modular representations in brains and machines.
Authors: Mahjabin Nahar, Sian Lee, Becky Guillen, Dongwon Lee
Abstract: This paper explores the current state of generative AI policies of computer science conferences and offers guidelines for policy adoption.
Authors: Haokun Liu, Yangqiaoyu Zhou, Mingxuan Li, Chenfei Yuan, Chenhao Tan
Abstract: AI holds promise for transforming scientific processes, including hypothesis generation. Prior work on hypothesis generation can be broadly categorized into theory-driven and data-driven approaches. While both have proven effective in generating novel and plausible hypotheses, it remains an open question whether they can complement each other. To address this, we develop the first method that combines literature-based insights with data to perform LLM-powered hypothesis generation. We apply our method on five different datasets and demonstrate that integrating literature and data outperforms other baselines (8.97\% over few-shot, 15.75\% over literature-based alone, and 3.37\% over data-driven alone). Additionally, we conduct the first human evaluation to assess the utility of LLM-generated hypotheses in assisting human decision-making on two challenging tasks: deception detection and AI generated content detection. Our results show that human accuracy improves significantly by 7.44\% and 14.19\% on these tasks, respectively. These findings suggest that integrating literature-based and data-driven approaches provides a comprehensive and nuanced framework for hypothesis generation and could open new avenues for scientific inquiry.
Authors: Jakob Kienegger, Alina Mannanova, Timo Gerkmann
Abstract: Due to their robustness and flexibility, neural-driven beamformers are a popular choice for speech separation in challenging environments with a varying amount of simultaneous speakers alongside noise and reverberation. Time-frequency masks and relative directions of the speakers regarding a fixed spatial grid can be used to estimate the beamformer's parameters. To some degree, speaker-independence is achieved by ensuring a greater amount of spatial partitions than speech sources. In this work, we analyze how to encode both mask and positioning into such a grid to enable joint estimation of both quantities. We propose mask-weighted spatial likelihood coding and show that it achieves considerable performance in both tasks compared to baseline encodings optimized for either localization or mask estimation. In the same setup, we demonstrate superiority for joint estimation of both quantities. Conclusively, we propose a universal approach which can replace an upstream sound source localization system solely by adapting the training framework, making it highly relevant in performance-critical scenarios.
Authors: An Wang, Li Zou
Abstract: We propose a unified framework based on persistent homology (PH) to characterize both local and global structures in disordered systems. It can simultaneously generate local and global descriptors using the same algorithm and data structure, and has shown to be highly effective and interpretable in predicting particle rearrangements and classifying global phases. We also demonstrated that using a single variable enables a linear SVM to achieve nearly perfect three-phase classification. Inspired by this discovery, we define a non-parametric metric, the Separation Index (SI), which not only achieves this classification without sacrificing significant performance but also establishes a connection between particle environments and the global phase structure. Our methods provide an effective framework for understanding and analyzing the properties of disordered materials, with broad potential applications in materials science and even wider studies of complex systems.
Authors: Shahriar Soudeep, M. F. Mridha, Md Abrar Jahin, Nilanjan Dey
Abstract: The detection and tracking of small, occluded objects such as pedestrians, cyclists, and motorbikes pose significant challenges for traffic surveillance systems because of their erratic movement, frequent occlusion, and poor visibility in dynamic urban environments. Traditional methods like YOLO11, while proficient in spatial feature extraction for precise detection, often struggle with these small and dynamically moving objects, particularly in handling real-time data updates and resource efficiency. This paper introduces DGNN-YOLO, a novel framework that integrates dynamic graph neural networks (DGNNs) with YOLO11 to address these limitations. Unlike standard GNNs, DGNNs are chosen for their superior ability to dynamically update graph structures in real-time, which enables adaptive and robust tracking of objects in highly variable urban traffic scenarios. This framework constructs and regularly updates its graph representations, capturing objects as nodes and their interactions as edges, thus effectively responding to rapidly changing conditions. Additionally, DGNN-YOLO incorporates Grad-CAM, Grad-CAM++, and Eigen-CAM visualization techniques to enhance interpretability and foster trust, offering insights into the model's decision-making process. Extensive experiments validate the framework's performance, achieving a precision of 0.8382, recall of 0.6875, and mAP@0.5:0.95 of 0.6476, significantly outperforming existing methods. This study offers a scalable and interpretable solution for real-time traffic surveillance and significantly advances intelligent transportation systems' capabilities by addressing the critical challenge of detecting and tracking small, occluded objects.
Authors: Ben Liu, Jihai Zhang, Fangquan Lin, Cheng Yang, Min Peng
Abstract: Large Language Models (LLMs) present massive inherent knowledge and superior semantic comprehension capability, which have revolutionized various tasks in natural language processing. Despite their success, a critical gap remains in enabling LLMs to perform knowledge graph completion (KGC). Empirical evidence suggests that LLMs consistently perform worse than conventional KGC approaches, even through sophisticated prompt design or tailored instruction-tuning. Fundamentally, applying LLMs on KGC introduces several critical challenges, including a vast set of entity candidates, hallucination issue of LLMs, and under-exploitation of the graph structure. To address these challenges, we propose a novel instruction-tuning-based method, namely FtG. Specifically, we present a \textit{filter-then-generate} paradigm and formulate the KGC task into a multiple-choice question format. In this way, we can harness the capability of LLMs while mitigating the issue casused by hallucinations. Moreover, we devise a flexible ego-graph serialization prompt and employ a structure-text adapter to couple structure and text information in a contextualized manner. Experimental results demonstrate that FtG achieves substantial performance gain compared to existing state-of-the-art methods. The instruction dataset and code are available at \url{https://github.com/LB0828/FtG}.
Authors: Jaione Bengoetxea, Mikel Zubillaga, Ekhi Azurmendi, Maite Heredia, Julen Etxaniz, Markel Ferro, Jeremy Barnes
Abstract: In this paper we present our submission for the NorSID Shared Task as part of the 2025 VarDial Workshop (Scherrer et al., 2025), consisting of three tasks: Intent Detection, Slot Filling and Dialect Identification, evaluated using data in different dialects of the Norwegian language. For Intent Detection and Slot Filling, we have fine-tuned a multitask model in a cross-lingual setting, to leverage the xSID dataset available in 17 languages. In the case of Dialect Identification, our final submission consists of a model fine-tuned on the provided development set, which has obtained the highest scores within our experiments. Our final results on the test set show that our models do not drop in performance compared to the development set, likely due to the domain-specificity of the dataset and the similar distribution of both subsets. Finally, we also report an in-depth analysis of the provided datasets and their artifacts, as well as other sets of experiments that have been carried out but did not yield the best results. Additionally, we present an analysis on the reasons why some methods have been more successful than others; mainly the impact of the combination of languages and domain-specificity of the training data on the results.
Authors: Jiaqi Xiong, Nan Yin, Shiyang Liang, Haoyang Li, Yingxu Wang, Duo Ai, Fang Pan, Jingjie Wang
Abstract: Inferencing Gene Regulatory Networks (GRNs) from gene expression data is a pivotal challenge in systems biology, and several innovative computational methods have been introduced. However, most of these studies have not considered the skewed degree distribution of genes. Specifically, some genes may regulate multiple target genes while some genes may be regulated by multiple regulator genes. Such a skewed degree distribution issue significantly complicates the application of directed graph embedding methods. To tackle this issue, we propose the Cross-Attention Complex Dual Graph Embedding Model (XATGRN). Our XATGRN employs a cross-attention mechanism to effectively capture intricate gene interactions from gene expression profiles. Additionally, it uses a Dual Complex Graph Embedding approach to manage the skewed degree distribution, thereby ensuring precise prediction of regulatory relationships and their directionality. Our model consistently outperforms existing state-of-the-art methods across various datasets, underscoring its efficacy in elucidating complex gene regulatory mechanisms. Our codes used in this paper are publicly available at: https://github.com/kikixiong/XATGRN.
Authors: Melody Y. Guan, Manas Joglekar, Eric Wallace, Saachi Jain, Boaz Barak, Alec Helyar, Rachel Dias, Andrea Vallone, Hongyu Ren, Jason Wei, Hyung Won Chung, Sam Toyer, Johannes Heidecke, Alex Beutel, Amelia Glaese
Abstract: As large-scale language models increasingly impact safety-critical domains, ensuring their reliable adherence to well-defined principles remains a fundamental challenge. We introduce Deliberative Alignment, a new paradigm that directly teaches the model safety specifications and trains it to explicitly recall and accurately reason over the specifications before answering. We used this approach to align OpenAI's o-series models, and achieved highly precise adherence to OpenAI's safety policies, without requiring human-written chain-of-thoughts or answers. Deliberative Alignment pushes the Pareto frontier by simultaneously increasing robustness to jailbreaks while decreasing overrefusal rates, and also improves out-of-distribution generalization. We demonstrate that reasoning over explicitly specified policies enables more scalable, trustworthy, and interpretable alignment.
Authors: Anushrut Jignasu, Ethan Herron, Zhanhong Jiang, Soumik Sarkar, Chinmay Hegde, Baskar Ganapathysubramanian, Aditya Balu, Adarsh Krishnamurthy
Abstract: We present STITCH, a novel approach for neural implicit surface reconstruction of a sparse and irregularly spaced point cloud while enforcing topological constraints (such as having a single connected component). We develop a new differentiable framework based on persistent homology to formulate topological loss terms that enforce the prior of a single 2-manifold object. Our method demonstrates excellent performance in preserving the topology of complex 3D geometries, evident through both visual and empirical comparisons. We supplement this with a theoretical analysis, and provably show that optimizing the loss with stochastic (sub)gradient descent leads to convergence and enables reconstructing shapes with a single connected component. Our approach showcases the integration of differentiable topological data analysis tools for implicit surface reconstruction.
Authors: Yijia Xiao, Edward Sun, Di Luo, Wei Wang
Abstract: Significant progress has been made in automated problem-solving using societies of agents powered by large language models (LLMs). In finance, efforts have largely focused on single-agent systems handling specific tasks or multi-agent frameworks independently gathering data. However, multi-agent systems' potential to replicate real-world trading firms' collaborative dynamics remains underexplored. TradingAgents proposes a novel stock trading framework inspired by trading firms, featuring LLM-powered agents in specialized roles such as fundamental analysts, sentiment analysts, technical analysts, and traders with varied risk profiles. The framework includes Bull and Bear researcher agents assessing market conditions, a risk management team monitoring exposure, and traders synthesizing insights from debates and historical data to make informed decisions. By simulating a dynamic, collaborative trading environment, this framework aims to improve trading performance. Detailed architecture and extensive experiments reveal its superiority over baseline models, with notable improvements in cumulative returns, Sharpe ratio, and maximum drawdown, highlighting the potential of multi-agent LLM frameworks in financial trading. More details on TradingAgents are available at https://TradingAgents-AI.github.io.
Authors: Sharika J. Hegde, Max T. M. Ng, Marcos Rios, Hani S. Mahmassani, Ying Chen, Karen Smilowitz
Abstract: The COVID-19 vaccine development, manufacturing, transportation, and administration proved an extreme logistics operation of global magnitude. Global vaccination levels, however, remain a key concern in preventing the emergence of new strains and minimizing the impact of the pandemic's disruption of daily life. In this paper, country-level vaccination rates are analyzed through a queuing framework to extract service rates that represent the practical capacity of a country to administer vaccines. These rates are further characterized through regression and interpretable machine learning methods with country-level demographic, governmental, and socio-economic variates. Model results show that participation in multi-governmental collaborations such as COVAX may improve the ability to vaccinate. Similarly, improved transportation and accessibility variates such as roads per area for low-income countries and rail lines per area for high-income countries can improve rates. It was also found that for low-income countries specifically, improvements in basic and health infrastructure (as measured through spending on healthcare, number of doctors and hospital beds per 100k, population percent with access to electricity, life expectancy, and vehicles per 1000 people) resulted in higher vaccination rates. Of the high-income countries, those with larger 65-plus populations struggled to vaccinate at high rates, indicating potential accessibility issues for the elderly. This study finds that improving basic and health infrastructure, focusing on accessibility in the last mile, particularly for the elderly, and fostering global partnerships can improve logistical operations of such a scale. Such structural impediments and inequities in global health care must be addressed in preparation for future global public health crises.
Authors: Valery Istomin, Oleg Pereziabov, Ilya Afanasyev
Abstract: This research focuses on developing a method for restoring the topology of digital images of paper documents captured by a camera, using algorithms for detection, segmentation, geometry restoration, and dewarping. Our methodology employs deep learning (DL) for document outline detection, followed by computer vision (CV) to create a topological 2D grid using cubic polynomial interpolation and correct nonlinear distortions by remapping the image. Using classical CV methods makes the document topology restoration process more efficient and faster, as it requires significantly fewer computational resources and memory. We developed a new pipeline for automatic document dewarping and reconstruction, along with a framework and annotated dataset to demonstrate its efficiency. Our experiments confirm the promise of our methodology and its superiority over existing benchmarks (including mobile apps and popular DL solutions, such as RectiNet, DocGeoNet, and DocTr++) both visually and in terms of document readability via Optical Character Recognition (OCR) and geometry restoration metrics. This paves the way for creating high-quality digital copies of paper documents and enhancing the efficiency of OCR systems. Project page: https://github.com/HorizonParadox/DRCCBI
Authors: Yu Ishihara, Noriaki Takasugi, Kotaro Kawakami, Masaya Kinoshita, Kazumi Aoyama
Abstract: Reinforcement learning has become an essential algorithm for generating complex robotic behaviors. However, to learn such behaviors, it is necessary to design a reward function that describes the task, which often consists of multiple objectives that needs to be balanced. This tuning process is known as reward engineering and typically involves extensive trial-and-error. In this paper, to avoid this trial-and-error process, we propose the concept of Constraints as Rewards (CaR). CaR formulates the task objective using multiple constraint functions instead of a reward function and solves a reinforcement learning problem with constraints using the Lagrangian-method. By adopting this approach, different objectives are automatically balanced, because Lagrange multipliers serves as the weights among the objectives. In addition, we will demonstrate that constraints, expressed as inequalities, provide an intuitive interpretation of the optimization target designed for the task. We apply the proposed method to the standing-up motion generation task of a six-wheeled-telescopic-legged robot and demonstrate that the proposed method successfully acquires the target behavior, even though it is challenging to learn with manually designed reward functions.
Authors: Yichao Zhong, Chong Zhang, Tairan He, Guanya Shi
Abstract: Real-world legged locomotion systems often need to reconcile agility and safety for different scenarios. Moreover, the underlying dynamics are often unknown and time-variant (e.g., payload, friction). In this paper, we introduce BAS (Bridging Adaptivity and Safety), which builds upon the pipeline of prior work Agile But Safe (ABS)(He et al.) and is designed to provide adaptive safety even in dynamic environments with uncertainties. BAS involves an agile policy to avoid obstacles rapidly and a recovery policy to prevent collisions, a physical parameter estimator that is concurrently trained with agile policy, and a learned control-theoretic RA (reach-avoid) value network that governs the policy switch. Also, the agile policy and RA network are both conditioned on physical parameters to make them adaptive. To mitigate the distribution shift issue, we further introduce an on-policy fine-tuning phase for the estimator to enhance its robustness and accuracy. The simulation results show that BAS achieves 50% better safety than baselines in dynamic environments while maintaining a higher speed on average. In real-world experiments, BAS shows its capability in complex environments with unknown physics (e.g., slippery floors with unknown frictions, unknown payloads up to 8kg), while baselines lack adaptivity, leading to collisions or. degraded agility. As a result, BAS achieves a 19.8% increase in speed and gets a 2.36 times lower collision rate than ABS in the real world. Videos: https://adaptive-safe-locomotion.github.io.
Authors: Hyungjin Chung, Dohun Lee, Zihui Wu, Byung-Hoon Kim, Katherine L. Bouman, Jong Chul Ye
Abstract: Compressed sensing MRI seeks to accelerate MRI acquisition processes by sampling fewer k-space measurements and then reconstructing the missing data algorithmically. The success of these approaches often relies on strong priors or learned statistical models. While recent diffusion model-based priors have shown great potential, previous methods typically ignore clinically available metadata (e.g. patient demographics, imaging parameters, slice-specific information). In practice, metadata contains meaningful cues about the anatomy and acquisition protocol, suggesting it could further constrain the reconstruction problem. In this work, we propose ContextMRI, a text-conditioned diffusion model for MRI that integrates granular metadata into the reconstruction process. We train a pixel-space diffusion model directly on minimally processed, complex-valued MRI images. During inference, metadata is converted into a structured text prompt and fed to the model via CLIP text embeddings. By conditioning the prior on metadata, we unlock more accurate reconstructions and show consistent gains across multiple datasets, acceleration factors, and undersampling patterns. Our experiments demonstrate that increasing the fidelity of metadata, ranging from slice location and contrast to patient age, sex, and pathology, systematically boosts reconstruction performance. This work highlights the untapped potential of leveraging clinical context for inverse problems and opens a new direction for metadata-driven MRI reconstruction.
Authors: Travis E. Gibson, Sawal Acharya
Abstract: Online learning and model reference adaptive control have many interesting intersections. One area where they differ however is in how the algorithms are analyzed and what objective or metric is used to discriminate "good" algorithms from "bad" algorithms. In adaptive control there are usually two objectives: 1) prove that all time varying parameters/states of the system are bounded, and 2) that the instantaneous error between the adaptively controlled system and a reference system converges to zero over time (or at least a compact set). For online learning the performance of algorithms is often characterized by the regret the algorithm incurs. Regret is defined as the cumulative loss (cost) over time from the online algorithm minus the cumulative loss (cost) of the single optimal fixed parameter choice in hindsight. Another significant difference between the two areas of research is with regard to the assumptions made in order to obtain said results. Adaptive control makes assumptions about the input-output properties of the control problem and derives solutions for a fixed error model or optimization task. In the online learning literature results are derived for classes of loss functions (i.e. convex) while a priori assuming that all time varying parameters are bounded, which for many optimization tasks is not unrealistic, but is a non starter in control applications. In this work we discuss these differences in detail through the regret based analysis of gradient descent for convex functions and the control based analysis of a streaming regression problem. We close with a discussion about the newly defined paradigm of online adaptive control and ask the following question "Are regret optimal control strategies deployable?"
Authors: Daniele Molino, Francesco Di Feola, Eliodoro Faiella, Deborah Fazzini, Domiziana Santucci, Linlin Shen, Valerio Guarrasi, Paolo Soda
Abstract: Artificial Intelligence is revolutionizing medical practice, enhancing diagnostic accuracy and healthcare delivery. However, its adaptation in medical settings still faces significant challenges, related to data availability and privacy constraints. Synthetic data has emerged as a promising solution to mitigate these issues, addressing data scarcity while preserving privacy. Recently, Latent Diffusion Models have emerged as a powerful tool for generating high-quality synthetic data. Meanwhile, the integration of different modalities has gained interest, emphasizing the need of models capable of handle multimodal medical data. Existing approaches struggle to integrate complementary information and lack the ability to generate modalities simultaneously. To address this challenge, we present MedCoDi-M, a 6.77-billion-parameter model, designed for multimodal medical data generation, that, following Foundation Model paradigm, exploits contrastive learning and large quantity of data to build a shared latent space which capture the relationships between different data modalities. Further, we introduce the Multi-Prompt training technique, which significantly boosts MedCoDi-M's generation under different settings. We extensively validate MedCoDi-M: first we benchmark it against five competitors on the MIMIC-CXR dataset, a state-of-the-art dataset for Chest X-ray and radiological report generation. Secondly, we perform a Visual Turing Test with expert radiologists to assess the realism and clinical relevance of the generated data, ensuring alignment with real-world scenarios. Finally, we assess the utility of MedCoDi-M in addressing key challenges in the medical field, such as anonymization, data scarcity and imbalance learning. The results are promising, demonstrating the applicability of MedCoDi-M in medical contexts. Project page is at https://cosbidev.github.io/MedCoDi-M/.