new Tensor Generalized Approximate Message Passing

Authors: Yinchuan Li, Guangchen Lan, Xiaodong Wang

Abstract: We propose a tensor generalized approximate message passing (TeG-AMP) algorithm for low-rank tensor inference, which can be used to solve tensor completion and decomposition problems. We derive TeG-AMP algorithm as an approximation of the sum-product belief propagation algorithm in high dimensions where the central limit theorem and Taylor series approximations are applicable. As TeG-AMP is developed based on a general TR decomposition model, it can be directly applied to many low-rank tensor types. Moreover, our TeG-AMP can be simplified based on the CP decomposition model and a tensor simplified AMP is proposed for low CP-rank tensor inference problems. Experimental results demonstrate that the proposed methods significantly improve recovery performances since it takes full advantage of tensor structures.

new LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration

Authors: Yuyao Zhang, Jinghao Li, Yu-Wing Tai

Abstract: Text-to-image generation (T2I) has become a key area of research with broad applications. However, existing methods often struggle with complex spatial relationships and fine-grained control over multiple concepts. Many existing approaches require significant architectural modifications, extensive training, or expert-level prompt engineering. To address these challenges, we introduce \textbf{LayerCraft}, an automated framework that leverages large language models (LLMs) as autonomous agents for structured procedural generation. LayerCraft enables users to customize objects within an image and supports narrative-driven creation with minimal effort. At its core, the system includes a coordinator agent that directs the process, along with two specialized agents: \textbf{ChainArchitect}, which employs chain-of-thought (CoT) reasoning to generate a dependency-aware 3D layout for precise instance-level control, and the \textbf{Object-Integration Network (OIN)}, which utilizes LoRA fine-tuning on pre-trained T2I models to seamlessly blend objects into specified regions of an image based on textual prompts without requiring architectural changes. Extensive evaluations demonstrate LayerCraft's versatility in applications ranging from multi-concept customization to storytelling. By providing non-experts with intuitive, precise control over T2I generation, our framework democratizes creative image creation. Our code will be released upon acceptance at github.com/PeterYYZhang/LayerCraft

new Revisiting the Relationship between Adversarial and Clean Training: Why Clean Training Can Make Adversarial Training Better

Authors: MingWei Zhou, Xiaobing Pei

Abstract: Adversarial training (AT) is an effective technique for enhancing adversarial robustness, but it usually comes at the cost of a decline in generalization ability. Recent studies have attempted to use clean training to assist adversarial training, yet there are contradictions among the conclusions. We comprehensively summarize the representative strategies and, with a focus on the multi - view hypothesis, provide a unified explanation for the contradictory phenomena among different studies. In addition, we conduct an in - depth analysis of the knowledge combinations transferred from clean - trained models to adversarially - trained models in previous studies, and find that they can be divided into two categories: reducing the learning difficulty and providing correct guidance. Based on this finding, we propose a new idea of leveraging clean training to further improve the performance of advanced AT methods.We reveal that the problem of generalization degradation faced by AT partly stems from the difficulty of adversarial training in learning certain sample features, and this problem can be alleviated by making full use of clean training.

new The Cursive Transformer

Authors: Sam Greydanus, Zachary Wimpee

Abstract: Transformers trained on tokenized text, audio, and images can generate high-quality autoregressive samples. But handwriting data, represented as sequences of pen coordinates, remains underexplored. We introduce a novel tokenization scheme that converts pen stroke offsets to polar coordinates, discretizes them into bins, and then turns them into sequences of tokens with which to train a standard GPT model. This allows us to capture complex stroke distributions without using any specialized architectures (eg. the mixture density network or the self-advancing ASCII attention head from Graves 2014). With just 3,500 handwritten words and a few simple data augmentations, we are able to train a model that can generate realistic cursive handwriting. Our approach is simpler and more performant than previous RNN-based methods.

new ModelRadar: Aspect-based Forecast Evaluation

Authors: Vitor Cerqueira, Luis Roque, Carlos Soares

Abstract: Accurate evaluation of forecasting models is essential for ensuring reliable predictions. Current practices for evaluating and comparing forecasting models focus on summarising performance into a single score, using metrics such as SMAPE. While convenient, averaging performance over all samples dilutes relevant information about model behavior under varying conditions. This limitation is especially problematic for time series forecasting, where multiple layers of averaging--across time steps, horizons, and multiple time series in a dataset--can mask relevant performance variations. We address this limitation by proposing ModelRadar, a framework for evaluating univariate time series forecasting models across multiple aspects, such as stationarity, presence of anomalies, or forecasting horizons. We demonstrate the advantages of this framework by comparing 24 forecasting methods, including classical approaches and different machine learning algorithms. NHITS, a state-of-the-art neural network architecture, performs best overall but its superiority varies with forecasting conditions. For instance, concerning the forecasting horizon, we found that NHITS (and also other neural networks) only outperforms classical approaches for multi-step ahead forecasting. Another relevant insight is that classical approaches such as ETS or Theta are notably more robust in the presence of anomalies. These and other findings highlight the importance of aspect-based model evaluation for both practitioners and researchers. ModelRadar is available as a Python package.

new CF-CAM: Gradient Perturbation Mitigation and Feature Stabilization for Reliable Interpretability

Authors: Hongjie He, Xu Pan, Yudong Yao

Abstract: As deep learning continues to advance, the opacity of neural network decision-making remains a critical challenge, limiting trust and applicability in high-stakes domains. Class Activation Mapping (CAM) techniques have emerged as a key approach to visualizing model decisions, yet existing methods face inherent trade-offs. Gradient-based CAM variants suffer from sensitivity to gradient perturbations, leading to unstable and unreliable explanations. Conversely, gradient-free approaches mitigate gradient instability but incur significant computational overhead and inference latency. To address these limitations, we propose Cluster Filter Class Activation Map (CF-CAM), a novel framework that reintroduces gradient-based weighting while enhancing robustness against gradient noise. CF-CAM employs a hierarchical importance weighting strategy to balance discriminative feature preservation and noise elimination. A density-aware channel clustering via Density-Based Spatial Clustering of Applications with Noise (DBSCAN) groups semantically relevant feature channels and discard noise-prone activations. Additionally, cluster-conditioned gradient filtering leverages bilateral filters to refine gradient signals, preserving edge-aware localization while suppressing noise impact. Experiment results demonstrate that CF-CAM achieves superior interpretability performance while maintaining resilience to gradient perturbations, outperforming state-of-the-art CAM methods in faithfulness and robustness. By effectively mitigating gradient instability without excessive computational cost, CF-CAM provides a reliable solution for enhancing the interpretability of deep neural networks in critical applications such as medical diagnosis and autonomous driving.

new Integrating Quantum-Classical Attention in Patch Transformers for Enhanced Time Series Forecasting

Authors: Sanjay Chakraborty, Fredrik Heintz

Abstract: QCAAPatchTF is a quantum attention network integrated with an advanced patch-based transformer, designed for multivariate time series forecasting, classification, and anomaly detection. Leveraging quantum superpositions, entanglement, and variational quantum eigensolver principles, the model introduces a quantum-classical hybrid self-attention mechanism to capture multivariate correlations across time points. For multivariate long-term time series, the quantum self-attention mechanism can reduce computational complexity while maintaining temporal relationships. It then applies the quantum-classical hybrid self-attention mechanism alongside a feed-forward network in the encoder stage of the advanced patch-based transformer. While the feed-forward network learns nonlinear representations for each variable frame, the quantum self-attention mechanism processes individual series to enhance multivariate relationships. The advanced patch-based transformer computes the optimized patch length by dividing the sequence length into a fixed number of patches instead of using an arbitrary set of values. The stride is then set to half of the patch length to ensure efficient overlapping representations while maintaining temporal continuity. QCAAPatchTF achieves state-of-the-art performance in both long-term and short-term forecasting, classification, and anomaly detection tasks, demonstrating state-of-the-art accuracy and efficiency on complex real-world datasets.

new Enhancing Time Series Forecasting with Fuzzy Attention-Integrated Transformers

Authors: Sanjay Chakraborty, Fredrik Heintz

Abstract: This paper introduces FANTF (Fuzzy Attention Network-Based Transformers), a novel approach that integrates fuzzy logic with existing transformer architectures to advance time series forecasting, classification, and anomaly detection tasks. FANTF leverages a proposed fuzzy attention mechanism incorporating fuzzy membership functions to handle uncertainty and imprecision in noisy and ambiguous time series data. The FANTF approach enhances its ability to capture complex temporal dependencies and multivariate relationships by embedding fuzzy logic principles into the self-attention module of the existing transformer's architecture. The framework combines fuzzy-enhanced attention with a set of benchmark existing transformer-based architectures to provide efficient predictions, classification and anomaly detection. Specifically, FANTF generates learnable fuzziness attention scores that highlight the relative importance of temporal features and data points, offering insights into its decision-making process. Experimental evaluatios on some real-world datasets reveal that FANTF significantly enhances the performance of forecasting, classification, and anomaly detection tasks over traditional transformer-based models.

new Times2D: Multi-Period Decomposition and Derivative Mapping for General Time Series Forecasting

Authors: Reza Nematirad, Anil Pahwa, Balasubramaniam Natarajan

Abstract: Time series forecasting is an important application in various domains such as energy management, traffic planning, financial markets, meteorology, and medicine. However, real-time series data often present intricate temporal variability and sharp fluctuations, which pose significant challenges for time series forecasting. Previous models that rely on 1D time series representations usually struggle with complex temporal variations. To address the limitations of 1D time series, this study introduces the Times2D method that transforms the 1D time series into 2D space. Times2D consists of three main parts: first, a Periodic Decomposition Block (PDB) that captures temporal variations within a period and between the same periods by converting the time series into a 2D tensor in the frequency domain. Second, the First and Second Derivative Heatmaps (FSDH) capture sharp changes and turning points, respectively. Finally, an Aggregation Forecasting Block (AFB) integrates the output tensors from PDB and FSDH for accurate forecasting. This 2D transformation enables the utilization of 2D convolutional operations to effectively capture long and short characteristics of the time series. Comprehensive experimental results across large-scale data in the literature demonstrate that the proposed Times2D model achieves state-of-the-art performance in both short-term and long-term forecasting. The code is available in this repository: https://github.com/Tims2D/Times2D.

URLs: https://github.com/Tims2D/Times2D.

new EMForecaster: A Deep Learning Framework for Time Series Forecasting in Wireless Networks with Distribution-Free Uncertainty Quantification

Authors: Xavier Mootoo, Hina Tabassum, Luca Chiaraviglio

Abstract: With the recent advancements in wireless technologies, forecasting electromagnetic field (EMF) exposure has become critical to enable proactive network spectrum and power allocation, as well as network deployment planning. In this paper, we develop a deep learning (DL) time series forecasting framework referred to as \textit{EMForecaster}. The proposed DL architecture employs patching to process temporal patterns at multiple scales, complemented by reversible instance normalization and mixing operations along both temporal and patch dimensions for efficient feature extraction. We augment {EMForecaster} with a conformal prediction mechanism, which is independent of the data distribution, to enhance the trustworthiness of model predictions via uncertainty quantification of forecasts. This conformal prediction mechanism ensures that the ground truth lies within a prediction interval with target error rate $\alpha$, where $1-\alpha$ is referred to as coverage. However, a trade-off exists, as increasing coverage often results in wider prediction intervals. To address this challenge, we propose a new metric called the \textit{Trade-off Score}, that balances trustworthiness of the forecast (i.e., coverage) and the width of prediction interval. Our experiments demonstrate that EMForecaster achieves superior performance across diverse EMF datasets, spanning both short-term and long-term prediction horizons. In point forecasting tasks, EMForecaster substantially outperforms current state-of-the-art DL approaches, showing improvements of 53.97\% over the Transformer architecture and 38.44\% over the average of all baseline models. EMForecaster also exhibits an excellent balance between prediction interval width and coverage in conformal forecasting, measured by the tradeoff score, showing marked improvements of 24.73\% over the average baseline and 49.17\% over the Transformer architecture.

new Lorentzian Graph Isomorphic Network

Authors: Srinitish Srinivasan, Omkumar CU

Abstract: We introduce the Lorentzian Graph Isomorphic Network (LGIN), a novel graph neural network (GNN) designed to operate in hyperbolic spaces, leveraging the Lorentzian model to enhance graph representation learning. Existing GNNs primarily operate in Euclidean spaces, which can limit their ability to capture hierarchical and multi-relational structures inherent to complex graphs. LGIN addresses this by incorporating curvature-aware aggregation functions that preserve the Lorentzian metric tensor, ensuring embeddings remain constrained within the hyperbolic space by proposing a new update rule that effectively captures both local neighborhood interactions and global structural properties, enabling LGIN to distinguish non-isomorphic graphs with expressiveness at least as powerful as the Weisfeiler-Lehman test. Through extensive evaluation across nine benchmark datasets, including molecular and protein structures, LGIN consistently outperforms or matches state-of-the-art GNNs, demonstrating its robustness and efficacy in modeling complex graph structures. To the best of our knowledge, this is the first study to extend the concept of a powerful graph neural network to Riemannian manifolds, paving the way for future advancements in hyperbolic graph learning. The code for our paper can be found at https://github.com/Deceptrax123/LGIN.

URLs: https://github.com/Deceptrax123/LGIN.

new Why risk matters for protein binder design

Authors: Tudor Cotet, Igor Krawczuk

Abstract: Bayesian optimization (BO) has recently become more prevalent in protein engineering applications and hence has become a fruitful target of benchmarks. However, current BO comparisons often overlook real-world considerations like risk and cost constraints. In this work, we compare 72 model combinations of encodings, surrogate models, and acquisition functions on 11 protein binder fitness landscapes, specifically from this perspective. Drawing from the portfolio optimization literature, we adopt metrics to quantify the cold-start performance relative to a random baseline, to assess the risk of an optimization campaign, and to calculate the overall budget required to reach a fitness threshold. Our results suggest the existence of Pareto-optimal models on the risk-performance axis, the shift of this preference depending on the landscape explored, and the robust correlation between landscape properties such as epistasis with the average and worst-case model performance. They also highlight that rigorous model selection requires substantial computational and statistical efforts.

new MetaCLBench: Meta Continual Learning Benchmark on Resource-Constrained Edge Devices

Authors: Sijia Li, Young D. Kwon, Lik-Hang Lee, Pan Hui

Abstract: Meta-Continual Learning (Meta-CL) has emerged as a promising approach to minimize manual labeling efforts and system resource requirements by enabling Continual Learning (CL) with limited labeled samples. However, while existing methods have shown success in image-based tasks, their effectiveness remains unexplored for sequential time-series data from sensor systems, particularly audio inputs. To address this gap, we conduct a comprehensive benchmark study evaluating six representative Meta-CL approaches using three network architectures on five datasets from both image and audio modalities. We develop MetaCLBench, an end-to-end Meta-CL benchmark framework for edge devices to evaluate system overheads and investigate trade-offs among performance, computational costs, and memory requirements across various Meta-CL methods. Our results reveal that while many Meta-CL methods enable to learn new classes for both image and audio modalities, they impose significant computational and memory costs on edge devices. Also, we find that pre-training and meta-training procedures based on source data before deployment improve Meta-CL performance. Finally, to facilitate further research, we provide practical guidelines for researchers and machine learning practitioners implementing Meta-CL on resource-constrained environments and make our benchmark framework and tools publicly available, enabling fair evaluation across both accuracy and system-level metrics.

new Discriminative Subspace Emersion from learning feature relevances across different populations

Authors: Marco Canducci, Lida Abdi, Alessandro Prete, Roland J. Veen, Michael Biehl, Wiebke Arlt, Peter Tino

Abstract: In a given classification task, the accuracy of the learner is often hampered by finiteness of the training set, high-dimensionality of the feature space and severe overlap between classes. In the context of interpretable learners, with (piecewise) linear separation boundaries, these issues can be mitigated by careful construction of optimization procedures and/or estimation of relevant features for the task. However, when the task is shared across two disjoint populations the main interest is shifted towards estimating a set of features that discriminate the most between the two, when performing classification. We propose a new Discriminative Subspace Emersion (DSE) method to extend subspace learning toward a general relevance learning framework. DSE allows us to identify the most relevant features in distinguishing the classification task across two populations, even in cases of high overlap between classes. The proposed methodology is designed to work with multiple sets of labels and is derived in principle without being tied to a specific choice of base learner. Theoretical and empirical investigations over synthetic and real-world datasets indicate that DSE accurately identifies a common subspace for the classification across different populations. This is shown to be true for a surprisingly high degree of overlap between classes.

new Are Domain Generalization Benchmarks with Accuracy on the Line Misspecified?

Authors: Olawale Salaudeen, Nicole Chiou, Shiny Weng, Sanmi Koyejo

Abstract: Spurious correlations are unstable statistical associations that hinder robust decision-making. Conventional wisdom suggests that models relying on such correlations will fail to generalize out-of-distribution (OOD), especially under strong distribution shifts. However, empirical evidence challenges this view as naive in-distribution empirical risk minimizers often achieve the best OOD accuracy across popular OOD generalization benchmarks. In light of these results, we propose a different perspective: many widely used benchmarks for evaluating robustness to spurious correlations are misspecified. Specifically, they fail to include shifts in spurious correlations that meaningfully impact OOD generalization, making them unsuitable for evaluating the benefit of removing such correlations. We establish conditions under which a distribution shift can reliably assess a model's reliance on spurious correlations. Crucially, under these conditions, we should not observe a strong positive correlation between in-distribution and OOD accuracy, often called "accuracy on the line." Yet, most state-of-the-art benchmarks exhibit this pattern, suggesting they do not effectively assess robustness. Our findings expose a key limitation in current benchmarks used to evaluate domain generalization algorithms, that is, models designed to avoid spurious correlations. We highlight the need to rethink how robustness to spurious correlations is assessed, identify well-specified benchmarks the field should prioritize, and enumerate strategies for designing future benchmarks that meaningfully reflect robustness under distribution shift.

new Identifying Sparsely Active Circuits Through Local Loss Landscape Decomposition

Authors: Brianna Chrisman, Lucius Bushnaq, Lee Sharkey

Abstract: Much of mechanistic interpretability has focused on understanding the activation spaces of large neural networks. However, activation space-based approaches reveal little about the underlying circuitry used to compute features. To better understand the circuits employed by models, we introduce a new decomposition method called Local Loss Landscape Decomposition (L3D). L3D identifies a set of low-rank subnetworks: directions in parameter space of which a subset can reconstruct the gradient of the loss between any sample's output and a reference output vector. We design a series of progressively more challenging toy models with well-defined subnetworks and show that L3D can nearly perfectly recover the associated subnetworks. Additionally, we investigate the extent to which perturbing the model in the direction of a given subnetwork affects only the relevant subset of samples. Finally, we apply L3D to a real-world transformer model and a convolutional neural network, demonstrating its potential to identify interpretable and relevant circuits in parameter space.

new Can Diffusion Models Disentangle? A Theoretical Perspective

Authors: Liming Wang, Muhammad Jehanzeb Mirza, Yishu Gong, Yuan Gong, Jiaqi Zhang, Brian H. Tracey, Katerina Placek, Marco Vilela, James R. Glass

Abstract: This paper presents a novel theoretical framework for understanding how diffusion models can learn disentangled representations. Within this framework, we establish identifiability conditions for general disentangled latent variable models, analyze training dynamics, and derive sample complexity bounds for disentangled latent subspace models. To validate our theory, we conduct disentanglement experiments across diverse tasks and modalities, including subspace recovery in latent subspace Gaussian mixture models, image colorization, image denoising, and voice conversion for speech classification. Additionally, our experiments show that training strategies inspired by our theory, such as style guidance regularization, consistently enhance disentanglement performance.

new A machine learning platform for development of low flammability polymers

Authors: Duy Nhat Phan, Alexander B. Morgan, Lokendra Poudel, Rahul Bhowmik

Abstract: Flammability index (FI) and cone calorimetry outcomes, such as maximum heat release rate, time to ignition, total smoke release, and fire growth rate, are critical factors in evaluating the fire safety of polymers. However, predicting these properties is challenging due to the complexity of material behavior under heat exposure. In this work, we investigate the use of machine learning (ML) techniques to predict these flammability metrics. We generated synthetic polymers using Synthetic Data Vault to augment the experimental dataset. Our comprehensive ML investigation employed both our polymer descriptors and those generated by the RDkit library. Despite the challenges of limited experimental data, our models demonstrate the potential to accurately predict FI and cone calorimetry outcomes, which could be instrumental in designing safer polymers. Additionally, we developed POLYCOMPRED, a module integrated into the cloud-based MatVerse platform, providing an accessible, web-based interface for flammability prediction. This work provides not only the predictive modeling of polymer flammability but also an interactive analysis tool for the discovery and design of new materials with tailored fire-resistant properties.

new Opportunistic Screening for Pancreatic Cancer using Computed Tomography Imaging and Radiology Reports

Authors: David Le, Ramon Correa-Medero, Amara Tariq, Bhavik Patel, Motoyo Yano, Imon Banerjee

Abstract: Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive cancer, with most cases diagnosed at stage IV and a five-year overall survival rate below 5%. Early detection and prognosis modeling are crucial for improving patient outcomes and guiding early intervention strategies. In this study, we developed and evaluated a deep learning fusion model that integrates radiology reports and CT imaging to predict PDAC risk. The model achieved a concordance index (C-index) of 0.6750 (95% CI: 0.6429, 0.7121) and 0.6435 (95% CI: 0.6055, 0.6789) on the internal and external dataset, respectively, for 5-year survival risk estimation. Kaplan-Meier analysis demonstrated significant separation (p<0.0001) between the low and high risk groups predicted by the fusion model. These findings highlight the potential of deep learning-based survival models in leveraging clinical and imaging data for pancreatic cancer.

new Over-the-Air Edge Inference via End-to-End Metasurfaces-Integrated Artificial Neural Networks

Authors: Kyriakos Stylianopoulos, Paolo Di Lorenzo, George C. Alexandropoulos

Abstract: In the Edge Inference (EI) paradigm, where a Deep Neural Network (DNN) is split across the transceivers to wirelessly communicate goal-defined features in solving a computational task, the wireless medium has been commonly treated as a source of noise. In this paper, motivated by the emerging technologies of Reconfigurable Intelligent Surfaces (RISs) and Stacked Intelligent Metasurfaces (SIM) that offer programmable propagation of wireless signals, either through controllable reflections or diffractions, we optimize the RIS/SIM-enabled smart wireless environment as a means of over-the-air computing, resembling the operations of DNN layers. We propose a framework of Metasurfaces-Integrated Neural Networks (MINNs) for EI, presenting its modeling, training through a backpropagation variation for fading channels, and deployment aspects. The overall end-to-end DNN architecture is general enough to admit RIS and SIM devices, through controllable reconfiguration before each transmission or fixed configurations after training, while both channel-aware and channel-agnostic transceivers are considered. Our numerical evaluation showcases metasurfaces to be instrumental in performing image classification under link budgets that impede conventional communications or metasurface-free systems. It is demonstrated that our MINN framework can significantly simplify EI requirements, achieving near-optimal performance with $50~$dB lower testing signal-to-noise ratio compared to training, even without transceiver channel knowledge.

new ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Authors: Huandong Chang, Zicheng Ma, Mingyuan Ma, Zhenting Qi, Andrew Sabot, Hong Jiang, H. T. Kung

Abstract: Low-Rank Adaptation (LoRA) has become a widely adopted technique for fine-tuning large-scale pre-trained models with minimal parameter updates. However, existing methods rely on fixed ranks or focus solely on either rank pruning or expansion, failing to adapt ranks dynamically to match the importance of different layers during training. In this work, we propose ElaLoRA, an adaptive low-rank adaptation framework that dynamically prunes and expands ranks based on gradient-derived importance scores. To the best of our knowledge, ElaLoRA is the first method that enables both rank pruning and expansion during fine-tuning. Experiments across multiple benchmarks demonstrate that ElaLoRA consistently outperforms existing PEFT methods across different parameter budgets. Furthermore, our studies validate that layers receiving higher rank allocations contribute more significantly to model performance, providing theoretical justification for our adaptive strategy. By introducing a principled and adaptive rank allocation mechanism, ElaLoRA offers a scalable and efficient fine-tuning solution, particularly suited for resource-constrained environments.

new Federated Learning for Cross-Domain Data Privacy: A Distributed Approach to Secure Collaboration

Authors: Yiwei Zhang, Jie Liu, Jiawei Wang, Lu Dai, Fan Guo, Guohui Cai

Abstract: This paper proposes a data privacy protection framework based on federated learning, which aims to realize effective cross-domain data collaboration under the premise of ensuring data privacy through distributed learning. Federated learning greatly reduces the risk of privacy breaches by training the model locally on each client and sharing only model parameters rather than raw data. The experiment verifies the high efficiency and privacy protection ability of federated learning under different data sources through the simulation of medical, financial, and user data. The results show that federated learning can not only maintain high model performance in a multi-domain data environment but also ensure effective protection of data privacy. The research in this paper provides a new technical path for cross-domain data collaboration and promotes the application of large-scale data analysis and machine learning while protecting privacy.

new A Deep Learning Approach to Anomaly Detection in High-Frequency Trading Data

Authors: Qiuliuyang Bao, Jiawei Wang, Hao Gong, Yiwei Zhang, Xiaojun Guo, Hanrui Feng

Abstract: This paper proposes an algorithm based on a staged sliding window Transformer architecture to detect abnormal behaviors in the microstructure of the foreign exchange market, focusing on high-frequency EUR/USD trading data. The method captures multi-scale temporal features through a staged sliding window, extracts global and local dependencies by combining the self-attention mechanism and weighted attention mechanism of the Transformer, and uses a classifier to identify abnormal events. Experimental results on a real high-frequency dataset containing order book depth, spread, and trading volume show that the proposed method significantly outperforms traditional machine learning (such as decision trees and random forests) and deep learning methods (such as MLP, CNN, RNN, LSTM) in terms of accuracy (0.93), F1-Score (0.91), and AUC-ROC (0.95). Ablation experiments verify the contribution of each component, and the visualization of order book depth and anomaly detection further reveals the effectiveness of the model under complex market dynamics. Despite the false positive problem, the model still provides important support for market supervision. In the future, noise processing can be optimized and extended to other markets to improve generalization and real-time performance.

new Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead

Authors: Vidhisha Balachandran, Jingya Chen, Lingjiao Chen, Shivam Garg, Neel Joshi, Yash Lara, John Langford, Besmira Nushi, Vibhav Vineet, Yue Wu, Safoora Yousefi

Abstract: Inference-time scaling can enhance the reasoning capabilities of large language models (LLMs) on complex problems that benefit from step-by-step problem solving. Although lengthening generated scratchpads has proven effective for mathematical tasks, the broader impact of this approach on other tasks remains less clear. In this work, we investigate the benefits and limitations of scaling methods across nine state-of-the-art models and eight challenging tasks, including math and STEM reasoning, calendar planning, NP-hard problems, navigation, and spatial reasoning. We compare conventional models (e.g., GPT-4o) with models fine-tuned for inference-time scaling (e.g., o1) through evaluation protocols that involve repeated model calls, either independently or sequentially with feedback. These evaluations approximate lower and upper performance bounds and potential for future performance improvements for each model, whether through enhanced training or multi-model inference systems. Our extensive empirical analysis reveals that the advantages of inference-time scaling vary across tasks and diminish as problem complexity increases. In addition, simply using more tokens does not necessarily translate to higher accuracy in these challenging regimes. Results from multiple independent runs with conventional models using perfect verifiers show that, for some tasks, these models can achieve performance close to the average performance of today's most advanced reasoning models. However, for other tasks, a significant performance gap remains, even in very high scaling regimes. Encouragingly, all models demonstrate significant gains when inference is further scaled with perfect verifiers or strong feedback, suggesting ample potential for future improvements.

new LOCO-EPI: Leave-one-chromosome-out (LOCO) as a benchmarking paradigm for deep learning based prediction of enhancer-promoter interactions

Authors: Muhammad Tahir, Shehroz S. Khan, James Davie, Soichiro Yamanaka, Ahmed Ashraf

Abstract: In mammalian and vertebrate genomes, the promoter regions of the gene and their distal enhancers may be located millions of base-pairs from each other, while a promoter may not interact with the closest enhancer. Since base-pair proximity is not a good indicator of these interactions, there is considerable work toward developing methods for predicting Enhancer-Promoter Interactions (EPI). Several machine learning methods have reported increasingly higher accuracies for predicting EPI. Typically, these approaches randomly split the dataset of Enhancer-Promoter (EP) pairs into training and testing subsets followed by model training. However, the aforementioned random splitting causes information leakage by assigning EP pairs from the same genomic region to both testing and training sets, leading to performance overestimation. In this paper we propose to use a more thorough training and testing paradigm i.e., Leave-one-chromosome-out (LOCO) cross-validation for EPI-prediction. We demonstrate that a deep learning algorithm, which gives higher accuracies when trained and tested on random-splitting setting, drops drastically in performance under LOCO setting, confirming overestimation of performance. We further propose a novel hybrid deep neural network for EPI-prediction that fuses k-mer features of the nucleotide sequence. We show that the hybrid architecture performs significantly better in the LOCO setting, demonstrating it can learn more generalizable aspects of EP interactions. With this paper we are also releasing the LOCO splitting-based EPI dataset. Research data is available in this public repository: https://github.com/malikmtahir/EPI

URLs: https://github.com/malikmtahir/EPI

new Diffusion models for probabilistic precipitation generation from atmospheric variables

Authors: Michael Aich, Sebastian Bathiany, Philipp Hess, Yu Huang, Niklas Boers

Abstract: Improving the representation of precipitation in Earth system models (ESMs) is critical for assessing the impacts of climate change and especially of extreme events like floods and droughts. In existing ESMs, precipitation is not resolved explicitly, but represented by parameterizations. These typically rely on resolving approximated but computationally expensive column-based physics, not accounting for interactions between locations. They struggle to capture fine-scale precipitation processes and introduce significant biases. We present a novel approach, based on generative machine learning, which integrates a conditional diffusion model with a UNet architecture to generate accurate, high-resolution (0.25{\deg}) global daily precipitation fields from a small set of prognostic atmospheric variables. Unlike traditional parameterizations, our framework efficiently produces ensemble predictions, capturing uncertainties in precipitation, and does not require fine-tuning by hand. We train our model on the ERA5 reanalysis and present a method that allows us to apply it to arbitrary ESM data, enabling fast generation of probabilistic forecasts and climate scenarios. By leveraging interactions between global prognostic variables, our approach provides an alternative parameterization scheme that mitigates biases present in the ESM precipitation while maintaining consistency with its large-scale (annual) trends. This work demonstrates that complex precipitation patterns can be learned directly from large-scale atmospheric variables, offering a computationally efficient alternative to conventional schemes.

new FedPaI: Achieving Extreme Sparsity in Federated Learning via Pruning at Initialization

Authors: Haonan Wang, Zeli Liu, Kajimusugura Hoshino, Tuo Zhang, John Paul Walters, Stephen Crago

Abstract: Federated Learning (FL) enables distributed training on edge devices but faces significant challenges due to resource constraints in edge environments, impacting both communication and computational efficiency. Existing iterative pruning techniques improve communication efficiency but are limited by their centralized design, which struggles with FL's decentralized and data-imbalanced nature, resulting in suboptimal sparsity levels. To address these issues, we propose FedPaI, a novel efficient FL framework that leverages Pruning at Initialization (PaI) to achieve extreme sparsity. FedPaI identifies optimal sparse connections at an early stage, maximizing model capacity and significantly reducing communication and computation overhead by fixing sparsity patterns at the start of training. To adapt to diverse hardware and software environments, FedPaI supports both structured and unstructured pruning. Additionally, we introduce personalized client-side pruning mechanisms for improved learning capacity and sparsity-aware server-side aggregation for enhanced efficiency. Experimental results demonstrate that FedPaI consistently outperforms existing efficient FL that applies conventional iterative pruning with significant leading in efficiency and model accuracy. For the first time, our proposed FedPaI achieves an extreme sparsity level of up to 98% without compromising the model accuracy compared to unpruned baselines, even under challenging non-IID settings. By employing our FedPaI with joint optimization of model learning capacity and sparsity, FL applications can benefit from faster convergence and accelerate the training by 6.4 to 7.9 times.

new Simple yet Effective Node Property Prediction on Edge Streams under Distribution Shifts

Authors: Jongha Lee, Taehyung Kwon, Heechan Moon, Kijung Shin

Abstract: The problem of predicting node properties (e.g., node classes) in graphs has received significant attention due to its broad range of applications. Graphs from real-world datasets often evolve over time, with newly emerging edges and dynamically changing node properties, posing a significant challenge for this problem. In response, temporal graph neural networks (TGNNs) have been developed to predict dynamic node properties from a stream of emerging edges. However, our analysis reveals that most TGNN-based methods are (a) far less effective without proper node features and, due to their complex model architectures, (b) vulnerable to distribution shifts. In this paper, we propose SPLASH, a simple yet powerful method for predicting node properties on edge streams under distribution shifts. Our key contributions are as follows: (1) we propose feature augmentation methods and an automatic feature selection method for edge streams, which improve the effectiveness of TGNNs, (2) we propose a lightweight MLP-based TGNN architecture that is highly efficient and robust under distribution shifts, and (3) we conduct extensive experiments to evaluate the accuracy, efficiency, generalization, and qualitative performance of the proposed method and its competitors on dynamic node classification, dynamic anomaly detection, and node affinity prediction tasks across seven real-world datasets.

new SeizureTransformer: Scaling U-Net with Transformer for Simultaneous Time-Step Level Seizure Detection from Long EEG Recordings

Authors: Kerui Wu, Ziyue Zhao, B\"ulent Yener

Abstract: Epilepsy is a common neurological disorder that affects around 65 million people worldwide. Detecting seizures quickly and accurately is vital, given the prevalence and severity of the associated complications. Recently, deep learning-based automated seizure detection methods have emerged as solutions; however, most existing methods require extensive post-processing and do not effectively handle the crucial long-range patterns in EEG data. In this work, we propose SeizureTransformer, a simple model comprised of (i) a deep encoder comprising 1D convolutions (ii) a residual CNN stack and a transformer encoder to embed previous output into high-level representation with contextual information, and (iii) streamlined decoder which converts these features into a sequence of probabilities, directly indicating the presence or absence of seizures at every time step. Extensive experiments on public and private EEG seizure detection datasets demonstrate that our model significantly outperforms existing approaches (ranked in the first place in the 2025 "seizure detection challenge" organized in the International Conference on Artificial Intelligence in Epilepsy and Other Neurological Disorders), underscoring its potential for real-time, precise seizure detection.

new Agentic Multimodal AI for Hyperpersonalized B2B and B2C Advertising in Competitive Markets: An AI-Driven Competitive Advertising Framework

Authors: Sakhinana Sagar Srinivas, Akash Das, Shivam Gupta, Venkataramana Runkana

Abstract: The growing use of foundation models (FMs) in real-world applications demands adaptive, reliable, and efficient strategies for dynamic markets. In the chemical industry, AI-discovered materials drive innovation, but commercial success hinges on market adoption, requiring FM-driven advertising frameworks that operate in-the-wild. We present a multilingual, multimodal AI framework for autonomous, hyper-personalized advertising in B2B and B2C markets. By integrating retrieval-augmented generation (RAG), multimodal reasoning, and adaptive persona-based targeting, our system generates culturally relevant, market-aware ads tailored to shifting consumer behaviors and competition. Validation combines real-world product experiments with a Simulated Humanistic Colony of Agents to model consumer personas, optimize strategies at scale, and ensure privacy compliance. Synthetic experiments mirror real-world scenarios, enabling cost-effective testing of ad strategies without risky A/B tests. Combining structured retrieval-augmented reasoning with in-context learning (ICL), the framework boosts engagement, prevents market cannibalization, and maximizes ROAS. This work bridges AI-driven innovation and market adoption, advancing multimodal FM deployment for high-stakes decision-making in commercial marketing.

new Reducing Smoothness with Expressive Memory Enhanced Hierarchical Graph Neural Networks

Authors: Thomas Bailie, Yun Sing Koh, S. Karthik Mukkavilli, Varvara Vetrova

Abstract: Graphical forecasting models learn the structure of time series data via projecting onto a graph, with recent techniques capturing spatial-temporal associations between variables via edge weights. Hierarchical variants offer a distinct advantage by analysing the time series across multiple resolutions, making them particularly effective in tasks like global weather forecasting, where low-resolution variable interactions are significant. A critical challenge in hierarchical models is information loss during forward or backward passes through the hierarchy. We propose the Hierarchical Graph Flow (HiGFlow) network, which introduces a memory buffer variable of dynamic size to store previously seen information across variable resolutions. We theoretically show two key results: HiGFlow reduces smoothness when mapping onto new feature spaces in the hierarchy and non-strictly enhances the utility of message-passing by improving Weisfeiler-Lehman (WL) expressivity. Empirical results demonstrate that HiGFlow outperforms state-of-the-art baselines, including transformer models, by at least an average of 6.1% in MAE and 6.2% in RMSE. Code is available at https://github.com/TB862/ HiGFlow.git.

URLs: https://github.com/TB862/

new Deep learning for state estimation of commercial sodium-ion batteries using partial charging profiles: validation with a multi-temperature ageing dataset

Authors: Jiapeng Liu, Lunte Li, Jing Xiang, Laiyong Xie, Yuhao Wang, Francesco Ciucci

Abstract: Accurately predicting the state of health for sodium-ion batteries is crucial for managing battery modules, playing a vital role in ensuring operational safety. However, highly accurate models available thus far are rare due to a lack of aging data for sodium-ion batteries. In this study, we experimentally collected 53 single cells at four temperatures (0, 25, 35, and 45 {\deg}C), along with two battery modules in the lab. By utilizing the charging profiles, we were able to predict the SOC, capacity, and SOH simultaneously. This was achieved by designing a new framework that integrates the neural ordinary differential equation and 2D convolutional neural networks, using the partial charging profile as input. The charging profile is partitioned into segments, and each segment is fed into the network to output the SOC. For capacity and SOH prediction, we first aggregated the extracted features corresponding to segments from one cycle, after which an embedding block for temperature is concatenated for the final prediction. This novel approach eliminates the issue of multiple outputs for a single target. Our model demonstrated an $R^2$ accuracy of 0.998 for SOC and 0.997 for SOH across single cells at various temperatures. Furthermore, the trained model can be employed to predict single cells at temperatures outside the training set and battery modules with different capacity and current levels. The results presented here highlight the high accuracy of our model and its capability to predict multiple targets simultaneously using a partial charging profile.

new Minimum Description Length of a Spectrum Variational Autoencoder: A Theory

Authors: Canlin Zhang, Xiuwen Liu

Abstract: Deep neural networks (DNNs) trained through end-to-end learning have achieved remarkable success across diverse machine learning tasks, yet they are not explicitly designed to adhere to the Minimum Description Length (MDL) principle, which posits that the best model provides the shortest description of the data. In this paper, we argue that MDL is essential to deep learning and propose a further generalized principle: Understanding is the use of a small amount of information to represent a large amount of information. To this end, we introduce a novel theoretical framework for designing and evaluating deep Variational Autoencoders (VAEs) based on MDL. In our theory, we designed the Spectrum VAE, a specific VAE architecture whose MDL can be rigorously evaluated under given conditions. Additionally, we introduce the concept of latent dimension combination, or pattern of spectrum, and provide the first theoretical analysis of their role in achieving MDL. We claim that a Spectrum VAE understands the data distribution in the most appropriate way when the MDL is achieved. This work is entirely theoretical and lays the foundation for future research on designing deep learning systems that explicitly adhere to information-theoretic principles.

new Forward Learning with Differential Privacy

Authors: Mingqian Feng, Zeliang Zhang, Jinyang Jiang, Yijie Peng, Chenliang Xu

Abstract: Differential privacy (DP) in deep learning is a critical concern as it ensures the confidentiality of training data while maintaining model utility. Existing DP training algorithms provide privacy guarantees by clipping and then injecting external noise into sample gradients computed by the backpropagation algorithm. Different from backpropagation, forward-learning algorithms based on perturbation inherently add noise during the forward pass and utilize randomness to estimate the gradients. Although these algorithms are non-privatized, the introduction of noise during the forward pass indirectly provides internal randomness protection to the model parameters and their gradients, suggesting the potential for naturally providing differential privacy. In this paper, we propose a \blue{privatized} forward-learning algorithm, Differential Private Unified Likelihood Ratio (DP-ULR), and demonstrate its differential privacy guarantees. DP-ULR features a novel batch sampling operation with rejection, of which we provide theoretical analysis in conjunction with classic differential privacy mechanisms. DP-ULR is also underpinned by a theoretically guided privacy controller that dynamically adjusts noise levels to manage privacy costs in each training step. Our experiments indicate that DP-ULR achieves competitive performance compared to traditional differential privacy training algorithms based on backpropagation, maintaining nearly the same privacy loss limits.

new HERA: Hybrid Edge-cloud Resource Allocation for Cost-Efficient AI Agents

Authors: Shiyi Liu, Haiying Shen, Shuai Che, Mahdi Ghandi, Mingqin Li

Abstract: In the realm of AI, large language models (LLMs) like GPT-4, central to the operation of AI agents, predominantly operate in the cloud, incurring high operational costs. With local-based small language models (SLMs) becoming more accurate, the necessity of cloud-exclusive processing is being reconsidered. An AI agent's response to a user's request comprises a series of subtasks or iterations. Existing approaches only allocate a single request between SLM and LLM to ensure their outputs are similar, but adopting this approach in the AI agent scenario for assigning each subtask is not effective since SLM will output a different subsequent subtask, which affects the accuracy of the final output. In this paper, we first conduct experimental analysis to understand the features of AI agent operations. Leveraging our findings, we propose the Adaptive Iteration-level Model Selector (AIMS), a lightweight scheduler to automatically partition AI agent's subtasks between local-based SLM and cloud-based LLM. AIMS considers the varying subtask features and strategically decides the location for each subtask in order to use SLM as much as possible while attaining the accuracy level. Our experimental results demonstrate that AIMS increases accuracy by up to 9.1% and SLM usage by up to 10.8% compared to HybridLLM. It offloads 45.67% of subtasks to a local SLM while attaining similar accuracy on average compared with the cloud-only LLM approach.

new MetaLoRA: Tensor-Enhanced Adaptive Low-Rank Fine-tuning

Authors: Maolin Wang, Xiangyu Zhao

Abstract: There has been a significant increase in the deployment of neural network models, presenting substantial challenges in model adaptation and fine-tuning. Efficient adaptation is crucial in maintaining model performance across diverse tasks and domains. While Low-Rank Adaptation (LoRA) has emerged as a promising parameter-efficient fine-tuning method, its fixed parameter nature limits its ability to handle dynamic task requirements effectively. Adapting models to new tasks can be challenging due to the need for extensive fine-tuning. Current LoRA variants primarily focus on general parameter reduction while overlooking the importance of dynamic parameter adjustment and meta-learning capabilities. Moreover, existing approaches mainly address static adaptations, neglecting the potential benefits of task-aware parameter generation in handling diverse task distributions. To address these limitations, this Ph.D. research proposes a LoRA generation approach to model task relationships and introduces MetaLoRA, a novel parameter-efficient adaptation framework incorporating meta-learning principles. This work develops a comprehensive architecture that integrates meta-parameter generation with adaptive low-rank decomposition, enabling efficient handling of both task-specific and task-agnostic features. MetaLoRA accurately captures task patterns by incorporating meta-learning mechanisms and dynamic parameter adjustment strategies. To our knowledge, this research represents the first attempt to provide a meta-learning enhanced LoRA variant, offering improved adaptation capability while maintaining computational efficiency in model fine-tuning.

new Efficient Near-Optimal Algorithm for Online Shortest Paths in Directed Acyclic Graphs with Bandit Feedback Against Adaptive Adversaries

Authors: Arnab Maiti, Zhiyuan Fan, Kevin Jamieson, Lillian J. Ratliff, Gabriele Farina

Abstract: In this paper, we study the online shortest path problem in directed acyclic graphs (DAGs) under bandit feedback against an adaptive adversary. Given a DAG $G = (V, E)$ with a source node $v_{\mathsf{s}}$ and a sink node $v_{\mathsf{t}}$, let $X \subseteq \{0,1\}^{|E|}$ denote the set of all paths from $v_{\mathsf{s}}$ to $v_{\mathsf{t}}$. At each round $t$, we select a path $\mathbf{x}_t \in X$ and receive bandit feedback on our loss $\langle \mathbf{x}_t, \mathbf{y}_t \rangle \in [-1,1]$, where $\mathbf{y}_t$ is an adversarially chosen loss vector. Our goal is to minimize regret with respect to the best path in hindsight over $T$ rounds. We propose the first computationally efficient algorithm to achieve a near-minimax optimal regret bound of $\tilde O(\sqrt{|E|T\log |X|})$ with high probability against any adaptive adversary, where $\tilde O(\cdot)$ hides logarithmic factors in the number of edges $|E|$. Our algorithm leverages a novel loss estimator and a centroid-based decomposition in a nontrivial manner to attain this regret bound. As an application, we show that our algorithm for DAGs provides state-of-the-art efficient algorithms for $m$-sets, extensive-form games, the Colonel Blotto game, shortest walks in directed graphs, hypercubes, and multi-task multi-armed bandits, achieving improved high-probability regret guarantees in all these settings.

new Informed Greedy Algorithm for Scalable Bayesian Network Fusion via Minimum Cut Analysis

Authors: Pablo Torrijos, Jos\'e M. Puerta, Jos\'e A. G\'amez, Juan A. Aledo

Abstract: This paper presents the Greedy Min-Cut Bayesian Consensus (GMCBC) algorithm for the structural fusion of Bayesian Networks (BNs). The method is designed to preserve essential dependencies while controlling network complexity. It addresses the limitations of traditional fusion approaches, which often lead to excessively complex models that are impractical for inference, reasoning, or real-world applications. As the number and size of input networks increase, this issue becomes even more pronounced. GMCBC integrates principles from flow network theory into BN fusion, adapting the Backward Equivalence Search (BES) phase of the Greedy Equivalence Search (GES) algorithm and applying the Ford-Fulkerson algorithm for minimum cut analysis. This approach removes non-essential edges, ensuring that the fused network retains key dependencies while minimizing unnecessary complexity. Experimental results on synthetic Bayesian Networks demonstrate that GMCBC achieves near-optimal network structures. In federated learning simulations, GMCBC produces a consensus network that improves structural accuracy and dependency preservation compared to the average of the input networks, resulting in a structure that better captures the real underlying (in)dependence relationships. This consensus network also maintains a similar size to the original networks, unlike unrestricted fusion methods, where network size grows exponentially.

new Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset Selection

Authors: Ruoyu Chen, Siyuan Liang, Jingzhi Li, Shiming Liu, Li Liu, Hua Zhang, Xiaochun Cao

Abstract: To develop a trustworthy AI system, which aim to identify the input regions that most influence the models decisions. The primary task of existing attribution methods lies in efficiently and accurately identifying the relationships among input-prediction interactions. Particularly when the input data is discrete, such as images, analyzing the relationship between inputs and outputs poses a significant challenge due to the combinatorial explosion. In this paper, we propose a novel and efficient black-box attribution mechanism, LiMA (Less input is More faithful for Attribution), which reformulates the attribution of important regions as an optimization problem for submodular subset selection. First, to accurately assess interactions, we design a submodular function that quantifies subset importance and effectively captures their impact on decision outcomes. Then, efficiently ranking input sub-regions by their importance for attribution, we improve optimization efficiency through a novel bidirectional greedy search algorithm. LiMA identifies both the most and least important samples while ensuring an optimal attribution boundary that minimizes errors. Extensive experiments on eight foundation models demonstrate that our method provides faithful interpretations with fewer regions and exhibits strong generalization, shows an average improvement of 36.3% in Insertion and 39.6% in Deletion. Our method also outperforms the naive greedy search in attribution efficiency, being 1.6 times faster. Furthermore, when explaining the reasons behind model prediction errors, the average highest confidence achieved by our method is, on average, 86.1% higher than that of state-of-the-art attribution algorithms. The code is available at https://github.com/RuoyuChen10/LIMA.

URLs: https://github.com/RuoyuChen10/LIMA.

new Preconditioned Additive Gaussian Processes with Fourier Acceleration

Authors: Theresa Wagner, Tianshi Xu, Franziska Nestler, Yuanzhe Xi, Martin Stoll

Abstract: Gaussian processes (GPs) are crucial in machine learning for quantifying uncertainty in predictions. However, their associated covariance matrices, defined by kernel functions, are typically dense and large-scale, posing significant computational challenges. This paper introduces a matrix-free method that utilizes the Non-equispaced Fast Fourier Transform (NFFT) to achieve nearly linear complexity in the multiplication of kernel matrices and their derivatives with vectors for a predetermined accuracy level. To address high-dimensional problems, we propose an additive kernel approach. Each sub-kernel in this approach captures lower-order feature interactions, allowing for the efficient application of the NFFT method and potentially increasing accuracy across various real-world datasets. Additionally, we implement a preconditioning strategy that accelerates hyperparameter tuning, further improving the efficiency and effectiveness of GPs.

new Enhancing stroke disease classification through machine learning models via a novel voting system by feature selection techniques

Authors: Mahade Hasan, Farhana Yasmin, Md. Mehedi Hassan, Xue Yu, Soniya Yeasmin, Herat Joshi, Sheikh Mohammed Shariful Islam

Abstract: Heart disease remains a leading cause of mortality and morbidity worldwide, necessitating the development of accurate and reliable predictive models to facilitate early detection and intervention. While state of the art work has focused on various machine learning approaches for predicting heart disease, but they could not able to achieve remarkable accuracy. In response to this need, we applied nine machine learning algorithms XGBoost, logistic regression, decision tree, random forest, k-nearest neighbors (KNN), support vector machine (SVM), gaussian na\"ive bayes (NB gaussian), adaptive boosting, and linear regression to predict heart disease based on a range of physiological indicators. Our approach involved feature selection techniques to identify the most relevant predictors, aimed at refining the models to enhance both performance and interpretability. The models were trained, incorporating processes such as grid search hyperparameter tuning, and cross-validation to minimize overfitting. Additionally, we have developed a novel voting system with feature selection techniques to advance heart disease classification. Furthermore, we have evaluated the models using key performance metrics including accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (ROC AUC). Among the models, XGBoost demonstrated exceptional performance, achieving 99% accuracy, precision, F1-Score, 98% recall, and 100% ROC AUC. This study offers a promising approach to early heart disease diagnosis and preventive healthcare.

new ParallelFlow: Parallelizing Linear Transformers via Flow Discretization

Authors: Nicola Muca Cirone, Cristopher Salvi

Abstract: We present a theoretical framework for analyzing linear attention models through matrix-valued state space models (SSMs). Our approach, Parallel Flows, provides a perspective that systematically decouples temporal dynamics from implementation constraints, enabling independent analysis of critical algorithmic components: chunking, parallelization, and information aggregation. Central to this framework is the reinterpretation of chunking procedures as computations of the flows governing system dynamics. This connection establishes a bridge to mathematical tools from rough path theory, opening the door to new insights into sequence modeling architectures. As a concrete application, we analyze DeltaNet in a generalized low-rank setting motivated by recent theoretical advances. Our methods allow us to design simple, streamlined generalizations of hardware-efficient algorithms present in the literature, and to provide completely different ones, inspired by rough paths techniques, with provably lower complexity. This dual contribution demonstrates how principled theoretical analysis can both explain existing practical methods and inspire fundamentally new computational approaches.

new Operator Learning with Domain Decomposition for Geometry Generalization in PDE Solving

Authors: Jianing Huang, Kaixuan Zhang, Youjia Wu, Ze Cheng

Abstract: Neural operators have become increasingly popular in solving \textit{partial differential equations} (PDEs) due to their superior capability to capture intricate mappings between function spaces over complex domains. However, the data-hungry nature of operator learning inevitably poses a bottleneck for their widespread applications. At the core of the challenge lies the absence of transferability of neural operators to new geometries. To tackle this issue, we propose operator learning with domain decomposition, a local-to-global framework to solve PDEs on arbitrary geometries. Under this framework, we devise an iterative scheme \textit{Schwarz Neural Inference} (SNI). This scheme allows for partitioning of the problem domain into smaller subdomains, on which local problems can be solved with neural operators, and stitching local solutions to construct a global solution. Additionally, we provide a theoretical analysis of the convergence rate and error bound. We conduct extensive experiments on several representative PDEs with diverse boundary conditions and achieve remarkable geometry generalization compared to alternative methods. These analysis and experiments demonstrate the proposed framework's potential in addressing challenges related to geometry generalization and data efficiency.

new Training Frozen Feature Pyramid DINOv2 for Eyelid Measurements with Infinite Encoding and Orthogonal Regularization

Authors: Chun-Hung Chen

Abstract: Accurate measurement of eyelid parameters such as Margin Reflex Distances (MRD1, MRD2) and Levator Function (LF) is critical in oculoplastic diagnostics but remains limited by manual, inconsistent methods. This study evaluates deep learning models: SE-ResNet, EfficientNet, and the vision transformer-based DINOv2 for automating these measurements using smartphone-acquired images. We assess performance across frozen and fine-tuned settings, using MSE, MAE, and R2 metrics. DINOv2, pretrained through self-supervised learning, demonstrates superior scalability and robustness, especially under frozen conditions ideal for mobile deployment. Lightweight regressors such as MLP and Deep Ensemble offer high precision with minimal computational overhead. To address class imbalance and improve generalization, we integrate focal loss, orthogonal regularization, and binary encoding strategies. Our results show that DINOv2 combined with these enhancements delivers consistent, accurate predictions across all tasks, making it a strong candidate for real-world, mobile-friendly clinical applications. This work highlights the potential of foundation models in advancing AI-powered ophthalmic care.

new Adversarial Curriculum Graph-Free Knowledge Distillation for Graph Neural Networks

Authors: Yuang Jia, Xiaojuan Shan, Jun Xia, Guancheng Wan, Yuchen Zhang, Wenke Huang, Mang Ye, Stan Z. Li

Abstract: Data-free Knowledge Distillation (DFKD) is a method that constructs pseudo-samples using a generator without real data, and transfers knowledge from a teacher model to a student by enforcing the student to overcome dimensional differences and learn to mimic the teacher's outputs on these pseudo-samples. In recent years, various studies in the vision domain have made notable advancements in this area. However, the varying topological structures and non-grid nature of graph data render the methods from the vision domain ineffective. Building upon prior research into differentiable methods for graph neural networks, we propose a fast and high-quality data-free knowledge distillation approach in this paper. Without compromising distillation quality, the proposed graph-free KD method (ACGKD) significantly reduces the spatial complexity of pseudo-graphs by leveraging the Binary Concrete distribution to model the graph structure and introducing a spatial complexity tuning parameter. This approach enables efficient gradient computation for the graph structure, thereby accelerating the overall distillation process. Additionally, ACGKD eliminates the dimensional ambiguity between the student and teacher models by increasing the student's dimensions and reusing the teacher's classifier. Moreover, it equips graph knowledge distillation with a CL-based strategy to ensure the student learns graph structures progressively. Extensive experiments demonstrate that ACGKD achieves state-of-the-art performance in distilling knowledge from GNNs without training data.

new Geometric Median Matching for Robust k-Subset Selection from Noisy Data

Authors: Anish Acharya, Sujay Sanghavi, Alexandros G Dimakis, Inderjit S Dhillon

Abstract: Data pruning -- the combinatorial task of selecting a small and representative subset from a large dataset, is crucial for mitigating the enormous computational costs associated with training data-hungry modern deep learning models at scale. Since large scale data collections are invariably noisy, developing data pruning strategies that remain robust even in the presence of corruption is critical in practice. However, existing data pruning methods often fail under high corruption rates due to their reliance on empirical mean estimation, which is highly sensitive to outliers. In response, we propose Geometric Median (GM) Matching, a novel k-subset selection strategy that leverages Geometric Median -- a robust estimator with an optimal breakdown point of 1/2; to enhance resilience against noisy data. Our method iteratively selects a k-subset such that the mean of the subset approximates the GM of the (potentially) noisy dataset, ensuring robustness even under arbitrary corruption. We provide theoretical guarantees, showing that GM Matching enjoys an improved O(1/k) convergence rate -- a quadratic improvement over random sampling, even under arbitrary corruption. Extensive experiments across image classification and image generation tasks demonstrate that GM Matching consistently outperforms existing pruning approaches, particularly in high-corruption settings and at high pruning rates; making it a strong baseline for robust data pruning.

new NeuraLUT-Assemble: Hardware-aware Assembling of Sub-Neural Networks for Efficient LUT Inference

Authors: Marta Andronic, George A. Constantinides

Abstract: Efficient neural networks (NNs) leveraging lookup tables (LUTs) have demonstrated significant potential for emerging AI applications, particularly when deployed on field-programmable gate arrays (FPGAs) for edge computing. These architectures promise ultra-low latency and reduced resource utilization, broadening neural network adoption in fields such as particle physics. However, existing LUT-based designs suffer from accuracy degradation due to the large fan-in required by neurons being limited by the exponential scaling of LUT resources with input width. In practice, in prior work this tension has resulted in the reliance on extremely sparse models. We present NeuraLUT-Assemble, a novel framework that addresses these limitations by combining mixed-precision techniques with the assembly of larger neurons from smaller units, thereby increasing connectivity while keeping the number of inputs of any given LUT manageable. Additionally, we introduce skip-connections across entire LUT structures to improve gradient flow. NeuraLUT-Assemble closes the accuracy gap between LUT-based methods and (fully-connected) MLP-based models, achieving competitive accuracy on tasks such as network intrusion detection, digit classification, and jet classification, demonstrating up to $8.42\times$ reduction in the area-delay product compared to the state-of-the-art at the time of the publication.

new Data Cleansing for GANs

Authors: Naoyuki Terashita, Hiroki Ohashi, Satoshi Hara

Abstract: As the application of generative adversarial networks (GANs) expands, it becomes increasingly critical to develop a unified approach that improves performance across various generative tasks. One effective strategy that applies to any machine learning task is identifying harmful instances, whose removal improves the performance. While previous studies have successfully estimated these harmful training instances in supervised settings, their approaches are not easily applicable to GANs. The challenge lies in two requirements of the previous approaches that do not apply to GANs. First, previous approaches require that the absence of a training instance directly affects the parameters. However, in the training for GANs, the instances do not directly affect the generator's parameters since they are only fed into the discriminator. Second, previous approaches assume that the change in loss directly quantifies the harmfulness of the instance to a model's performance, while common types of GAN losses do not always reflect the generative performance. To overcome the first challenge, we propose influence estimation methods that use the Jacobian of the generator's gradient with respect to the discriminator's parameters (and vice versa). Such a Jacobian represents the indirect effect between two models: how removing an instance from the discriminator's training changes the generator's parameters. Second, we propose an instance evaluation scheme that measures the harmfulness of each training instance based on how a GAN evaluation metric (e.g., Inception score) is expected to change by the instance's removal. Furthermore, we demonstrate that removing the identified harmful instances significantly improves the generative performance on various GAN evaluation metrics.

new Feature Subset Weighting for Distance-based Supervised Learning through Choquet Integration

Authors: Adnan Theerens, Yvan Saeys, Chris Cornelis

Abstract: This paper introduces feature subset weighting using monotone measures for distance-based supervised learning. The Choquet integral is used to define a distance metric that incorporates these weights. This integration enables the proposed distances to effectively capture non-linear relationships and account for interactions both between conditional and decision attributes and among conditional attributes themselves, resulting in a more flexible distance measure. In particular, we show how this approach ensures that the distances remain unaffected by the addition of duplicate and strongly correlated features. Another key point of this approach is that it makes feature subset weighting computationally feasible, since only $m$ feature subset weights should be calculated each time instead of calculating all feature subset weights ($2^m$), where $m$ is the number of attributes. Next, we also examine how the use of the Choquet integral for measuring similarity leads to a non-equivalent definition of distance. The relationship between distance and similarity is further explored through dual measures. Additionally, symmetric Choquet distances and similarities are proposed, preserving the classical symmetry between similarity and distance. Finally, we introduce a concrete feature subset weighting distance, evaluate its performance in a $k$-nearest neighbors (KNN) classification setting, and compare it against Mahalanobis distances and weighted distance methods.

new Impact of Data Duplication on Deep Neural Network-Based Image Classifiers: Robust vs. Standard Models

Authors: Alireza Aghabagherloo, Aydin Abadi, Sumanta Sarkar, Vishnu Asutosh Dasu, Bart Preneel

Abstract: The accuracy and robustness of machine learning models against adversarial attacks are significantly influenced by factors such as training data quality, model architecture, the training process, and the deployment environment. In recent years, duplicated data in training sets, especially in language models, has attracted considerable attention. It has been shown that deduplication enhances both training performance and model accuracy in language models. While the importance of data quality in training image classifier Deep Neural Networks (DNNs) is widely recognized, the impact of duplicated images in the training set on model generalization and performance has received little attention. In this paper, we address this gap and provide a comprehensive study on the effect of duplicates in image classification. Our analysis indicates that the presence of duplicated images in the training set not only negatively affects the efficiency of model training but also may result in lower accuracy of the image classifier. This negative impact of duplication on accuracy is particularly evident when duplicated data is non-uniform across classes or when duplication, whether uniform or non-uniform, occurs in the training set of an adversarially trained model. Even when duplicated samples are selected in a uniform way, increasing the amount of duplication does not lead to a significant improvement in accuracy.

new Learning to Normalize on the SPD Manifold under Bures-Wasserstein Geometry

Authors: Rui Wang, Shaocheng Jin, Ziheng Chen, Xiaoqing Luo, Xiao-Jun Wu

Abstract: Covariance matrices have proven highly effective across many scientific fields. Since these matrices lie within the Symmetric Positive Definite (SPD) manifold - a Riemannian space with intrinsic non-Euclidean geometry, the primary challenge in representation learning is to respect this underlying geometric structure. Drawing inspiration from the success of Euclidean deep learning, researchers have developed neural networks on the SPD manifolds for more faithful covariance embedding learning. A notable advancement in this area is the implementation of Riemannian batch normalization (RBN), which has been shown to improve the performance of SPD network models. Nonetheless, the Riemannian metric beneath the existing RBN might fail to effectively deal with the ill-conditioned SPD matrices (ICSM), undermining the effectiveness of RBN. In contrast, the Bures-Wasserstein metric (BWM) demonstrates superior performance for ill-conditioning. In addition, the recently introduced Generalized BWM (GBWM) parameterizes the vanilla BWM via an SPD matrix, allowing for a more nuanced representation of vibrant geometries of the SPD manifold. Therefore, we propose a novel RBN algorithm based on the GBW geometry, incorporating a learnable metric parameter. Moreover, the deformation of GBWM by matrix power is also introduced to further enhance the representational capacity of GBWM-based RBN. Experimental results on different datasets validate the effectiveness of our proposed method.

new Sim-is-More: Randomizing HW-NAS with Synthetic Devices

Authors: Francesco Capuano, Gabriele Tiboni, Niccol\`o Cavagnero, Giuseppe Averta

Abstract: Existing hardware-aware NAS (HW-NAS) methods typically assume access to precise information circa the target device, either via analytical approximations of the post-compilation latency model, or through learned latency predictors. Such approximate approaches risk introducing estimation errors that may prove detrimental in risk-sensitive applications. In this work, we propose a two-stage HW-NAS framework, in which we first learn an architecture controller on a distribution of synthetic devices, and then directly deploy the controller on a target device. At test-time, our network controller deploys directly to the target device without relying on any pre-collected information, and only exploits direct interactions. In particular, the pre-training phase on synthetic devices enables the controller to design an architecture for the target device by interacting with it through a small number of high-fidelity latency measurements. To guarantee accessibility of our method, we only train our controller with training-free accuracy proxies, allowing us to scale the meta-training phase without incurring the overhead of full network training. We benchmark on HW-NATS-Bench, demonstrating that our method generalizes to unseen devices and searches for latency-efficient architectures by in-context adaptation using only a few real-world latency evaluations at test-time.

new GraphMaster: Automated Graph Synthesis via LLM Agents in Data-Limited Environments

Authors: Enjun Du, Xunkai Li, Tian Jin, Zhihan Zhang, Rong-Hua Li, Guoren Wang

Abstract: The era of foundation models has revolutionized AI research, yet Graph Foundation Models (GFMs) remain constrained by the scarcity of large-scale graph corpora. Traditional graph data synthesis techniques primarily focus on simplistic structural operations, lacking the capacity to generate semantically rich nodes with meaningful textual attributes: a critical limitation for real-world applications. While large language models (LLMs) demonstrate exceptional text generation capabilities, their direct application to graph synthesis is impeded by context window limitations, hallucination phenomena, and structural consistency challenges. To address these issues, we introduce GraphMaster, the first multi-agent framework specifically designed for graph data synthesis in data-limited environments. GraphMaster orchestrates four specialized LLM agents (Manager, Perception, Enhancement, and Evaluation) that collaboratively optimize the synthesis process through iterative refinement, ensuring both semantic coherence and structural integrity. To rigorously evaluate our approach, we create new data-limited "Sub" variants of six standard graph benchmarks, specifically designed to test synthesis capabilities under realistic constraints. Additionally, we develop a novel interpretability assessment framework that combines human evaluation with a principled Grassmannian manifold-based analysis, providing both qualitative and quantitative measures of semantic coherence. Experimental results demonstrate that GraphMaster significantly outperforms traditional synthesis methods across multiple datasets, establishing a strong foundation for advancing GFMs in data-scarce environments.

new Spectral Normalization and Voigt-Reuss net: A universal approach to microstructure-property forecasting with physical guarantees

Authors: Sanath Keshav, Julius Herb, Felix Fritzen

Abstract: Heterogeneous materials are crucial to producing lightweight components, functional components, and structures composed of them. A crucial step in the design process is the rapid evaluation of their effective mechanical, thermal, or, in general, constitutive properties. The established procedure is to use forward models that accept microstructure geometry and local constitutive properties as inputs. The classical simulation-based approach, which uses, e.g., finite elements and FFT-based solvers, can require substantial computational resources. At the same time, simulation-based models struggle to provide gradients with respect to the microstructure and the constitutive parameters. Such gradients are, however, of paramount importance for microstructure design and for inverting the microstructure-property mapping. Machine learning surrogates can excel in these situations. However, they can lead to unphysical predictions that violate essential bounds on the constitutive response, such as the upper (Voigt-like) or the lower (Reuss-like) bound in linear elasticity. Therefore, we propose a novel spectral normalization scheme that a priori enforces these bounds. The approach is fully agnostic with respect to the chosen microstructural features and the utilized surrogate model. All of these will automatically and strictly predict outputs that obey the upper and lower bounds by construction. The technique can be used for any constitutive tensor that is symmetric and where upper and lower bounds (in the L\"owner sense) exist, i.e., for permeability, thermal conductivity, linear elasticity, and many more. We demonstrate the use of spectral normalization in the Voigt-Reuss net using a simple neural network. Numerical examples on truly extensive datasets illustrate the improved accuracy, robustness, and independence of the type of input features in comparison to much-used neural networks.

new Alleviating Performance Disparity in Adversarial Spatiotemporal Graph Learning Under Zero-Inflated Distribution

Authors: Songran Bai, Yuheng Ji, Yue Liu, Xingwei Zhang, Xiaolong Zheng, Daniel Dajun Zeng

Abstract: Spatiotemporal Graph Learning (SGL) under Zero-Inflated Distribution (ZID) is crucial for urban risk management tasks, including crime prediction and traffic accident profiling. However, SGL models are vulnerable to adversarial attacks, compromising their practical utility. While adversarial training (AT) has been widely used to bolster model robustness, our study finds that traditional AT exacerbates performance disparities between majority and minority classes under ZID, potentially leading to irreparable losses due to underreporting critical risk events. In this paper, we first demonstrate the smaller top-k gradients and lower separability of minority class are key factors contributing to this disparity. To address these issues, we propose MinGRE, a framework for Minority Class Gradients and Representations Enhancement. MinGRE employs a multi-dimensional attention mechanism to reweight spatiotemporal gradients, minimizing the gradient distribution discrepancies across classes. Additionally, we introduce an uncertainty-guided contrastive loss to improve the inter-class separability and intra-class compactness of minority representations with higher uncertainty. Extensive experiments demonstrate that the MinGRE framework not only significantly reduces the performance disparity across classes but also achieves enhanced robustness compared to existing baselines. These findings underscore the potential of our method in fostering the development of more equitable and robust models.

new EMO: Edge Model Overlays to Scale Model Size in Federated Learning

Authors: Di Wu, Weibo He, Wanglei Feng, Zhenyu Wen, Bin Qian, Blesson Varghese

Abstract: Federated Learning (FL) trains machine learning models on edge devices with distributed data. However, the computational and memory limitations of these devices restrict the training of large models using FL. Split Federated Learning (SFL) addresses this challenge by distributing the model across the device and server, but it introduces a tightly coupled data flow, leading to computational bottlenecks and high communication costs. We propose EMO as a solution to enable the training of large models in FL while mitigating the challenges of SFL. EMO introduces Edge Model Overlay(s) between the device and server, enabling the creation of a larger ensemble model without modifying the FL workflow. The key innovation in EMO is Augmented Federated Learning (AFL), which builds an ensemble model by connecting the original (smaller) FL model with model(s) trained in the overlay(s) to facilitate horizontal or vertical scaling. This is accomplished through three key modules: a hierarchical activation replay cache to decouple AFL from FL, a convergence-aware communication controller to optimize communication overhead, and an ensemble inference module. Evaluations on a real-world prototype show that EMO improves accuracy by up to 17.77% compared to FL, and reduces communication costs by up to 7.17x and decreases training time by up to 6.9x compared to SFL.

new Detection of Disease on Nasal Breath Sound by New Lightweight Architecture: Using COVID-19 as An Example

Authors: Jiayuan She, Lin Shi, Peiqi Li, Ziling Dong, Renxing Li, Shengkai Li, Liping Gu, Tong Zhao, Zhuochang Yang, Yajie Ji, Liang Feng, Jiangang Chen

Abstract: Background. Infectious diseases, particularly COVID-19, continue to be a significant global health issue. Although many countries have reduced or stopped large-scale testing measures, the detection of such diseases remains a propriety. Objective. This study aims to develop a novel, lightweight deep neural network for efficient, accurate, and cost-effective detection of COVID-19 using a nasal breathing audio data collected via smartphones. Methodology. Nasal breathing audio from 128 patients diagnosed with the Omicron variant was collected. Mel-Frequency Cepstral Coefficients (MFCCs), a widely used feature in speech and sound analysis, were employed for extracting important characteristics from the audio signals. Additional feature selection was performed using Random Forest (RF) and Principal Component Analysis (PCA) for dimensionality reduction. A Dense-ReLU-Dropout model was trained with K-fold cross-validation (K=3), and performance metrics like accuracy, precision, recall, and F1-score were used to evaluate the model. Results. The proposed model achieved 97% accuracy in detecting COVID-19 from nasal breathing sounds, outperforming state-of-the-art methods such as those by [23] and [13]. Our Dense-ReLU-Dropout model, using RF and PCA for feature selection, achieves high accuracy with greater computational efficiency compared to existing methods that require more complex models or larger datasets. Conclusion. The findings suggest that the proposed method holds significant potential for clinical implementation, advancing smartphone-based diagnostics in infectious diseases. The Dense-ReLU-Dropout model, combined with innovative feature processing techniques, offers a promising approach for efficient and accurate COVID-19 detection, showcasing the capabilities of mobile device-based diagnostics

new Automated Feature Labeling with Token-Space Gradient Descent

Authors: Julian Schulz, Seamus Fallows

Abstract: We present a novel approach to feature labeling using gradient descent in token-space. While existing methods typically use language models to generate hypotheses about feature meanings, our method directly optimizes label representations by using a language model as a discriminator to predict feature activations. We formulate this as a multi-objective optimization problem in token-space, balancing prediction accuracy, entropy minimization, and linguistic naturalness. Our proof-of-concept experiments demonstrate successful convergence to interpretable single-token labels across diverse domains, including features for detecting animals, mammals, Chinese text, and numbers. Although our current implementation is constrained to single-token labels and relatively simple features, the results suggest that token-space gradient descent could become a valuable addition to the interpretability researcher's toolkit.

new Integrating Fourier Neural Operators with Diffusion Models to improve Spectral Representation of Synthetic Earthquake Ground Motion Response

Authors: Niccol\`o Perrone, Fanny Lehmann, Hugo Gabrielidis, Stefania Fresca, Filippo Gatti

Abstract: Nuclear reactor buildings must be designed to withstand the dynamic load induced by strong ground motion earthquakes. For this reason, their structural behavior must be assessed in multiple realistic ground shaking scenarios (e.g., the Maximum Credible Earthquake). However, earthquake catalogs and recorded seismograms may not always be available in the region of interest. Therefore, synthetic earthquake ground motion is progressively being employed, although with some due precautions: earthquake physics is sometimes not well enough understood to be accurately reproduced with numerical tools, and the underlying epistemic uncertainties lead to prohibitive computational costs related to model calibration. In this study, we propose an AI physics-based approach to generate synthetic ground motion, based on the combination of a neural operator that approximates the elastodynamics Green's operator in arbitrary source-geology setups, enhanced by a denoising diffusion probabilistic model. The diffusion model is trained to correct the ground motion time series generated by the neural operator. Our results show that such an approach promisingly enhances the realism of the generated synthetic seismograms, with frequency biases and Goodness-Of-Fit (GOF) scores being improved by the diffusion model. This indicates that the latter is capable to mitigate the mid-frequency spectral falloff observed in the time series generated by the neural operator. Our method showcases fast and cheap inference in different site and source conditions.

new TAMIS: Tailored Membership Inference Attacks on Synthetic Data

Authors: Paul Andrey, Batiste Le Bars, Marc Tommasi

Abstract: Membership Inference Attacks (MIA) enable to empirically assess the privacy of a machine learning algorithm. In this paper, we propose TAMIS, a novel MIA against differentially-private synthetic data generation methods that rely on graphical models. This attack builds upon MAMA-MIA, a recently-published state-of-the-art method. It lowers its computational cost and requires less attacker knowledge. Our attack is the product of a two-fold improvement. First, we recover the graphical model having generated a synthetic dataset by using solely that dataset, rather than shadow-modeling over an auxiliary one. This proves less costly and more performant. Second, we introduce a more mathematically-grounded attack score, that provides a natural threshold for binary predictions. In our experiments, TAMIS achieves better or similar performance as MAMA-MIA on replicas of the SNAKE challenge.

new Automated Explanation of Machine Learning Models of Footballing Actions in Words

Authors: Pegah Rahimian, Jernej Flisar, David Sumpter

Abstract: While football analytics has changed the way teams and analysts assess performance, there remains a communication gap between machine learning practice and how coaching staff talk about football. Coaches and practitioners require actionable insights, which are not always provided by models. To bridge this gap, we show how to build wordalizations (a novel approach that leverages large language models) for shots in football. Specifically, we first build an expected goals model using logistic regression. We then use the co-efficients of this regression model to write sentences describing how factors (such as distance, angle and defensive pressure) contribute to the model's prediction. Finally, we use large language models to give an entertaining description of the shot. We describe our approach in a model card and provide an interactive open-source application describing shots in recent tournaments. We discuss how shot wordalisations might aid communication in coaching and football commentary, and give a further example of how the same approach can be applied to other actions in football.

new Conditional Temporal Neural Processes with Covariance Loss

Authors: Boseon Yoo, Jiwoo Lee, Janghoon Ju, Seijun Chung, Soyeon Kim, Jaesik Choi

Abstract: We introduce a novel loss function, Covariance Loss, which is conceptually equivalent to conditional neural processes and has a form of regularization so that is applicable to many kinds of neural networks. With the proposed loss, mappings from input variables to target variables are highly affected by dependencies of target variables as well as mean activation and mean dependencies of input and target variables. This nature enables the resulting neural networks to become more robust to noisy observations and recapture missing dependencies from prior information. In order to show the validity of the proposed loss, we conduct extensive sets of experiments on real-world datasets with state-of-the-art models and discuss the benefits and drawbacks of the proposed Covariance Loss.

new Mixture-of-Experts for Distributed Edge Computing with Channel-Aware Gating Function

Authors: Qiuchen Song, Shusen Jing, Shuai Zhang, Songyang Zhang, Chuan Huang

Abstract: In a distributed mixture-of-experts (MoE) system, a server collaborates with multiple specialized expert clients to perform inference. The server extracts features from input data and dynamically selects experts based on their areas of specialization to produce the final output. Although MoE models are widely valued for their flexibility and performance benefits, adapting distributed MoEs to operate effectively in wireless networks has remained unexplored. In this work, we introduce a novel channel-aware gating function for wireless distributed MoE, which incorporates channel conditions into the MoE gating mechanism. To train the channel-aware gating, we simulate various signal-to-noise ratios (SNRs) for each expert's communication channel and add noise to the features distributed to the experts based on these SNRs. The gating function then utilizes both features and SNRs to optimize expert selection. Unlike conventional MoE models which solely consider the alignment of features with the specializations of experts, our approach additionally considers the impact of channel conditions on expert performance. Experimental results demonstrate that the proposed channel-aware gating scheme outperforms traditional MoE models.

new Deep Generative Models: Complexity, Dimensionality, and Approximation

Authors: Kevin Wang, Hongqian Niu, Yixin Wang, Didong Li

Abstract: Generative networks have shown remarkable success in learning complex data distributions, particularly in generating high-dimensional data from lower-dimensional inputs. While this capability is well-documented empirically, its theoretical underpinning remains unclear. One common theoretical explanation appeals to the widely accepted manifold hypothesis, which suggests that many real-world datasets, such as images and signals, often possess intrinsic low-dimensional geometric structures. Under this manifold hypothesis, it is widely believed that to approximate a distribution on a $d$-dimensional Riemannian manifold, the latent dimension needs to be at least $d$ or $d+1$. In this work, we show that this requirement on the latent dimension is not necessary by demonstrating that generative networks can approximate distributions on $d$-dimensional Riemannian manifolds from inputs of any arbitrary dimension, even lower than $d$, taking inspiration from the concept of space-filling curves. This approach, in turn, leads to a super-exponential complexity bound of the deep neural networks through expanded neurons. Our findings thus challenge the conventional belief on the relationship between input dimensionality and the ability of generative networks to model data distributions. This novel insight not only corroborates the practical effectiveness of generative networks in handling complex data structures, but also underscores a critical trade-off between approximation error, dimensionality, and model complexity.

new Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations

Authors: Chongjie Si, Zhiyi Shi, Xuehui Wang, Yichen Xiao, Xiaokang Yang, Wei Shen

Abstract: Adapting pre-trained foundation models for diverse downstream tasks is a core practice in artificial intelligence. However, the wide range of tasks and high computational costs make full fine-tuning impractical. To overcome this, parameter-efficient fine-tuning (PEFT) methods like LoRA have emerged and are becoming a growing research focus. Despite the success of these methods, they are primarily designed for linear layers, focusing on two-dimensional matrices while largely ignoring higher-dimensional parameter spaces like convolutional kernels. Moreover, directly applying these methods to higher-dimensional parameter spaces often disrupts their structural relationships. Given the rapid advancements in matrix-based PEFT methods, rather than designing a specialized strategy, we propose a generalization that extends matrix-based PEFT methods to higher-dimensional parameter spaces without compromising their structural properties. Specifically, we treat parameters as elements of a Lie group, with updates modeled as perturbations in the corresponding Lie algebra. These perturbations are mapped back to the Lie group through the exponential map, ensuring smooth, consistent updates that preserve the inherent structure of the parameter space. Extensive experiments on computer vision and natural language processing validate the effectiveness and versatility of our approach, demonstrating clear improvements over existing methods.

new ReaLitE: Enrichment of Relation Embeddings in Knowledge Graphs using Numeric Literals

Authors: Antonis Klironomos, Baifan Zhou, Zhuoxun Zheng, Gad-Elrab Mohamed, Heiko Paulheim, Evgeny Kharlamov

Abstract: Most knowledge graph embedding (KGE) methods tailored for link prediction focus on the entities and relations in the graph, giving little attention to other literal values, which might encode important information. Therefore, some literal-aware KGE models attempt to either integrate numerical values into the embeddings of the entities or convert these numerics into entities during preprocessing, leading to information loss. Other methods concerned with creating relation-specific numerical features assume completeness of numerical data, which does not apply to real-world graphs. In this work, we propose ReaLitE, a novel relation-centric KGE model that dynamically aggregates and merges entities' numerical attributes with the embeddings of the connecting relations. ReaLitE is designed to complement existing conventional KGE methods while supporting multiple variations for numerical aggregations, including a learnable method. We comprehensively evaluated the proposed relation-centric embedding using several benchmarks for link prediction and node classification tasks. The results showed the superiority of ReaLitE over the state of the art in both tasks.

new P2NIA: Privacy-Preserving Non-Iterative Auditing

Authors: Jade Garcia Bourr\'ee, Hadrien Lautraite, S\'ebastien Gambs, Gilles Tredan, Erwan Le Merrer, Beno\^it Rottembourg

Abstract: The emergence of AI legislation has increased the need to assess the ethical compliance of high-risk AI systems. Traditional auditing methods rely on platforms' application programming interfaces (APIs), where responses to queries are examined through the lens of fairness requirements. However, such approaches put a significant burden on platforms, as they are forced to maintain APIs while ensuring privacy, facing the possibility of data leaks. This lack of proper collaboration between the two parties, in turn, causes a significant challenge to the auditor, who is subject to estimation bias as they are unaware of the data distribution of the platform. To address these two issues, we present P2NIA, a novel auditing scheme that proposes a mutually beneficial collaboration for both the auditor and the platform. Extensive experiments demonstrate P2NIA's effectiveness in addressing both issues. In summary, our work introduces a privacy-preserving and non-iterative audit scheme that enhances fairness assessments using synthetic or local data, avoiding the challenges associated with traditional API-based audits.

new Detection of Anomalous Vehicular Traffic and Sensor Failures Using Data Clustering Techniques

Authors: Davide Moretti, Elia Onofri, Emiliano Cristiani

Abstract: The increasing availability of traffic data from sensor networks has created new opportunities for understanding vehicular dynamics and identifying anomalies. In this study, we employ clustering techniques to analyse traffic flow data with the dual objective of uncovering meaningful traffic patterns and detecting anomalies, including sensor failures and irregular congestion events. We explore multiple clustering approaches, i.e partitioning and hierarchical methods, combined with various time-series representations and similarity measures. Our methodology is applied to real-world data from highway sensors, enabling us to assess the impact of different clustering frameworks on traffic pattern recognition. We also introduce a clustering-driven anomaly detection methodology that identifies deviations from expected traffic behaviour based on distance-based anomaly scores. Results indicate that hierarchical clustering with symbolic representations provides robust segmentation of traffic patterns, while partitioning methods such as k-means and fuzzy c-means yield meaningful results when paired with Dynamic Time Warping. The proposed anomaly detection strategy successfully identifies sensor malfunctions and abnormal traffic conditions with minimal false positives, demonstrating its practical utility for real-time monitoring. Real-world vehicular traffic data are provided by Autostrade Alto Adriatico S.p.A.

new Spectral Architecture Search for Neural Networks

Authors: Gianluca Peri, Lorenzo Giambagli, Lorenzo Chicchi, Duccio Fanelli

Abstract: Architecture design and optimization are challenging problems in the field of artificial neural networks. Working in this context, we here present SPARCS (SPectral ARchiteCture Search), a novel architecture search protocol which exploits the spectral attributes of the inter-layer transfer matrices. SPARCS allows one to explore the space of possible architectures by spanning continuous and differentiable manifolds, thus enabling for gradient-based optimization algorithms to be eventually employed. With reference to simple benchmark models, we show that the newly proposed method yields a self-emerging architecture with a minimal degree of expressivity to handle the task under investigation and with a reduced parameter count as compared to other viable alternatives.

new Provably accurate adaptive sampling for collocation points in physics-informed neural networks

Authors: Antoine Caradot, R\'emi Emonet, Amaury Habrard, Abdel-Rahim Mezidi, Marc Sebban

Abstract: Despite considerable scientific advances in numerical simulation, efficiently solving PDEs remains a complex and often expensive problem. Physics-informed Neural Networks (PINN) have emerged as an efficient way to learn surrogate solvers by embedding the PDE in the loss function and minimizing its residuals using automatic differentiation at so-called collocation points. Originally uniformly sampled, the choice of the latter has been the subject of recent advances leading to adaptive sampling refinements for PINNs. In this paper, leveraging a new quadrature method for approximating definite integrals, we introduce a provably accurate sampling method for collocation points based on the Hessian of the PDE residuals. Comparative experiments conducted on a set of 1D and 2D PDEs demonstrate the benefits of our method.

new Benchmarking Federated Machine Unlearning methods for Tabular Data

Authors: Chenguang Xiao, Abhirup Ghosh, Han Wu, Shuo Wang, Diederick van Thiel

Abstract: Machine unlearning, which enables a model to forget specific data upon request, is increasingly relevant in the era of privacy-centric machine learning, particularly within federated learning (FL) environments. This paper presents a pioneering study on benchmarking machine unlearning methods within a federated setting for tabular data, addressing the unique challenges posed by cross-silo FL where data privacy and communication efficiency are paramount. We explore unlearning at the feature and instance levels, employing both machine learning, random forest and logistic regression models. Our methodology benchmarks various unlearning algorithms, including fine-tuning and gradient-based approaches, across multiple datasets, with metrics focused on fidelity, certifiability, and computational efficiency. Experiments demonstrate that while fidelity remains high across methods, tree-based models excel in certifiability, ensuring exact unlearning, whereas gradient-based methods show improved computational efficiency. This study provides critical insights into the design and selection of unlearning algorithms tailored to the FL environment, offering a foundation for further research in privacy-preserving machine learning.

new CFIRE: A General Method for Combining Local Explanations

Authors: Sebastian M\"uller, Vanessa Toborek, Tam\'as Horv\'ath, Christian Bauckhage

Abstract: We propose a novel eXplainable AI algorithm to compute faithful, easy-to-understand, and complete global decision rules from local explanations for tabular data by combining XAI methods with closed frequent itemset mining. Our method can be used with any local explainer that indicates which dimensions are important for a given sample for a given black-box decision. This property allows our algorithm to choose among different local explainers, addressing the disagreement problem, \ie the observation that no single explanation method consistently outperforms others across models and datasets. Unlike usual experimental methodology, our evaluation also accounts for the Rashomon effect in model explainability. To this end, we demonstrate the robustness of our approach in finding suitable rules for nearly all of the 700 black-box models we considered across 14 benchmark datasets. The results also show that our method exhibits improved runtime, high precision and F1-score while generating compact and complete rules.

new Personalized Federated Training of Diffusion Models with Privacy Guarantees

Authors: Kumar Kshitij Patel, Weitong Zhang, Lingxiao Wang

Abstract: The scarcity of accessible, compliant, and ethically sourced data presents a considerable challenge to the adoption of artificial intelligence (AI) in sensitive fields like healthcare, finance, and biomedical research. Furthermore, access to unrestricted public datasets is increasingly constrained due to rising concerns over privacy, copyright, and competition. Synthetic data has emerged as a promising alternative, and diffusion models -- a cutting-edge generative AI technology -- provide an effective solution for generating high-quality and diverse synthetic data. In this paper, we introduce a novel federated learning framework for training diffusion models on decentralized private datasets. Our framework leverages personalization and the inherent noise in the forward diffusion process to produce high-quality samples while ensuring robust differential privacy guarantees. Our experiments show that our framework outperforms non-collaborative training methods, particularly in settings with high data heterogeneity, and effectively reduces biases and imbalances in synthetic data, resulting in fairer downstream models.

cross MedPix 2.0: A Comprehensive Multimodal Biomedical Data set for Advanced AI Applications with Retrieval Augmented Generation and Knowledge Graphs

Authors: Irene Siragusa, Salvatore Contino, Massimo La Ciura, Rosario Alicata, Roberto Pirrone

Abstract: The increasing interest in developing Artificial Intelligence applications in the medical domain, suffers from the lack of high-quality data set, mainly due to privacy-related issues. In addition, the recent increase in large multimodal models (LMM) leads to the need for multimodal medical data sets, where clinical reports and findings are attached to the corresponding CT or MRI scans. This paper illustrates the entire workflow for building the MedPix 2.0 data set. Starting with the well-known multimodal data set MedPix\textsuperscript{\textregistered}, mainly used by physicians, nurses, and healthcare students for Continuing Medical Education purposes, a semi-automatic pipeline was developed to extract visual and textual data followed by a manual curing procedure in which noisy samples were removed, thus creating a MongoDB database. Along with the data set, we developed a GUI aimed at navigating efficiently the MongoDB instance and obtaining the raw data that can be easily used for training and/or fine-tuning LMMs. To enforce this point, in this work, we first recall DR-Minerva, a RAG-based LMM trained using MedPix 2.0. DR-Minerva predicts the body part and the modality used to scan its input image. We also propose the extension of DR-Minerva with a Knowledge Graph that uses Llama 3.1 Instruct 8B, and leverages MedPix 2.0. The resulting architecture can be queried in a end-to-end manner, as a medical decision support system. MedPix 2.0 is available on GitHub. \url{https://github.com/CHILab1/MedPix-2.0}

URLs: https://github.com/CHILab1/MedPix-2.0

cross Deep Learning-Based Hypoglycemia Classification Across Multiple Prediction Horizons

Authors: Beyza Cinar, Jennifer Daniel Onwuchekwa, Maria Maleshkova

Abstract: Type 1 diabetes (T1D) management can be significantly enhanced through the use of predictive machine learning (ML) algorithms, which can mitigate the risk of adverse events like hypoglycemia. Hypoglycemia, characterized by blood glucose levels below 70 mg/dL, is a life-threatening condition typically caused by excessive insulin administration, missed meals, or physical activity. Its asymptomatic nature impedes timely intervention, making ML models crucial for early detection. This study integrates short- (up to 2h) and long-term (up to 24h) prediction horizons (PHs) within a single classification model to enhance decision support. The predicted times are 5-15 min, 15-30 min, 30 min-1h, 1-2h, 2-4h, 4-8h, 8-12h, and 12-24h before hypoglycemia. In addition, a simplified model classifying up to 4h before hypoglycemia is compared. We trained ResNet and LSTM models on glucose levels, insulin doses, and acceleration data. The results demonstrate the superiority of the LSTM models when classifying nine classes. In particular, subject-specific models yielded better performance but achieved high recall only for classes 0, 1, and 2 with 98%, 72%, and 50%, respectively. A population-based six-class model improved the results with at least 60% of events detected. In contrast, longer PHs remain challenging with the current approach and may be considered with different models.

cross Four Things People Should Know About Migraines

Authors: Mohammad S. Parsa, Lukasz Golab

Abstract: Migraine literacy among the public is known to be low, and this lack of understanding has a negative impact on migraineurs' quality of life. To understand this impact, we use text mining methods to study migraine discussion on the Reddit social media platform. We summarize the findings in the form of "four things people should know about chronic migraines": it is a serious disease that affects people of all ages, it can be triggered by many different factors, it affects women more than men, and it can get worse in combination with the COVID-19 virus.

cross I'm Sorry Dave: How the old world of personnel security can inform the new world of AI insider risk

Authors: Paul Martin, Sarah Mercer

Abstract: Organisations are rapidly adopting artificial intelligence (AI) tools to perform tasks previously undertaken by people. The potential benefits are enormous. Separately, some organisations deploy personnel security measures to mitigate the security risks arising from trusted human insiders. Unfortunately, there is no meaningful interplay between the rapidly evolving domain of AI and the traditional world of personnel security. This is a problem. The complex risks from human insiders are hard enough to understand and manage, despite many decades of effort. The emerging security risks from AI insiders are even more opaque. Both sides need all the help they can get. Some of the concepts and approaches that have proved useful in dealing with human insiders are also applicable to the emerging risks from AI insiders. Furthermore, AI can be used defensively to protect against both human and AI insiders.

cross Enhance Vision-based Tactile Sensors via Dynamic Illumination and Image Fusion

Authors: Artemii Redkin, Zdravko Dugonjic, Mike Lambeta, Roberto Calandra

Abstract: Vision-based tactile sensors use structured light to measure deformation in their elastomeric interface. Until now, vision-based tactile sensors such as DIGIT and GelSight have been using a single, static pattern of structured light tuned to the specific form factor of the sensor. In this work, we investigate the effectiveness of dynamic illumination patterns, in conjunction with image fusion techniques, to improve the quality of sensing of vision-based tactile sensors. Specifically, we propose to capture multiple measurements, each with a different illumination pattern, and then fuse them together to obtain a single, higher-quality measurement. Experimental results demonstrate that this type of dynamic illumination yields significant improvements in image contrast, sharpness, and background difference. This discovery opens the possibility of retroactively improving the sensing quality of existing vision-based tactile sensors with a simple software update, and for new hardware designs capable of fully exploiting dynamic illumination.

cross SandboxEval: Towards Securing Test Environment for Untrusted Code

Authors: Rafiqul Rabin, Jesse Hostetler, Sean McGregor, Brett Weir, Nick Judd

Abstract: While large language models (LLMs) are powerful assistants in programming tasks, they may also produce malicious code. Testing LLM-generated code therefore poses significant risks to assessment infrastructure tasked with executing untrusted code. To address these risks, this work focuses on evaluating the security and confidentiality properties of test environments, reducing the risk that LLM-generated code may compromise the assessment infrastructure. We introduce SandboxEval, a test suite featuring manually crafted test cases that simulate real-world safety scenarios for LLM assessment environments in the context of untrusted code execution. The suite evaluates vulnerabilities to sensitive information exposure, filesystem manipulation, external communication, and other potentially dangerous operations in the course of assessment activity. We demonstrate the utility of SandboxEval by deploying it on an open-source implementation of Dyff, an established AI assessment framework used to evaluate the safety of LLMs at scale. We show, first, that the test suite accurately describes limitations placed on an LLM operating under instructions to generate malicious code. Second, we show that the test results provide valuable insights for developers seeking to harden assessment infrastructure and identify risks associated with LLM execution activities.

cross Celler:A Genomic Language Model for Long-Tailed Single-Cell Annotation

Authors: Huan Zhao, Yiming Liu, Jina Yao, Ling Xiong, Zexin Zhou, Zixing Zhang

Abstract: Recent breakthroughs in single-cell technology have ushered in unparalleled opportunities to decode the molecular intricacy of intricate biological systems, especially those linked to diseases unique to humans. However, these progressions have also ushered in novel obstacles-specifically, the efficient annotation of extensive, long-tailed single-cell data pertaining to disease conditions. To effectively surmount this challenge, we introduce Celler, a state-of-the-art generative pre-training model crafted specifically for the annotation of single-cell data. Celler incorporates two groundbreaking elements: First, we introduced the Gaussian Inflation (GInf) Loss function. By dynamically adjusting sample weights, GInf Loss significantly enhances the model's ability to learn from rare categories while reducing the risk of overfitting for common categories. Secondly, we introduce an innovative Hard Data Mining (HDM) strategy into the training process, specifically targeting the challenging-to-learn minority data samples, which significantly improved the model's predictive accuracy. Additionally, to further advance research in this field, we have constructed a large-scale single-cell dataset: Celler-75, which encompasses 40 million cells distributed across 80 human tissues and 75 specific diseases. This dataset provides critical support for comprehensively exploring the potential of single-cell technology in disease research. Our code is available at https://github.com/AI4science-ym/HiCeller.

URLs: https://github.com/AI4science-ym/HiCeller.

cross A multi-locus predictiveness curve and its summary assessment for genetic risk prediction

Authors: Changshuai Wei, Ming Li, Yalu Wen, Chengyin Ye, Qing Lu

Abstract: With the advance of high-throughput genotyping and sequencing technologies, it becomes feasible to comprehensive evaluate the role of massive genetic predictors in disease prediction. There exists, therefore, a critical need for developing appropriate statistical measurements to access the combined effects of these genetic variants in disease prediction. Predictiveness curve is commonly used as a graphical tool to measure the predictive ability of a risk prediction model on a single continuous biomarker. Yet, for most complex diseases, risk prediciton models are formed on multiple genetic variants. We therefore propose a multi-marker predictiveness curve and provide a non-parametric method to construct the curve for case-control studies. We further introduce a global predictiveness U and a partial predictiveness U to summarize prediction curve across the whole population and sub-population of clinical interest, respectively. We also demonstrate the connections of predictiveness curve with ROC curve and Lorenz curve. Through simulation, we compared the performance of the predictiveness U to other three summary indices: R square, Total Gain, and Average Entropy, and showed that Predictiveness U outperformed the other three indexes in terms of unbiasedness and robustness. Moreover, we simulated a series of rare-variants disease model, found partial predictiveness U performed better than global predictiveness U. Finally, we conducted a real data analysis, using predictiveness curve and predictiveness U to evaluate a risk prediction model for Nicotine Dependence.

cross Token-Driven GammaTune: Adaptive Calibration for Enchanced Speculative Decoding

Authors: Aayush Gautam, Susav Shrestha, Narasimha Annapareddy

Abstract: Speculative decoding accelerates large language model (LLM) inference by using a smaller draft model to propose tokens, which are then verified by a larger target model. However, selecting an optimal speculation length is critical for maximizing speedup while minimizing wasted computation. We introduce \textit{GammaTune} and \textit{GammaTune+}, training-free adaptive algorithms that dynamically adjust speculation length based on token acceptance rates using a heuristic-based switching mechanism. Evaluated on SpecBench across multiple tasks and model pairs, our method outperforms other heuristic-based approaches and fixed-length speculative decoding, achieving an average speedup of 15\% ($\pm$5\%) with \textit{GammaTune} and 16\% ($\pm$3\%) with \textit{GammaTune+}, while reducing performance variance. This makes \textit{GammaTune} a robust and efficient solution for real-world deployment.

cross Quantum Generative Models for Image Generation: Insights from MNIST and MedMNIST

Authors: Chi-Sheng Chen, Wei An Hou, Siang-Wei Hu, Zhen-Sheng Cai

Abstract: Research on quantum generative models is currently in its early exploratory stages, with very few established methodologies. In this paper, we propose a novel hybrid quantum generative model based on variational quantum circuits for image generation tasks, introducing innovative noise techniques specifically tailored for quantum computation. Our approach utilizes two distinctive noise strategies: quantum-generated noise inherent to quantum circuits, and a newly developed noise scheduling method, applying different noise levels strategically across time steps during the training process. Experiments conducted on MNIST and MedMNIST datasets demonstrate that our hybrid quantum model, combined with these specialized noise techniques, achieves promising results, suggesting improved generative performance compared to baseline quantum generative approaches. This exploratory work lays a critical foundation and opens new avenues for advancing quantum generative modeling research.

cross Improving Diseases Predictions Utilizing External Bio-Banks

Authors: Hido Pinto, Eran Segal

Abstract: Machine learning has been successfully used in critical domains, such as medicine. However, extracting meaningful insights from biomedical data is often constrained by the lack of their available disease labels. In this research, we demonstrate how machine learning can be leveraged to enhance explainability and uncover biologically meaningful associations, even when predictive improvements in disease modeling are limited. We train LightGBM models from scratch on our dataset (10K) to impute metabolomics features and apply them to the UK Biobank (UKBB) for downstream analysis. The imputed metabolomics features are then used in survival analysis to assess their impact on disease-related risk factors. As a result, our approach successfully identified biologically relevant connections that were not previously known to the predictive models. Additionally, we applied a genome-wide association study (GWAS) on key metabolomics features, revealing a link between vascular dementia and smoking. Although being a well-established epidemiological relationship, this link was not embedded in the model's training data, which validated the method's ability to extract meaningful signals. Furthermore, by integrating survival models as inputs in the 10K data, we uncovered associations between metabolic substances and obesity, demonstrating the ability to infer disease risk for future patients without requiring direct outcome labels. These findings highlight the potential of leveraging external bio-banks to extract valuable biomedical insights, even in data-limited scenarios. Our results demonstrate that machine learning models trained on smaller datasets can still be used to uncover real biological associations when carefully integrated with survival analysis and genetic studies.

cross Imbalanced malware classification: an approach based on dynamic classifier selection

Authors: J. V. S. Souza, C. B. Vieira, G. D. C. Cunha, R. M. O. Cruz

Abstract: In recent years, the rise of cyber threats has emphasized the need for robust malware detection systems, especially on mobile devices. Malware, which targets vulnerabilities in devices and user data, represents a substantial security risk. A significant challenge in malware detection is the imbalance in datasets, where most applications are benign, with only a small fraction posing a threat. This study addresses the often-overlooked issue of class imbalance in malware detection by evaluating various machine learning strategies for detecting malware in Android applications. We assess monolithic classifiers and ensemble methods, focusing on dynamic selection algorithms, which have shown superior performance compared to traditional approaches. In contrast to balancing strategies performed on the whole dataset, we propose a balancing procedure that works individually for each classifier in the pool. Our empirical analysis demonstrates that the KNOP algorithm obtained the best results using a pool of Random Forest. Additionally, an instance hardness assessment revealed that balancing reduces the difficulty of the minority class and enhances the detection of the minority class (malware). The code used for the experiments is available at https://github.com/jvss2/Machine-Learning-Empirical-Evaluation.

URLs: https://github.com/jvss2/Machine-Learning-Empirical-Evaluation.

cross GAL-MAD: Towards Explainable Anomaly Detection in Microservice Applications Using Graph Attention Networks

Authors: Lahiru Akmeemana, Chamodya Attanayake, Husni Faiz, Sandareka Wickramanayake

Abstract: The transition to microservices has revolutionized software architectures, offering enhanced scalability and modularity. However, the distributed and dynamic nature of microservices introduces complexities in ensuring system reliability, making anomaly detection crucial for maintaining performance and functionality. Anomalies stemming from network and performance issues must be swiftly identified and addressed. Existing anomaly detection techniques often rely on statistical models or machine learning methods that struggle with the high-dimensional, interdependent data inherent in microservice applications. Current techniques and available datasets predominantly focus on system traces and logs, limiting their ability to support advanced detection models. This paper addresses these gaps by introducing the RS-Anomic dataset generated using the open-source RobotShop microservice application. The dataset captures multivariate performance metrics and response times under normal and anomalous conditions, encompassing ten types of anomalies. We propose a novel anomaly detection model called Graph Attention and LSTM-based Microservice Anomaly Detection (GAL-MAD), leveraging Graph Attention and Long Short-Term Memory architectures to capture spatial and temporal dependencies in microservices. We utilize SHAP values to localize anomalous services and identify root causes to enhance explainability. Experimental results demonstrate that GAL-MAD outperforms state-of-the-art models on the RS-Anomic dataset, achieving higher accuracy and recall across varying anomaly rates. The explanations provide actionable insights into service anomalies, which benefits system administrators.

cross Improving Predictions of Convective Storm Wind Gusts through Statistical Post-Processing of Neural Weather Models

Authors: Antoine Leclerc, Erwan Koch, Monika Feldmann, Daniele Nerini, Tom Beucler

Abstract: Issuing timely severe weather warnings helps mitigate potentially disastrous consequences. Recent advancements in Neural Weather Models (NWMs) offer a computationally inexpensive and fast approach for forecasting atmospheric environments on a 0.25{\deg} global grid. For thunderstorms, these environments can be empirically post-processed to predict wind gust distributions at specific locations. With the Pangu-Weather NWM, we apply a hierarchy of statistical and deep learning post-processing methods to forecast hourly wind gusts up to three days ahead. To ensure statistical robustness, we constrain our probabilistic forecasts using generalised extreme-value distributions across five regions in Switzerland. Using a convolutional neural network to post-process the predicted atmospheric environment's spatial patterns yields the best results, outperforming direct forecasting approaches across lead times and wind gust speeds. Our results confirm the added value of NWMs for extreme wind forecasting, especially for designing more responsive early-warning systems.

cross Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B

Authors: Aleksandra Bakalova, Yana Veitsman, Xinting Huang, Michael Hahn

Abstract: In-Context Learning (ICL) is an intriguing ability of large language models (LLMs). Despite a substantial amount of work on its behavioral aspects and how it emerges in miniature setups, it remains unclear which mechanism assembles task information from the individual examples in a fewshot prompt. We use causal interventions to identify information flow in Gemma-2 2B for five naturalistic ICL tasks. We find that the model infers task information using a two-step strategy we call contextualize-then-aggregate: In the lower layers, the model builds up representations of individual fewshot examples, which are contextualized by preceding examples through connections between fewshot input and output tokens across the sequence. In the higher layers, these representations are aggregated to identify the task and prepare prediction of the next output. The importance of the contextualization step differs between tasks, and it may become more important in the presence of ambiguous examples. Overall, by providing rigorous causal analysis, our results shed light on the mechanisms through which ICL happens in language models.

cross Data-driven Power Loss Identification through Physics-Based Thermal Model Backpropagation

Authors: Mattia Scarpa, Francesco Pase, Ruggero Carli, Mattia Bruschetta, Franscesco Toso

Abstract: Digital twins for power electronics require accurate power losses whose direct measurements are often impractical or impossible in real-world applications. This paper presents a novel hybrid framework that combines physics-based thermal modeling with data-driven techniques to identify and correct power losses accurately using only temperature measurements. Our approach leverages a cascaded architecture where a neural network learns to correct the outputs of a nominal power loss model by backpropagating through a reduced-order thermal model. We explore two neural architectures, a bootstrapped feedforward network, and a recurrent neural network, demonstrating that the bootstrapped feedforward approach achieves superior performance while maintaining computational efficiency for real-time applications. Between the interconnection, we included normalization strategies and physics-guided training loss functions to preserve stability and ensure physical consistency. Experimental results show that our hybrid model reduces both temperature estimation errors (from 7.2+-6.8{\deg}C to 0.3+-0.3{\deg}C) and power loss prediction errors (from 5.4+-6.6W to 0.2+-0.3W) compared to traditional physics-based approaches, even in the presence of thermal model uncertainties. This methodology allows us to accurately estimate power losses without direct measurements, making it particularly helpful for real-time industrial applications where sensor placement is hindered by cost and physical limitations.

cross Towards Precise Action Spotting: Addressing Temporal Misalignment in Labels with Dynamic Label Assignment

Authors: Masato Tamura

Abstract: Precise action spotting has attracted considerable attention due to its promising applications. While existing methods achieve substantial performance by employing well-designed model architecture, they overlook a significant challenge: the temporal misalignment inherent in ground-truth labels. This misalignment arises when frames labeled as containing events do not align accurately with the actual event times, often as a result of human annotation errors or the inherent difficulties in precisely identifying event boundaries across neighboring frames. To tackle this issue, we propose a novel dynamic label assignment strategy that allows predictions to have temporal offsets from ground-truth action times during training, ensuring consistent event spotting. Our method extends the concept of minimum-cost matching, which is utilized in the spatial domain for object detection, to the temporal domain. By calculating matching costs based on predicted action class scores and temporal offsets, our method dynamically assigns labels to the most likely predictions, even when the predicted times of these predictions deviate from ground-truth times, alleviating the negative effects of temporal misalignment in labels. We conduct extensive experiments and demonstrate that our method achieves state-of-the-art performance, particularly in conditions where events are visually distinct and temporal misalignment in labels is common.

cross Nuclear Microreactor Control with Deep Reinforcement Learning

Authors: Leo Tunkle, Kamal Abdulraheem, Linyu Lin, Majdi I. Radaideh

Abstract: The economic feasibility of nuclear microreactors will depend on minimizing operating costs through advancements in autonomous control, especially when these microreactors are operating alongside other types of energy systems (e.g., renewable energy). This study explores the application of deep reinforcement learning (RL) for real-time drum control in microreactors, exploring performance in regard to load-following scenarios. By leveraging a point kinetics model with thermal and xenon feedback, we first establish a baseline using a single-output RL agent, then compare it against a traditional proportional-integral-derivative (PID) controller. This study demonstrates that RL controllers, including both single- and multi-agent RL (MARL) frameworks, can achieve similar or even superior load-following performance as traditional PID control across a range of load-following scenarios. In short transients, the RL agent was able to reduce the tracking error rate in comparison to PID. Over extended 300-minute load-following scenarios in which xenon feedback becomes a dominant factor, PID maintained better accuracy, but RL still remained within a 1% error margin despite being trained only on short-duration scenarios. This highlights RL's strong ability to generalize and extrapolate to longer, more complex transients, affording substantial reductions in training costs and reduced overfitting. Furthermore, when control was extended to multiple drums, MARL enabled independent drum control as well as maintained reactor symmetry constraints without sacrificing performance -- an objective that standard single-agent RL could not learn. We also found that, as increasing levels of Gaussian noise were added to the power measurements, the RL controllers were able to maintain lower error rates than PID, and to do so with less control effort.

cross Backdoor Detection through Replicated Execution of Outsourced Training

Authors: Hengrui Jia, Sierra Wyllie, Akram Bin Sediq, Ahmed Ibrahim, Nicolas Papernot

Abstract: It is common practice to outsource the training of machine learning models to cloud providers. Clients who do so gain from the cloud's economies of scale, but implicitly assume trust: the server should not deviate from the client's training procedure. A malicious server may, for instance, seek to insert backdoors in the model. Detecting a backdoored model without prior knowledge of both the backdoor attack and its accompanying trigger remains a challenging problem. In this paper, we show that a client with access to multiple cloud providers can replicate a subset of training steps across multiple servers to detect deviation from the training procedure in a similar manner to differential testing. Assuming some cloud-provided servers are benign, we identify malicious servers by the substantial difference between model updates required for backdooring and those resulting from clean training. Perhaps the strongest advantage of our approach is its suitability to clients that have limited-to-no local compute capability to perform training; we leverage the existence of multiple cloud providers to identify malicious updates without expensive human labeling or heavy computation. We demonstrate the capabilities of our approach on an outsourced supervised learning task where $50\%$ of the cloud providers insert their own backdoor; our approach is able to correctly identify $99.6\%$ of them. In essence, our approach is successful because it replaces the signature-based paradigm taken by existing approaches with an anomaly-based detection paradigm. Furthermore, our approach is robust to several attacks from adaptive adversaries utilizing knowledge of our detection scheme.

cross Self-Evolving Visual Concept Library using Vision-Language Critics

Authors: Atharva Sehgal, Patrick Yuan, Ziniu Hu, Yisong Yue, Jennifer J. Sun, Swarat Chaudhuri

Abstract: We study the problem of building a visual concept library for visual recognition. Building effective visual concept libraries is challenging, as manual definition is labor-intensive, while relying solely on LLMs for concept generation can result in concepts that lack discriminative power or fail to account for the complex interactions between them. Our approach, ESCHER, takes a library learning perspective to iteratively discover and improve visual concepts. ESCHER uses a vision-language model (VLM) as a critic to iteratively refine the concept library, including accounting for interactions between concepts and how they affect downstream classifiers. By leveraging the in-context learning abilities of LLMs and the history of performance using various concepts, ESCHER dynamically improves its concept generation strategy based on the VLM critic's feedback. Finally, ESCHER does not require any human annotations, and is thus an automated plug-and-play framework. We empirically demonstrate the ability of ESCHER to learn a concept library for zero-shot, few-shot, and fine-tuning visual classification tasks. This work represents, to our knowledge, the first application of concept library learning to real-world visual tasks.

cross Insight-RAG: Enhancing LLMs with Insight-Driven Augmentation

Authors: Pouya Pezeshkpour, Estevam Hruschka

Abstract: Retrieval Augmented Generation (RAG) frameworks have shown significant promise in leveraging external knowledge to enhance the performance of large language models (LLMs). However, conventional RAG methods often retrieve documents based solely on surface-level relevance, leading to many issues: they may overlook deeply buried information within individual documents, miss relevant insights spanning multiple sources, and are not well-suited for tasks beyond traditional question answering. In this paper, we propose Insight-RAG, a novel framework designed to address these issues. In the initial stage of Insight-RAG, instead of using traditional retrieval methods, we employ an LLM to analyze the input query and task, extracting the underlying informational requirements. In the subsequent stage, a specialized LLM -- trained on the document database -- is queried to mine content that directly addresses these identified insights. Finally, by integrating the original query with the retrieved insights, similar to conventional RAG approaches, we employ a final LLM to generate a contextually enriched and accurate response. Using two scientific paper datasets, we created evaluation benchmarks targeting each of the mentioned issues and assessed Insight-RAG against traditional RAG pipeline. Our results demonstrate that the Insight-RAG pipeline successfully addresses these challenges, outperforming existing methods by a significant margin in most cases. These findings suggest that integrating insight-driven retrieval within the RAG framework not only enhances performance but also broadens the applicability of RAG to tasks beyond conventional question answering.

cross Detecting Glioma, Meningioma, and Pituitary Tumors, and Normal Brain Tissues based on Yolov11 and Yolov8 Deep Learning Models

Authors: Ahmed M. Taha, Salah A. Aly, Mohamed F. Darwish

Abstract: Accurate and quick diagnosis of normal brain tissue Glioma, Meningioma, and Pituitary Tumors is crucial for optimal treatment planning and improved medical results. Magnetic Resonance Imaging (MRI) is widely used as a non-invasive diagnostic tool for detecting brain abnormalities, including tumors. However, manual interpretation of MRI scans is often time-consuming, prone to human error, and dependent on highly specialized expertise. This paper proposes an advanced AI-driven technique to detecting glioma, meningioma, and pituitary brain tumors using YoloV11 and YoloV8 deep learning models. Methods: Using a transfer learning-based fine-tuning approach, we integrate cutting-edge deep learning techniques with medical imaging to classify brain tumors into four categories: No-Tumor, Glioma, Meningioma, and Pituitary Tumors. Results: The study utilizes the publicly accessible CE-MRI Figshare dataset and involves fine-tuning pre-trained models YoloV8 and YoloV11 of 99.49% and 99.56% accuracies; and customized CNN accuracy of 96.98%. The results validate the potential of CNNs in achieving high precision in brain tumor detection and classification, highlighting their transformative role in medical imaging and diagnostics.

cross $\textit{Agents Under Siege}$: Breaking Pragmatic Multi-Agent LLM Systems with Optimized Prompt Attacks

Authors: Rana Muhammad Shahroz Khan, Zhen Tan, Sukwon Yun, Charles Flemming, Tianlong Chen

Abstract: Most discussions about Large Language Model (LLM) safety have focused on single-agent settings but multi-agent LLM systems now create novel adversarial risks because their behavior depends on communication between agents and decentralized reasoning. In this work, we innovatively focus on attacking pragmatic systems that have constrains such as limited token bandwidth, latency between message delivery, and defense mechanisms. We design a $\textit{permutation-invariant adversarial attack}$ that optimizes prompt distribution across latency and bandwidth-constraint network topologies to bypass distributed safety mechanisms within the system. Formulating the attack path as a problem of $\textit{maximum-flow minimum-cost}$, coupled with the novel $\textit{Permutation-Invariant Evasion Loss (PIEL)}$, we leverage graph-based optimization to maximize attack success rate while minimizing detection risk. Evaluating across models including $\texttt{Llama}$, $\texttt{Mistral}$, $\texttt{Gemma}$, $\texttt{DeepSeek}$ and other variants on various datasets like $\texttt{JailBreakBench}$ and $\texttt{AdversarialBench}$, our method outperforms conventional attacks by up to $7\times$, exposing critical vulnerabilities in multi-agent systems. Moreover, we demonstrate that existing defenses, including variants of $\texttt{Llama-Guard}$ and $\texttt{PromptGuard}$, fail to prohibit our attack, emphasizing the urgent need for multi-agent specific safety mechanisms.

cross Plane-Wave Decomposition and Randomised Training; a Novel Path to Generalised PINNs for SHM

Authors: Rory Clements, James Ellis, Geoff Hassall, Simon Horsley

Abstract: In this paper, we introduce a formulation of Physics-Informed Neural Networks (PINNs), based on learning the form of the Fourier decomposition, and a training methodology based on a spread of randomly chosen boundary conditions. By training in this way we produce a PINN that generalises; after training it can be used to correctly predict the solution for an arbitrary set of boundary conditions and interpolate this solution between the samples that spanned the training domain. We demonstrate for a toy system of two coupled oscillators that this gives the PINN formulation genuine predictive capability owing to an effective reduction of the training to evaluation times ratio due to this decoupling of the solution from specific boundary conditions.

cross Rack Position Optimization in Large-Scale Heterogeneous Data Centers

Authors: Chang-Lin Chen, Jiayu Chen, Tian Lan, Zhaoxia Zhao, Hongbo Dong, Vaneet Aggarwal

Abstract: As rapidly growing AI computational demands accelerate the need for new hardware installation and maintenance, this work explores optimal data center resource management by balancing operational efficiency with fault tolerance through strategic rack positioning considering diverse resources and locations. Traditional mixed-integer programming (MIP) approaches often struggle with scalability, while heuristic methods may result in significant sub-optimality. To address these issues, this paper presents a novel two-tier optimization framework using a high-level deep reinforcement learning (DRL) model to guide a low-level gradient-based heuristic for local search. The high-level DRL agent employs Leader Reward for optimal rack type ordering, and the low-level heuristic efficiently maps racks to positions, minimizing movement counts and ensuring fault-tolerant resource distribution. This approach allows scalability to over 100,000 positions and 100 rack types. Our method outperformed the gradient-based heuristic by 7\% on average and the MIP solver by over 30\% in objective value. It achieved a 100\% success rate versus MIP's 97.5\% (within a 20-minute limit), completing in just 2 minutes compared to MIP's 1630 minutes (i.e., almost 4 orders of magnitude improvement). Unlike the MIP solver, which showed performance variability under time constraints and high penalties, our algorithm consistently delivered stable, efficient results - an essential feature for large-scale data center management.

cross Aligning Diffusion Model with Problem Constraints for Trajectory Optimization

Authors: Anjian Li, Ryne Beeson

Abstract: Diffusion models have recently emerged as effective generative frameworks for trajectory optimization, capable of producing high-quality and diverse solutions. However, training these models in a purely data-driven manner without explicit incorporation of constraint information often leads to violations of critical constraints, such as goal-reaching, collision avoidance, and adherence to system dynamics. To address this limitation, we propose a novel approach that aligns diffusion models explicitly with problem-specific constraints, drawing insights from the Dynamic Data-driven Application Systems (DDDAS) framework. Our approach introduces a hybrid loss function that explicitly measures and penalizes constraint violations during training. Furthermore, by statistically analyzing how constraint violations evolve throughout the diffusion steps, we develop a re-weighting strategy that aligns predicted violations to ground truth statistics at each diffusion step. Evaluated on a tabletop manipulation and a two-car reach-avoid problem, our constraint-aligned diffusion model significantly reduces constraint violations compared to traditional diffusion models, while maintaining the quality of trajectory solutions. This approach is well-suited for integration into the DDDAS framework for efficient online trajectory adaptation as new environmental data becomes available.

cross Using machine learning method for variable star classification using the TESS Sectors 1-57 data

Authors: Li-Heng Wang, Kai Li, Xiang Gao, Ya-Ni Guo, Guo-You Sun

Abstract: The Transiting Exoplanet Survey Satellite (TESS) is a wide-field all-sky survey mission designed to detect Earth-sized exoplanets. After over four years photometric surveys, data from sectors 1-57, including approximately 1,050,000 light curves with a 2-minute cadence, were collected. By cross-matching the data with Gaia's variable star catalogue, we obtained labeled datasets for further analysis. Using a random forest classifier, we performed classification of variable stars and designed distinct classification processes for each subclass, 6770 EA, 2971 EW, 980 CEP, 8347 DSCT, 457 RRab, 404 RRc and 12348 ROT were identified. Each variable star was visually inspected to ensure the reliability and accuracy of the compiled catalog. Subsequently, we ultimately obtained 6046 EA, 3859 EW, 2058 CEP, 8434 DSCT, 482 RRab, 416 RRc, and 9694 ROT, and a total of 14092 new variable stars were discovered.

cross CopyQNN: Quantum Neural Network Extraction Attack under Varying Quantum Noise

Authors: Zhenxiao Fu, Leyi Zhao, Xuhong Zhang, Yilun Xu, Gang Huang, Fan Chen

Abstract: Quantum Neural Networks (QNNs) have shown significant value across domains, with well-trained QNNs representing critical intellectual property often deployed via cloud-based QNN-as-a-Service (QNNaaS) platforms. Recent work has examined QNN model extraction attacks using classical and emerging quantum strategies. These attacks involve adversaries querying QNNaaS platforms to obtain labeled data for training local substitute QNNs that replicate the functionality of cloud-based models. However, existing approaches have largely overlooked the impact of varying quantum noise inherent in noisy intermediate-scale quantum (NISQ) computers, limiting their effectiveness in real-world settings. To address this limitation, we propose the CopyQNN framework, which employs a three-step data cleaning method to eliminate noisy data based on its noise sensitivity. This is followed by the integration of contrastive and transfer learning within the quantum domain, enabling efficient training of substitute QNNs using a limited but cleaned set of queried data. Experimental results on NISQ computers demonstrate that a practical implementation of CopyQNN significantly outperforms state-of-the-art QNN extraction attacks, achieving an average performance improvement of 8.73% across all tasks while reducing the number of required queries by 90x, with only a modest increase in hardware overhead.

cross Spatiotemporal Attention Learning Framework for Event-Driven Object Recognition

Authors: Tiantian Xie, Pengpai Wang, Rosa H. M. Chan

Abstract: Event-based vision sensors, inspired by biological neural systems, asynchronously capture local pixel-level intensity changes as a sparse event stream containing position, polarity, and timestamp information. These neuromorphic sensors offer significant advantages in dynamic range, latency, and power efficiency. Their working principle inherently addresses traditional camera limitations such as motion blur and redundant background information, making them particularly suitable for dynamic vision tasks. While recent works have proposed increasingly complex event-based architectures, the computational overhead and parameter complexity of these approaches limit their practical deployment. This paper presents a novel spatiotemporal learning framework for event-based object recognition, utilizing a VGG network enhanced with Convolutional Block Attention Module (CBAM). Our approach achieves comparable performance to state-of-the-art ResNet-based methods while reducing parameter count by 2.3% compared to the original VGG model. Specifically, it outperforms ResNet-based methods like MVF-Net, achieving the highest Top-1 accuracy of 76.4% (pretrained) and 71.3% (not pretrained) on CIFAR10-DVS, and 72.4% (not pretrained) on N-Caltech101. These results highlight the robustness of our method when pretrained weights are not used, making it suitable for scenarios where transfer learning is unavailable. Moreover, our approach reduces reliance on data augmentation. Experimental results on standard event-based datasets demonstrate the framework's efficiency and effectiveness for real-world applications.

cross Using complex prompts to identify fine-grained biases in image generation through ChatGPT-4o

Authors: Marinus Ferreira

Abstract: There are not one but two dimensions of bias that can be revealed through the study of large AI models: not only bias in training data or the products of an AI, but also bias in society, such as disparity in employment or health outcomes between different demographic groups. Often training data and AI output is biased for or against certain demographics (i.e. older white people are overrepresented in image datasets), but sometimes large AI models accurately illustrate biases in the real world (i.e. young black men being disproportionately viewed as threatening). These social disparities often appear in image generation AI outputs in the form of 'marked' features, where some feature of an individual or setting is a social marker of disparity, and prompts both humans and AI systems to treat subjects that are marked in this way as exceptional and requiring special treatment. Generative AI has proven to be very sensitive to such marked features, to the extent of over-emphasising them and thus often exacerbating social biases. I briefly discuss how we can use complex prompts to image generation AI to investigate either dimension of bias, emphasising how we can probe the large language models underlying image generation AI through, for example, automated sentiment analysis of the text prompts used to generate images.

cross Flow Matching on Lie Groups

Authors: Finn M. Sherry, Bart M. N. Smets

Abstract: Flow Matching (FM) is a recent generative modelling technique: we aim to learn how to sample from distribution $\mathfrak{X}_1$ by flowing samples from some distribution $\mathfrak{X}_0$ that is easy to sample from. The key trick is that this flow field can be trained while conditioning on the end point in $\mathfrak{X}_1$: given an end point, simply move along a straight line segment to the end point (Lipman et al. 2022). However, straight line segments are only well-defined on Euclidean space. Consequently, Chen and Lipman (2023) generalised the method to FM on Riemannian manifolds, replacing line segments with geodesics or their spectral approximations. We take an alternative point of view: we generalise to FM on Lie groups by instead substituting exponential curves for line segments. This leads to a simple, intrinsic, and fast implementation for many matrix Lie groups, since the required Lie group operations (products, inverses, exponentials, logarithms) are simply given by the corresponding matrix operations. FM on Lie groups could then be used for generative modelling with data consisting of sets of features (in $\mathbb{R}^n$) and poses (in some Lie group), e.g. the latent codes of Equivariant Neural Fields (Wessels et al. 2025).

cross Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?

Authors: Kai Yan, Yufei Xu, Zhengyin Du, Xuesong Yao, Zheyu Wang, Xiaowen Guo, Jiecao Chen

Abstract: The rapid escalation from elementary school-level to frontier problems of the difficulty for LLM benchmarks in recent years have weaved a miracle for researchers that we are only inches away from surpassing human intelligence. However, is the LLMs' remarkable reasoning ability indeed comes from true intelligence by human standards, or are they simply reciting solutions witnessed during training at an Internet level? To study this problem, we propose RoR-Bench, a novel, multi-modal benchmark for detecting LLM's recitation behavior when asked simple reasoning problems but with conditions subtly shifted, and conduct empirical analysis on our benchmark. Surprisingly, we found existing cutting-edge LLMs unanimously exhibits extremely severe recitation behavior; by changing one phrase in the condition, top models such as OpenAI-o1 and DeepSeek-R1 can suffer $60\%$ performance loss on elementary school-level arithmetic and reasoning problems. Such findings are a wake-up call to the LLM community that compels us to re-evaluate the true intelligence level of cutting-edge LLMs.

cross SCRec: A Scalable Computational Storage System with Statistical Sharding and Tensor-train Decomposition for Recommendation Models

Authors: Jinho Yang, Ji-Hoon Kim, Joo-Young Kim

Abstract: Deep Learning Recommendation Models (DLRMs) play a crucial role in delivering personalized content across web applications such as social networking and video streaming. However, with improvements in performance, the parameter size of DLRMs has grown to terabyte (TB) scales, accompanied by memory bandwidth demands exceeding TB/s levels. Furthermore, the workload intensity within the model varies based on the target mechanism, making it difficult to build an optimized recommendation system. In this paper, we propose SCRec, a scalable computational storage recommendation system that can handle TB-scale industrial DLRMs while guaranteeing high bandwidth requirements. SCRec utilizes a software framework that features a mixed-integer programming (MIP)-based cost model, efficiently fetching data based on data access patterns and adaptively configuring memory-centric and compute-centric cores. Additionally, SCRec integrates hardware acceleration cores to enhance DLRM computations, particularly allowing for the high-performance reconstruction of approximated embedding vectors from extremely compressed tensor-train (TT) format. By combining its software framework and hardware accelerators, while eliminating data communication overhead by being implemented on a single server, SCRec achieves substantial improvements in DLRM inference performance. It delivers up to 55.77$\times$ speedup compared to a CPU-DRAM system with no loss in accuracy and up to 13.35$\times$ energy efficiency gains over a multi-GPU system.

cross MARIOH: Multiplicity-Aware Hypergraph Reconstruction

Authors: Kyuhan Lee, Geon Lee, Kijung Shin

Abstract: Hypergraphs offer a powerful framework for modeling higher-order interactions that traditional pairwise graphs cannot fully capture. However, practical constraints often lead to their simplification into projected graphs, resulting in substantial information loss and ambiguity in representing higher-order relationships. In this work, we propose MARIOH, a supervised approach for reconstructing the original hypergraph from its projected graph by leveraging edge multiplicity. To overcome the difficulties posed by the large search space, MARIOH integrates several key ideas: (a) identifying provable size-2 hyperedges, which reduces the candidate search space, (b) predicting the likelihood of candidates being hyperedges by utilizing both structural and multiplicity-related features, and (c) not only targeting promising hyperedge candidates but also examining less confident ones to explore alternative possibilities. Together, these ideas enable MARIOH to efficiently and effectively explore the search space. In our experiments using 10 real-world datasets, MARIOH achieves up to 74.51% higher reconstruction accuracy compared to state-of-the-art methods.

cross Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features

Authors: Jewon Lee, Ki-Ung Song, Seungmin Yang, Donguk Lim, Jaeyeon Kim, Wooksu Shin, Bo-Kyeong Kim, Yong Jae Lee, Tae-Ho Kim

Abstract: Visual token reduction lowers inference costs caused by extensive image features in large vision-language models (LVLMs). Unlike relevant studies that prune tokens in self-attention-only LVLMs, our work uniquely addresses cross-attention-based models, which achieve superior performance. We identify that the key-value (KV) cache size for image tokens in cross-attention layers significantly exceeds that of text tokens in self-attention layers, posing a major compute bottleneck. To mitigate this issue, we exploit the sparse nature in cross-attention maps to selectively prune redundant visual features. Our Trimmed Llama effectively reduces KV cache demands without requiring additional training. By benefiting from 50%-reduced visual features, our model can reduce inference latency and memory usage while achieving benchmark parity.

cross Efficient Annotator Reliablity Assessment with EffiARA

Authors: Owen Cook, Jake Vasilakes, Ian Roberts, Xingyi Song

Abstract: Data annotation is an essential component of the machine learning pipeline; it is also a costly and time-consuming process. With the introduction of transformer-based models, annotation at the document level is increasingly popular; however, there is no standard framework for structuring such tasks. The EffiARA annotation framework is, to our knowledge, the first project to support the whole annotation pipeline, from understanding the resources required for an annotation task to compiling the annotated dataset and gaining insights into the reliability of individual annotators as well as the dataset as a whole. The framework's efficacy is supported by two previous studies: one improving classification performance through annotator-reliability-based soft label aggregation and sample weighting, and the other increasing the overall agreement among annotators through removing identifying and replacing an unreliable annotator. This work introduces the EffiARA Python package and its accompanying webtool, which provides an accessible graphical user interface for the system. We open-source the EffiARA Python package at https://github.com/MiniEggz/EffiARA and the webtool is publicly accessible at https://effiara.gate.ac.uk.

URLs: https://github.com/MiniEggz/EffiARA, https://effiara.gate.ac.uk.

cross Near Field Localization via AI-Aided Subspace Methods

Authors: Arad Gast, Luc Le Magoarou, Nir Shlezinger

Abstract: The increasing demands for high-throughput and energy-efficient wireless communications are driving the adoption of extremely large antennas operating at high-frequency bands. In these regimes, multiple users will reside in the radiative near-field, and accurate localization becomes essential. Unlike conventional far-field systems that rely solely on DOA estimation, near-field localization exploits spherical wavefront propagation to recover both DOA and range information. While subspace-based methods, such as MUSIC and its extensions, offer high resolution and interpretability for near-field localization, their performance is significantly impacted by model assumptions, including non-coherent sources, well-calibrated arrays, and a sufficient number of snapshots. To address these limitations, this work proposes AI-aided subspace methods for near-field localization that enhance robustness to real-world challenges. Specifically, we introduce NF-SubspaceNet, a deep learning-augmented 2D MUSIC algorithm that learns a surrogate covariance matrix to improve localization under challenging conditions, and DCD-MUSIC, a cascaded AI-aided approach that decouples angle and range estimation to reduce computational complexity. We further develop a novel model-order-aware training method to accurately estimate the number of sources, that is combined with casting of near field subspace methods as AI models for learning. Extensive simulations demonstrate that the proposed methods outperform classical and existing deep-learning-based localization techniques, providing robust near-field localization even under coherent sources, miscalibrations, and few snapshots.

cross Bi-Grid Reconstruction for Image Anomaly Detection

Authors: Huichuan Huang, Zhiqing Zhong, Guangyu Wei, Yonghao Wan, Wenlong Sun, Aimin Feng

Abstract: In image anomaly detection, significant advancements have been made using un- and self-supervised methods with datasets containing only normal samples. However, these approaches often struggle with fine-grained anomalies. This paper introduces \textbf{GRAD}: Bi-\textbf{G}rid \textbf{R}econstruction for Image \textbf{A}nomaly \textbf{D}etection, which employs two continuous grids to enhance anomaly detection from both normal and abnormal perspectives. In this work: 1) Grids as feature repositories that improve generalization and mitigate the Identical Shortcut (IS) issue; 2) An abnormal feature grid that refines normal feature boundaries, boosting detection of fine-grained defects; 3) The Feature Block Paste (FBP) module, which synthesizes various anomalies at the feature level for quick abnormal grid deployment. GRAD's robust representation capabilities also allow it to handle multiple classes with a single model. Evaluations on datasets like MVTecAD, VisA, and GoodsAD show significant performance improvements in fine-grained anomaly detection. GRAD excels in overall accuracy and in discerning subtle differences, demonstrating its superiority over existing methods.

cross Deep Learning Model Predictive Control for Deep Brain Stimulation in Parkinson's Disease

Authors: Sebastian Steffen, Mark Cannon

Abstract: We present a nonlinear data-driven Model Predictive Control (MPC) algorithm for deep brain stimulation (DBS) for the treatment of Parkinson's disease (PD). Although DBS is typically implemented in open-loop, closed-loop DBS (CLDBS) uses the amplitude of neural oscillations in specific frequency bands (e.g. beta 13-30 Hz) as a feedback signal, resulting in improved treatment outcomes with reduced side effects and slower rates of patient habituation to stimulation. To date, CLDBS has only been implemented in vivo with simple control algorithms, such as proportional or proportional-integral control. Our approach employs a multi-step predictor based on differences of input-convex neural networks to model the future evolution of beta oscillations. The use of a multi-step predictor enhances prediction accuracy over the optimization horizon and simplifies online computation. In tests using a simulated model of beta-band activity response and data from PD patients, we achieve reductions of more than 20% in both tracking error and control activity in comparison with existing CLDBS algorithms. The proposed control strategy provides a generalizable data-driven technique that can be applied to the treatment of PD and other diseases targeted by CLDBS, as well as to other neuromodulation techniques.

cross On Benchmarking Code LLMs for Android Malware Analysis

Authors: Yiling He, Hongyu She, Xingzhi Qian, Xinran Zheng, Zhuo Chen, Zhan Qin, Lorenzo Cavallaro

Abstract: Large Language Models (LLMs) have demonstrated strong capabilities in various code intelligence tasks. However, their effectiveness for Android malware analysis remains underexplored. Decompiled Android code poses unique challenges for analysis, primarily due to its large volume of functions and the frequent absence of meaningful function names. This paper presents Cama, a benchmarking framework designed to systematically evaluate the effectiveness of Code LLMs in Android malware analysis tasks. Cama specifies structured model outputs (comprising function summaries, refined function names, and maliciousness scores) to support key malware analysis tasks, including malicious function identification and malware purpose summarization. Built on these, it integrates three domain-specific evaluation metrics, consistency, fidelity, and semantic relevance, enabling rigorous stability and effectiveness assessment and cross-model comparison. We construct a benchmark dataset consisting of 118 Android malware samples, encompassing over 7.5 million distinct functions, and use Cama to evaluate four popular open-source models. Our experiments provide insights into how Code LLMs interpret decompiled code and quantify the sensitivity to function renaming, highlighting both the potential and current limitations of Code LLMs in malware analysis tasks.

cross Command A: An Enterprise-Ready Large Language Model

Authors: Team Cohere, Aakanksha, Arash Ahmadian, Marwan Ahmed, Jay Alammar, Yazeed Alnumay, Sophia Althammer, Arkady Arkhangorodsky, Viraat Aryabumi, Dennis Aumiller, Rapha\"el Avalos, Zahara Aviv, Sammie Bae, Saurabh Baji, Alexandre Barbet, Max Bartolo, Bj\"orn Bebensee, Neeral Beladia, Walter Beller-Morales, Alexandre B\'erard, Andrew Berneshawi, Anna Bialas, Phil Blunsom, Matt Bobkin, Adi Bongale, Sam Braun, Maxime Brunet, Samuel Cahyawijaya, David Cairuz, Jon Ander Campos, Cassie Cao, Kris Cao, Roman Castagn\'e, Juli\'an Cendrero, Leila Chan Currie, Yash Chandak, Diane Chang, Giannis Chatziveroglou, Hongyu Chen, Claire Cheng, Alexis Chevalier, Justin T. Chiu, Eugene Cho, Eugene Choi, Eujeong Choi, Tim Chung, Volkan Cirik, Ana Cismaru, Pierre Clavier, Henry Conklin, Lucas Crawhall-Stein, Devon Crouse, Andres Felipe Cruz-Salinas, Ben Cyrus, Daniel D'souza, Hugo Dalla-Torre, John Dang, William Darling, Omar Darwiche Domingues, Saurabh Dash, Antoine Debugne, Th\'eo Dehaze, Shaan Desai, Joan Devassy, Rishit Dholakia, Kyle Duffy, Ali Edalati, Ace Eldeib, Abdullah Elkady, Sarah Elsharkawy, Irem Erg\"un, Beyza Ermis, Marzieh Fadaee, Boyu Fan, Lucas Fayoux, Yannis Flet-Berliac, Nick Frosst, Matthias Gall\'e, Wojciech Galuba, Utsav Garg, Matthieu Geist, Mohammad Gheshlaghi Azar, Seraphina Goldfarb-Tarrant, Tomas Goldsack, Aidan Gomez, Victor Machado Gonzaga, Nithya Govindarajan, Manoj Govindassamy, Nathan Grinsztajn, Nikolas Gritsch, Patrick Gu, Shangmin Guo, Kilian Haefeli, Rod Hajjar, Tim Hawes, Jingyi He, Sebastian Hofst\"atter, Sungjin Hong, Sara Hooker, Tom Hosking, Stephanie Howe, Eric Hu, Renjie Huang, Hemant Jain, Ritika Jain, Nick Jakobi, Madeline Jenkins, JJ Jordan, Dhruti Joshi, Jason Jung, Trushant Kalyanpur, Siddhartha Rao Kamalakara, Julia Kedrzycki, Gokce Keskin, Edward Kim, Joon Kim, Wei-Yin Ko, Tom Kocmi, Michael Kozakov, Wojciech Kry\'sci\'nski, Arnav Kumar Jain, Komal Kumar Teru, Sander Land, Michael Lasby, Olivia Lasche, Justin Lee, Patrick Lewis, Jeffrey Li, Jonathan Li, Hangyu Lin, Acyr Locatelli, Kevin Luong, Raymond Ma, Lukas Mach, Marina Machado, Joanne Magbitang, Brenda Malacara Lopez, Aryan Mann, Kelly Marchisio, Olivia Markham, Alexandre Matton, Alex McKinney, Dominic McLoughlin, Jozef Mokry, Adrien Morisot, Autumn Moulder, Harry Moynehan, Maximilian Mozes, Vivek Muppalla, Lidiya Murakhovska, Hemangani Nagarajan, Alekhya Nandula, Hisham Nasir, Shauna Nehra, Josh Netto-Rosen, Daniel Ohashi, James Owers-Bardsley, Jason Ozuzu, Dennis Padilla, Gloria Park, Sam Passaglia, Jeremy Pekmez, Laura Penstone, Aleksandra Piktus, Case Ploeg, Andrew Poulton, Youran Qi, Shubha Raghvendra, Miguel Ramos, Ekagra Ranjan, Pierre Richemond, C\'ecile Robert-Michon, Aur\'elien Rodriguez, Sudip Roy, Laura Ruis, Louise Rust, Anubhav Sachan, Alejandro Salamanca, Kailash Karthik Saravanakumar, Isha Satyakam, Alice Schoenauer Sebag, Priyanka Sen, Sholeh Sepehri, Preethi Seshadri, Ye Shen, Tom Sherborne, Sylvie Chang Shi, Sanal Shivaprasad, Vladyslav Shmyhlo, Anirudh Shrinivason, Inna Shteinbuk, Amir Shukayev, Mathieu Simard, Ella Snyder, Ava Spataru, Victoria Spooner, Trisha Starostina, Florian Strub, Yixuan Su, Jimin Sun, Dwarak Talupuru, Eugene Tarassov, Elena Tommasone, Jennifer Tracey, Billy Trend, Evren Tumer, Ahmet \"Ust\"un, Bharat Venkitesh, David Venuto, Pat Verga, Maxime Voisin, Alex Wang, Donglu Wang, Shijian Wang, Edmond Wen, Naomi White, Jesse Willman, Marysia Winkels, Chen Xia, Jessica Xie, Minjie Xu, Bowen Yang, Tan Yi-Chern, Ivan Zhang, Zhenyu Zhao, Zhoujie Zhao

Abstract: In this report we describe the development of Command A, a powerful large language model purpose-built to excel at real-world enterprise use cases. Command A is an agent-optimised and multilingual-capable model, with support for 23 languages of global business, and a novel hybrid architecture balancing efficiency with top of the range performance. It offers best-in-class Retrieval Augmented Generation (RAG) capabilities with grounding and tool use to automate sophisticated business processes. These abilities are achieved through a decentralised training approach, including self-refinement algorithms and model merging techniques. We also include results for Command R7B which shares capability and architectural similarities to Command A. Weights for both models have been released for research purposes. This technical report details our original training pipeline and presents an extensive evaluation of our models across a suite of enterprise-relevant tasks and public benchmarks, demonstrating excellent performance and efficiency.

cross Energy Weighted Learning Progress Guided Interleaved Multi-Task Learning

Authors: Hanne Say (Graduate School of Science and Engineering, Ozyegin University, Istanbul, Turkey), Suzan Ece Ada (Department of Computer Engineering, Bogazici University, Istanbul, Turkey), Emre Ugur (Department of Computer Engineering, Bogazici University, Istanbul, Turkey), Erhan Oztop (Graduate School of Science and Engineering, Ozyegin University, Istanbul, Turkey, OTRI, SISREC, Osaka University, Osaka, Japan)

Abstract: Humans can continuously acquire new skills and knowledge by exploiting existing ones for improved learning, without forgetting them. Similarly, 'continual learning' in machine learning aims to learn new information while preserving the previously acquired knowledge. Existing research often overlooks the nature of human learning, where tasks are interleaved due to human choice or environmental constraints. So, almost never do humans master one task before switching to the next. To investigate to what extent human-like learning can benefit the learner, we propose a method that interleaves tasks based on their 'learning progress' and energy consumption. From a machine learning perspective, our approach can be seen as a multi-task learning system that balances learning performance with energy constraints while mimicking ecologically realistic human task learning. To assess the validity of our approach, we consider a robot learning setting in simulation, where the robot learns the effect of its actions in different contexts. The conducted experiments show that our proposed method achieves better performance than sequential task learning and reduces energy consumption for learning the tasks.

cross Science Autonomy using Machine Learning for Astrobiology

Authors: Victoria Da Poian, Bethany Theiling, Eric Lyness, David Burtt, Abigail R. Azari, Joey Pasterski, Luoth Chou, Melissa Trainer, Ryan Danell, Desmond Kaplan, Xiang Li, Lily Clough, Brett McKinney, Lukas Mandrake, Bill Diamond, Caroline Freissinet

Abstract: In recent decades, artificial intelligence (AI) including machine learning (ML) have become vital for space missions enabling rapid data processing, advanced pattern recognition, and enhanced insight extraction. These tools are especially valuable in astrobiology applications, where models must distinguish biotic patterns from complex abiotic backgrounds. Advancing the integration of autonomy through AI and ML into space missions is a complex challenge, and we believe that by focusing on key areas, we can make significant progress and offer practical recommendations for tackling these obstacles.

cross Communication-Efficient l_0 Penalized Least Square

Authors: Chenqi Gong, Hu Yang

Abstract: In this paper, we propose a communication-efficient penalized regression algorithm for high-dimensional sparse linear regression models with massive data. This approach incorporates an optimized distributed system communication algorithm, named CESDAR algorithm, based on the Enhanced Support Detection and Root finding algorithm. The CESDAR algorithm leverages data distributed across multiple machines to compute and update the active set and introduces the communication-efficient surrogate likelihood framework to approximate the optimal solution for the full sample on the active set, resulting in the avoidance of raw data transmission, which enhances privacy and data security, while significantly improving algorithm execution speed and substantially reducing communication costs. Notably, this approach achieves the same statistical accuracy as the global estimator. Furthermore, this paper explores an extended version of CESDAR and an adaptive version of CESDAR to enhance algorithmic speed and optimize parameter selection, respectively. Simulations and real data benchmarks experiments demonstrate the efficiency and accuracy of the CESDAR algorithm.

cross $C^2$AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction

Authors: Wenxuan Wu, Xueyuan Chen, Shuai Wang, Jiadong Wang, Lingwei Meng, Xixin Wu, Helen Meng, Haizhou Li

Abstract: Audio-Visual Target Speaker Extraction (AV-TSE) aims to mimic the human ability to enhance auditory perception using visual cues. Although numerous models have been proposed recently, most of them estimate target signals by primarily relying on local dependencies within acoustic features, underutilizing the human-like capacity to infer unclear parts of speech through contextual information. This limitation results in not only suboptimal performance but also inconsistent extraction quality across the utterance, with some segments exhibiting poor quality or inadequate suppression of interfering speakers. To close this gap, we propose a model-agnostic strategy called the Mask-And-Recover (MAR). It integrates both inter- and intra-modality contextual correlations to enable global inference within extraction modules. Additionally, to better target challenging parts within each sample, we introduce a Fine-grained Confidence Score (FCS) model to assess extraction quality and guide extraction modules to emphasize improvement on low-quality segments. To validate the effectiveness of our proposed model-agnostic training paradigm, six popular AV-TSE backbones were adopted for evaluation on the VoxCeleb2 dataset, demonstrating consistent performance improvements across various metrics.

cross CellVTA: Enhancing Vision Foundation Models for Accurate Cell Segmentation and Classification

Authors: Yang Yang, Xijie Xu, Yixun Zhou, Jie Zheng

Abstract: Cell instance segmentation is a fundamental task in digital pathology with broad clinical applications. Recently, vision foundation models, which are predominantly based on Vision Transformers (ViTs), have achieved remarkable success in pathology image analysis. However, their improvements in cell instance segmentation remain limited. A key challenge arises from the tokenization process in ViTs, which substantially reduces the spatial resolution of input images, leading to suboptimal segmentation quality, especially for small and densely packed cells. To address this problem, we propose CellVTA (Cell Vision Transformer with Adapter), a novel method that improves the performance of vision foundation models for cell instance segmentation by incorporating a CNN-based adapter module. This adapter extracts high-resolution spatial information from input images and injects it into the ViT through a cross-attention mechanism. Our method preserves the core architecture of ViT, ensuring seamless integration with pretrained foundation models. Extensive experiments show that CellVTA achieves 0.538 mPQ on the CoNIC dataset and 0.506 mPQ on the PanNuke dataset, which significantly outperforms the state-of-the-art cell segmentation methods. Ablation studies confirm the superiority of our approach over other fine-tuning strategies, including decoder-only fine-tuning and full fine-tuning. Our code and models are publicly available at https://github.com/JieZheng-ShanghaiTech/CellVTA.

URLs: https://github.com/JieZheng-ShanghaiTech/CellVTA.

cross FeatInsight: An Online ML Feature Management System on 4Paradigm Sage-Studio Platform

Authors: Xin Tong, Xuanhe Zhou, Bingsheng He, Guoliang Li, Zirui Tang, Wei Zhou, Fan Wu, Mian Lu, Yuqiang Chen

Abstract: Feature management is essential for many online machine learning applications and can often become the performance bottleneck (e.g., taking up to 70% of the overall latency in sales prediction service). Improper feature configurations (e.g., introducing too many irrelevant features) can severely undermine the model's generalization capabilities. However, managing online ML features is challenging due to (1) large-scale, complex raw data (e.g., the 2018 PHM dataset contains 17 tables and dozens to hundreds of columns), (2) the need for high-performance, consistent computation of interdependent features with complex patterns, and (3) the requirement for rapid updates and deployments to accommodate real-time data changes. In this demo, we present FeatInsight, a system that supports the entire feature lifecycle, including feature design, storage, visualization, computation, verification, and lineage management. FeatInsight (with OpenMLDB as the execution engine) has been deployed in over 100 real-world scenarios on 4Paradigm's Sage Studio platform, handling up to a trillion-dimensional feature space and enabling millisecond-level feature updates. We demonstrate how FeatInsight enhances feature design efficiency (e.g., for online product recommendation) and improve feature computation performance (e.g., for online fraud detection). The code is available at https://github.com/4paradigm/FeatInsight.

URLs: https://github.com/4paradigm/FeatInsight.

cross Spingarn's Method and Progressive Decoupling Beyond Elicitable Monotonicity

Authors: Brecht Evens, Puya Latafat, Panagiotis Patrinos

Abstract: Spingarn's method of partial inverses and the progressive decoupling algorithm address inclusion problems involving the sum of an operator and the normal cone of a linear subspace, known as linkage problems. Despite their success, existing convergence results are limited to the so-called elicitable monotone setting, where nonmonotonicity is allowed only on the orthogonal complement of the linkage subspace. In this paper, we introduce progressive decoupling+, a generalized version of standard progressive decoupling that incorporates separate relaxation parameters for the linkage subspace and its orthogonal complement. We prove convergence under conditions that link the relaxation parameters to the nonmonotonicity of their respective subspaces and show that the special cases of Spingarn's method and standard progressive decoupling also extend beyond the elicitable monotone setting. Our analysis hinges upon an equivalence between progressive decoupling+ and the preconditioned proximal point algorithm, for which we develop a general local convergence analysis in a certain nonmonotone setting.

cross PRISM-0: A Predicate-Rich Scene Graph Generation Framework for Zero-Shot Open-Vocabulary Tasks

Authors: Abdelrahman Elskhawy, Mengze Li, Nassir Navab, Benjamin Busam

Abstract: In Scene Graphs Generation (SGG) one extracts structured representation from visual inputs in the form of objects nodes and predicates connecting them. This facilitates image-based understanding and reasoning for various downstream tasks. Although fully supervised SGG approaches showed steady performance improvements, they suffer from a severe training bias. This is caused by the availability of only small subsets of curated data and exhibits long-tail predicate distribution issues with a lack of predicate diversity adversely affecting downstream tasks. To overcome this, we introduce PRISM-0, a framework for zero-shot open-vocabulary SGG that bootstraps foundation models in a bottom-up approach to capture the whole spectrum of diverse, open-vocabulary predicate prediction. Detected object pairs are filtered and passed to a Vision Language Model (VLM) that generates descriptive captions. These are used to prompt an LLM to generate fine-andcoarse-grained predicates for the pair. The predicates are then validated using a VQA model to provide a final SGG. With the modular and dataset-independent PRISM-0, we can enrich existing SG datasets such as Visual Genome (VG). Experiments illustrate that PRIMS-0 generates semantically meaningful graphs that improve downstream tasks such as Image Captioning and Sentence-to-Graph Retrieval with a performance on par to the best fully supervised methods.

cross Logical perspectives on learning statistical objects

Authors: Aaron Anderson, Michael Benedikt

Abstract: We consider the relationship between learnability of a ``base class'' of functions on a set X and learnability of a class of statistical functions derived from the base class. For example, we refine results showing that learnability of a family of functions implies learnability of the family of functions mapping a function in the class to its expectation under a distribution. We will look at both Probably Approximately Correct (PAC) learning, where example inputs and outputs are chosen at random, and online learning, where the examples are chosen adversarially. We establish improved bounds on the sample complexity of learning for statistical classes, stated in terms of combinatorial dimensions of the base class. We do this by adapting techniques introduced in model theory for ``randomizing a structure''. We give particular attention to classes derived from logical formulas, and relate learnability of the statistical classes to properties of the formula. Finally, we provide bounds on the complexity of learning the statistical classes built on top of a logic-based hypothesis class.

cross Whispering Under the Eaves: Protecting User Privacy Against Commercial and LLM-powered Automatic Speech Recognition Systems

Authors: Weifei Jin, Yuxin Cao, Junjie Su, Derui Wang, Yedi Zhang, Minhui Xue, Jie Hao, Jin Song Dong, Yixian Yang

Abstract: The widespread application of automatic speech recognition (ASR) supports large-scale voice surveillance, raising concerns about privacy among users. In this paper, we concentrate on using adversarial examples to mitigate unauthorized disclosure of speech privacy thwarted by potential eavesdroppers in speech communications. While audio adversarial examples have demonstrated the capability to mislead ASR models or evade ASR surveillance, they are typically constructed through time-intensive offline optimization, restricting their practicality in real-time voice communication. Recent work overcame this limitation by generating universal adversarial perturbations (UAPs) and enhancing their transferability for black-box scenarios. However, they introduced excessive noise that significantly degrades audio quality and affects human perception, thereby limiting their effectiveness in practical scenarios. To address this limitation and protect live users' speech against ASR systems, we propose a novel framework, AudioShield. Central to this framework is the concept of Transferable Universal Adversarial Perturbations in the Latent Space (LS-TUAP). By transferring the perturbations to the latent space, the audio quality is preserved to a large extent. Additionally, we propose target feature adaptation to enhance the transferability of UAPs by embedding target text features into the perturbations. Comprehensive evaluation on four commercial ASR APIs (Google, Amazon, iFlytek, and Alibaba), three voice assistants, two LLM-powered ASR and one NN-based ASR demonstrates the protection superiority of AudioShield over existing competitors, and both objective and subjective evaluations indicate that AudioShield significantly improves the audio quality. Moreover, AudioShield also shows high effectiveness in real-time end-to-end scenarios, and demonstrates strong resilience against adaptive countermeasures.

cross Investigating the Capabilities and Limitations of Machine Learning for Identifying Bias in English Language Data with Information and Heritage Professionals

Authors: Lucy Havens, Benjamin Bach, Melissa Terras, Beatrice Alex

Abstract: Despite numerous efforts to mitigate their biases, ML systems continue to harm already-marginalized people. While predominant ML approaches assume bias can be removed and fair models can be created, we show that these are not always possible, nor desirable, goals. We reframe the problem of ML bias by creating models to identify biased language, drawing attention to a dataset's biases rather than trying to remove them. Then, through a workshop, we evaluated the models for a specific use case: workflows of information and heritage professionals. Our findings demonstrate the limitations of ML for identifying bias due to its contextual nature, the way in which approaches to mitigating it can simultaneously privilege and oppress different communities, and its inevitability. We demonstrate the need to expand ML approaches to bias and fairness, providing a mixed-methods approach to investigating the feasibility of removing bias or achieving fairness in a given ML use case.

cross CrackSQL: A Hybrid SQL Dialect Translation System Powered by Large Language Models

Authors: Wei Zhou, Yuyang Gao, Xuanhe Zhou, Guoliang Li

Abstract: Dialect translation plays a key role in enabling seamless interaction across heterogeneous database systems. However, translating SQL queries between different dialects (e.g., from PostgreSQL to MySQL) remains a challenging task due to syntactic discrepancies and subtle semantic variations. Existing approaches including manual rewriting, rule-based systems, and large language model (LLM)-based techniques often involve high maintenance effort (e.g., crafting custom translation rules) or produce unreliable results (e.g., LLM generates non-existent functions), especially when handling complex queries. In this demonstration, we present CrackSQL, the first hybrid SQL dialect translation system that combines rule and LLM-based methods to overcome these limitations. CrackSQL leverages the adaptability of LLMs to minimize manual intervention, while enhancing translation accuracy by segmenting lengthy complex SQL via functionality-based query processing. To further improve robustness, it incorporates a novel cross-dialect syntax embedding model for precise syntax alignment, as well as an adaptive local-to-global translation strategy that effectively resolves interdependent query operations. CrackSQL supports three translation modes and offers multiple deployment and access options including a web console interface, a PyPI package, and a command-line prompt, facilitating adoption across a variety of real-world use cases

cross Privacy-Preserving Transfer Learning for Community Detection using Locally Distributed Multiple Networks

Authors: Xiao Guo, Xuming He, Xiangyu Chang, Shujie Ma

Abstract: This paper develops a new spectral clustering-based method called TransNet for transfer learning in community detection of network data. Our goal is to improve the clustering performance of the target network using auxiliary source networks, which are heterogeneous, privacy-preserved, and locally stored across various sources. The edges of each locally stored network are perturbed using the randomized response mechanism to achieve differential privacy. Notably, we allow the source networks to have distinct privacy-preserving and heterogeneity levels as often desired in practice. To better utilize the information from the source networks, we propose a novel adaptive weighting method to aggregate the eigenspaces of the source networks multiplied by adaptive weights chosen to incorporate the effects of privacy and heterogeneity. We propose a regularization method that combines the weighted average eigenspace of the source networks with the eigenspace of the target network to achieve an optimal balance between them. Theoretically, we show that the adaptive weighting method enjoys the error-bound-oracle property in the sense that the error bound of the estimated eigenspace only depends on informative source networks. We also demonstrate that TransNet performs better than the estimator using only the target network and the estimator using only the weighted source networks.

cross Role and Use of Race in AI/ML Models Related to Health

Authors: Martin C. Were, Ang Li, Bradley A. Malin, Zhijun Yin, Joseph R. Coco, Benjamin X. Collins, Ellen Wright Clayton, Laurie L. Novak, Rachele Hendricks-Sturrup, Abiodun Oluyomi, Shilo Anders, Chao Yan

Abstract: The role and use of race within health-related artificial intelligence and machine learning (AI/ML) models has sparked increasing attention and controversy. Despite the complexity and breadth of related issues, a robust and holistic framework to guide stakeholders in their examination and resolution remains lacking. This perspective provides a broad-based, systematic, and cross-cutting landscape analysis of race-related challenges, structured around the AI/ML lifecycle and framed through "points to consider" to support inquiry and decision-making.

cross Explorable INR: An Implicit Neural Representation for Ensemble Simulation Enabling Efficient Spatial and Parameter Exploration

Authors: Yi-Tang Chen, Haoyu Li, Neng Shi, Xihaier Luo, Wei Xu, Han-Wei Shen

Abstract: With the growing computational power available for high-resolution ensemble simulations in scientific fields such as cosmology and oceanology, storage and computational demands present significant challenges. Current surrogate models fall short in the flexibility of point- or region-based predictions as the entire field reconstruction is required for each parameter setting, hence hindering the efficiency of parameter space exploration. Limitations exist in capturing physical attribute distributions and pinpointing optimal parameter configurations. In this work, we propose Explorable INR, a novel implicit neural representation-based surrogate model, designed to facilitate exploration and allow point-based spatial queries without computing full-scale field data. In addition, to further address computational bottlenecks of spatial exploration, we utilize probabilistic affine forms (PAFs) for uncertainty propagation through Explorable INR to obtain statistical summaries, facilitating various ensemble analysis and visualization tasks that are expensive with existing models. Furthermore, we reformulate the parameter exploration problem as optimization tasks using gradient descent and KL divergence minimization that ensures scalability. We demonstrate that the Explorable INR with the proposed approach for spatial and parameter exploration can significantly reduce computation and memory costs while providing effective ensemble analysis.

cross Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

Authors: Saaket Agashe, Kyle Wong, Vincent Tu, Jiachen Yang, Ang Li, Xin Eric Wang

Abstract: Computer use agents automate digital tasks by directly interacting with graphical user interfaces (GUIs) on computers and mobile devices, offering significant potential to enhance human productivity by completing an open-ended space of user queries. However, current agents face significant challenges: imprecise grounding of GUI elements, difficulties with long-horizon task planning, and performance bottlenecks from relying on single generalist models for diverse cognitive tasks. To this end, we introduce Agent S2, a novel compositional framework that delegates cognitive responsibilities across various generalist and specialist models. We propose a novel Mixture-of-Grounding technique to achieve precise GUI localization and introduce Proactive Hierarchical Planning, dynamically refining action plans at multiple temporal scales in response to evolving observations. Evaluations demonstrate that Agent S2 establishes new state-of-the-art (SOTA) performance on three prominent computer use benchmarks. Specifically, Agent S2 achieves 18.9% and 32.7% relative improvements over leading baseline agents such as Claude Computer Use and UI-TARS on the OSWorld 15-step and 50-step evaluation. Moreover, Agent S2 generalizes effectively to other operating systems and applications, surpassing previous best methods by 52.8% on WindowsAgentArena and by 16.52% on AndroidWorld relatively. Code available at https://github.com/simular-ai/Agent-S.

URLs: https://github.com/simular-ai/Agent-S.

cross AI Judges in Design: Statistical Perspectives on Achieving Human Expert Equivalence With Vision-Language Models

Authors: Kristen M. Edwards, Farnaz Tehranchi, Scarlett R. Miller, Faez Ahmed

Abstract: The subjective evaluation of early stage engineering designs, such as conceptual sketches, traditionally relies on human experts. However, expert evaluations are time-consuming, expensive, and sometimes inconsistent. Recent advances in vision-language models (VLMs) offer the potential to automate design assessments, but it is crucial to ensure that these AI ``judges'' perform on par with human experts. However, no existing framework assesses expert equivalence. This paper introduces a rigorous statistical framework to determine whether an AI judge's ratings match those of human experts. We apply this framework in a case study evaluating four VLM-based judges on key design metrics (uniqueness, creativity, usefulness, and drawing quality). These AI judges employ various in-context learning (ICL) techniques, including uni- vs. multimodal prompts and inference-time reasoning. The same statistical framework is used to assess three trained novices for expert-equivalence. Results show that the top-performing AI judge, using text- and image-based ICL with reasoning, achieves expert-level agreement for uniqueness and drawing quality and outperforms or matches trained novices across all metrics. In 6/6 runs for both uniqueness and creativity, and 5/6 runs for both drawing quality and usefulness, its agreement with experts meets or exceeds that of the majority of trained novices. These findings suggest that reasoning-supported VLM models can achieve human-expert equivalence in design evaluation. This has implications for scaling design evaluation in education and practice, and provides a general statistical framework for validating AI judges in other domains requiring subjective content evaluation.

cross Diffusion-model approach to flavor models: A case study for $S_4^\prime$ modular flavor model

Authors: Satsuki Nishimura, Hajime Otsuka, Haruki Uchiyama

Abstract: We propose a numerical method of searching for parameters with experimental constraints in generic flavor models by utilizing diffusion models, which are classified as a type of generative artificial intelligence (generative AI). As a specific example, we consider the $S_4^\prime$ modular flavor model and construct a neural network that reproduces quark masses, the CKM matrix, and the Jarlskog invariant by treating free parameters in the flavor model as generating targets. By generating new parameters with the trained network, we find various phenomenologically interesting parameter regions where an analytical evaluation of the $S_4^\prime$ model is challenging. Additionally, we confirm that the spontaneous CP violation occurs in the $S_4^\prime$ model. The diffusion model enables an inverse problem approach, allowing the machine to provide a series of plausible model parameters from given experimental data. Moreover, it can serve as a versatile analytical tool for extracting new physical predictions from flavor models.

cross QSViT: A Methodology for Quantizing Spiking Vision Transformers

Authors: Rachmad Vidya Wicaksana Putra, Saad Iftikhar, Muhammad Shafique

Abstract: Vision Transformer (ViT)-based models have shown state-of-the-art performance (e.g., accuracy) in vision-based AI tasks. However, realizing their capability in resource-constrained embedded AI systems is challenging due to their inherent large memory footprints and complex computations, thereby incurring high power/energy consumption. Recently, Spiking Vision Transformer (SViT)-based models have emerged as alternate low-power ViT networks. However, their large memory footprints still hinder their applicability for resource-constrained embedded AI systems. Therefore, there is a need for a methodology to compress SViT models without degrading the accuracy significantly. To address this, we propose QSViT, a novel design methodology to compress the SViT models through a systematic quantization strategy across different network layers. To do this, our QSViT employs several key steps: (1) investigating the impact of different precision levels in different network layers, (2) identifying the appropriate base quantization settings for guiding bit precision reduction, (3) performing a guided quantization strategy based on the base settings to select the appropriate quantization setting, and (4) developing an efficient quantized network based on the selected quantization setting. The experimental results demonstrate that, our QSViT methodology achieves 22.75% memory saving and 21.33% power saving, while also maintaining high accuracy within 2.1% from that of the original non-quantized SViT model on the ImageNet dataset. These results highlight the potential of QSViT methodology to pave the way toward the efficient SViT deployments on resource-constrained embedded AI systems.

cross Enabling Efficient Processing of Spiking Neural Networks with On-Chip Learning on Commodity Neuromorphic Processors for Edge AI Systems

Authors: Rachmad Vidya Wicaksana Putra, Pasindu Wickramasinghe, Muhammad Shafique

Abstract: The rising demand for energy-efficient edge AI systems (e.g., mobile agents/robots) has increased the interest in neuromorphic computing, since it offers ultra-low power/energy AI computation through spiking neural network (SNN) algorithms on neuromorphic processors. However, their efficient implementation strategy has not been comprehensively studied, hence limiting SNN deployments for edge AI systems. Toward this, we propose a design methodology to enable efficient SNN processing on commodity neuromorphic processors. To do this, we first study the key characteristics of targeted neuromorphic hardware (e.g., memory and compute budgets), and leverage this information to perform compatibility analysis for network selection. Afterward, we employ a mapping strategy for efficient SNN implementation on the targeted processor. Furthermore, we incorporate an efficient on-chip learning mechanism to update the systems' knowledge for adapting to new input classes and dynamic environments. The experimental results show that the proposed methodology leads the system to achieve low latency of inference (i.e., less than 50ms for image classification, less than 200ms for real-time object detection in video streaming, and less than 1ms in keyword recognition) and low latency of on-chip learning (i.e., less than 2ms for keyword recognition), while incurring less than 250mW of processing power and less than 15mJ of energy consumption across the respective different applications and scenarios. These results show the potential of the proposed methodology in enabling efficient edge AI systems for diverse application use-cases.

cross SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching

Authors: Yuxuan Zhu, Ali Falahati, David H. Yang, Mohammad Mohammadi Amiri

Abstract: Large language models face significant computational and memory challenges when processing long contexts. During inference, efficient management of the key-value (KV) cache, which stores intermediate activations for autoregressive generation, is critical to reducing memory overhead and improving computational efficiency. Traditional token-level efficient KV caching methods overlook semantic information, treating tokens independently without considering their semantic relationships. Meanwhile, existing semantic-preserving KV cache management approaches often suffer from substantial memory usage and high time-to-first-token. To address these limitations, we propose SentenceKV, a novel sentence-level semantic KV caching approach designed to enhance inference efficiency while preserving semantic coherence. During prefilling, SentenceKV groups tokens based on sentence-level semantic similarity, compressing sentence representations into concise semantic vectors stored directly on the GPU, while individual KV pairs are offloaded to CPU. During decoding, SentenceKV generates tokens by selectively retrieving semantically relevant sentence-level KV entries, leveraging the semantic similarity between the prefilling-stage semantic vectors and decoding-stage queries. This ensures efficient and contextually accurate predictions, minimizing the loading of redundant or irrelevant data into GPU memory and significantly reducing memory overhead while maintaining stable inference latency, even for extremely long contexts. Extensive evaluations on benchmarks including PG-19, LongBench, and Needle-In-A-Haystack demonstrate that SentenceKV significantly outperforms state-of-the-art methods in both efficiency and memory usage, without compromising model accuracy.

cross When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning

Authors: Nishad Singhi, Hritik Bansal, Arian Hosseini, Aditya Grover, Kai-Wei Chang, Marcus Rohrbach, Anna Rohrbach

Abstract: Scaling test-time compute has emerged as a key strategy for enhancing the reasoning capabilities of large language models (LLMs), particularly in tasks like mathematical problem-solving. A traditional approach, Self-Consistency (SC), generates multiple solutions to a problem and selects the most common answer via majority voting. Another common method involves scoring each solution with a reward model (verifier) and choosing the best one. Recent advancements in Generative Reward Models (GenRM) reframe verification as a next-token prediction task, enabling inference-time scaling along a new axis. Specifically, GenRM generates multiple verification chains-of-thought to score each solution. Under a limited inference budget, this introduces a fundamental trade-off: should you spend the budget on scaling solutions via SC or generate fewer solutions and allocate compute to verification via GenRM? To address this, we evaluate GenRM against SC under a fixed inference budget. Interestingly, we find that SC is more compute-efficient than GenRM for most practical inference budgets across diverse models and datasets. For instance, GenRM first matches SC after consuming up to 8x the inference compute and requires significantly more compute to outperform it. Furthermore, we derive inference scaling laws for the GenRM paradigm, revealing that compute-optimal inference favors scaling solution generation more aggressively than scaling the number of verifications. Our work provides practical guidance on optimizing test-time scaling by balancing solution generation and verification. The code is available at https://github.com/nishadsinghi/sc-genrm-scaling.

URLs: https://github.com/nishadsinghi/sc-genrm-scaling.

cross Data-Driven Safety Verification using Barrier Certificates and Matrix Zonotopes

Authors: Mohammed Adib Oumer, Amr Alanwar, Majid Zamani

Abstract: Ensuring safety in cyber-physical systems (CPSs) is a critical challenge, especially when system models are difficult to obtain or cannot be fully trusted due to uncertainty, modeling errors, or environmental disturbances. Traditional model-based approaches rely on precise system dynamics, which may not be available in real-world scenarios. To address this, we propose a data-driven safety verification framework that leverages matrix zonotopes and barrier certificates to verify system safety directly from noisy data. Instead of trusting a single unreliable model, we construct a set of models that capture all possible system dynamics that align with the observed data, ensuring that the true system model is always contained within this set. This model set is compactly represented using matrix zonotopes, enabling efficient computation and propagation of uncertainty. By integrating this representation into a barrier certificate framework, we establish rigorous safety guarantees without requiring an explicit system model. Numerical experiments demonstrate the effectiveness of our approach in verifying safety for dynamical systems with unknown models, showcasing its potential for real-world CPS applications.

replace Designing Heterogeneous GNNs with Desired Permutation Properties for Wireless Resource Allocation

Authors: Jianyu Zhao, Chenyang Yang, Tingting Liu

Abstract: Graph neural networks (GNNs) have been designed for learning a variety of wireless policies, i.e., the mappings from environment parameters to decision variables, thanks to their superior performance, and the potential in enabling scalability and size generalizability. These merits are rooted in leveraging permutation prior, i.e., satisfying the permutation property of the policy to be learned (referred to as desired permutation property). Many wireless policies are with complicated permutation properties. To satisfy these properties, heterogeneous GNNs (HetGNNs) should be used to learn such policies. There are two critical factors that enable a HetGNN to satisfy a desired permutation property: constructing an appropriate heterogeneous graph and judiciously designing the architecture of the HetGNN. However, both the graph and the HetGNN are designed heuristically so far. In this paper, we strive to provide a systematic approach for the design to satisfy the desired permutation property. We first propose a method for constructing a graph for a policy, where the edges and their types are defined for the sake of satisfying complicated permutation properties. Then, we provide and prove three sufficient conditions to design a HetGNN such that it can satisfy the desired permutation property when learning over an appropriate graph. These conditions suggest a method of designing the HetGNN with desired permutation property by sharing the processing, combining, and pooling functions according to the types of vertices and edges of the graph. We take power allocation and hybrid precoding policies as examples for demonstrating how to apply the proposed methods and validating the impact of the permutation prior by simulations.

replace MSCMNet: Multi-scale Semantic Correlation Mining for Visible-Infrared Person Re-Identification

Authors: Xuecheng Hua, Ke Cheng, Hu Lu, Juanjuan Tu, Yuanquan Wang, Shitong Wang

Abstract: The main challenge in the Visible-Infrared Person Re-Identification (VI-ReID) task lies in how to extract discriminative features from different modalities for matching purposes. While the existing well works primarily focus on minimizing the modal discrepancies, the modality information can not thoroughly be leveraged. To solve this problem, a Multi-scale Semantic Correlation Mining network (MSCMNet) is proposed to comprehensively exploit semantic features at multiple scales and simultaneously reduce modality information loss as small as possible in feature extraction. The proposed network contains three novel components. Firstly, after taking into account the effective utilization of modality information, the Multi-scale Information Correlation Mining Block (MIMB) is designed to explore semantic correlations across multiple scales. Secondly, in order to enrich the semantic information that MIMB can utilize, a quadruple-stream feature extractor (QFE) with non-shared parameters is specifically designed to extract information from different dimensions of the dataset. Finally, the Quadruple Center Triplet Loss (QCT) is further proposed to address the information discrepancy in the comprehensive features. Extensive experiments on the SYSU-MM01, RegDB, and LLCM datasets demonstrate that the proposed MSCMNet achieves the greatest accuracy.

replace ResNLS: An Improved Model for Stock Price Forecasting

Authors: Yuanzhe Jia, Ali Anaissi, Basem Suleiman

Abstract: Stock prices forecasting has always been a challenging task. Although many research projects try to address the problem, few of them pay attention to the varying degrees of dependencies between stock prices. In this paper, we introduce a hybrid model that improves the prediction of stock prices by emphasizing the dependencies between adjacent stock prices. The proposed model, ResNLS, is mainly composed of two neural architectures, ResNet and LSTM. ResNet serves as a feature extractor to identify dependencies between stock prices, while LSTM analyzes the initial time series data with the combination of dependencies, which are considered as residuals. Our experiment reveals that when the closing price data for the previous 5 consecutive trading days is used as input, the performance of the model (ResNLS-5) is optimal compared to those with other inputs. Furthermore, ResNLS-5 demonstrates at least a 20% improvement over current state-of-the-art baselines. To verify whether ResNLS-5 can help clients effectively avoid risks and earn profits in the stock market, we construct a quantitative trading framework for back testing. The result shows that the trading strategy based on ResNLS-5 predictions can successfully mitigate losses during declining stock prices and generate profits in periods of rising stock prices. The relevant code is publicly available on GitHub.

replace SVInvNet: A Densely Connected Encoder-Decoder Architecture for Seismic Velocity Inversion

Authors: Mojtaba Najafi Khatounabad, Hacer Yalim Keles, Selma Kadioglu

Abstract: This study presents a deep learning-based approach to seismic velocity inversion problem, focusing on both noisy and noiseless training datasets of varying sizes. Our Seismic Velocity Inversion Network (SVInvNet) introduces a novel architecture that contains a multi-connection encoder-decoder structure enhanced with dense blocks. This design is specifically tuned to effectively process time series data, which is essential for addressing the challenges of non-linear seismic velocity inversion. For training and testing, we created diverse seismic velocity models, including multi-layered, faulty, and salt dome categories. We also investigated how different kinds of ambient noise, both coherent and stochastic, and the size of the training dataset affect learning outcomes. SVInvNet is trained on datasets ranging from 750 to 6,000 samples and is tested using a large benchmark dataset of 12,000 samples. Despite its fewer parameters compared to the baseline model, SVInvNet achieves superior performance with this dataset. The performance of SVInvNet was further evaluated using the OpenFWI dataset and Marmousi-derived velocity models. The comparative analysis clearly reveals the effectiveness of the proposed model.

replace Holistic analysis on the sustainability of Federated Learning across AI product lifecycle

Authors: Hongliu Cao

Abstract: In light of emerging legal requirements and policies focused on privacy protection, there is a growing trend of companies across various industries adopting Federated Learning (FL). This decentralized approach involves multiple clients or silos, collaboratively training a global model under the coordination of a central server while utilizing their private local data. Unlike traditional methods that necessitate data sharing and transmission, Cross-Silo FL allows clients to share model updates rather than raw data, thereby enhancing privacy. Despite its growing adoption, the carbon impact associated with Cross-Silo FL remains poorly understood due to the limited research in this area. This study seeks to bridge this gap by evaluating the sustainability of Cross-Silo FL throughout the entire AI product lifecycle, extending the analysis beyond the model training phase alone. We systematically compare this decentralized method with traditional centralized approaches and present a robust quantitative framework for assessing the costs and CO2 emissions in real-world Cross-Silo FL environments. Our findings indicate that the energy consumption and costs of model training are comparable between Cross-Silo Federated Learning and Centralized Learning. However, the additional data transfer and storage requirements inherent in Centralized Learning can result in significant, often overlooked CO2 emissions. Moreover, we introduce an innovative data and application management system that integrates Cross-Silo FL and analytics, aiming at improving the sustainability and economic efficiency of IT enterprises.

replace Explainable Bayesian Optimization

Authors: Tanmay Chakraborty, Christian Wirth, Christin Seifert

Abstract: Manual parameter tuning of cyber-physical systems is a common practice, but it is labor-intensive. Bayesian Optimization (BO) offers an automated alternative, yet its black-box nature reduces trust and limits human-BO collaborative system tuning. Experts struggle to interpret BO recommendations due to the lack of explanations. This paper addresses the post-hoc BO explainability problem for cyber-physical systems. We introduce TNTRules (Tune-No-Tune Rules), a novel algorithm that provides both global and local explanations for BO recommendations. TNTRules generates actionable rules and visual graphs, identifying optimal solution bounds and ranges, as well as potential alternative solutions. Unlike existing explainable AI (XAI) methods, TNTRules is tailored specifically for BO, by encoding uncertainty via a variance pruning technique and hierarchical agglomerative clustering. A multi-objective optimization approach allows maximizing explanation quality. We evaluate TNTRules using established XAI metrics (Correctness, Completeness, and Compactness) and compare it against adapted baseline methods. The results demonstrate that TNTRules generates high-fidelity, compact, and complete explanations, significantly outperforming three baselines on 5 multi-objective testing functions and 2 hyperparameter tuning problems.

replace Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss

Authors: Ingvar Ziemann, Stephen Tu, George J. Pappas, Nikolai Matni

Abstract: In this work, we study statistical learning with dependent ($\beta$-mixing) data and square loss in a hypothesis class $\mathscr{F}\subset L_{\Psi_p}$ where $\Psi_p$ is the norm $\|f\|_{\Psi_p} \triangleq \sup_{m\geq 1} m^{-1/p} \|f\|_{L^m} $ for some $p\in [2,\infty]$. Our inquiry is motivated by the search for a sharp noise interaction term, or variance proxy, in learning with dependent data. Absent any realizability assumption, typical non-asymptotic results exhibit variance proxies that are deflated multiplicatively by the mixing time of the underlying covariates process. We show that whenever the topologies of $L^2$ and $\Psi_p$ are comparable on our hypothesis class $\mathscr{F}$ -- that is, $\mathscr{F}$ is a weakly sub-Gaussian class: $\|f\|_{\Psi_p} \lesssim \|f\|_{L^2}^\eta$ for some $\eta\in (0,1]$ -- the empirical risk minimizer achieves a rate that only depends on the complexity of the class and second order statistics in its leading term. Our result holds whether the problem is realizable or not and we refer to this as a \emph{near mixing-free rate}, since direct dependence on mixing is relegated to an additive higher order term. We arrive at our result by combining the above notion of a weakly sub-Gaussian class with mixed tail generic chaining. This combination allows us to compute sharp, instance-optimal rates for a wide range of problems. Examples that satisfy our framework include sub-Gaussian linear regression, more general smoothly parameterized function classes, finite hypothesis classes, and bounded smoothness classes.

replace Towards Adversarially Robust Dataset Distillation by Curvature Regularization

Authors: Eric Xue, Yijiang Li, Haoyang Liu, Peiran Wang, Yifan Shen, Haohan Wang

Abstract: Dataset distillation (DD) allows datasets to be distilled to fractions of their original size while preserving the rich distributional information so that models trained on the distilled datasets can achieve a comparable accuracy while saving significant computational loads. Recent research in this area has been focusing on improving the accuracy of models trained on distilled datasets. In this paper, we aim to explore a new perspective of DD. We study how to embed adversarial robustness in distilled datasets, so that models trained on these datasets maintain the high accuracy and meanwhile acquire better adversarial robustness. We propose a new method that achieves this goal by incorporating curvature regularization into the distillation process with much less computational overhead than standard adversarial training. Extensive empirical experiments suggest that our method not only outperforms standard adversarial training on both accuracy and robustness with less computation overhead but is also capable of generating robust distilled datasets that can withstand various adversarial attacks.

replace VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning

Authors: Yongshuo Zong, Ondrej Bohdal, Timothy Hospedales

Abstract: Large language models (LLMs) famously exhibit emergent in-context learning (ICL) -- the ability to rapidly adapt to new tasks using few-shot examples provided as a prompt, without updating the model's weights. Built on top of LLMs, vision large language models (VLLMs) have advanced significantly in areas such as recognition, reasoning, and grounding. However, investigations into \emph{multimodal ICL} have predominantly focused on few-shot visual question answering (VQA), and image captioning, which we will show neither exploit the strengths of ICL, nor test its limitations. The broader capabilities and limitations of multimodal ICL remain under-explored. In this study, we introduce a comprehensive benchmark VL-ICL Bench for multimodal in-context learning, encompassing a broad spectrum of tasks that involve both images and text as inputs and outputs, and different types of challenges, from {perception to reasoning and long context length}. We evaluate the abilities of state-of-the-art VLLMs against this benchmark suite, revealing their diverse strengths and weaknesses, and showing that even the most advanced models, such as GPT-4, find the tasks challenging. By highlighting a range of new ICL tasks, and the associated strengths and limitations of existing models, we hope that our dataset will inspire future work on enhancing the in-context learning capabilities of VLLMs, as well as inspire new applications that leverage VLLM ICL. The code and dataset are available at https://github.com/ys-zong/VL-ICL.

URLs: https://github.com/ys-zong/VL-ICL.

replace A Clustering Method with Graph Maximum Decoding Information

Authors: Xinrun Xu, Manying Lv, Zhanbiao Lian, Yurong Wu, Jin Yan, Shan Jiang, Zhiming Ding

Abstract: The clustering method based on graph models has garnered increased attention for its widespread applicability across various knowledge domains. Its adaptability to integrate seamlessly with other relevant applications endows the graph model-based clustering analysis with the ability to robustly extract "natural associations" or "graph structures" within datasets, facilitating the modelling of relationships between data points. Despite its efficacy, the current clustering method utilizing the graph-based model overlooks the uncertainty associated with random walk access between nodes and the embedded structural information in the data. To address this gap, we present a novel Clustering method for Maximizing Decoding Information within graph-based models, named CMDI. CMDI innovatively incorporates two-dimensional structural information theory into the clustering process, consisting of two phases: graph structure extraction and graph vertex partitioning. Within CMDI, graph partitioning is reformulated as an abstract clustering problem, leveraging maximum decoding information to minimize uncertainty associated with random visits to vertices. Empirical evaluations on three real-world datasets demonstrate that CMDI outperforms classical baseline methods, exhibiting a superior decoding information ratio (DI-R). Furthermore, CMDI showcases heightened efficiency, particularly when considering prior knowledge (PK). These findings underscore the effectiveness of CMDI in enhancing decoding information quality and computational efficiency, positioning it as a valuable tool in graph-based clustering analyses.

replace CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge

Authors: Zhangquan Chen, Chunjiang Liu, Haobin Duan

Abstract: In this paper, we introduce CodingTeachLLM, a large language model (LLM) designed for coding teaching. Specially, we aim to enhance the coding ability of LLM and lead it to better teaching mode in education context. Thus, we propose an end-to-end prior-based three-phases supervised fine-tuned model, which is proved more competitive than traditional fine-tuning method. More specifically, our model realizes the structural disassembly and incremental guided output of educational knowledge. To this end, we robustify data classification of three types via a sampler and overlap estimation neural network, and inject the preprocessing datasets into pre-trained model in three batches for LORA fine-tuning. Then, we design a prior module couples system prompt, vector databases, and abstract syntax tree task segmentation. Finally, the compression method and regularization constraint are applied to the prior-based fine-tuned model, followed by text filter at the output end to obtain incremental guided results. Our model represents the first research effort to truly embody the tutor role with the features of abundant educational knowledge, step-by-step incremental guided outputs and non-disclosure of answers. Extensive experiments report that our model also achieves state-of-the-art in code abilities compared to open-source models, reaching an impressive 75.10% on the HumanEval (@pass 1) benchmark. Additionally, our model maintains strong conversational capabilities, with the 13B quantized version achieving scores of 56.34, 50.60, and 45.27 respectively on the MMLU, C-Eval, and AGIEval (5 shot) dialogue evaluation benchmarks.

replace Rehearsal-free Federated Domain-incremental Learning

Authors: Rui Sun, Haoran Duan, Jiahua Dong, Varun Ojha, Tejal Shah, Rajiv Ranjan

Abstract: We introduce a rehearsal-free federated domain incremental learning framework, RefFiL, based on a global prompt-sharing paradigm to alleviate catastrophic forgetting challenges in federated domain-incremental learning, where unseen domains are continually learned. Typical methods for mitigating forgetting, such as the use of additional datasets and the retention of private data from earlier tasks, are not viable in federated learning (FL) due to devices' limited resources. Our method, RefFiL, addresses this by learning domain-invariant knowledge and incorporating various domain-specific prompts from the domains represented by different FL participants. A key feature of RefFiL is the generation of local fine-grained prompts by our domain adaptive prompt generator, which effectively learns from local domain knowledge while maintaining distinctive boundaries on a global scale. We also introduce a domain-specific prompt contrastive learning loss that differentiates between locally generated prompts and those from other domains, enhancing RefFiL's precision and effectiveness. Compared to existing methods, RefFiL significantly alleviates catastrophic forgetting without requiring extra memory space, making it ideal for privacy-sensitive and resource-constrained devices.

replace Enhancing Domain Adaptation through Prompt Gradient Alignment

Authors: Hoang Phan, Lam Tran, Quyen Tran, Trung Le

Abstract: Prior Unsupervised Domain Adaptation (UDA) methods often aim to train a domain-invariant feature extractor, which may hinder the model from learning sufficiently discriminative features. To tackle this, a line of works based on prompt learning leverages the power of large-scale pre-trained vision-language models to learn both domain-invariant and specific features through a set of domain-agnostic and domain-specific learnable prompts. Those studies typically enforce invariant constraints on representation, output, or prompt space to learn such prompts. In contrast, we cast UDA as a multiple-objective optimization problem in which each objective is represented by a domain loss. Under this new framework, we propose to align per-objective gradients to foster consensus between them. Additionally, to prevent potential overfitting when fine-tuning this deep learning architecture, we penalize the norm of these gradients. To achieve these goals, we devise a practical gradient update procedure that can work under both single-source and multi-source UDA. Empirically, our method consistently outperforms other vision-language model adaptation methods. The implementation is available at https://github.com/VietHoang1512/PGA.

URLs: https://github.com/VietHoang1512/PGA.

replace Generative Data Assimilation of Sparse Weather Station Observations at Kilometer Scales

Authors: Peter Manshausen, Yair Cohen, Peter Harrington, Jaideep Pathak, Mike Pritchard, Piyush Garg, Morteza Mardani, Karthik Kashinath, Simon Byrne, Noah Brenowitz

Abstract: Data assimilation of observational data into full atmospheric states is essential for weather forecast model initialization. Recently, methods for deep generative data assimilation have been proposed which allow for using new input data without retraining the model. They could also dramatically accelerate the costly data assimilation process used in operational regional weather models. Here, in a central US testbed, we demonstrate the viability of score-based data assimilation in the context of realistically complex km-scale weather. We train an unconditional diffusion model to generate snapshots of a state-of-the-art km-scale analysis product, the High Resolution Rapid Refresh. Then, using score-based data assimilation to incorporate sparse weather station data, the model produces maps of precipitation and surface winds. The generated fields display physically plausible structures, such as gust fronts, and sensitivity tests confirm learnt physics through multivariate relationships. Preliminary skill analysis shows the approach already outperforms a naive baseline of the High-Resolution Rapid Refresh system itself. By incorporating observations from 40 weather stations, 10% lower RMSEs on left-out stations are attained. Despite some lingering imperfections such as insufficiently disperse ensemble DA estimates, we find the results overall an encouraging proof of concept, and the first at km-scale. It is a ripe time to explore extensions that combine increasingly ambitious regional state generators with an increasing set of in situ, ground-based, and satellite remote sensing data streams.

replace Machine Unlearning Fails to Remove Data Poisoning Attacks

Authors: Martin Pawelczyk, Jimmy Z. Di, Yiwei Lu, Ayush Sekhari, Gautam Kamath, Seth Neel

Abstract: We revisit the efficacy of several practical methods for approximate machine unlearning developed for large-scale deep learning. In addition to complying with data deletion requests, one often-cited potential application for unlearning methods is to remove the effects of poisoned data. We experimentally demonstrate that, while existing unlearning methods have been demonstrated to be effective in a number of settings, they fail to remove the effects of data poisoning across a variety of types of poisoning attacks (indiscriminate, targeted, and a newly-introduced Gaussian poisoning attack) and models (image classifiers and LLMs); even when granted a relatively large compute budget. In order to precisely characterize unlearning efficacy, we introduce new evaluation metrics for unlearning based on data poisoning. Our results suggest that a broader perspective, including a wider variety of evaluations, are required to avoid a false sense of confidence in machine unlearning procedures for deep learning without provable guarantees. Moreover, while unlearning methods show some signs of being useful to efficiently remove poisoned data without having to retrain, our work suggests that these methods are not yet ``ready for prime time,'' and currently provide limited benefit over retraining.

replace NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals

Authors: Jaden Fiotto-Kaufman, Alexander R. Loftus, Eric Todd, Jannik Brinkmann, Koyena Pal, Dmitrii Troitskii, Michael Ripa, Adam Belfki, Can Rager, Caden Juang, Aaron Mueller, Samuel Marks, Arnab Sen Sharma, Francesca Lucchetti, Nikhil Prakash, Carla Brodley, Arjun Guha, Jonathan Bell, Byron C. Wallace, David Bau

Abstract: We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks. NNsight is an open-source system that extends PyTorch to introduce deferred remote execution. The National Deep Inference Fabric (NDIF) is a scalable inference service that executes NNsight requests, allowing users to share GPU resources and pretrained models. These technologies are enabled by the Intervention Graph, an architecture developed to decouple experimental design from model runtime. Together, this framework provides transparent and efficient access to the internals of deep neural networks such as very large language models (LLMs) without imposing the cost or complexity of hosting customized models individually. We conduct a quantitative survey of the machine learning literature that reveals a growing gap in the study of the internals of large-scale AI. We demonstrate the design and use of our framework to address this gap by enabling a range of research methods on huge models. Finally, we conduct benchmarks to compare performance with previous approaches. Code, documentation, and tutorials are available at https://nnsight.net/.

URLs: https://nnsight.net/.

replace Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs

Authors: Sungmin Cha, Sungjun Cho, Dasol Hwang, Moontae Lee

Abstract: Large Language Models (LLMs) have demonstrated strong reasoning and memorization capabilities via pretraining on massive textual corpora. However, this poses risk of privacy and copyright violations, highlighting the need for efficient machine unlearning methods that remove sensitive data without retraining from scratch. While Gradient Ascent (GA) is commonly used to unlearn by reducing the likelihood of generating unwanted content, it leads to unstable optimization and catastrophic forgetting of retrained knowledge. We find that combining GA with low-rank adaptation results in poor trade-offs between computational cost and generative performance. To address these challenges, we propose Low-rank Knowledge Unlearning (LoKU), a novel framework that enables robust and efficient unlearning for LLMs. First, we introduce Inverted Hinge Loss, which suppresses unwanted tokens while maintaining fluency by boosting the probability of the next most likely token. Second, we develop a data-adaptive initialization for LoRA adapters via low-rank approximation weighted with relative Fisher information, thereby focusing updates on parameters critical for removing targeted knowledge. Experiments on the Training Data Extraction Challenge dataset using GPT-Neo models as well as on the TOFU benchmark with Phi-1.5B and Llama2-7B models demonstrate that our approach effectively removes sensitive information while maintaining reasoning and generative capabilities with minimal impact. Our implementation can be found in https://github.com/csm9493/efficient-llm-unlearning.

URLs: https://github.com/csm9493/efficient-llm-unlearning.

replace Prior Learning in Introspective VAEs

Authors: Ioannis Athanasiadis, Fredrik Lindsten, Michael Felsberg

Abstract: Variational Autoencoders (VAEs) are a popular framework for unsupervised learning and data generation. A plethora of methods have been proposed focusing on improving VAEs, with the incorporation of adversarial objectives and the integration of prior learning mechanisms being prominent directions. When it comes to the former, an indicative instance is the recently introduced family of Introspective VAEs aiming at ensuring that a low likelihood is assigned to unrealistic samples. In this study, we focus on the Soft-IntroVAE (S-IntroVAE) and investigate the implication of incorporating a multimodal and learnable prior into this framework. Namely, we formulate the prior as a third player and show that when trained in cooperation with the decoder constitutes an effective way for prior learning, which shares the Nash Equilibrium with the vanilla S-IntroVAE. Furthermore, based on a modified formulation of the optimal ELBO in S-IntroVAE, we develop theoretically motivated regularizations, that is (i) adaptive variance clipping to stabilize training when learning the prior and (ii) responsibility regularization to discourage the formation of inactive prior mode. Finally, we perform a series of targeted experiments on a 2D density estimation benchmark and in an image generation setting comprised of the (F)-MNIST and CIFAR-10 datasets demonstrating the benefit of prior learning in S-IntroVAE in generation and representation learning.

replace Large-Scale Multi-omic Biosequence Transformers for Modeling Protein-Nucleic Acid Interactions

Authors: Sully F. Chen, Robert J. Steele, Glen M. Hocky, Beakal Lemeneh, Shivanand P. Lad, Eric K. Oermann

Abstract: The transformer architecture has revolutionized bioinformatics and driven progress in the understanding and prediction of the properties of biomolecules. Almost all research on large-scale biosequence transformers has focused on one domain at a time (single-omic), usually DNA/RNA or proteins. These models have seen incredible success in downstream tasks in each domain, and have achieved particularly noteworthy breakthroughs in sequence modeling and structural modeling. However, these single-omic models are naturally incapable of efficiently modeling multi-omic tasks, one of the most biologically critical being protein-nucleic acid interactions. We present our work training the largest open-source multi-omic foundation model to date. We show that these multi-omic models (MOMs) can learn joint representations between various single-omic distributions that are emergently consistent with the Central Dogma of molecular biology despite only being trained on unlabeled biosequences. We further demonstrate that MOMs can be fine-tuned to achieve state-of-the-art results on protein-nucleic acid interaction tasks, namely predicting the change in Gibbs free energy ($\Delta G$) of the binding interaction between a given nucleic acid and protein. Remarkably, we show that multi-omic biosequence transformers emergently learn useful structural information without any \textit{a priori} structural training, allowing us to predict which protein residues are most involved in the protein-nucleic acid binding interaction. Lastly, we provide evidence that multi-omic biosequence models are in many cases superior to foundation models trained on single-omics distributions, both in performance-per-FLOP and absolute performance, suggesting a more generalized or foundational approach to building these models for biology.

replace Illuminating the Diversity-Fitness Trade-Off in Black-Box Optimization

Authors: Maria Laura Santoni, Elena Raponi, Aneta Neumann, Frank Neumann, Mike Preuss, Carola Doerr

Abstract: In real-world applications, users often favor structurally diverse design choices over one high-quality solution. It is hence important to consider more solutions that decision makers can compare and further explore based on additional criteria. Alongside the existing approaches of evolutionary diversity optimization, quality diversity, and multimodal optimization, this paper presents a fresh perspective on this challenge by considering the problem of identifying a fixed number of solutions with a pairwise distance above a specified threshold while maximizing their average quality. We obtain first insight into these objectives by performing a subset selection on the search trajectories of different well-established search heuristics, whether they have been specifically designed with diversity in mind or not. We emphasize that the main goal of our work is not to present a new algorithm but to understand the capability of off-the-shelf algorithms to quantify the trade-off between the minimum pairwise distance within batches of solutions and their average quality. We also analyze how this trade-off depends on the properties of the underlying optimization problem. A possibly surprising outcome of our empirical study is the observation that naive uniform random sampling establishes a very strong baseline for our problem, hardly ever outperformed by the search trajectories of the considered heuristics. We interpret these results as a motivation to develop algorithms tailored to produce diverse solutions of high average quality.

replace Optimal or Greedy Decision Trees? Revisiting their Objectives, Tuning, and Performance

Authors: Jacobus G. M. van der Linden, Dani\"el Vos, Mathijs M. de Weerdt, Sicco Verwer, Emir Demirovi\'c

Abstract: Recently there has been a surge of interest in optimal decision tree (ODT) methods that globally optimize accuracy directly, in contrast to traditional approaches that locally optimize an impurity or information metric. However, the value of optimal methods is not well understood yet, as the literature provides conflicting results, with some demonstrating superior out-of-sample performance of ODTs over greedy approaches, while others show the opposite. Through a novel extensive experimental study, we provide new insights into the design and behavior of learning decision trees. In particular, we identify and analyze two relatively unexplored aspects of ODTs: the objective function used in training trees, and tuning techniques. Thus, we address these three questions: what objective to optimize in ODTs; how to tune ODTs; and how do optimal and greedy methods compare? Our experimental evaluation examines 11 objective functions, six tuning methods, and six claims from the literature on optimal and greedy methods on 180 real and synthetic data sets. Through our analysis, both conceptually and experimentally, we show the effect of (non-)concave objectives in greedy and optimal approaches; we highlight the importance of proper tuning of ODTs; support and refute several claims from the literature; provide clear recommendations for researchers and practitioners on the usage of greedy and optimal methods; and code for future comparisons.

replace Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks

Authors: Siddharth Joshi, Jiayi Ni, Baharan Mirzasoleiman

Abstract: Dataset distillation (DD) generates small synthetic datasets that can efficiently train deep networks with a limited amount of memory and compute. Despite the success of DD methods for supervised learning, DD for self-supervised pre-training of deep models has remained unaddressed. Pre-training on unlabeled data is crucial for efficiently generalizing to downstream tasks with limited labeled data. In this work, we propose the first effective DD method for SSL pre-training. First, we show, theoretically and empirically, that naive application of supervised DD methods to SSL fails, due to the high variance of the SSL gradient. Then, we address this issue by relying on insights from knowledge distillation (KD) literature. Specifically, we train a small student model to match the representations of a larger teacher model trained with SSL. Then, we generate a small synthetic dataset by matching the training trajectories of the student models. As the KD objective has considerably lower variance than SSL, our approach can generate synthetic datasets that can successfully pre-train high-quality encoders. Through extensive experiments, we show that our distilled sets lead to up to 13% higher accuracy than prior work, on a variety of downstream tasks, in the presence of limited labeled data. Code at https://github.com/BigML-CS-UCLA/MKDT.

URLs: https://github.com/BigML-CS-UCLA/MKDT.

replace A Formal Framework for Understanding Length Generalization in Transformers

Authors: Xinting Huang, Andy Yang, Satwik Bhattamishra, Yash Sarrof, Andreas Krebs, Hattie Zhou, Preetum Nakkiran, Michael Hahn

Abstract: A major challenge for transformers is generalizing to sequences longer than those observed during training. While previous works have empirically shown that transformers can either succeed or fail at length generalization depending on the task, theoretical understanding of this phenomenon remains limited. In this work, we introduce a rigorous theoretical framework to analyze length generalization in causal transformers with learnable absolute positional encodings. In particular, we characterize those functions that are identifiable in the limit from sufficiently long inputs with absolute positional encodings under an idealized inference scheme using a norm-based regularizer. This enables us to prove the possibility of length generalization for a rich family of problems. We experimentally validate the theory as a predictor of success and failure of length generalization across a range of algorithmic and formal language tasks. Our theory not only explains a broad set of empirical observations but also opens the way to provably predicting length generalization capabilities in transformers.

replace Diffusion State-Guided Projected Gradient for Inverse Problems

Authors: Rayhan Zirvi, Bahareh Tolooshams, Anima Anandkumar

Abstract: Recent advancements in diffusion models have been effective in learning data priors for solving inverse problems. They leverage diffusion sampling steps for inducing a data prior while using a measurement guidance gradient at each step to impose data consistency. For general inverse problems, approximations are needed when an unconditionally trained diffusion model is used since the measurement likelihood is intractable, leading to inaccurate posterior sampling. In other words, due to their approximations, these methods fail to preserve the generation process on the data manifold defined by the diffusion prior, leading to artifacts in applications such as image restoration. To enhance the performance and robustness of diffusion models in solving inverse problems, we propose Diffusion State-Guided Projected Gradient (DiffStateGrad), which projects the measurement gradient onto a subspace that is a low-rank approximation of an intermediate state of the diffusion process. DiffStateGrad, as a module, can be added to a wide range of diffusion-based inverse solvers to improve the preservation of the diffusion process on the prior manifold and filter out artifact-inducing components. We highlight that DiffStateGrad improves the robustness of diffusion models in terms of the choice of measurement guidance step size and noise while improving the worst-case performance. Finally, we demonstrate that DiffStateGrad improves upon the state-of-the-art on linear and nonlinear image restoration inverse problems. Our code is available at https://github.com/Anima-Lab/DiffStateGrad.

URLs: https://github.com/Anima-Lab/DiffStateGrad.

replace Scalable Mechanistic Neural Networks for Differential Equations and Machine Learning

Authors: Jiale Chen, Dingling Yao, Adeel Pervez, Dan Alistarh, Francesco Locatello

Abstract: We propose Scalable Mechanistic Neural Network (S-MNN), an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences. By reformulating the original Mechanistic Neural Network (MNN) (Pervez et al., 2024), we reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear. This significant improvement enables efficient modeling of long-term dynamics without sacrificing accuracy or interpretability. Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources. Consequently, S-MNN can drop-in replace the original MNN in applications, providing a practical and efficient tool for integrating mechanistic bottlenecks into neural network models of complex dynamical systems. Source code is available at https://github.com/IST-DASLab/ScalableMNN.

URLs: https://github.com/IST-DASLab/ScalableMNN.

replace MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning

Authors: Yaming Yang, Dilxat Muhtar, Yelong Shen, Yuefeng Zhan, Jianfeng Liu, Yujing Wang, Hao Sun, Denvy Deng, Feng Sun, Qi Zhang, Weizhu Chen, Yunhai Tong

Abstract: Parameter-efficient fine-tuning (PEFT) has been widely employed for domain adaptation, with LoRA being one of the most prominent methods due to its simplicity and effectiveness. However, in multi-task learning (MTL) scenarios, LoRA tends to obscure the distinction between tasks by projecting sparse high-dimensional features from different tasks into the same dense low-dimensional intrinsic space. This leads to task interference and suboptimal performance for LoRA and its variants. To tackle this challenge, we propose MTL-LoRA, which retains the advantages of low-rank adaptation while significantly enhancing MTL capabilities. MTL-LoRA augments LoRA by incorporating additional task-adaptive parameters that differentiate task-specific information and capture shared knowledge across various tasks within low-dimensional spaces. This approach enables pre-trained models to jointly adapt to different target domains with a limited number of trainable parameters. Comprehensive experimental results, including evaluations on public academic benchmarks for natural language understanding, commonsense reasoning, and image-text understanding, as well as real-world industrial text Ads relevance datasets, demonstrate that MTL-LoRA outperforms LoRA and its various variants with comparable or even fewer learnable parameters in MTL setting.

replace Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models

Authors: Jun Luo, Chen Chen, Shandong Wu

Abstract: Federated prompt learning benefits federated learning with CLIP-like Vision-Language Model's (VLM's) robust representation learning ability through prompt learning. However, current federated prompt learning methods are habitually restricted to the traditional FL paradigm, where the participating clients are generally only allowed to download a single globally aggregated model from the server. While justifiable for training full-sized models under federated settings, in this work, we argue that this paradigm is ill-suited for lightweight prompts. By facilitating the clients to download multiple pre-aggregated prompts as fixed non-local experts, we propose Personalized Federated Mixture of Adaptive Prompts (pFedMoAP), a novel FL framework that personalizes the prompt learning process through the lens of Mixture of Experts (MoE). pFedMoAP implements a local attention-based gating network that learns to generate enhanced text features for better alignment with local image data, benefiting from both local and downloaded non-local adaptive prompt experts. Extensive experiments on 9 datasets under various federated settings demonstrate the efficacy of the proposed pFedMoAP algorithm. The code is available at https://github.com/ljaiverson/pFedMoAP.

URLs: https://github.com/ljaiverson/pFedMoAP.

replace Improving Vector-Quantized Image Modeling with Latent Consistency-Matching Diffusion

Authors: Bac Nguyen, Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Stefano Ermon, Yuki Mitsufuji

Abstract: By embedding discrete representations into a continuous latent space, we can leverage continuous-space latent diffusion models to handle generative modeling of discrete data. However, despite their initial success, most latent diffusion methods rely on fixed pretrained embeddings, limiting the benefits of joint training with the diffusion model. While jointly learning the embedding (via reconstruction loss) and the latent diffusion model (via score matching loss) could enhance performance, end-to-end training risks embedding collapse, degrading generation quality. To mitigate this issue, we introduce VQ-LCMD, a continuous-space latent diffusion framework within the embedding space that stabilizes training. VQ-LCMD uses a novel training objective combining the joint embedding-diffusion variational lower bound with a consistency-matching (CM) loss, alongside a shifted cosine noise schedule and random dropping strategy. Experiments on several benchmarks show that the proposed VQ-LCMD yields superior results on FFHQ, LSUN Churches, and LSUN Bedrooms compared to discrete-state latent diffusion models. In particular, VQ-LCMD achieves an FID of 6.81 for class-conditional image generation on ImageNet with 50 steps.

replace Optimizing Posterior Samples for Bayesian Optimization via Rootfinding

Authors: Taiwo A. Adebiyi, Bach Do, Ruda Zhang

Abstract: Bayesian optimization devolves the global optimization of a costly objective function to the global optimization of a sequence of acquisition functions. This inner-loop optimization can be catastrophically difficult if it involves posterior sample paths, especially in higher dimensions. We introduce an efficient global optimization strategy for posterior samples based on global rootfinding. It provides gradient-based optimizers with two sets of judiciously selected starting points, designed to combine exploration and exploitation. The number of starting points can be kept small without sacrificing optimization quality. Remarkably, even with just one point from each set, the global optimum is discovered most of the time. The algorithm scales practically linearly to high dimensions, breaking the curse of dimensionality. For Gaussian process Thompson sampling (GP-TS), we demonstrate remarkable improvement in both inner- and outer-loop optimization, surprisingly outperforming alternatives like EI and GP-UCB in most cases. Our approach also improves the performance of other posterior sample-based acquisition functions, such as variants of entropy search. Furthermore, we propose a sample-average formulation of GP-TS, which has a parameter to explicitly control exploitation and can be computed at the cost of one posterior sample. Our implementation is available at https://github.com/UQUH/TSRoots .

URLs: https://github.com/UQUH/TSRoots

replace Conditional Variable Flow Matching: Transforming Conditional Densities with Amortized Conditional Optimal Transport

Authors: Adam P. Generale, Andreas E. Robertson, Surya R. Kalidindi

Abstract: Forecasting conditional stochastic nonlinear dynamical systems is a fundamental challenge repeatedly encountered across the biological and physical sciences. While flow-based models can impressively predict the temporal evolution of probability distributions representing possible outcomes of a specific process, existing frameworks cannot satisfactorily account for the impact of conditioning variables on these dynamics. Amongst several limitations, existing methods require training data with paired conditions and are developed for discrete conditioning variables. We propose Conditional Variable Flow Matching (CVFM), a framework for learning flows transforming conditional distributions with amortization across continuous conditioning variables - permitting predictions across the conditional density manifold. This is accomplished through several novel advances. In particular, simultaneous sample conditioned flows over the main and conditioning variables, alongside a conditional Wasserstein distance combined with a loss reweighting kernel facilitating conditional optimal transport. Collectively, these advances allow for learning system dynamics provided measurement data whose states and conditioning variables are not in correspondence. We demonstrate CVFM on a suite of increasingly challenging problems, including discrete and continuous conditional mapping benchmarks, image-to-image domain transfer, and modeling the temporal evolution of materials internal structure during manufacturing processes. We observe that CVFM results in improved performance and convergence characteristics over alternative conditional variants.

replace Features that Make a Difference: Leveraging Gradients for Improved Dictionary Learning

Authors: Jeffrey Olmo, Jared Wilson, Max Forsey, Bryce Hepner, Thomas Vin Howe, David Wingate

Abstract: Sparse Autoencoders (SAEs) are a promising approach for extracting neural network representations by learning a sparse and overcomplete decomposition of the network's internal activations. However, SAEs are traditionally trained considering only activation values and not the effect those activations have on downstream computations. This limits the information available to learn features, and biases the autoencoder towards neglecting features which are represented with small activation values but strongly influence model outputs. To address this, we introduce Gradient SAEs (g-SAEs), which modify the $k$-sparse autoencoder architecture by augmenting the TopK activation function to rely on the gradients of the input activation when selecting the $k$ elements. For a given sparsity level, g-SAEs produce reconstructions that are more faithful to original network performance when propagated through the network. Additionally, we find evidence that g-SAEs learn latents that are on average more effective at steering models in arbitrary contexts. By considering the downstream effects of activations, our approach leverages the dual nature of neural network features as both $\textit{representations}$, retrospectively, and $\textit{actions}$, prospectively. While previous methods have approached the problem of feature discovery primarily focused on the former aspect, g-SAEs represent a step towards accounting for the latter as well.

replace UniFlow: A Foundation Model for Unified Urban Spatio-Temporal Flow Prediction

Authors: Yuan Yuan, Jingtao Ding, Chonghua Han, Zhi Sheng, Depeng Jin, Yong Li

Abstract: Urban spatio-temporal flow prediction, encompassing traffic flows and crowd flows, is crucial for optimizing city infrastructure and managing traffic and emergency responses. Traditional approaches have relied on separate models tailored to either grid-based data, representing cities as uniform cells, or graph-based data, modeling cities as networks of nodes and edges. In this paper, we build UniFlow, a foundational model for general urban flow prediction that unifies both grid-based and graphbased data. We first design a multi-view spatio-temporal patching mechanism to standardize different data into a consistent sequential format and then introduce a spatio-temporal transformer architecture to capture complex correlations and dynamics. To leverage shared spatio-temporal patterns across different data types and facilitate effective cross-learning, we propose SpatioTemporal Memory Retrieval Augmentation (ST-MRA). By creating structured memory modules to store shared spatio-temporal patterns, ST-MRA enhances predictions through adaptive memory retrieval. Extensive experiments demonstrate that UniFlow outperforms existing models in both grid-based and graph-based flow prediction, excelling particularly in scenarios with limited data availability, showcasing its superior performance and broad applicability. The datasets and code implementation have been released on https://github.com/YuanYuan98/UniFlow.

URLs: https://github.com/YuanYuan98/UniFlow.

replace Robust Bayesian Optimization via Localized Online Conformal Prediction

Authors: Dongwon Kim, Matteo Zecchin, Sangwoo Park, Joonhyuk Kang, Osvaldo Simeone

Abstract: Bayesian optimization (BO) is a sequential approach for optimizing black-box objective functions using zeroth-order noisy observations. In BO, Gaussian processes (GPs) are employed as probabilistic surrogate models to estimate the objective function based on past observations, guiding the selection of future queries to maximize utility. However, the performance of BO heavily relies on the quality of these probabilistic estimates, which can deteriorate significantly under model misspecification. To address this issue, we introduce localized online conformal prediction-based Bayesian optimization (LOCBO), a BO algorithm that calibrates the GP model through localized online conformal prediction (CP). LOCBO corrects the GP likelihood based on predictive sets produced by LOCBO, and the corrected GP likelihood is then denoised to obtain a calibrated posterior distribution on the objective function. The likelihood calibration step leverages an input-dependent calibration threshold to tailor coverage guarantees to different regions of the input space. Under minimal noise assumptions, we provide theoretical performance guarantees for LOCBO's iterates that hold for the unobserved objective function. These theoretical findings are validated through experiments on synthetic and real-world optimization tasks, demonstrating that LOCBO consistently outperforms state-of-the-art BO algorithms in the presence of model misspecification.

replace TOBUGraph: Knowledge Graph-Based Retrieval for Enhanced LLM Performance Beyond RAG

Authors: Savini Kashmira, Jayanaka L. Dantanarayana, Joshua Brodsky, Ashish Mahendra, Yiping Kang, Krisztian Flautner, Lingjia Tang, Jason Mars

Abstract: Retrieval-Augmented Generation (RAG) is one of the leading and most widely used techniques for enhancing LLM retrieval capabilities, but it still faces significant limitations in commercial use cases. RAG primarily relies on the query-chunk text-to-text similarity in the embedding space for retrieval and can fail to capture deeper semantic relationships across chunks, is highly sensitive to chunking strategies, and is prone to hallucinations. To address these challenges, we propose TOBUGraph, a graph-based retrieval framework that first constructs the knowledge graph from unstructured data dynamically and automatically. Using LLMs, TOBUGraph extracts structured knowledge and diverse relationships among data, going beyond RAG's text-to-text similarity. Retrieval is achieved through graph traversal, leveraging the extracted relationships and structures to enhance retrieval accuracy, eliminating the need for chunking configurations while reducing hallucination. We demonstrate TOBUGraph's effectiveness in TOBU, a real-world application in production for personal memory organization and retrieval. Our evaluation using real user data demonstrates that TOBUGraph outperforms multiple RAG implementations in both precision and recall, significantly improving user experience through improved retrieval accuracy.

replace Identifying Predictions That Influence the Future: Detecting Performative Concept Drift in Data Streams

Authors: Brandon Gower-Winter, Georg Krempl, Sergey Dragomiretskiy, Tineke Jelsma, Arno Siebes

Abstract: Concept Drift has been extensively studied within the context of Stream Learning. However, it is often assumed that the deployed model's predictions play no role in the concept drift the system experiences. Closer inspection reveals that this is not always the case. Automated trading might be prone to self-fulfilling feedback loops. Likewise, malicious entities might adapt to evade detectors in the adversarial setting resulting in a self-negating feedback loop that requires the deployed models to constantly retrain. Such settings where a model may induce concept drift are called performative. In this work, we investigate this phenomenon. Our contributions are as follows: First, we define performative drift within a stream learning setting and distinguish it from other causes of drift. We introduce a novel type of drift detection task, aimed at identifying potential performative concept drift in data streams. We propose a first such performative drift detection approach, called CheckerBoard Performative Drift Detection (CB-PDD). We apply CB-PDD to both synthetic and semi-synthetic datasets that exhibit varying degrees of self-fulfilling feedback loops. Results are positive with CB-PDD showing high efficacy, low false detection rates, resilience to intrinsic drift, comparability to other drift detection techniques, and an ability to effectively detect performative drift in semi-synthetic datasets. Secondly, we highlight the role intrinsic (traditional) drift plays in obfuscating performative drift and discuss the implications of these findings as well as the limitations of CB-PDD.

replace Optimization Insights into Deep Diagonal Linear Networks

Authors: Hippolyte Labarri\`ere, Cesare Molinari, Lorenzo Rosasco, Silvia Villa, Cristian Vega

Abstract: Overparameterized models trained with (stochastic) gradient descent are ubiquitous in modern machine learning. These large models achieve unprecedented performance on test data, but their theoretical understanding is still limited. In this paper, we take a step towards filling this gap by adopting an optimization perspective. More precisely, we study the implicit regularization properties of the gradient flow "algorithm" for estimating the parameters of a deep diagonal neural network. Our main contribution is showing that this gradient flow induces a mirror flow dynamic on the model, meaning that it is biased towards a specific solution of the problem depending on the initialization of the network. Along the way, we prove several properties of the trajectory.

replace Balls-and-Bins Sampling for DP-SGD

Authors: Lynn Chua, Badih Ghazi, Charlie Harrison, Ethan Leeman, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

Abstract: We introduce the Balls-and-Bins sampling for differentially private (DP) optimization methods such as DP-SGD. While it has been common practice to use some form of shuffling in DP-SGD implementations, privacy accounting algorithms have typically assumed that Poisson subsampling is used instead. Recent work by Chua et al. (ICML 2024), however, pointed out that shuffling based DP-SGD can have a much larger privacy cost in practical regimes of parameters. In this work we show that the Balls-and-Bins sampling achieves the "best-of-both" samplers, namely, the implementation of Balls-and-Bins sampling is similar to that of Shuffling and models trained using DP-SGD with Balls-and-Bins sampling achieve utility comparable to those trained using DP-SGD with Shuffling at the same noise multiplier, and yet, Balls-and-Bins sampling enjoys similar-or-better privacy amplification as compared to Poisson subsampling in practical regimes.

replace Provably-Safe Neural Network Training Using Hybrid Zonotope Reachability Analysis

Authors: Long Kiu Chung, Shreyas Kousik

Abstract: Even though neural networks are being increasingly deployed in safety-critical control applications, it remains difficult to enforce constraints on their output, meaning that it is hard to guarantee safety in such settings. While many existing methods seek to verify a neural network's satisfaction of safety constraints, few address how to correct an unsafe network. The handful of works that extract a training signal from verification cannot handle non-convex sets, and are either conservative or slow. To begin addressing these challenges, this work proposes a neural network training method that can encourage the exact image of a non-convex input set for a neural network with rectified linear unit (ReLU) nonlinearities to avoid a non-convex unsafe region. This is accomplished by reachability analysis with scaled hybrid zonotopes, a modification of the existing hybrid zonotope set representation that enables parameterized scaling of non-convex polytopic sets with a differentiable collision check via mixed-integer linear programs (MILPs). The proposed method was shown to be effective and fast for networks with up to 240 neurons, with the computational complexity dominated by inverse operations on matrices that scale linearly in size with the number of neurons and complexity of input and unsafe sets. We demonstrate the practicality of our method by training a forward-invariant neural network controller for a non-convex input set to an affine system, as well as generating safe reach-avoid plans for a black-box dynamical system.

replace Sparse identification of nonlinear dynamics and Koopman operators with Shallow Recurrent Decoder Networks

Authors: Mars Liyao Gao, Jan P. Williams, J. Nathan Kutz

Abstract: Modeling real-world spatio-temporal data is exceptionally difficult due to inherent high dimensionality, measurement noise, partial observations, and often expensive data collection procedures. In this paper, we present Sparse Identification of Nonlinear Dynamics with SHallow REcurrent Decoder networks (SINDy-SHRED), a method to jointly solve the sensing and model identification problems with simple implementation, efficient computation, and robust performance. SINDy-SHRED uses Gated Recurrent Units to model the temporal sequence of sparse sensor measurements along with a shallow decoder network to reconstruct the full spatio-temporal field from the latent state space. Our algorithm introduces a SINDy-based regularization for which the latent space progressively converges to a SINDy-class functional, provided the projection remains within the set. In restricting SINDy to a linear model, a Koopman-SHRED model is generated. SINDy-SHRED (i) learns a symbolic and interpretable generative model of a parsimonious and low-dimensional latent space for the complex spatio-temporal dynamics, (ii) discovers new physics models even for well-known physical systems, (iii) achieves provably robust convergence with an observed globally convex loss landscape, and (iv) achieves superior accuracy, data efficiency, and training time, all with fewer model parameters. We conduct systematic experimental studies on PDE data such as turbulent flows, real-world sensor measurements for sea surface temperature, and direct video data. The interpretable SINDy and Koopman models of latent state dynamics enable stable and accurate long-term video predictions, outperforming all current baseline deep learning models in accuracy, training time, and data requirements, including Convolutional LSTM, PredRNN, ResNet, and SimVP.

replace ZETA: Leveraging Z-order Curves for Efficient Top-k Attention

Authors: Qiuhao Zeng, Jerry Huang, Peng Lu, Gezheng Xu, Boxing Chen, Charles Ling, Boyu Wang

Abstract: Over recent years, the Transformer has become a fundamental building block for sequence modeling architectures. Yet at its core is the use of self-attention, whose memory and computational cost grow quadratically with the sequence length $N$, rendering it prohibitively expensive for long sequences. A promising approach is top-$k$ attention, which selects only the $k$ most relevant tokens and achieves performance comparable to vanilla self-attention while significantly reducing space and computational demands. However, causal masks require the current query token to only attend to past tokens, preventing the existing top-$k$ attention method from efficiently searching for the most relevant tokens in parallel, thereby limiting training efficiency. In this work, we propose ZETA, leveraging \textbf{Z}-Order Curves for \textbf{E}fficient \textbf{T}op-$k$ \textbf{A}ttention, to enable parallel querying of past tokens for entire sequences. % in both space and time complexity of $\mathcal{O}(N \log N)$. We first theoretically show that the choice of key and query dimensions involves a trade-off between the curse of dimensionality and the preservation of relative distances after projection. In light of this insight, we propose reducing the dimensionality of keys and queries in contrast to values and further leverage $Z$-order curves to map low-dimensional keys and queries into \emph{one}-dimensional space, which permits parallel sorting, thereby largely improving the efficiency for top-$k$ token selection. Experimental results demonstrate that ZETA matches the performance of standard attention on the synthetic \textsc{Multi-Query Associative Recall} task and outperforms attention and its variants on \textsc{Long Range Arena} and \textsc{WikiText-103} language modeling.

replace Leveraging Joint Predictive Embedding and Bayesian Inference in Graph Self Supervised Learning

Authors: Srinitish Srinivasan, Omkumar CU

Abstract: Graph representation learning has emerged as a cornerstone for tasks like node classification and link prediction, yet prevailing self-supervised learning (SSL) methods face challenges such as computational inefficiency, reliance on contrastive objectives, and representation collapse. Existing approaches often depend on feature reconstruction, negative sampling, or complex decoders, which introduce training overhead and hinder generalization. Further, current techniques which address such limitations fail to account for the contribution of node embeddings to a certain prediction in the absence of labeled nodes. To address these limitations, we propose a novel joint embedding predictive framework for graph SSL that eliminates contrastive objectives and negative sampling while preserving semantic and structural information. Additionally, we introduce a semantic-aware objective term that incorporates pseudo-labels derived from Gaussian Mixture Models (GMMs), enhancing node discriminability by evaluating latent feature contributions. Extensive experiments demonstrate that our framework outperforms state-of-the-art graph SSL methods across benchmarks, achieving superior performance without contrastive loss or complex decoders. Key innovations include (1) a non-contrastive, view-invariant joint embedding predictive architecture, (2) Leveraging single context and multiple targets relationship between subgraphs, and (3) GMM-based pseudo-label scoring to capture semantic contributions. This work advances graph SSL by offering a computationally efficient, collapse-resistant paradigm that bridges spatial and semantic graph features for downstream tasks. The code for our paper can be found at https://github.com/Deceptrax123/JPEB-GSSL

URLs: https://github.com/Deceptrax123/JPEB-GSSL

replace AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors

Authors: Ruoxuan Feng, Jiangyu Hu, Wenke Xia, Tianci Gao, Ao Shen, Yuhao Sun, Bin Fang, Di Hu

Abstract: Visuo-tactile sensors aim to emulate human tactile perception, enabling robots to precisely understand and manipulate objects. Over time, numerous meticulously designed visuo-tactile sensors have been integrated into robotic systems, aiding in completing various tasks. However, the distinct data characteristics of these low-standardized visuo-tactile sensors hinder the establishment of a powerful tactile perception system. We consider that the key to addressing this issue lies in learning unified multi-sensor representations, thereby integrating the sensors and promoting tactile knowledge transfer between them. To achieve unified representation of this nature, we introduce TacQuad, an aligned multi-modal multi-sensor tactile dataset from four different visuo-tactile sensors, which enables the explicit integration of various sensors. Recognizing that humans perceive the physical environment by acquiring diverse tactile information such as texture and pressure changes, we further propose to learn unified multi-sensor representations from both static and dynamic perspectives. By integrating tactile images and videos, we present AnyTouch, a unified static-dynamic multi-sensor representation learning framework with a multi-level structure, aimed at both enhancing comprehensive perceptual abilities and enabling effective cross-sensor transfer. This multi-level architecture captures pixel-level details from tactile data via masked modeling and enhances perception and transferability by learning semantic-level sensor-agnostic features through multi-modal alignment and cross-sensor matching. We provide a comprehensive analysis of multi-sensor transferability, and validate our method on various datasets and in the real-world pouring task. Experimental results show that our method outperforms existing methods, exhibits outstanding static and dynamic perception capabilities across various sensors.

replace On Creating a Causally Grounded Usable Rating Method for Assessing the Robustness of Foundation Models Supporting Time Series

Authors: Kausik Lakkaraju, Rachneet Kaur, Parisa Zehtabi, Sunandita Patra, Siva Likitha Valluru, Zhen Zeng, Biplav Srivastava, Marco Valtorta

Abstract: Foundation Models (FMs) have improved time series forecasting in various sectors, such as finance, but their vulnerability to input disturbances can hinder their adoption by stakeholders, such as investors and analysts. To address this, we propose a causally grounded rating framework to study the robustness of Foundational Models for Time Series (FMTS) with respect to input perturbations. We evaluate our approach to the stock price prediction problem, a well-studied problem with easily accessible public data, evaluating six state-of-the-art (some multi-modal) FMTS across six prominent stocks spanning three industries. The ratings proposed by our framework effectively assess the robustness of FMTS and also offer actionable insights for model selection and deployment. Within the scope of our study, we find that (1) multi-modal FMTS exhibit better robustness and accuracy compared to their uni-modal versions and, (2) FMTS pre-trained on time series forecasting task exhibit better robustness and forecasting accuracy compared to general-purpose FMTS pre-trained across diverse settings. Further, to validate our framework's usability, we conduct a user study showcasing FMTS prediction errors along with our computed ratings. The study confirmed that our ratings reduced the difficulty for users in comparing the robustness of different systems.

replace FedORGP: Guiding Heterogeneous Federated Learning with Orthogonality Regularization on Global Prototypes

Authors: Fucheng Guo, Zeyu Luan, Qing Li, Dan Zhao, Yong Jiang

Abstract: Federated Learning (FL) has emerged as an essential framework for distributed machine learning, especially with its potential for privacy-preserving data processing. However, existing FL frameworks struggle to address statistical and model heterogeneity, which severely impacts model performance. While Heterogeneous Federated Learning (HtFL) introduces prototype-based strategies to address the challenges, current approaches face limitations in achieving optimal separation of prototypes. This paper presents FedORGP, a novel HtFL algorithm designed to improve global prototype separation through orthogonality regularization, which not only encourages intra-class prototype similarity but also significantly expands the inter-class angular separation. With the guidance of the global prototype, each client keeps its embeddings aligned with the corresponding prototype in the feature space, promoting directional independence that integrates seamlessly with the cross-entropy (CE) loss. We provide theoretical proof of FedORGP's convergence under non-convex conditions. Extensive experiments demonstrate that FedORGP outperforms seven state-of-the-art baselines, achieving up to 10.12\% accuracy improvement in scenarios where statistical and model heterogeneity coexist.

replace Class-Dependent Perturbation Effects in Evaluating Time Series Attributions

Authors: Gregor Baer, Isel Grau, Chao Zhang, Pieter Van Gorp

Abstract: As machine learning models become increasingly prevalent in time series applications, Explainable Artificial Intelligence (XAI) methods are essential for understanding their predictions. Within XAI, feature attribution methods aim to identify which input features contribute the most to a model's prediction, with their evaluation typically relying on perturbation-based metrics. Through systematic empirical analysis across multiple datasets, model architectures, and perturbation strategies, we reveal previously overlooked class-dependent effects in these metrics: they show varying effectiveness across classes, achieving strong results for some while remaining less sensitive to others. In particular, we find that the most effective perturbation strategies often demonstrate the most pronounced class differences. Our analysis suggests that these effects arise from the learned biases of classifiers, indicating that perturbation-based evaluation may reflect specific model behaviors rather than intrinsic attribution quality. We propose an evaluation framework with a class-aware penalty term to help assess and account for these effects in evaluating feature attributions, offering particular value for class-imbalanced datasets. Although our analysis focuses on time series classification, these class-dependent effects likely extend to other structured data domains where perturbation-based evaluation is common.

replace Forgetting Transformer: Softmax Attention with a Forget Gate

Authors: Zhixuan Lin, Evgenii Nikishin, Xu Owen He, Aaron Courville

Abstract: An essential component of modern recurrent sequence models is the forget gate. While Transformers do not have an explicit recurrent form, we show that a forget gate can be naturally incorporated into Transformers by down-weighting the unnormalized attention scores in a data-dependent way. We name this attention mechanism Forgetting Attention and the resulting model the Forgetting Transformer (FoX). We show that FoX outperforms the Transformer on long-context language modeling, length extrapolation, and short-context downstream tasks, while performing on par with the Transformer on long-context downstream tasks. Moreover, it is compatible with the FlashAttention algorithm and does not require any positional embeddings. Several analyses, including the needle-in-the-haystack test, show that FoX also retains the Transformer's superior long-context capabilities over recurrent sequence models such as Mamba-2, HGRN2, and DeltaNet. We also introduce a "Pro" block design that incorporates some common architectural components in recurrent sequence models and find it significantly improves the performance of both FoX and the Transformer. Our code is available at https://github.com/zhixuan-lin/forgetting-transformer.

URLs: https://github.com/zhixuan-lin/forgetting-transformer.

replace ExMAG: Learning of Maximally Ancestral Graphs

Authors: Petr Ry\v{s}av\'y, Pavel Ryt\'i\v{r}, Xiaoyu He, Georgios Korpas, Jakub Mare\v{c}ek

Abstract: As one transitions from statistical to causal learning, one is seeking the most appropriate causal model. Dynamic Bayesian networks are a popular model, where a weighted directed acyclic graph represents the causal relationships. Stochastic processes are represented by its vertices, and weighted oriented edges suggest the strength of the causal relationships. When there are confounders, one would like to utilize both oriented edges (when the direction of causality is clear) and edges that are not oriented (when there is a confounder or not a relationship), yielding mixed graphs. A little-studied extension of acyclicity to this mixed-graph setting is known as maximally ancestral graphs with consideration of confounders. We propose a score-based learning algorithm for learning maximally ancestral graphs. A mixed-integer quadratic program is formulated, and an algorithmic approach is proposed, in which the pre-generation of exponentially many constraints is avoided by generating only violated constraints in the so-called branch-and-cut (``lazy constraint'') method. Comparing the novel approach to the state-of-the-art, we show that the proposed approach turns out to produce more accurate results when applied to small and medium-sized synthetic instances containing up to 25 variables.

replace Don't lie to your friends: Learning what you know from collaborative self-play

Authors: Jacob Eisenstein, Reza Aghajani, Adam Fisch, Dheeru Dua, Fantine Huot, Mirella Lapata, Vicky Zayats, Jonathan Berant

Abstract: To be helpful assistants, AI agents must be aware of their own capabilities and limitations. This includes knowing when to answer from parametric knowledge versus using tools, when to trust tool outputs, and when to abstain or hedge. Such capabilities are hard to teach through supervised fine-tuning because they require constructing examples that reflect the agent's specific capabilities. We therefore propose a radically new approach to teaching agents what they know: \emph{collaborative self-play}. We construct multi-agent collaborations in which the group is rewarded for collectively arriving at correct answers. The desired meta-knowledge emerges from the incentives built into the structure of the interaction. We focus on small societies of agents that have access to heterogeneous tools (corpus-specific retrieval), and therefore must collaborate to maximize their success while minimizing their effort. Experiments show that group-level rewards for multi-agent communities can induce policies that \emph{transfer} to improve tool use and selective prediction in settings where individual agents are deployed in isolation.

replace Lie Detector: Unified Backdoor Detection via Cross-Examination Framework

Authors: Xuan Wang, Siyuan Liang, Dongping Liao, Han Fang, Aishan Liu, Xiaochun Cao, Yu-liang Lu, Ee-Chien Chang, Xitong Gao

Abstract: Institutions with limited data and computing resources often outsource model training to third-party providers in a semi-honest setting, assuming adherence to prescribed training protocols with pre-defined learning paradigm (e.g., supervised or semi-supervised learning). However, this practice can introduce severe security risks, as adversaries may poison the training data to embed backdoors into the resulting model. Existing detection approaches predominantly rely on statistical analyses, which often fail to maintain universally accurate detection accuracy across different learning paradigms. To address this challenge, we propose a unified backdoor detection framework in the semi-honest setting that exploits cross-examination of model inconsistencies between two independent service providers. Specifically, we integrate central kernel alignment to enable robust feature similarity measurements across different model architectures and learning paradigms, thereby facilitating precise recovery and identification of backdoor triggers. We further introduce backdoor fine-tuning sensitivity analysis to distinguish backdoor triggers from adversarial perturbations, substantially reducing false positives. Extensive experiments demonstrate that our method achieves superior detection performance, improving accuracy by 5.4%, 1.6%, and 11.9% over SoTA baselines across supervised, semi-supervised, and autoregressive learning tasks, respectively. Notably, it is the first to effectively detect backdoors in multimodal large language models, further highlighting its broad applicability and advancing secure deep learning.

replace Severing Spurious Correlations with Data Pruning

Authors: Varun Mulchandani, Jung-Eun Kim

Abstract: Deep neural networks have been shown to learn and rely on spurious correlations present in the data that they are trained on. Reliance on such correlations can cause these networks to malfunction when deployed in the real world, where these correlations may no longer hold. To overcome the learning of and reliance on such correlations, recent studies propose approaches that yield promising results. These works, however, study settings where the strength of the spurious signal is significantly greater than that of the core, invariant signal, making it easier to detect the presence of spurious features in individual training samples and allow for further processing. In this paper, we identify new settings where the strength of the spurious signal is relatively weaker, making it difficult to detect any spurious information while continuing to have catastrophic consequences. We also discover that spurious correlations are learned primarily due to only a handful of all the samples containing the spurious feature and develop a novel data pruning technique that identifies and prunes small subsets of the training data that contain these samples. Our proposed technique does not require inferred domain knowledge, information regarding the sample-wise presence or nature of spurious information, or human intervention. Finally, we show that such data pruning attains state-of-the-art performance on previously studied settings where spurious information is identifiable.

replace Statistically Testing Training Data for Unwanted Error Patterns using Rule-Oriented Regression

Authors: Stefan Rass, Martin Dallinger

Abstract: Artificial intelligence models trained from data can only be as good as the underlying data is. Biases in training data propagating through to the output of a machine learning model are a well-documented and well-understood phenomenon, but the machinery to prevent these undesired effects is much less developed. Efforts to ensure data is clean during collection, such as using bias-aware sampling, are most effective when the entity controlling data collection also trains the AI. In cases where the data is already available, how do we find out if the data was already manipulated, i.e., ``poisoned'', so that an undesired behavior would be trained into a machine learning model? This is a challenge fundamentally different to (just) improving approximation accuracy or efficiency, and we provide a method to test training data for flaws, to establish a trustworthy ground-truth for a subsequent training of machine learning models (of any kind). Unlike the well-studied problem of approximating data using fuzzy rules that are generated from the data, our method hinges on a prior definition of rules to happen before seeing the data to be tested. Therefore, the proposed method can also discover hidden error patterns, which may also have substantial influence. Our approach extends the abilities of conventional statistical testing by letting the ``test-condition'' be any Boolean condition to describe a pattern in the data, whose presence we wish to determine. The method puts fuzzy inference into a regression model, to get the best of the two: explainability from fuzzy logic with statistical properties and diagnostics from the regression, and finally also being applicable to ``small data'', hence not requiring large datasets as deep learning methods do. We provide an open source implementation for demonstration and experiments.

replace Lean Formalization of Generalization Error Bound by Rademacher Complexity

Authors: Sho Sonoda, Kazumi Kasaura, Yuma Mizuno, Kei Tsukamoto, Naoto Onda

Abstract: We formalize the generalization error bound using Rademacher complexity in the Lean 4 theorem prover. Generalization error quantifies the gap between a learning machine's performance on given training data versus unseen test data, and Rademacher complexity serves as an estimate of this error based on the complexity of learning machines, or hypothesis class. Unlike traditional methods such as PAC learning and VC dimension, Rademacher complexity is applicable across diverse machine learning scenarios including deep learning and kernel methods. We formalize key concepts and theorems, including the empirical and population Rademacher complexities, and establish generalization error bounds through formal proofs of McDiarmid's inequality, Hoeffding's lemma, and symmetrization arguments.

replace Innovative LSGTime Model for Crime Spatiotemporal Prediction Based on MindSpore Framework

Authors: Zhenkai Qin, BaoZhong Wei, Caifeng Gao

Abstract: With the acceleration of urbanization, the spatiotemporal characteristics of criminal activities have become increasingly complex. Accurate prediction of crime distribution is crucial for optimizing the allocation of police resources and preventing crime. This paper proposes LGSTime, a crime spatiotemporal prediction model that integrates Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and the Multi-head Sparse Self-attention mechanism. LSTM and GRU capture long-term dependencies in crime time series, such as seasonality and periodicity, through their unique gating mechanisms. The Multi-head Sparse Self-attention mechanism, on the other hand, focuses on both temporal and spatial features of criminal events simultaneously through parallel processing and sparsification techniques, significantly improving computational efficiency and prediction accuracy. The integrated model leverages the strengths of each technique to better handle complex spatiotemporal data. Experimental findings demonstrate that the model attains optimal performance across four real - world crime datasets. In comparison to the CNN model, it exhibits performance enhancements of 2.8\%, 1.9\%, and 1.4\% in the Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE) metrics respectively. These results offer a valuable reference for tackling the challenges in crime prediction.

replace DC-SGD: Differentially Private SGD with Dynamic Clipping through Gradient Norm Distribution Estimation

Authors: Chengkun Wei, Weixian Li, Chen Gong, Wenzhi Chen

Abstract: Differentially Private Stochastic Gradient Descent (DP-SGD) is a widely adopted technique for privacy-preserving deep learning. A critical challenge in DP-SGD is selecting the optimal clipping threshold C, which involves balancing the trade-off between clipping bias and noise magnitude, incurring substantial privacy and computing overhead during hyperparameter tuning. In this paper, we propose Dynamic Clipping DP-SGD (DC-SGD), a framework that leverages differentially private histograms to estimate gradient norm distributions and dynamically adjust the clipping threshold C. Our framework includes two novel mechanisms: DC-SGD-P and DC-SGD-E. DC-SGD-P adjusts the clipping threshold based on a percentile of gradient norms, while DC-SGD-E minimizes the expected squared error of gradients to optimize C. These dynamic adjustments significantly reduce the burden of hyperparameter tuning C. The extensive experiments on various deep learning tasks, including image classification and natural language processing, show that our proposed dynamic algorithms achieve up to 9 times acceleration on hyperparameter tuning than DP-SGD. And DC-SGD-E can achieve an accuracy improvement of 10.62% on CIFAR10 than DP-SGD under the same privacy budget of hyperparameter tuning. We conduct rigorous theoretical privacy and convergence analyses, showing that our methods seamlessly integrate with the Adam optimizer. Our results highlight the robust performance and efficiency of DC-SGD, offering a practical solution for differentially private deep learning with reduced computational overhead and enhanced privacy guarantees.

replace Buyer-Initiated Auction Mechanism for Data Redemption in Machine Unlearning

Authors: Bin Han, Di Feng, Jie Wang, Hans D. Schotten

Abstract: The rapid growth of artificial intelligence (AI) has raised privacy concerns over user data, leading to regulations like the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). With the essential toolbox provided by machine unlearning, AI service providers are now able to remove user data from their trained models as well as the training datasets, so as to comply with such regulations. However, extensive data redemption can be costly and degrade model accuracy. To balance the cost of unlearning and the privacy protection, we propose a buyer-initiated auction mechanism for data redemption, enabling the service provider to purchase data from willing users with appropriate compensation. This approach does not require the server to have any a priori knowledge about the users' privacy preference, and provides an efficient solution for maximizing the social welfare in the investigated problem.

replace Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Authors: Mohammadreza Pourreza, Shayan Talaei, Ruoxi Sun, Xingchen Wan, Hailong Li, Azalia Mirhoseini, Amin Saberi, Sercan "O. Arik

Abstract: Text-to-SQL is a challenging task involving multiple reasoning-intensive subtasks, including natural language understanding, database schema comprehension, and precise SQL query formulation. Existing approaches often rely on handcrafted reasoning paths with inductive biases that can limit their overall effectiveness. Motivated by the recent success of reasoning-enhanced models such as DeepSeek R1 and OpenAI o1, which effectively leverage reward-driven self-exploration to enhance reasoning capabilities and generalization, we propose a novel set of partial rewards tailored specifically for the Text-to-SQL task. Our reward set includes schema-linking, AI feedback, n-gram similarity, and syntax check, explicitly designed to address the reward sparsity issue prevalent in reinforcement learning (RL). Leveraging group relative policy optimization (GRPO), our approach explicitly encourages large language models (LLMs) to develop intrinsic reasoning skills necessary for accurate SQL query generation. With models of different sizes, we demonstrate that RL-only training with our proposed rewards consistently achieves higher accuracy and superior generalization compared to supervised fine-tuning (SFT). Remarkably, our RL-trained 14B-parameter model significantly outperforms larger proprietary models, e.g. o3-mini by 4% and Gemini-1.5-Pro-002 by 3% on the BIRD benchmark. These highlight the efficacy of our proposed RL-training framework with partial rewards for enhancing both accuracy and reasoning capabilities in Text-to-SQL tasks.

replace A Survey on Unlearnable Data

Authors: Jiahao Li, Yiqiang Chen, Yunbing Xing, Yang Gu, Xiangyuan Lan

Abstract: Unlearnable data (ULD) has emerged as an innovative defense technique to prevent machine learning models from learning meaningful patterns from specific data, thus protecting data privacy and security. By introducing perturbations to the training data, ULD degrades model performance, making it difficult for unauthorized models to extract useful representations. Despite the growing significance of ULD, existing surveys predominantly focus on related fields, such as adversarial attacks and machine unlearning, with little attention given to ULD as an independent area of study. This survey fills that gap by offering a comprehensive review of ULD, examining unlearnable data generation methods, public benchmarks, evaluation metrics, theoretical foundations and practical applications. We compare and contrast different ULD approaches, analyzing their strengths, limitations, and trade-offs related to unlearnability, imperceptibility, efficiency and robustness. Moreover, we discuss key challenges, such as balancing perturbation imperceptibility with model degradation and the computational complexity of ULD generation. Finally, we highlight promising future research directions to advance the effectiveness and applicability of ULD, underscoring its potential to become a crucial tool in the evolving landscape of data protection in machine learning.

replace Time-Series Forecasting via Topological Information Supervised Framework with Efficient Topological Feature Learning

Authors: ZiXin Lin, Nur Fariha Syaqina Zulkepli

Abstract: Topological Data Analysis (TDA) has emerged as a powerful tool for extracting meaningful features from complex data structures, driving significant advancements in fields such as neuroscience, biology, machine learning, and financial modeling. Despite its success, the integration of TDA with time-series prediction remains underexplored due to three primary challenges: the limited utilization of temporal dependencies within topological features, computational bottlenecks associated with persistent homology, and the deterministic nature of TDA pipelines restricting generalized feature learning. This study addresses these challenges by proposing the Topological Information Supervised (TIS) Prediction framework, which leverages neural networks and Conditional Generative Adversarial Networks (CGANs) to generate synthetic topological features, preserving their distribution while significantly reducing computational time. We propose a novel training strategy that integrates topological consistency loss to improve the predictive accuracy of deep learning models. Specifically, we introduce two state-of-the-art models, TIS-BiGRU and TIS-Informer, designed to capture short-term and long-term temporal dependencies, respectively. Comparative experimental results demonstrate the superior performance of TIS models over conventional predictors, validating the effectiveness of integrating topological information. This work not only advances TDA-based time-series prediction but also opens new avenues for utilizing topological features in deep learning architectures.

replace When Counterfactual Reasoning Fails: Chaos and Real-World Complexity

Authors: Yahya Aalaila, Gerrit Gro{\ss}mann, Sumantrak Mukherjee, Jonas Wahl, Sebastian Vollmer

Abstract: Counterfactual reasoning, a cornerstone of human cognition and decision-making, is often seen as the 'holy grail' of causal learning, with applications ranging from interpreting machine learning models to promoting algorithmic fairness. While counterfactual reasoning has been extensively studied in contexts where the underlying causal model is well-defined, real-world causal modeling is often hindered by model and parameter uncertainty, observational noise, and chaotic behavior. The reliability of counterfactual analysis in such settings remains largely unexplored. In this work, we investigate the limitations of counterfactual reasoning within the framework of Structural Causal Models. Specifically, we empirically investigate \emph{counterfactual sequence estimation} and highlight cases where it becomes increasingly unreliable. We find that realistic assumptions, such as low degrees of model uncertainty or chaotic dynamics, can result in counterintuitive outcomes, including dramatic deviations between predicted and true counterfactual trajectories. This work urges caution when applying counterfactual reasoning in settings characterized by chaos and uncertainty. Furthermore, it raises the question of whether certain systems may pose fundamental limitations on the ability to answer counterfactual questions about their behavior.

replace Evaluating machine learning models for predicting pesticides toxicity to honey bees

Authors: Jakub Adamczyk, Jakub Poziemski, Pawel Siedlecki

Abstract: Small molecules play a critical role in the biomedical, environmental, and agrochemical domains, each with distinct physicochemical requirements and success criteria. Although biomedical research benefits from extensive datasets and established benchmarks, agrochemical data remain scarce, particularly with respect to species-specific toxicity. This work focuses on ApisTox, the most comprehensive dataset of experimentally validated chemical toxicity to the honey bee (Apis mellifera), an ecologically vital pollinator. We evaluate ApisTox using a diverse suite of machine learning approaches, including molecular fingerprints, graph kernels, and graph neural networks, as well as pretrained models. Comparative analysis with medicinal datasets from the MoleculeNet benchmark reveals that ApisTox represents a distinct chemical space. Performance degradation on non-medicinal datasets, such as ApisTox, demonstrates their limited generalizability of current state-of-the-art algorithms trained solely on biomedical data. Our study highlights the need for more diverse datasets and for targeted model development geared toward the agrochemical domain.

replace-cross Predicting human decisions with behavioral theories and machine learning

Authors: Ori Plonsky, Reut Apel, Eyal Ert, Moshe Tennenholtz, David Bourgin, Joshua C. Peterson, Daniel Reichman, Thomas L. Griffiths, Stuart J. Russell, Evan C. Carter, James F. Cavanagh, Ido Erev

Abstract: Predicting human decisions under risk and uncertainty remains a fundamental challenge across disciplines. Existing models often struggle even in highly stylized tasks like choice between lotteries. We introduce BEAST Gradient Boosting (BEAST-GB), a hybrid model integrating behavioral theory (BEAST) with machine learning. We first present CPC18, a competition for predicting risky choice, in which BEAST-GB won. Then, using two large datasets, we demonstrate BEAST-GB predicts more accurately than neural networks trained on extensive data and dozens of existing behavioral models. BEAST-GB also generalizes robustly across unseen experimental contexts, surpassing direct empirical generalization, and helps refine and improve the behavioral theory itself. Our analyses highlight the potential of anchoring predictions on behavioral theory even in data-rich settings and even when the theory alone falters. Our results underscore how integrating machine learning with theoretical frameworks, especially those-like BEAST-designed for prediction, can improve our ability to predict and understand human behavior.

replace-cross Safe Policy Learning through Extrapolation: Application to Pre-trial Risk Assessment

Authors: Eli Ben-Michael, D. James Greiner, Kosuke Imai, Zhichao Jiang

Abstract: Algorithmic recommendations and decisions have become ubiquitous in today's society. Many of these data-driven policies, especially in the realm of public policy, are based on known, deterministic rules to ensure their transparency and interpretability. We examine a particular case of algorithmic pre-trial risk assessments in the US criminal justice system, which provide deterministic classification scores and recommendations to help judges make release decisions. Our goal is to analyze data from a unique field experiment on an algorithmic pre-trial risk assessment to investigate whether the scores and recommendations can be improved. Unfortunately, prior methods for policy learning are not applicable because they require existing policies to be stochastic. We develop a maximin robust optimization approach that partially identifies the expected utility of a policy, and then finds a policy that maximizes the worst-case expected utility. The resulting policy has a statistical safety property, limiting the probability of producing a worse policy than the existing one, under structural assumptions about the outcomes. Our analysis of data from the field experiment shows that we can safely improve certain components of the risk assessment instrument by classifying arrestees as lower risk under a wide range of utility specifications, though the analysis is not informative about several components of the instrument.

replace-cross Causal Dynamic Variational Autoencoder for Counterfactual Regression in Longitudinal Data

Authors: Mouad El Bouchattaoui, Myriam Tami, Benoit Lepetit, Paul-Henry Courn\`ede

Abstract: Estimating treatment effects over time is relevant in many real-world applications, such as precision medicine, epidemiology, economy, and marketing. Many state-of-the-art methods either assume the observations of all confounders or seek to infer the unobserved ones. We take a different perspective by assuming unobserved risk factors, i.e., adjustment variables that affect only the sequence of outcomes. Under unconfoundedness, we target the Individual Treatment Effect (ITE) estimation with unobserved heterogeneity in the treatment response due to missing risk factors. We address the challenges posed by time-varying effects and unobserved adjustment variables. Led by theoretical results over the validity of the learned adjustment variables and generalization bounds over the treatment effect, we devise Causal DVAE (CDVAE). This model combines a Dynamic Variational Autoencoder (DVAE) framework with a weighting strategy using propensity scores to estimate counterfactual responses. The CDVAE model allows for accurate estimation of ITE and captures the underlying heterogeneity in longitudinal data. Evaluations of our model show superior performance over state-of-the-art models.

replace-cross Individualized Policy Evaluation and Learning under Clustered Network Interference

Authors: Yi Zhang, Kosuke Imai

Abstract: Although there is now a large literature on policy evaluation and learning, much of the prior work assumes that the treatment assignment of one unit does not affect the outcome of another unit. Unfortunately, ignoring interference can lead to biased policy evaluation and ineffective learned policies. For example, treating influential individuals who have many friends can generate positive spillover effects, thereby improving the overall performance of an individualized treatment rule (ITR). We consider the problem of evaluating and learning an optimal ITR under clustered network interference (also known as partial interference), where clusters of units are sampled from a population and units may influence one another within each cluster. Unlike previous methods that impose strong restrictions on spillover effects, such as anonymous interference, the proposed methodology only assumes a semiparametric structural model, where each unit's outcome is an additive function of individual treatments within the cluster. Under this model, we propose an estimator that can be used to evaluate the empirical performance of an ITR. We show that this estimator is substantially more efficient than the standard inverse probability weighting estimator, which does not impose any assumption about spillover effects. We derive the finite-sample regret bound for a learned ITR, showing that the use of our efficient evaluation estimator leads to the improved performance of learned policies. We consider both experimental and observational studies, and for the latter, we develop a doubly robust estimator that is semiparametrically efficient and yields an optimal regret bound. Finally, we conduct simulation and empirical studies to illustrate the advantages of the proposed methodology.

replace-cross DG-TTA: Out-of-domain Medical Image Segmentation through Augmentation and Descriptor-driven Domain Generalization and Test-Time Adaptation

Authors: Christian Weihsbach, Christian N. Kruse, Alexander Bigalke, Mattias P. Heinrich

Abstract: Purpose: Applying pre-trained medical deep learning segmentation models on out-of-domain images often yields predictions of insufficient quality. In this study, we propose to use a powerful generalizing descriptor along with augmentation to enable domain-generalized pre-training and test-time adaptation, achieving high-quality segmentation in unseen domains. Materials and Methods: In this retrospective study five different publicly available datasets (2012 to 2022) including 3D CT and MRI images are used to evaluate segmentation performance in out-of-domain scenarios. The settings include abdominal, spine, and cardiac imaging. The data is randomly split into training and test samples. Domain-generalized pre-training on source data is used to obtain the best initial performance in the target domain. We introduce the combination of the generalizing SSC descriptor and GIN intensity augmentation for optimal generalization. Segmentation results are subsequently optimized at test time, where we propose to adapt the pre-trained models for every unseen scan with a consistency scheme using the same augmentation-descriptor combination. The segmentation is evaluated using Dice similarity and Hausdorff distance and the significance of improvements is tested with the Wilcoxon signed-rank test. Results: The proposed generalized pre-training and subsequent test-time adaptation improves model performance significantly in CT to MRI cross-domain prediction for abdominal (+46.2% and +28.2% Dice), spine (+72.9%), and cardiac (+14.2% and +55.7% Dice) scenarios (p<0.001). Conclusion: Our method enables optimal, independent usage of medical image source and target data and bridges domain gaps successfully with a compact and efficient methodology. Open-source code available at: https://github.com/multimodallearning/DG-TTA

URLs: https://github.com/multimodallearning/DG-TTA

replace-cross DT-DDNN: A Physical Layer Security Attack Detector in 5G RF Domain for CAVs

Authors: Ghazal Asemian, Mohammadreza Amini, Burak Kantarci, Melike Erol-Kantarci

Abstract: The Synchronization Signal Block (SSB) is a fundamental component of the 5G New Radio (NR) air interface, crucial for the initial access procedure of Connected and Automated Vehicles (CAVs), and serves several key purposes in the network's operation. However, due to the predictable nature of SSB transmission, including the Primary and Secondary Synchronization Signals (PSS and SSS), jamming attacks are critical threats. These attacks, which can be executed without requiring high power or complex equipment, pose substantial risks to the 5G network, particularly as a result of the unencrypted transmission of control signals. Leveraging RF domain knowledge, this work presents a novel deep learning-based technique for detecting jammers in CAV networks. Unlike the existing jamming detection algorithms that mostly rely on network parameters, we introduce a double-threshold deep learning jamming detector by focusing on the SSB. The detection method is focused on RF domain features and improves the robustness of the network without requiring integration with the pre-existing network infrastructure. By integrating a preprocessing block to extract PSS correlation and energy per null resource elements (EPNRE) characteristics, our method distinguishes between normal and jammed received signals with high precision. Additionally, by incorporating of Discrete Wavelet Transform (DWT), the efficacy of training and detection are optimized. A double-threshold double Deep Neural Network (DT-DDNN) is also introduced to the architecture complemented by a deep cascade learning model to increase the sensitivity of the model to variations of signal-to-jamming noise ratio (SJNR). Results show that the proposed method achieves 96.4% detection rate in extra low jamming power, i.e., SJNR between 15 to 30 dB. Further, performance of DT-DDNN is validated by analyzing real 5G signals obtained from a practical testbed.

replace-cross Temporal and Semantic Evaluation Metrics for Foundation Models in Post-Hoc Analysis of Robotic Sub-tasks

Authors: Jonathan Salfity, Selma Wanna, Minkyu Choi, Mitch Pryor

Abstract: Recent works in Task and Motion Planning (TAMP) show that training control policies on language-supervised robot trajectories with quality labeled data markedly improves agent task success rates. However, the scarcity of such data presents a significant hurdle to extending these methods to general use cases. To address this concern, we present an automated framework to decompose trajectory data into temporally bounded and natural language-based descriptive sub-tasks by leveraging recent prompting strategies for Foundation Models (FMs) including both Large Language Models (LLMs) and Vision Language Models (VLMs). Our framework provides both time-based and language-based descriptions for lower-level sub-tasks that comprise full trajectories. To rigorously evaluate the quality of our automatic labeling framework, we contribute an algorithm SIMILARITY to produce two novel metrics, temporal similarity and semantic similarity. The metrics measure the temporal alignment and semantic fidelity of language descriptions between two sub-task decompositions, namely an FM sub-task decomposition prediction and a ground-truth sub-task decomposition. We present scores for temporal similarity and semantic similarity above 90%, compared to 30% of a randomized baseline, for multiple robotic environments, demonstrating the effectiveness of our proposed framework. Our results enable building diverse, large-scale, language-supervised datasets for improved robotic TAMP.

replace-cross A predictive machine learning force field framework for liquid electrolyte development

Authors: Sheng Gong, Yumin Zhang, Zhenliang Mu, Zhichen Pu, Hongyi Wang, Zhiao Yu, Mengyi Chen, Tianze Zheng, Zhi Wang, Lifei Chen, Zhenze Yang, Xiaojie Wu, Shaochen Shi, Weihao Gao, Wen Yan, Liang Xiang

Abstract: Despite the widespread applications of machine learning force fields (MLFF) in solids and small molecules, there is a notable gap in applying MLFF to simulate liquid electrolyte, a critical component of the current commercial lithium-ion battery. In this work, we introduce BAMBOO (\textbf{B}yteDance \textbf{A}I \textbf{M}olecular Simulation \textbf{Boo}ster), a predictive framework for molecular dynamics (MD) simulations, with a demonstration of its capability in the context of liquid electrolyte for lithium batteries. We design a physics-inspired graph equivariant transformer architecture as the backbone of BAMBOO to learn from quantum mechanical simulations. Additionally, we introduce an ensemble knowledge distillation approach and apply it to MLFFs to reduce the fluctuation of observations from MD simulations. Finally, we propose a density alignment algorithm to align BAMBOO with experimental measurements. BAMBOO demonstrates state-of-the-art accuracy in predicting key electrolyte properties such as density, viscosity, and ionic conductivity across various solvents and salt combinations. The current model, trained on more than 15 chemical species, achieves the average density error of 0.01 g/cm$^3$ on various compositions compared with experiment.

replace-cross FastLloyd: Federated, Accurate, Secure, and Tunable $k$-Means Clustering with Differential Privacy

Authors: Abdulrahman Diaa, Thomas Humphries, Florian Kerschbaum

Abstract: We study the problem of privacy-preserving $k$-means clustering in the horizontally federated setting. Existing federated approaches using secure computation suffer from substantial overheads and do not offer output privacy. At the same time, differentially private (DP) $k$-means algorithms either assume a trusted central curator or significantly degrade utility by adding noise in the local DP model. Naively combining the secure and central DP solutions results in a protocol with impractical overhead. Instead, our work provides enhancements to both the DP and secure computation components, resulting in a design that is faster, more private, and more accurate than previous work. By utilizing the computational DP model, we design a lightweight, secure aggregation-based approach that achieves five orders of magnitude speed-up over state-of-the-art related work. Furthermore, we not only maintain the utility of the state-of-the-art in the central model of DP, but we improve the utility further by designing a new DP clustering mechanism.

replace-cross GP-MoLFormer: A Foundation Model For Molecular Generation

Authors: Jerret Ross, Brian Belgodere, Samuel C. Hoffman, Vijil Chenthamarakshan, Jiri Navratil, Youssef Mroueh, Payel Das

Abstract: Transformer-based models trained on large and general purpose datasets consisting of molecular strings have recently emerged as a powerful tool for successfully modeling various structure-property relations. Inspired by this success, we extend the paradigm of training chemical language transformers on large-scale chemical datasets to generative tasks in this work. Specifically, we propose GP-MoLFormer, an autoregressive molecular string generator that is trained on more than 1.1B (billion) chemical SMILES. GP-MoLFormer uses a 46.8M parameter transformer decoder model with linear attention and rotary positional encodings as the base architecture. GP-MoLFormer's utility is evaluated and compared with that of existing baselines on three different tasks: de novo generation, scaffold-constrained molecular decoration, and unconstrained property-guided optimization. While the first two are handled with no additional training, we propose a parameter-efficient fine-tuning method for the last task, which uses property-ordered molecular pairs as input. We call this new approach pair-tuning. Our results show GP-MoLFormer performs better or comparable with baselines across all three tasks, demonstrating its general utility for a variety of molecular generation tasks. We further report strong memorization of training data in GP-MoLFormer generations, which has so far remained unexplored for chemical language models. Our analyses reveal that training data memorization and novelty in generations are impacted by the quality and scale of the training data; duplication bias in training data can enhance memorization at the cost of lowering novelty. We further establish a scaling law relating inference compute and novelty in generations.

replace-cross Hierarchical Procedural Framework for Low-latency Robot-Assisted Hand-Object Interaction

Authors: Mingqi Yuan, Huijiang Wang, Kai-Fung Chu, Fumiya Iida, Bo Li, Wenjun Zeng

Abstract: Advances in robotics have been driving the development of human-robot interaction (HRI) technologies. However, accurately perceiving human actions and achieving adaptive control remains a challenge in facilitating seamless coordination between human and robotic movements. In this paper, we propose a hierarchical procedural framework to enable dynamic robot-assisted hand-object interaction. An open-loop hierarchy leverages the computer vision (CV)-based 3D reconstruction of the human hand, based on which motion primitives have been designed to translate hand motions into robotic actions. The low-level coordination hierarchy fine-tunes the robot's action by using the continuously updated 3D hand models. Experimental validation demonstrates the effectiveness of the hierarchical control architecture. The adaptive coordination between human and robot behavior has achieved a delay of $\leq 0.3$ seconds in the tele-interaction scenario. A case study of ring-wearing tasks indicates the potential application of this work in assistive technologies such as healthcare and manufacturing.

replace-cross Eliminating Position Bias of Language Models: A Mechanistic Approach

Authors: Ziqi Wang, Hanlin Zhang, Xiner Li, Kuan-Hao Huang, Chi Han, Shuiwang Ji, Sham M. Kakade, Hao Peng, Heng Ji

Abstract: Position bias has proven to be a prevalent issue of modern language models (LMs), where the models prioritize content based on its position within the given context. This bias often leads to unexpected model failures and hurts performance, robustness, and reliability across various applications. Our mechanistic analysis attributes the position bias to two components employed in nearly all state-of-the-art LMs: causal attention and relative positional encodings. Based on the analyses, we propose to eliminate position bias (e.g., different retrieved documents' orders in QA affect performance) with a training-free zero-shot approach. Our method changes the causal attention to bidirectional attention between documents and utilizes model attention values to decide the relative orders of documents instead of using the order provided in input prompts, therefore enabling Position-INvariant inferencE (PINE) at the document level. By eliminating position bias, models achieve better performance and reliability in downstream tasks, including LM-as-a-judge, retrieval-augmented QA, molecule generation, and math reasoning. Notably, PINE is especially useful when adapting LMs for evaluating reasoning pairs: it consistently provides 8 to 10 percentage points performance gains, making Llama-3-70B-Instruct perform even better than GPT-4-0125-preview and GPT-4o-2024-08-06 on the RewardBench reasoning set.

replace-cross Non-Determinism of "Deterministic" LLM Settings

Authors: Berk Atil, Sarp Aykent, Alexa Chittams, Lisheng Fu, Rebecca J. Passonneau, Evan Radcliffe, Guru Rajan Rajagopal, Adam Sloan, Tomasz Tudrej, Ferhan Ture, Zhe Wu, Lixinyu Xu, Breck Baldwin

Abstract: LLM (large language model) practitioners commonly notice that outputs can vary for the same inputs under settings expected to be deterministic. Yet the questions of how pervasive this is, and with what impact on results, have not to our knowledge been systematically investigated. We investigate non-determinism in five LLMs configured to be deterministic when applied to eight common tasks in across 10 runs, in both zero-shot and few-shot settings. We see accuracy variations up to 15% across naturally occurring runs with a gap of best possible performance to worst possible performance up to 70%. In fact, none of the LLMs consistently delivers repeatable accuracy across all tasks, much less identical output strings. Sharing preliminary results with insiders has revealed that non-determinism perhaps essential to the efficient use of compute resources via co-mingled data in input buffers so this issue is not going away anytime soon. To better quantify our observations, we introduce metrics focused on quantifying determinism, TARr@N for the total agreement rate at N runs over raw output, and TARa@N for total agreement rate of parsed-out answers. Our code and data are publicly available at http://github.com/REDACTED.

URLs: http://github.com/REDACTED.

replace-cross Modeling and Analyzing the Influence of Non-Item Pages on Sequential Next-Item Prediction

Authors: Elisabeth Fischer, Albin Zehe, Andreas Hotho, Daniel Schl\"or

Abstract: Analyzing sequences of interactions between users and items, sequential recommendation models can learn user intent and make predictions about the next item. Next to item interactions, most systems also have interactions with what we call non-item pages: these pages are not related to specific items but still can provide insights into the user's interests, as, for example, navigation pages. We therefore propose a general way to include these non-item pages in sequential recommendation models to enhance next-item prediction. First, we demonstrate the influence of non-item pages on following interactions using the hypotheses testing framework HypTrails and propose methods for representing non-item pages in sequential recommendation models. Subsequently, we adapt popular sequential recommender models to integrate non-item pages and investigate their performance with different item representation strategies as well as their ability to handle noisy data. To show the general capabilities of the models to integrate non-item pages, we create a synthetic dataset for a controlled setting and then evaluate the improvements from including non-item pages on two real-world datasets. Our results show that non-item pages are a valuable source of information, and incorporating them in sequential recommendation models increases the performance of next-item prediction across all analyzed model architectures.

replace-cross One Policy to Run Them All: an End-to-end Learning Approach to Multi-Embodiment Locomotion

Authors: Nico Bohlinger, Grzegorz Czechmanowski, Maciej Krupka, Piotr Kicki, Krzysztof Walas, Jan Peters, Davide Tateo

Abstract: Deep Reinforcement Learning techniques are achieving state-of-the-art results in robust legged locomotion. While there exists a wide variety of legged platforms such as quadruped, humanoids, and hexapods, the field is still missing a single learning framework that can control all these different embodiments easily and effectively and possibly transfer, zero or few-shot, to unseen robot embodiments. We introduce URMA, the Unified Robot Morphology Architecture, to close this gap. Our framework brings the end-to-end Multi-Task Reinforcement Learning approach to the realm of legged robots, enabling the learned policy to control any type of robot morphology. The key idea of our method is to allow the network to learn an abstract locomotion controller that can be seamlessly shared between embodiments thanks to our morphology-agnostic encoders and decoders. This flexible architecture can be seen as a potential first step in building a foundation model for legged robot locomotion. Our experiments show that URMA can learn a locomotion policy on multiple embodiments that can be easily transferred to unseen robot platforms in simulation and the real world.

replace-cross Privacy Vulnerabilities in Marginals-based Synthetic Data

Authors: Steven Golob, Sikha Pentyala, Anuar Maratkhan, Martine De Cock

Abstract: When acting as a privacy-enhancing technology, synthetic data generation (SDG) aims to maintain a resemblance to the real data while excluding personally-identifiable information. Many SDG algorithms provide robust differential privacy (DP) guarantees to this end. However, we show that the strongest class of SDG algorithms--those that preserve \textit{marginal probabilities}, or similar statistics, from the underlying data--leak information about individuals that can be recovered more efficiently than previously understood. We demonstrate this by presenting a novel membership inference attack, MAMA-MIA, and evaluate it against three seminal DP SDG algorithms: MST, PrivBayes, and Private-GSD. MAMA-MIA leverages knowledge of which SDG algorithm was used, allowing it to learn information about the hidden data more accurately, and orders-of-magnitude faster, than other leading attacks. We use MAMA-MIA to lend insight into existing SDG vulnerabilities. Our approach went on to win the first SNAKE (SaNitization Algorithm under attacK ... $\varepsilon$) competition.

replace-cross Exact full-RSB SAT/UNSAT transition in infinitely wide two-layer neural networks

Authors: Brandon L. Annesi, Enrico M. Malatesta, Francesco Zamponi

Abstract: We analyze the problem of storing random pattern-label associations using two classes of continuous non-convex weights models, namely the perceptron with negative margin and an infinite-width two-layer neural network with non-overlapping receptive fields and generic activation function. Using a full-RSB ansatz we compute the exact value of the SAT/UNSAT transition. Furthermore, in the case of the negative perceptron we show that the overlap distribution of typical states displays an overlap gap (a disconnected support) in certain regions of the phase diagram defined by the value of the margin and the density of patterns to be stored. This implies that some recent theorems that ensure convergence of Approximate Message Passing (AMP) based algorithms to capacity are not applicable. Finally, we show that Gradient Descent is not able to reach the maximal capacity, irrespectively of the presence of an overlap gap for typical states. This finding, similarly to what occurs in binary weight models, suggests that gradient-based algorithms are biased towards highly atypical states, whose inaccessibility determines the algorithmic threshold.

replace-cross Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective

Authors: Zhexuan Liu, Rong Ma, Yiqiao Zhong

Abstract: Visualizing high-dimensional data is essential for understanding biomedical data and deep learning models. Neighbor embedding methods, such as t-SNE and UMAP, are widely used but can introduce misleading visual artifacts. We find that the manifold learning interpretations from many prior works are inaccurate and that the misuse stems from a lack of data-independent notions of embedding maps, which project high-dimensional data into a lower-dimensional space. Leveraging the leave-one-out principle, we introduce LOO-map, a framework that extends embedding maps beyond discrete points to the entire input space. We identify two forms of map discontinuity that distort visualizations: one exaggerates cluster separation and the other creates spurious local structures. As a remedy, we develop two types of point-wise diagnostic scores to detect unreliable embedding points and improve hyperparameter selection, which are validated on datasets from computer vision and single-cell omics.

replace-cross 1-2-3-Go! Policy Synthesis for Parameterized Markov Decision Processes via Decision-Tree Learning and Generalization

Authors: Muqsit Azeem, Debraj Chakraborty, Sudeep Kanav, Jan Kretinsky, Mohammadsadegh Mohagheghi, Stefanie Mohr, Maximilian Weininger

Abstract: Despite the advances in probabilistic model checking, the scalability of the verification methods remains limited. In particular, the state space often becomes extremely large when instantiating parameterized Markov decision processes (MDPs) even with moderate values. Synthesizing policies for such \emph{huge} MDPs is beyond the reach of available tools. We propose a learning-based approach to obtain a reasonable policy for such huge MDPs. The idea is to generalize optimal policies obtained by model-checking small instances to larger ones using decision-tree learning. Consequently, our method bypasses the need for explicit state-space exploration of large models, providing a practical solution to the state-space explosion problem. We demonstrate the efficacy of our approach by performing extensive experimentation on the relevant models from the quantitative verification benchmark set. The experimental results indicate that our policies perform well, even when the size of the model is orders of magnitude beyond the reach of state-of-the-art analysis tools.

replace-cross MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba

Authors: Masakazu Yoshimura, Teruaki Hayashi, Yota Maeda

Abstract: An ecosystem of Transformer-based models has been established by building large models with extensive data. Parameter-efficient fine-tuning (PEFT) is a crucial technology for deploying these models to downstream tasks with minimal cost while achieving effective performance. Recently, Mamba, a State Space Model (SSM)-based model, has attracted attention as a potential alternative to Transformers. While many large-scale Mamba-based models have been proposed, efficiently adapting pre-trained Mamba-based models to downstream tasks remains unexplored. In this paper, we conduct an exploratory analysis of PEFT methods for Mamba. We investigate the effectiveness of existing PEFT methods for Transformers when applied to Mamba. We also modify these methods to better align with the Mamba architecture. Additionally, we propose new Mamba-specific PEFT methods that leverage the distinctive structure of Mamba. Our experiments indicate that PEFT performs more effectively for Mamba than Transformers. Lastly, we demonstrate how to effectively combine multiple PEFT methods and provide a framework that outperforms previous works. To ensure reproducibility, we will release the code after publication.

replace-cross VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models

Authors: Muchao Ye, Weiyang Liu, Pan He

Abstract: The rapid advancement of vision-language models (VLMs) has established a new paradigm in video anomaly detection (VAD): leveraging VLMs to simultaneously detect anomalies and provide comprehendible explanations for the decisions. Existing work in this direction often assumes the complex reasoning required for VAD exceeds the capabilities of pretrained VLMs. Consequently, these approaches either incorporate specialized reasoning modules during inference or rely on instruction tuning datasets through additional training to adapt VLMs for VAD. However, such strategies often incur substantial computational costs or data annotation overhead. To address these challenges in explainable VAD, we introduce a verbalized learning framework named VERA that enables VLMs to perform VAD without model parameter modifications. Specifically, VERA automatically decomposes the complex reasoning required for VAD into reflections on simpler, more focused guiding questions capturing distinct abnormal patterns. It treats these reflective questions as learnable parameters and optimizes them through data-driven verbal interactions between learner and optimizer VLMs, using coarsely labeled training data. During inference, VERA embeds the learned questions into model prompts to guide VLMs in generating segment-level anomaly scores, which are then refined into frame-level scores via the fusion of scene and temporal contexts. Experimental results on challenging benchmarks demonstrate that the learned questions of VERA are highly adaptable, significantly improving both detection performance and explainability of VLMs for VAD.

replace-cross Patient-specific prediction of glioblastoma growth via reduced order modeling and neural networks

Authors: D. Cerrone, D. Riccobelli, S. Gazzoni, P. Vitullo, F. Ballarin, J. Falco, F. Acerbi, A. Manzoni, P. Zunino, P. Ciarletta

Abstract: Glioblastoma is among the most aggressive brain tumors in adults, characterized by patient-specific invasion patterns driven by the underlying brain microstructure. In this work, we present a proof-of-concept for a mathematical model of GBL growth, enabling real-time prediction and patient-specific parameter identification from longitudinal neuroimaging data. The framework exploits a diffuse-interface mathematical model to describe the tumor evolution and a reduced-order modeling strategy, relying on proper orthogonal decomposition, trained on synthetic data derived from patient-specific brain anatomies reconstructed from magnetic resonance imaging and diffusion tensor imaging. A neural network surrogate learns the inverse mapping from tumor evolution to model parameters, achieving significant computational speed-up while preserving high accuracy. To ensure robustness and interpretability, we perform both global and local sensitivity analyses, identifying the key biophysical parameters governing tumor dynamics and assessing the stability of the inverse problem solution. These results establish a methodological foundation for future clinical deployment of patient-specific digital twins in neuro-oncology.

replace-cross Multi-objective Combinatorial Methodology for Nuclear Reactor Site Assessment: A Case Study for the United States

Authors: Omer Erdem, Kevin Daley, Gabrielle Hoelzle, Majdi I. Radaideh

Abstract: As clean energy demand grows to meet sustainability and net-zero goals, nuclear energy emerges as a reliable option. However, high capital costs remain a challenge for nuclear power plants (NPP), where repurposing coal power plant sites (CPP) with existing infrastructure is one way to reduce these costs. Additionally, Brownfield sites-previously developed or underutilized lands often impacted by industrial activity-present another compelling alternative. This study introduces a novel multi-objective optimization methodology, leveraging combinatorial search to evaluate over 30,000 potential NPP sites in the United States. Our approach addresses gaps in the current practice of assigning pre-determined weights to each site attribute that could lead to bias in the ranking. Each site is assigned a performance-based score, derived from a detailed combinatorial analysis of its site attributes. The methodology generates a comprehensive database comprising site locations (inputs), attributes (outputs), site score (outputs), and the contribution of each attribute to the site score. We then use this database to train a neural network model, enabling rapid predictions of nuclear siting suitability across any location in the United States. Our findings highlight that CPP sites are highly competitive for nuclear development, but some Brownfield sites are able to compete with them. Notably, four CPP sites in Ohio, North Carolina, and New Hampshire, and two Brownfield sites in Florida and California rank among the most promising locations. These results underscore the potential of integrating machine learning and optimization techniques to transform nuclear siting, paving the way for a cost-effective and sustainable energy future.

replace-cross Data-Free Group-Wise Fully Quantized Winograd Convolution via Learnable Scales

Authors: Shuokai Pan, Gerti Tuzi, Sudarshan Sreeram, Dibakar Gope

Abstract: Despite the revolutionary breakthroughs of large-scale text-to-image diffusion models for complex vision and downstream tasks, their extremely high computational and storage costs limit their usability. Quantization of diffusion models has been explored in recent works to reduce compute costs and memory bandwidth usage. To further improve inference time, fast convolution algorithms such as Winograd can be used for convolution layers, which account for a significant portion of computations in diffusion models. However, the significant quality loss of fully quantized Winograd using existing coarser-grained post-training quantization methods, combined with the complexity and cost of finetuning the Winograd transformation matrices for such large models to recover quality, makes them unsuitable for large-scale foundation models. Motivated by the presence of a large range of values in them, we investigate the impact of finer-grained group-wise quantization in quantizing diffusion models. While group-wise quantization can largely handle the fully quantized Winograd convolution, it struggles to deal with the large distribution imbalance in a sizable portion of the Winograd domain computation. To reduce range differences in the Winograd domain, we propose finetuning only the scale parameters of the Winograd transform matrices without using any domain-specific training data. Because our method does not depend on any training data, the generalization performance of quantized diffusion models is safely guaranteed. For text-to-image generation task, the 8-bit fully-quantized diffusion model with Winograd provides near-lossless quality (FID and CLIP scores) in comparison to the full-precision model. For image classification, our method outperforms the state-of-the-art Winograd PTQ method by 1.62% and 2.56% in top-1 ImageNet accuracy on ResNet18 and ResNet-34, respectively, with Winograd F(6, 3).

replace-cross Disentangling Safe and Unsafe Corruptions via Anisotropy and Locality

Authors: Ramchandran Muthukumar, Ambar Pal, Jeremias Sulam, Rene Vidal

Abstract: State-of-the-art machine learning systems are vulnerable to small perturbations to their input, where ``small'' is defined according to a threat model that assigns a positive threat to each perturbation. Most prior works define a task-agnostic, isotropic, and global threat, like the $\ell_p$ norm, where the magnitude of the perturbation fully determines the degree of the threat and neither the direction of the attack nor its position in space matter. However, common corruptions in computer vision, such as blur, compression, or occlusions, are not well captured by such threat models. This paper proposes a novel threat model called \texttt{Projected Displacement} (PD) to study robustness beyond existing isotropic and global threat models. The proposed threat model measures the threat of a perturbation via its alignment with \textit{unsafe directions}, defined as directions in the input space along which a perturbation of sufficient magnitude changes the ground truth class label. Unsafe directions are identified locally for each input based on observed training data. In this way, the PD threat model exhibits anisotropy and locality. Experiments on Imagenet-1k data indicate that, for any input, the set of perturbations with small PD threat includes \textit{safe} perturbations of large $\ell_p$ norm that preserve the true label, such as noise, blur and compression, while simultaneously excluding \textit{unsafe} perturbations that alter the true label. Unlike perceptual threat models based on embeddings of large-vision models, the PD threat model can be readily computed for arbitrary classification tasks without pre-training or finetuning. Further additional task annotation such as sensitivity to image regions or concept hierarchies can be easily integrated into the assessment of threat and thus the PD threat model presents practitioners with a flexible, task-driven threat specification.

replace-cross Optimal generalisation and learning transition in extensive-width shallow neural networks near interpolation

Authors: Jean Barbier, Francesco Camilli, Minh-Toan Nguyen, Mauro Pastore, Rudy Skerk

Abstract: We consider a teacher-student model of supervised learning with a fully-trained two-layer neural network whose width $k$ and input dimension $d$ are large and proportional. We provide an effective theory for approximating the Bayes-optimal generalisation error of the network for any activation function in the regime of sample size $n$ scaling quadratically with the input dimension, i.e., around the interpolation threshold where the number of trainable parameters $kd+k$ and of data $n$ are comparable. Our analysis tackles generic weight distributions. We uncover a discontinuous phase transition separating a "universal" phase from a "specialisation" phase. In the first, the generalisation error is independent of the weight distribution and decays slowly with the sampling rate $n/d^2$, with the student learning only some non-linear combinations of the teacher weights. In the latter, the error is weight distribution-dependent and decays faster due to the alignment of the student towards the teacher network. We thus unveil the existence of a highly predictive solution near interpolation, which is however potentially hard to find by practical algorithms.

replace-cross SE Arena: An Interactive Platform for Evaluating Foundation Models in Software Engineering

Authors: Zhimin Zhao

Abstract: Foundation models (FMs), particularly large language models (LLMs), have shown significant promise in various software engineering (SE) tasks, including code generation, debugging, and requirement refinement. Despite these advances, existing evaluation frameworks are insufficient for assessing model performance in iterative, context-rich workflows characteristic of SE activities. To address this limitation, we introduce SE Arena, an interactive platform designed to evaluate SE-focused chatbots. SE Arena provides a transparent, open-source leaderboard, supports multi-round conversational workflows, and enables end-to-end model comparisons. Moreover, SE Arena incorporates a new feature called RepoChat, which automatically injects repository-related context (e.g., issues, commits, pull requests) into the conversation, further aligning evaluations with real-world development processes. This paper outlines the design and capabilities of SE Arena, emphasizing its potential to advance the evaluation and practical application of FMs in software engineering.

replace-cross Semantic Learning for Molecular Communication in Internet of Bio-Nano Things

Authors: Hanlin Cai, Ozgur B. Akan

Abstract: Molecular communication (MC) provides a foundational framework for information transmission in the Internet of Bio-Nano Things (IoBNT), where efficiency and reliability are crucial. However, the inherent limitations of molecular channels, such as low transmission rates, noise, and intersymbol interference (ISI), limit their ability to support complex data transmission. This paper proposes an end-to-end semantic learning framework designed to optimize task-oriented molecular communication, with a focus on biomedical diagnostic tasks under resource-constrained conditions. The proposed framework employs a deep encoder-decoder architecture to efficiently extract, quantize, and decode semantic features, prioritizing taskrelevant semantic information to enhance diagnostic classification performance. Additionally, a probabilistic channel network is introduced to approximate molecular propagation dynamics, enabling gradient-based optimization for end-to-end learning. Experimental results demonstrate that the proposed semantic framework improves diagnostic accuracy by at least 25% compared to conventional JPEG compression with LDPC coding methods under resource-constrained communication scenarios.

replace-cross Low-Rank Thinning

Authors: Annabelle Michael Carrell, Albert Gong, Abhishek Shetty, Raaz Dwivedi, Lester Mackey

Abstract: The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms like Kernel Halving and Compress can match the quality of uniform subsampling while substantially reducing the number of summary points. However, existing guarantees cover only a restricted range of distributions and kernel-based quality measures and suffer from pessimistic dimension dependence. To address these deficiencies, we introduce a new low-rank analysis of sub-Gaussian thinning that applies to any distribution and any kernel, guaranteeing high-quality compression whenever the kernel or data matrix is approximately low-rank. To demonstrate the broad applicability of the techniques, we design practical sub-Gaussian thinning approaches that improve upon the best known guarantees for approximating attention in transformers, accelerating stochastic gradient training through reordering, and distinguishing distributions in near-linear time.

replace-cross Diversity-driven Data Selection for Language Model Tuning through Sparse Autoencoder

Authors: Xianjun Yang, Shaoliang Nie, Lijuan Liu, Suchin Gururangan, Ujjwal Karn, Rui Hou, Madian Khabsa, Yuning Mao

Abstract: Instruction tuning data are often quantity-saturated due to the large volume of data collection and fast model iteration, leaving data selection important but underexplored. Existing quality-driven data selection methods, such as LIMA (NeurIPS 2023 \citep{zhou2024lima}) and AlpaGasus (ICLR 2024 \citep{chenalpagasus}) generally ignore the equal importance of data diversity and complexity. In this work, we aim to design a diversity-aware data selection strategy and creatively propose using sparse autoencoders (SAEs) to tackle the challenge of data diversity measure. In addition, SAEs can also provide more interpretability of model behavior and explain, e.g., the surprising effectiveness of selecting the longest response (ICML 2024 \citep{zhaolong}). Using effective data selection, we experimentally prove that models trained on our selected data can outperform other methods in terms of model capabilities, reduce training cost, and potentially gain more control over model behaviors. We prove that SAEs can serve as a good alternative to diversity measure and design our method to be scalable for potential industrial large-scale pruning, and we will also release our trained SAEs for use by the broader community.

replace-cross Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation

Authors: Han Xue, Jieji Ren, Wendi Chen, Gu Zhang, Yuan Fang, Guoying Gu, Huazhe Xu, Cewu Lu

Abstract: Humans can accomplish complex contact-rich tasks using vision and touch, with highly reactive capabilities such as quick adjustments to environmental changes and adaptive control of contact forces; however, this remains challenging for robots. Existing visual imitation learning (IL) approaches rely on action chunking to model complex behaviors, which lacks the ability to respond instantly to real-time tactile feedback during the chunk execution. Furthermore, most teleoperation systems struggle to provide fine-grained tactile / force feedback, which limits the range of tasks that can be performed. To address these challenges, we introduce TactAR, a low-cost teleoperation system that provides real-time tactile feedback through Augmented Reality (AR), along with Reactive Diffusion Policy (RDP), a novel slow-fast visual-tactile imitation learning algorithm for learning contact-rich manipulation skills. RDP employs a two-level hierarchy: (1) a slow latent diffusion policy for predicting high-level action chunks in latent space at low frequency, (2) a fast asymmetric tokenizer for closed-loop tactile feedback control at high frequency. This design enables both complex trajectory modeling and quick reactive behavior within a unified framework. Through extensive evaluation across three challenging contact-rich tasks, RDP significantly improves performance compared to state-of-the-art visual IL baselines through rapid response to tactile / force feedback. Furthermore, experiments show that RDP is applicable across different tactile / force sensors. Code and videos are available on https://reactive-diffusion-policy.github.io.

URLs: https://reactive-diffusion-policy.github.io.

replace-cross Addressing pitfalls in implicit unobserved confounding synthesis using explicit block hierarchical ancestral sampling

Authors: Xudong Sun, Alex Markham, Pratik Misra, Carsten Marr

Abstract: Unbiased data synthesis is crucial for evaluating causal discovery algorithms in the presence of unobserved confounding, given the scarcity of real-world datasets. A common approach, implicit parameterization, encodes unobserved confounding by modifying the off-diagonal entries of the idiosyncratic covariance matrix while preserving positive definiteness. Within this approach, we identify that state-of-the-art protocols have two distinct issues that hinder unbiased sampling from the complete space of causal models: first, we give a detailed analysis of use of diagonally dominant constructions restricts the spectrum of partial correlation matrices; and second, the restriction of possible graphical structures when sampling bidirected edges, unnecessarily ruling out valid causal models. To address these limitations, we propose an improved explicit modeling approach for unobserved confounding, leveraging block-hierarchical ancestral generation of ground truth causal graphs. Algorithms for converting the ground truth DAG into ancestral graph is provided so that the output of causal discovery algorithms could be compared with. We draw connections between implicit and explicit parameterization, prove that our approach fully covers the space of causal models, including those generated by the implicit parameterization, thus enabling more robust evaluation of methods for causal discovery and inference.

replace-cross Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond

Authors: Liang Wen, Yunke Cai, Fenrui Xiao, Xin He, Qi An, Zhenyu Duan, Yimin Du, Junchen Liu, Lifu Tang, Xiaowei Lv, Haosheng Zou, Yongchao Deng, Shousheng Jia, Xiangzheng Zhang

Abstract: This paper introduces Light-R1, an open-source suite for training long reasoning models using reproducible and cost-effective methodology. Given the proprietary nature of data used in the DeepSeek-R1 series, we develop an alternative approach leveraging exclusively public data and models. Our curriculum training progressively increases data difficulty, combined with multi-staged post-training. Our Light-R1-32B model, trained from Qwen2.5-32B-Instruct, outperforms DeepSeek-R1-Distill-Qwen-32B in math reasoning. Experimental results show that this curriculum approach becomes more effective when distinct, diverse datasets are available for different training stages: fine-tuning DeepSeek-R1-Distilled models (pre-tuned by DeepSeek team on proprietary data) with 3,000 challenging examples from our curriculum dataset yielded state-of-the-art 7B and 14B models, while the 32B model, Light-R1-32B-DS performed comparably to QwQ-32B and DeepSeek-R1. Furthermore, we extend our work by applying GRPO on long reasoning models. Our final Light-R1-14B-DS achieves SOTA performance among 14B models in math, with AIME24 \& 25 scores of 74.0 and 60.2 respectively, surpassing many 32B models and DeepSeek-R1-Distill-Llama-70B. Despite math-focused training, Light-R1-14B-DS demonstrates strong cross-domain generalization. Light-R1 represents a significant advancement in making sophisticated reasoning models more accessible and implementable in real-world applications. Our models, training data and code have been made available at https://github.com/Qihoo360/Light-R1.

URLs: https://github.com/Qihoo360/Light-R1.

replace-cross Self-Vocabularizing Training for Neural Machine Translation

Authors: Pin-Jie Lin, Ernie Chang, Yangyang Shi, Vikas Chandra

Abstract: Past vocabulary learning techniques identify relevant vocabulary before training, relying on statistical and entropy-based assumptions that largely neglect the role of model training. Empirically, we observe that trained translation models are induced to use a byte-pair encoding (BPE) vocabulary subset distinct from the original BPE vocabulary, leading to performance improvements when retrained with the induced vocabulary. In this paper, we analyze this discrepancy in neural machine translation by examining vocabulary and entropy shifts during self-training--where each iteration generates a labeled dataset by pairing source sentences with the model's predictions to define a new vocabulary. Building on these insights, we propose self-vocabularizing training, an iterative method that self-selects a smaller, more optimal vocabulary, yielding up to a 1.49 BLEU improvement. Moreover, we find that deeper model architectures lead to both an increase in unique token usage and a 6-8% reduction in vocabulary size.

replace-cross Vision-Language Models for Acute Tuberculosis Diagnosis: A Multimodal Approach Combining Imaging and Clinical Data

Authors: Ananya Ganapthy, Praveen Shastry, Naveen Kumarasami, Anandakumar D, Keerthana R, Mounigasri M, Varshinipriya M, Kishore Prasath Venkatesh, Bargava Subramanian, Kalyan Sivasailam

Abstract: Background: This study introduces a Vision-Language Model (VLM) leveraging SIGLIP and Gemma-3b architectures for automated acute tuberculosis (TB) screening. By integrating chest X-ray images and clinical notes, the model aims to enhance diagnostic accuracy and efficiency, particularly in resource-limited settings. Methods: The VLM combines visual data from chest X-rays with clinical context to generate detailed, context-aware diagnostic reports. The architecture employs SIGLIP for visual encoding and Gemma-3b for decoding, ensuring effective representation of acute TB-specific pathologies and clinical insights. Results: Key acute TB pathologies, including consolidation, cavities, and nodules, were detected with high precision (97percent) and recall (96percent). The model demonstrated strong spatial localization capabilities and robustness in distinguishing TB-positive cases, making it a reliable tool for acute TB diagnosis. Conclusion: The multimodal capability of the VLM reduces reliance on radiologists, providing a scalable solution for acute TB screening. Future work will focus on improving the detection of subtle pathologies and addressing dataset biases to enhance its generalizability and application in diverse global healthcare settings.

replace-cross Calibration Strategies for Robust Causal Estimation: Theoretical and Empirical Insights on Propensity Score Based Estimators

Authors: Jan Rabenseifner, Sven Klaassen, Jannis Kueck, Philipp Bach

Abstract: The partitioning of data for estimation and calibration critically impacts the performance of propensity score based estimators like inverse probability weighting (IPW) and double/debiased machine learning (DML) frameworks. We extend recent advances in calibration techniques for propensity score estimation, improving the robustness of propensity scores in challenging settings such as limited overlap, small sample sizes, or unbalanced data. Our contributions are twofold: First, we provide a theoretical analysis of the properties of calibrated estimators in the context of DML. To this end, we refine existing calibration frameworks for propensity score models, with a particular emphasis on the role of sample-splitting schemes in ensuring valid causal inference. Second, through extensive simulations, we show that calibration reduces variance of inverse-based propensity score estimators while also mitigating bias in IPW, even in small-sample regimes. Notably, calibration improves stability for flexible learners (e.g., gradient boosting) while preserving the doubly robust properties of DML. A key insight is that, even when methods perform well without calibration, incorporating a calibration step does not degrade performance, provided that an appropriate sample-splitting approach is chosen.

replace-cross A stochastic gradient descent algorithm with random search directions

Authors: Em\'eric Gbaguidi

Abstract: Stochastic coordinate descent algorithms are efficient methods in which each iterate is obtained by fixing most coordinates at their values from the current iteration, and approximately minimizing the objective with respect to the remaining coordinates. However, this approach is usually restricted to canonical basis vectors of $\mathbb{R}^d$. In this paper, we develop a new class of stochastic gradient descent algorithms with random search directions which uses the directional derivative of the gradient estimate following more general random vectors. We establish the almost sure convergence of these algorithms with decreasing step. We further investigate their central limit theorem and pay particular attention to analyze the impact of the search distributions on the asymptotic covariance matrix. We also provide non-asymptotic $\mathbb{L}^p$ rates of convergence.

replace-cross Can Zero-Shot Commercial APIs Deliver Regulatory-Grade Clinical Text DeIdentification?

Authors: Veysel Kocaman, Muhammed Santas, Yigit Gul, Mehmet Butgul, David Talby

Abstract: We evaluate the performance of four leading solutions for de-identification of unstructured medical text - Azure Health Data Services, AWS Comprehend Medical, OpenAI GPT-4o, and John Snow Labs - on a ground truth dataset of 48 clinical documents annotated by medical experts. The analysis, conducted at both entity-level and token-level, suggests that John Snow Labs' Medical Language Models solution achieves the highest accuracy, with a 96% F1-score in protected health information (PHI) detection, outperforming Azure (91%), AWS (83%), and GPT-4o (79%). John Snow Labs is not only the only solution which achieves regulatory-grade accuracy (surpassing that of human experts) but is also the most cost-effective solution: It is over 80% cheaper compared to Azure and GPT-4o, and is the only solution not priced by token. Its fixed-cost local deployment model avoids the escalating per-request fees of cloud-based services, making it a scalable and economical choice.

replace-cross Rerouting Connection: Hybrid Computer Vision Analysis Reveals Visual Similarity Between Indus and Tibetan-Yi Corridor Writing Systems

Authors: Ooha Lakkadi Reddy

Abstract: This thesis employs a hybrid CNN-Transformer architecture, in conjunction with a detailed anthropological framework, to investigate potential historical connections between the visual morphology of the Indus Valley script and pictographic systems of the Tibetan-Yi Corridor. Through an ensemble methodology of three target scripts across 15 independently trained models, we demonstrate that Tibetan-Yi Corridor scripts exhibit approximately six-fold higher visual similarity to the Indus script (61.7%-63.5%) than to the Bronze Age Proto-Cuneiform (10.2%-10.9%) or Proto-Elamite (7.6%-8.7%) systems. Additionally and contrarily to our current understanding of the networks of the Indus Valley Civilization, the Indus script unexpectedly maps closer to Tibetan-Yi Corridor scripts, with a mean cosine similarity of 0.629, than to the aforementioned contemporaneous West Asian signaries, both of which recorded mean cosine similarities of 0.104 and 0.080 despite their close geographic proximity and evident trade relations. Across various dimensionality reduction practices and clustering methodologies, the Indus script consistently clusters closest to Tibetan-Yi Corridor scripts. Our computational results align with qualitative observations of specific pictorial parallels in numeral systems, gender markers, and key iconographic elements; this is further supported by archaeological evidence of sustained contact networks along the ancient Shu-Shendu road in tandem with the Indus Valley Civilization's decline, providing a plausible transmission pathway. While alternative explanations cannot be ruled out, the specificity and consistency of observed similarities challenge conventional narratives of isolated script development and suggest more complex ancient cultural transmission networks between South and East Asia than previously recognized.

replace-cross PharMolixFM: All-Atom Foundation Models for Molecular Modeling and Generation

Authors: Yizhen Luo, Jiashuo Wang, Siqi Fan, Zaiqing Nie

Abstract: Structural biology relies on accurate three-dimensional biomolecular structures to advance our understanding of biological functions, disease mechanisms, and therapeutics. While recent advances in deep learning have enabled the development of all-atom foundation models for molecular modeling and generation, existing approaches face challenges in generalization due to the multi-modal nature of atomic data and the lack of comprehensive analysis of training and sampling strategies. To address these limitations, we propose PharMolixFM, a unified framework for constructing all-atom foundation models based on multi-modal generative techniques. Our framework includes three variants using state-of-the-art multi-modal generative models. By formulating molecular tasks as a generalized denoising process with task-specific priors, PharMolixFM achieves robust performance across various structural biology applications. Experimental results demonstrate that PharMolixFM-Diff achieves competitive prediction accuracy in protein-small-molecule docking (83.9% vs. 90.2% RMSD < 2{\AA}, given pocket) with significantly improved inference speed. Moreover, we explore the empirical inference scaling law by introducing more sampling repeats or steps. Our code and model are available at https://github.com/PharMolix/OpenBioMed.

URLs: https://github.com/PharMolix/OpenBioMed.

replace-cross Nonhuman Primate Brain Tissue Segmentation Using a Transfer Learning Approach

Authors: Zhen Lin, Hongyu Yuan, Richard Barcus, Qing Lyu, Sucheta Chakravarty, Megan E. Lipford, Carol A. Shively, Suzanne Craft, Mohammad Kawas, Jeongchul Kim, Christopher T. Whitlow

Abstract: Non-human primates (NHPs) serve as critical models for understanding human brain function and neurological disorders due to their close evolutionary relationship with humans. Accurate brain tissue segmentation in NHPs is critical for understanding neurological disorders, but challenging due to the scarcity of annotated NHP brain MRI datasets, the small size of the NHP brain, the limited resolution of available imaging data and the anatomical differences between human and NHP brains. To address these challenges, we propose a novel approach utilizing STU-Net with transfer learning to leverage knowledge transferred from human brain MRI data to enhance segmentation accuracy in the NHP brain MRI, particularly when training data is limited. The combination of STU-Net and transfer learning effectively delineates complex tissue boundaries and captures fine anatomical details specific to NHP brains. Notably, our method demonstrated improvement in segmenting small subcortical structures such as putamen and thalamus that are challenging to resolve with limited spatial resolution and tissue contrast, and achieved DSC of over 0.88, IoU over 0.8 and HD95 under 7. This study introduces a robust method for multi-class brain tissue segmentation in NHPs, potentially accelerating research in evolutionary neuroscience and preclinical studies of neurological disorders relevant to human health.

replace-cross Convolutional Neural Networks Can (Meta-)Learn the Same-Different Relation

Authors: Max Gupta, Sunayana Rane, R. Thomas McCoy, Thomas L. Griffiths

Abstract: While convolutional neural networks (CNNs) have come to match and exceed human performance in many settings, the tasks these models optimize for are largely constrained to the level of individual objects, such as classification and captioning. Humans remain vastly superior to CNNs in visual tasks involving relations, including the ability to identify two objects as `same' or `different'. A number of studies have shown that while CNNs can be coaxed into learning the same-different relation in some settings, they tend to generalize poorly to other instances of this relation. In this work we show that the same CNN architectures that fail to generalize the same-different relation with conventional training are able to succeed when trained via meta-learning, which explicitly encourages abstraction and generalization across tasks.

replace-cross $p$-Adic Polynomial Regression as Alternative to Neural Network for Approximating $p$-Adic Functions of Many Variables

Authors: Alexander P. Zubarev

Abstract: A method for approximating continuous functions $\mathbb{Z}_{p}^{n}\rightarrow\mathbb{Z}_{p}$ by a linear superposition of continuous functions $\mathbb{Z}_{p}\rightarrow\mathbb{Z}_{p}$ is presented and a polynomial regression model is constructed that allows approximating such functions with any degree of accuracy. A physical interpretation of such a model is given and possible methods for its training are discussed. The proposed model can be considered as a simple alternative to possible $p$-adic models based on neural network architecture.

replace-cross Self-Supervised Pretraining for Aerial Road Extraction

Authors: Rupert Polley, Sai Vignesh Abishek Deenadayalan, J. Marius Z\"ollner

Abstract: Deep neural networks for aerial image segmentation require large amounts of labeled data, but high-quality aerial datasets with precise annotations are scarce and costly to produce. To address this limitation, we propose a self-supervised pretraining method that improves segmentation performance while reducing reliance on labeled data. Our approach uses inpainting-based pretraining, where the model learns to reconstruct missing regions in aerial images, capturing their inherent structure before being fine-tuned for road extraction. This method improves generalization, enhances robustness to domain shifts, and is invariant to model architecture and dataset choice. Experiments show that our pretraining significantly boosts segmentation accuracy, especially in low-data regimes, making it a scalable solution for aerial image analysis.