Site uses cookies to provide basic functionality.
Javascript rendering is set to off by default when visiting the site via .onion and .i2p domains. It can be enabled back again in user's settings section. Javascript rendering set to off means, that you can disable javascript in your browser now and the site will remain functional.
There is also IRC server now available via native IRC clients or non javascript web based one.
Fonts can be adjusted in user's settings section as well.
Check FAQ for more.

OK

arXiv:2601.11229v4 Announce Type: replace Abstract: Qualitative Comparative Analysis (QCA) requires researchers to choose calibration and dichotomization thresholds, and these choices can substantially affect truth tables, minimization, and resulting solution formulas. Despite this dependency, threshold sensitivity is often examined only in an ad hoc manner because repeated analyses are time-intensive and error-prone. We present ThSQCA, an....

arXiv:2601.15880v3 Announce Type: replace Abstract: The Mann-Whitney effect is an effect measure for the order of two sample-specific outcome variables. It has the interpretation of a probability and also a connection to the area under the ROC curve. In the literature it has been considered for both ordinal and right-censored time-to-event outcomes. For both cases, the present paper introduces a distribution-free regression model that rela....

arXiv:2601.21696v2 Announce Type: replace Abstract: Advances in data collection are producing growing volumes of temporal count observations, making adapted modeling increasingly necessary. In this work, we introduce a generative framework for independent component analysis of temporal count data, combining regime-adaptive dynamics with Poisson log-normal emissions. The model identifies disentangled components with regime-dependent contrib..

arXiv:2601.21959v2 Announce Type: replace Abstract: We develop a near-optimal testing procedure under the framework of Gaussian differential privacy for simple as well as one- and two-sided tests under monotone likelihood ratio conditions. Our mechanism is based on a private mean estimator with data-driven clamping bounds, whose population risk matches the private minimax rate up to logarithmic factors. Using this estimator, we construct p..

arXiv:2601.22784v2 Announce Type: replace Abstract: We introduce a rank-statistic approximation of $f$-divergences that avoids explicit density-ratio estimation by working directly with the distribution of ranks. For a resolution parameter $K$, we map the mismatch between two univariate distributions $\mu$ and $\nu$ to a rank histogram on $\{ 0, \ldots, K\}$ and measure its deviation from uniformity via a discrete $f$-divergence, yielding ....

Persuasive Privacy - arxiv.org - 2 days ago - eng
arXiv:2601.22945v2 Announce Type: replace Abstract: We propose a novel framework for measuring privacy from a Bayesian game-theoretic perspective. This framework enables the creation of new, purpose-driven privacy definitions that are rigorously justified, while also allowing for the assessment of existing privacy guarantees through game theory. We show that pure and probabilistic differential privacy are special cases of our framework, an..

arXiv:2602.00878v2 Announce Type: replace Abstract: Slice sampling is a standard Monte Carlo technique for Dirichlet process (DP)-based models, widely used in posterior simulation. However, formal assessments of the scalability of posterior slice samplers have remained largely unexplored, primarily because the computational cost of a slice-sampling iteration is random and potentially unbounded. In this work, we obtain high-probability boun....

arXiv:2602.03970v3 Announce Type: replace Abstract: We study the statistical behavior of reasoning probes in a stylized model of iterative computation inspired by neural algorithmic reasoning. The underlying computation is given by a looped Boolean circuit whose graph is a perfect $\nu$-ary tree ($\nu\ge 2$), with outputs recursively fed back as inputs across computation rounds. A probe observes a sampled subset of internal nodes and seeks....

arXiv:2602.03972v3 Announce Type: replace Abstract: The best-arm identification (BAI) problem is one of the most fundamental problems in interactive machine learning, which has two flavors: the fixed-budget setting (FB) and the fixed-confidence setting (FC). For $K$-armed bandits with a unique best arm, the optimal sample complexities for both settings have been settled down, and they match up to logarithmic factors. This prompts an intere....

arXiv:2602.05395v2 Announce Type: replace Abstract: A simple strategy for improving LLM accuracy, especially in math and reasoning problems, is to sample multiple responses and submit the answer most consistently reached. In this paper we leverage Bayesian prior information to save on sampling costs, stopping once sufficient consistency is reached. Although the exact posterior is computationally intractable, we further introduce an efficie..

arXiv:2602.06065v3 Announce Type: replace Abstract: Understanding how the structure of language can be learned from sentences alone is a central question in both cognitive science and machine learning. Studies of the internal representations of Large Language Models (LLMs) support their ability to parse text when predicting the next word, while representing semantic notions independently of surface form. Yet, which data statistics make the....

arXiv:2602.09651v2 Announce Type: replace Abstract: Diffusion models do not recover semantic structure uniformly over time. Instead, samples transition from semantic ambiguity to class commitment within a narrow regime. Recent theoretical work attributes this transition to dynamical instabilities along class-separating directions, but practical methods to detect and exploit these windows in trained models are still limited. We show that tr....

arXiv:2602.13906v2 Announce Type: replace Abstract: Stochastic approximation (SA) is a method for finding the root of an operator perturbed by noise. The focus of this paper is studying the distribution of SA iterates in finite time. In general, it is not possible to characterize the exact distribution, and therefore our goal is to find an approximation which can yield useful tail bounds. Inspired by the rich literature on the asymptotic n....

arXiv:2602.16794v2 Announce Type: replace Abstract: Conformal prediction (CP) offers distribution-free uncertainty quantification for machine learning models, yet its interplay with fairness in downstream decision-making remains underexplored. Moving beyond CP as a standalone operation (procedural fairness), we analyze the holistic decision-making pipeline to evaluate substantive fairness-the equity of downstream outcomes. Theoretically, w....

arXiv:2602.22768v2 Announce Type: replace Abstract: Multi-armed bandit (MAB) processes constitute a foundational subclass of reinforcement learning problems and represent a central topic in statistical decision theory. Yet, conducting valid sequential testing under adaptive allocation remains challenging due to the lack of asymptotic theory under non-i.i.d. reward sequences and sublinear sample sizes for some arms. To address this open cha....

arXiv:2602.24219v2 Announce Type: replace Abstract: A statistic can be a function of multiple samples. There is little existing work on asymptotic theory for such statistics when group membership is random. We propose a flexible framework that can handle both deterministic and random membership. We prove some asymptotic properties and apply the framework to the stratified sampling context.

arXiv:2603.07563v2 Announce Type: replace Abstract: In this paper, we address a fundamental limitation of the classical Wasserstein barycenter -- its sensitivity to outliers. To overcome these issues, we propose the robust Wasserstein barycenter (RWB) based on a recent concept of the robust optimal transport. Theoretical guarantees, including existence and consistency, are established for the proposed RWB. Through extensive numerical exper..

arXiv:2603.09919v2 Announce Type: replace Abstract: Adaptive enrichment trials aim to identify and recruit participants most likely to benefit from treatment based on evolving biomarker evidence, with the goal of informing individualized treatment recommendations. Bayesian methods are well suited to these designs because they allow external information to be incorporated in a principled manner. In practice, prior studies often provide only....

arXiv:2603.14798v2 Announce Type: replace Abstract: We propose a machine-learning algorithm for Bayesian inverse problems in the function-space regime. Based on one-step generative transport, the method learns an amortized neural operator whose pushforward of a Gaussian source approximates the posterior distribution conditioned on each new observation. We show that white-noise sources are incompatible with the function-space limit, and the....

arXiv:2603.22215v2 Announce Type: replace Abstract: Joint modeling of multiview graphs with a common set of nodes between views and auxiliary predictors is an essential, yet less explored, area in statistical methodology. Traditional approaches often treat graphs in different views as independent or fail to adequately incorporate predictors, potentially missing complex dependencies within and across graph views and leading to reduced infer....

arXiv:2603.24170v3 Announce Type: replace Abstract: At the end, the house always wins! This simple truth holds for all public games of chance. Nevertheless, since lotteries have existed, people have tried everything to give luck a helping hand. This article compares objective scientific approaches to tackle the 6/49 lottery: probabilistic methods and combinatorial designs. The mathematical models developed herein can be modified and applie..

arXiv:2605.00696v2 Announce Type: replace Abstract: We study adaptive querying for learning user-dependent quantities of interest, such as responses to held-out items and psychometric indicators, within tight query budgets. Classical Bayesian design and computerized adaptive testing typically rely on restrictive parametric assumptions or expensive posterior approximations, limiting their use in heterogeneous, high-dimensional, and cold-sta....

arXiv:2605.03781v4 Announce Type: replace Abstract: Using standard-normal critical-value calibration (SNC) to construct a kernel-smoother-based confidence interval faces a fundamental challenge: the normalization makes a small estimation bias become a non-negligible inferential bias. This paper takes a different route by replacing the SNC control with empirical Bernstein tail control. The resulting confidence intervals control stochastic v....

arXiv:2605.07818v2 Announce Type: replace Abstract: The expectation--maximization (EM) algorithm combines global monotonicity, local linear convergence, and strong practical robustness, but these features are usually analyzed separately. Global descent is nonlinear, whereas local convergence is governed by the spectrum of the linearized EM map. How these two levels fit into a single dynamical picture has remained less transparent. We make....

arXiv:2605.12768v2 Announce Type: replace Abstract: Open time-series forecasting (TSF) benchmarks cover retail, energy, weather, and traffic, but supply-chain logistics remains underserved. We introduce ISOMORPH, the first public digital twin of a multi-echelon logistics network with interpretable, user-configurable parameters and modular topology, demand, and control rules. The simulator advances a directed routing graph in discrete time:....

arXiv:2605.13203v2 Announce Type: replace Abstract: This paper investigates the predictive performance of model averaging in high-dimensional linear regression where the number of regressors is comparable to the sample size. Leveraging tools from random matrix theory, we derive the exact limiting out-of-sample risk under a nested model setting and comprehensively characterize the risk landscape. This limiting risk helps to reveal two pheno....

arXiv:2605.13397v2 Announce Type: replace Abstract: Inference for models with recursively defined likelihoods is computationally demanding, limiting scalability to large datasets. We propose a stabilised weighted subsampling methodology for accelerated inference based on an unbiased estimator of the log-likelihood. By assigning higher sampling probabilities to early observations, the method reduces the effective depth of recursive likeliho....

arXiv:2605.13430v3 Announce Type: replace Abstract: Selection bias is pervasive in observational studies. For example, large scale biobanks data can exhibit ``healthy volunteer bias'' when respondents are healthier and of higher socio-economic status than the population they are meant to represent. Recovering causal effects from such sub-population is an important problem in causal inference, as estimating average treatment effects (ATE) f....

arXiv:2605.20615v2 Announce Type: replace Abstract: Causal mediation analysis is essential for disentangling the mechanisms by which investigational therapeutic and preventive agents impact clinical outcomes. However, the measurement of biological mediators is often subject to left-censoring by technical measurement limitations, most commonly an assay's limit of quantification. This form of censoring can pose severe challenges for both ide....

arXiv:2605.24377v2 Announce Type: replace Abstract: Real-World Data (RWD), with its large sample sizes and rich clinical detail, offers a compelling alternative to randomized controlled trials (RCTs) for studying treatment effects in diverse and complex patient populations. However, its observational nature introduces confounding that prevents straightforward comparative effectiveness research. Target trial emulation leverages RWD to estim....

arXiv:2605.24847v2 Announce Type: replace Abstract: Introduction: Logistic regression (LR)-type model limitations for causal inference are explained theoretically and empirically through the lens of the purported gateway effect from e-cigarette use to smoking. Previous studies have reported that baseline e-cigarette use quadruples odds of follow-up smoking (binarized) in LR-type models of adolescent longitudinal cohorts (LCs), such that in....

arXiv:2605.29200v2 Announce Type: replace Abstract: Conformal prediction is a framework for providing prediction intervals with distribution-free validity, guaranteeing predictive coverage for data drawn from any distribution. Its two main variants are full conformal prediction and split conformal prediction (also called transductive and inductive). Full conformal prediction is widely considered to be statistically more efficient (since sp....

arXiv:2605.29388v2 Announce Type: replace Abstract: This paper develops a framework for differentially private $e$-values under Gaussian differential privacy ($\mu$-GDP). We characterize the canonical noise mechanism, establishing that optimal multiplicative perturbation follows a Gaussian distribution. Using this distribution, we derive a globally sharp rejection threshold that strictly improves upon the standard Markov bound. Asymptotic ....

arXiv:2605.30242v2 Announce Type: replace Abstract: The airborne fraction is the share of anthropogenic carbon dioxide emissions that remains in the atmosphere and is a key indicator of carbon-cycle response and remaining carbon budgets under continued emissions. Whether this share is rising remains debated because inference is sensitive to uncertainty in land-use and land-cover change (LULC) emissions. Here we use all available LULC measu....

arXiv:2310.20545v3 Announce Type: replace-cross Abstract: We present a multi-task optimization approach based on a deep learning architecture for time series forecasting. We leverage large collections of time series to identify the weights of forecasting models that can be combined to produce forecasts for each series. This method jointly addresses two tasks: the selection of different forecasting models, and their effective combination. I....

arXiv:2403.07008v3 Announce Type: replace-cross Abstract: The evaluation of machine learning models using human-labeled validation data can be expensive and time-consuming. AI-labeled synthetic data can be used to decrease the number of human annotations required for this purpose in a process called autoevaluation. We suggest efficient and statistically principled algorithms for this purpose that improve sample efficiency while remaining u..

arXiv:2409.15532v3 Announce Type: replace-cross Abstract: Stochastic differential equations are ubiquitous modelling tools in physics and the sciences. In most modelling scenarios, random fluctuations driving dynamics or motion have some non-trivial temporal correlation structure, which renders the SDE non-Markovian; a phenomenon commonly known as ``colored'' noise. Thus, an important objective is to develop effective tools for mathematica....

arXiv:2410.17105v4 Announce Type: replace-cross Abstract: We develop a flexible framework for Bayesian estimation of impulse responses using Local Projections (LPs) with instrumental variables. It accommodates multiple shocks and instruments, accounts for autocorrelation in multi-step forecasts by jointly modeling all LPs as a seemingly unrelated system of equations, defines a flexible yet parsimonious joint prior for impulse responses bas..

arXiv:2411.08692v2 Announce Type: replace-cross Abstract: The inherent complexity of biological agents often leads to motility behavior that appears to have random components. Robust stochastic inference methods are therefore required to understand and predict the motion patterns from time-discrete trajectory data provided by experiments. In many cases, second-order Langevin models are needed to adequately capture the motility. Additionall....

arXiv:2411.12438v2 Announce Type: replace-cross Abstract: We develop a new approach for clustering non-spherical (i.e., arbitrary component covariances) Gaussian mixture models via a subroutine, based on the sum-of-squares method, that finds a low-dimensional separation-preserving projection of the input data. Our method gives a non-spherical analog of the classical dimension reduction, based on singular value decomposition, that, among se....

arXiv:2412.04177v2 Announce Type: replace-cross Abstract: Recently, there has been an increasing interest in performing post-hoc uncertainty estimation about the predictions of pre-trained deep neural networks (DNNs). Given a pre-trained DNN via back-propagation, these methods enhance the original network by adding output confidence measures, such as error bars, without compromising its initial accuracy. In this context, we introduce a nov....

arXiv:2412.16209v5 Announce Type: replace-cross Abstract: When using machine learning for imbalanced binary classification problems, it is common to subsample the majority class to create a (more) balanced training dataset. This biases the model's predictions because the model learns from data that is not fully representative of the underlying population of interest. One way of accounting for this bias is analytically mapping the resulting....

arXiv:2412.19444v2 Announce Type: replace-cross Abstract: Optimization algorithms such as AdaGrad and Adam have significantly advanced the training of deep models by dynamically adjusting the learning rate during the optimization process. However, ad-hoc tuning of learning rates poses a challenge and leads to inefficiencies in practice. To address this issue, recent research has focused on developing ``parameter-free'' algorithms that oper....

arXiv:2501.08640v2 Announce Type: replace-cross Abstract: We propose a way to bound the generalisation errors of several classes of quantum reservoirs using the Rademacher complexity. We give specific, parameter-dependent bounds for two particular quantum reservoir classes. We analyse how the generalisation bounds scale with growing numbers of qubits. Applying our results to classes with polynomial readout functions, we find that the risk ..

9 visitors online