Site uses cookies to provide basic functionality.
Javascript rendering is set to off by default when visiting the site via .onion and .i2p domains. It can be enabled back again in user's settings section. Javascript rendering set to off means, that you can disable javascript in your browser now and the site will remain functional.
There is also IRC server now available via native IRC clients or non javascript web based one.
Fonts can be adjusted in user's settings section as well.
Check FAQ for more.

OK

arXiv:2606.01645v1 Announce Type: new Abstract: Diffusion models have emerged as a leading framework for deep generative modeling. While the standard Gaussian formulation is theoretically convenient, its suitability for heavy-tailed datasets remains unclear. To address this, heavy-tailed diffusion models (HTDMs) extend the standard formulation by replacing the Gaussian distribution with a Student's t-distribution, thereby improving tail fi..

arXiv:2606.01669v1 Announce Type: new Abstract: Principal stratification is an effective framework addressing intermediate variables in causal inference. However, point identification of the principal causal effects (PCEs) often requires the untestable principal ignorability (PI) assumption. This article develops a nonparametric sensitivity analysis framework for evaluating PI violations. We introduce a margin-free bounding factor paramete....

arXiv:2606.01674v1 Announce Type: new Abstract: Higher-order efficient estimators extend standard first-order semiparametric estimators by replacing second-order residuals with third- or higher-order terms, potentially enabling asymptotic efficiency under slower nuisance function convergence rates and improving finite-sample performance. Existing methods achieve higher-order expansions through structurally different approximation strategie....

arXiv:2606.01724v1 Announce Type: new Abstract: LEO satellite constellations, led by deployments such as Starlink, are playing an increasingly pivotal role in enabling global broadband connectivity. However, the reliability and performance of these space-based networks are highly sensitive to environmental dynamics, particularly localized weather phenomena that exhibit strong spatio-temporal variability. In this study, we present a contine....

arXiv:2606.01796v1 Announce Type: new Abstract: Human viral challenge studies, in which participants are deliberately inoculated with influenza strains such as H1N1 or H3N2 and monitored through longitudinal transcriptomic profiling before and after inoculation, are critical for characterizing dynamic biological immune responses to viral infection. A key analytical goal in such settings is to detect critical transition times, or change poi....

arXiv:2606.01854v1 Announce Type: new Abstract: This paper presents closed BH, a uniform improvement of the False Discovery Rate controlling method of Benjamini and Hochberg (BH). Closed BH is valid under the same assumption of Positive Regression Dependency on a Subset (PRDS) as BH. As a uniform improvement, closed BH never rejects fewer hypotheses than BH, but it may reject quite a few more. An increase in power is observed especially wh..

arXiv:2606.01932v1 Announce Type: new Abstract: Spatial capture-recapture models are routinely used to estimate the abundance and distribution of wild animal populations and involve a latent spatial point process of animal activity centres that describes the spatial distribution of individuals. While traditional spatial capture-recapture models use a Poisson process, the assumption of conditional independence between points is often violat....

arXiv:2606.01960v1 Announce Type: new Abstract: We consider the problem of detecting a Return to Baseline (RtB) in high-frequency monitoring data preceding and following an intervention, where the aim is to identify the time at which the data-generating distribution realigns with its pre-intervention distribution. We propose a sequential, distribution-free testing procedure that does not rely on specifying a null model and provides anytime....

arXiv:2606.01990v1 Announce Type: new Abstract: The Admixture Model describes genetic marker data by representing each individual's genome as a mixture of contributions from $K$ ancestral populations, with the individual admixture vector summarizing the corresponding ancestry proportions. In population and forensic genetics, a key question is whether an individual's genome supports a predominantly single-ancestry interpretation or whether ....

arXiv:2606.02008v1 Announce Type: new Abstract: Pre-training has become a fundamental paradigm in modern machine learning, with one of its key empirical benefits being reduced downstream sample complexity as the scale of pre-training data increases. However, existing theoretical frameworks for pre-training do not fully explain this phenomenon. In this paper, we introduce complexity minimization, a novel meta-representation learning framewo....

arXiv:2606.02017v1 Announce Type: new Abstract: High-dimensional interaction models are useful for studying, for example, how a large set of variables of interest, such as gene expression or other omics features, interact with a smaller set of modifying variables, such as clinical covariates. In this context, the pliable lasso has recently been proposed as an efficient method for screening large numbers of potential interaction terms under....

arXiv:2606.02047v1 Announce Type: new Abstract: We introduce Convex Distance Operator Transport (CDOT), the first convex optimal transport framework that aligns distributions across heterogeneous domains by jointly preserving feature correspondence and intrinsic geometric structure. Specifically, CDOT employs an operator-based regularization that aligns aggregated distance structures by introducing distance and conditional expectation oper....

arXiv:2606.02059v1 Announce Type: new Abstract: The intraclass correlation coefficient (ICC) is among the most widely used statistics in reliability research, playing a central role in medical measurement, psychological assessment, and behavioral science. However, practical application of ICC faces two major obstacles. First, ICC can be organized into multiple forms under the McGraw and Wong (1996) framework -- including six widely reporte....

arXiv:2606.02062v1 Announce Type: new Abstract: Different methods have been employed to estimate models maximizing the area under the receiver operating characteristic curve (ROC-AUC). Once a model is developed, integrating novel biomarkers may improve its diagnostic ability. However, the discrimination improvement from adding a new biomarker is not always evident, even if the marker itself has good discriminatory power. The sign and magni....

arXiv:2606.02065v1 Announce Type: new Abstract: While it is well-known how to compute the cells of a Laguerre tessellation for a given set of weighted generator points, it is not obvious how to invert a Laguerre tessellation. That is, given that one observes a Laguerre tessellation, how can one retrieve the weighted generators corresponding to the observed cells. In this paper, we consider inversion of a class of random Laguerre tessellati....

arXiv:2606.02076v1 Announce Type: new Abstract: Background: Multistate models (MSMs) applied to screening data can characterise the natural history of cancer and predict "stage-shifts" from screening. However, inferring parameters like mean sojourn time (MST) is challenging as disease onset is inherently unobserved in these data. This is even more challenging when characterising heterogeneity between cancer types in multicancer early detec....

arXiv:2606.02101v1 Announce Type: new Abstract: This paper proposes a method of creating synthetic data (SD) that will have two important advantages for the user compared to other methods currently available. The first is transparency; unlike other methods, the person in receipt of the SD will know which of the relationships between variables in the original data will be approximately maintained in the SD. The second is a guarantee that th....

arXiv:2606.02115v1 Announce Type: new Abstract: Parameter estimation in stochastic differential equations is a classical statistical problem of much importance in many scientific fields. Recent work of Tapia Costa et al. (2026) introduced a novel technique for estimating the drift when the diffusion parameter is known, using discrete samples from multiple trajectories. Their method treats drift estimation as a denoising problem, and levera....

arXiv:2606.02117v1 Announce Type: new Abstract: Probabilistic time series forecasting has attracted increasing attention in financial applications due to the need to quantify risk and uncertainty in future observations. We propose ProbRes, a post-hoc probabilistic calibration method that explicitly learns and incorporates volatility dynamics into probabilistic forecasting, enabling effective handling of heteroskedastic data. During trainin....

arXiv:2606.02118v1 Announce Type: new Abstract: In this work we provide analytical and closed-form expressions for the exact computation of the score and the observed Fisher information matrix in a Gaussian random walk observed through Gaussian noise. Our method is based on the Oakes' identity and, as for the computation of the log-likelihood, its complexity in time is linear in the length of the sequence with the forward-backward (or Baum..

arXiv:2606.02130v1 Announce Type: new Abstract: This article describes the design of a neutral comparison study in the context of empirical studies where the interest is in learning the functional relationship between a continuous errorprone exposure variable and a binary outcome. The performance of combinations of measurement error correction methods and flexible regression modeling techniques was compared using a simulation study. The pr....

arXiv:2606.02144v1 Announce Type: new Abstract: We investigate support thresholds for fully smeary and directionally smeary absolutely continuous probability measures on the sphere \(\mathbb{S}^m\). The motivation is inferential: smeariness is caused by degeneracy of the Hessian of the Fr\'echet function, and such degeneracy can invalidate the classical central limit theorem (CLT) for Fr\'echet means and the corresponding Wald-type \(\chi^....

arXiv:2606.02199v1 Announce Type: new Abstract: Multinomial count data, such as microbial composition profiles derived from sequencing studies, frequently contain anomalous observations that distort parameter estimates. The Dirichlet-multinomial (DM) distribution is widely used in this setting but remains sensitive to such contamination. We propose the contaminated Dirichlet-multinomial (CDM) distribution, a two-component mixture in which ....

arXiv:2606.02228v1 Announce Type: new Abstract: Predicting whether an individual with Alzheimer's disease will experience mild or severe disease progression is essential for personalized treatment. Typically, practitioners seek to predict the distribution of a discrete disease score, conditional on an individual's current MRI volume and their historical disease trajectory. Classical statistical regression models and single-task neural netw....

arXiv:2606.02231v1 Announce Type: new Abstract: Temporal systems often exhibit non-stationary behaviour, such as seasonal climate variation or glucose fluctuations in patients with type-1 diabetes. One way to model non-stationarity is through discrete latent regimes, i.e., stationary segments of time. Such systems induce a Markov Switching Model (MSM), a class of Hidden Markov Models with autoregressive dependencies among latent regimes an....

arXiv:2606.02247v1 Announce Type: new Abstract: Shapley values are a principled attribution measure widely used in interpretable machine learning, but their exact computation scales exponentially with the number of players, motivating a wide range of approximation methods based on value function evaluations of sampled coalitions. This raises the question of whether approximation accuracy can be improved by adaptively selecting coalitions f....

arXiv:2606.02295v1 Announce Type: new Abstract: When it comes to estimating an unknown spectral density as simply and reliably as possible, parametric spectral density estimation using AR models and order selection via AIC is the method of choice. In contrast, no standard method has yet emerged for automatic nonparametric spectral density estimation, and there seems to be little willingness to weigh the advantages and disadvantages of diff....

arXiv:2606.02345v1 Announce Type: new Abstract: Many machine learning problems, including similarity learning, ranking, and clustering, rely on empirical pairwise loss functions whose quadratic computational cost quickly becomes prohibitive at scale. We demonstrate how a frugal approach that retains only a fraction of the available information on pairs can achieve estimation or optimization performance comparable to that obtained by using ..

arXiv:2606.02410v1 Announce Type: new Abstract: Two-arm phase II clinical trials often benefit from an interim analysis that allows early stopping for futility, but Bayesian calibration of such designs is usually based on computationally intensive Monte Carlo simulation. In this work, a simulation-free methodology is developed to obtain Bayesian optimal two-stage designs in two-arm phase II trials with binary endpoints using Bayes factors ....

arXiv:2606.02508v1 Announce Type: new Abstract: In the last few years, AI-based models have become the centre of attention in weather forecasting due to their increasing accuracy and efficiency. Pioneering among weather services, ECMWF has developed its Artificial Intelligence Forecasting System (AIFS) model, which was first to provide data-driven ensemble forecasts in June 2024. Since July 2025, the AIFS ensemble model has been operationa....

arXiv:2606.02533v1 Announce Type: new Abstract: Space-filling designs are commonly used in deterministic computer experiments. However, they are ineffective for factor screening, which makes them inefficient when only a small subset of input factors is influential to the output. Recently developed screening designs, such as MOFAT designs, are effective at identifying important factors but lack space-filling properties, limiting their usefu..

arXiv:2606.02550v1 Announce Type: new Abstract: A fundamental goal in climate attribution is to estimate how forced climate change contributes to observed extreme weather events. The storyline attribution method compares an observed weather event, conditional on its atmospheric dynamic state (i.e., atmospheric circulation), in the current, 'factual' climate to an event with very similar circulation conditions in a hypothetical, 'counterfac....

arXiv:2605.28952v2 Announce Type: cross Abstract: E-values have attracted considerable interest in recent years as flexible tools for enabling anytime-valid and adaptive data analysis. Hypothesis testing is at the core of many of these applications, which can often involve private or sensitive data. In this work, we answer a simple but important question: given two distributions $\mathbb{P}$ and $\mathbb{Q}$, what is the maximum achievable....

arXiv:2606.00082v1 Announce Type: cross Abstract: Explainability of deep learning algorithms is critical for computer-vision applications with high-stake decisions. Concept bottleneck models (CBM) have recently shown promising performance to provide explainable and accurate predictions for classification problems, based on a bottleneck of high-level concepts. Existing CBM methods rely on a linear aggregation of the concept scores to comput....

arXiv:2606.00115v1 Announce Type: cross Abstract: Bridging the gap between visual realism and physical understanding is a core challenge for video-based world models. We study the structural identifiability of continuous-time physical laws from raw pixels, focusing on whether an encoder-only pipeline can uniquely recover the parameters of second-order linear ODEs. We prove that a level-set slope-coverage condition ensures the learned laten....

arXiv:2606.00183v1 Announce Type: cross Abstract: Tree search is a central abstraction behind many language-agent reasoning and decision-making tasks: agents must explore actions, remember failures, and backtrack toward promising alternatives. Yet, we lack a theoretical understanding of how transformer-based policies acquire such search capabilities from the training dynamics of reinforcement learning (RL). We study this question in a stoc....

arXiv:2606.00241v1 Announce Type: cross Abstract: Measuring statistical dependency between high-dimensional random variables is a fundamental task in data science and machine learning. Neural mutual information (MI) estimators offer a promising avenue, but they typically require costly iterative optimization for each new dataset, making them impractical for real-time applications. We present InfoAtlas, a foundation model-like architecture ....

arXiv:2606.00243v1 Announce Type: cross Abstract: Biological and neuromorphic recurrent neural networks (RNNs) are subject to spatial and temporal locality constraints on the information that can plausibly be used during learning. A common strategy to satisfy these constraints is to modify gradient descent by neglecting non-local terms to varying degrees, as in random feedback local online (RFLO) learning and truncated backpropagation thro....

arXiv:2606.00262v1 Announce Type: cross Abstract: InfoNCE is the standard contrastive learning objective, but its softmax form is not only a computational convenience: it also encodes a statistical assumption about how the top-scoring example is selected. Using extreme value theory, we show that this assumption is often misaligned with the normalized embedding setting used in modern contrastive learning. Motivated by this mismatch, we prop..

arXiv:2606.00293v1 Announce Type: cross Abstract: Tuning algorithms such as stochastic gradient descent (SGD) and stochastic gradient Langevin dynamics (SGLD) for approximate sampling and uncertainty quantification remains challenging, particularly in the practically relevant settings when the batch size is large or the model is misspecified. Existing theory that provides tuning guidance relies on continuous-time limits or strong statistic....

arXiv:2606.00309v1 Announce Type: cross Abstract: Stochastic gradient Langevin dynamics combined with Gibbs updates (SGLD--Gibbs) provides a highly scalable approach to approximate Bayesian inference in latent variable models. However, it remains unclear how to tune the algorithm's hyperparameters in a principled manner to ensure the uncertainty estimates are statistically meaningful. In this work, we address this gap in tuning guidance by....

arXiv:2606.00322v1 Announce Type: cross Abstract: We introduce a perturbative approach for nonparametric instrumental variable (NPIV) estimation. By drawing inspiration from perturbation theory in physics, we extend standard kernel ridge methods with systematic higher perturbation order corrections that significantly improve estimation accuracy. Spectrally, the perturbation introduces mixing between different eigenmodes of the expectation ....

arXiv:2606.00329v1 Announce Type: cross Abstract: Recursive systems can enter collapse-like regimes -- self-reinforcing amplification, persistent recursion, and narrowing diversity that mask accelerating internal degradation -- before overt failure becomes visible. We introduce Loopzero, a claim-bounded benchmark framework for testing whether recursive failures follow a directional telemetry pattern: rising gain (G), recursive persistence ....

arXiv:2606.00384v1 Announce Type: cross Abstract: Fitting quantitative models to data is a central step in scientific workflows, yet it remains one of the least automated. Recent agent-based systems leverage language and vision-language models (VLMs) to iteratively propose and refine statistical models, but these systems struggle on more challenging modeling tasks. To address these limitations, we introduce VESTA: Visual Exploration with S....

3 visitors online