20260422:更新

2026-04-22 16:56:53 +08:00
parent dd8345a6ea
commit 0b1535dfaf
34 changed files with 4111 additions and 19 deletions
--- a/raw/papers/nikolopoulos-spurious-predictability-2026.md
+++ b/raw/papers/nikolopoulos-spurious-predictability-2026.md
@@ -0,0 +1,92 @@
+# Spurious Predictability in Financial Machine Learning
+
+**Authors:** Sotirios D. Nikolopoulos  
+**arXiv ID:** 2604.15531v1  
+**Published:** 2026-04-16  
+**Categories:** q-fin.ST, stat.ME, stat.ML  
+**Comments:** 49 pages, 10 figures. The QuantAudit R package and full replication scripts will be made publicly available upon journal publication  
+**Subjects:** Statistical Finance (q-fin.ST); Methodology (stat.ME); Machine Learning (stat.ML)  
+**MSC classes:** 91G70, 62P20, 62M20, 68T05  
+**DOI:** https://doi.org/10.48550/arXiv.2604.15531
+
+## Abstract
+
+Adaptive specification search generates statistically significant backtests even under martingale-difference nulls. We introduce a falsification audit testing complete predictive workflows against synthetic reference classes, including zero-predictability environments and microstructure placebos. Workflows generating significant walk-forward evidence in these environments are falsified. For passing workflows, we quantify selection-induced performance inflation using an absolute magnitude gap linking optimized in-sample evidence to disjoint walk-forward realizations, adjusted for effective multiplicity. Simulations validate extreme-value scaling under correlated searches and demonstrate detection power under genuine structure. Empirical case studies confirm that many apparent findings represent methodological artifacts rather than genuine predictability.
+
+## Key Concepts
+
+### 1. Spurious Predictability
+The phenomenon where adaptive specification search (data mining, model selection, hyperparameter tuning) can generate statistically significant backtest results even when the underlying data-generating process has no genuine predictive structure (martingale-difference nulls).
+
+### 2. Falsification Audit
+A methodological framework for testing complete predictive workflows against synthetic reference classes:
+- **Zero-predictability environments**: Simulated data with no genuine predictive structure
+- **Microstructure placebos**: Realistic but non-predictive market microstructure features
+
+### 3. Selection-Induced Performance Inflation
+The bias introduced by model selection and optimization, quantified as the gap between:
+- Optimized in-sample performance
+- Out-of-sample (walk-forward) performance on disjoint data
+
+### 4. Effective Multiplicity
+Adjustment for the multiple comparisons problem in adaptive specification search, accounting for correlated search paths and dependencies between model specifications.
+
+## Methodology
+
+### Falsification Framework
+1. **Reference class construction**: Create synthetic environments with known properties
+2. **Workflow testing**: Apply the complete predictive workflow to reference classes
+3. **Falsification criteria**: Reject workflows that show significant predictive power in zero-predictability environments
+
+### Performance Gap Quantification
+For workflows that pass falsification tests:
+1. **In-sample optimization**: Measure performance on training data
+2. **Walk-forward validation**: Test on disjoint out-of-sample periods
+3. **Gap calculation**: Compute absolute magnitude difference adjusted for effective multiplicity
+
+## Empirical Findings
+
+### Case Studies
+The paper presents empirical case studies demonstrating that many apparent findings in financial machine learning represent methodological artifacts rather than genuine predictability.
+
+### Implications
+1. **Methodological rigor**: Need for robust validation frameworks
+2. **Publication bias**: Tendency to publish positive results without proper falsification
+3. **Replication crisis**: Similar challenges as in other empirical sciences
+
+## Technical Contributions
+
+### 1. QuantAudit R Package
+The authors will release an R package implementing the falsification audit framework.
+
+### 2. Statistical Framework
+- Extreme-value theory for correlated searches
+- Effective multiplicity adjustments
+- Walk-forward validation protocols
+
+### 3. Simulation Studies
+Validation of the framework's detection power under various data-generating processes.
+
+## Related Concepts
+
+- [[cramer-rao-lower-bound]] - Theoretical bounds on parameter estimation
+- [[computerized-adaptive-testing]] - Adaptive testing methodologies
+- [[symbolic-regression]] - Machine learning for discovering mathematical expressions
+- [[formal-verification]] - Formal methods for validation
+
+## References
+
+- arXiv: https://arxiv.org/abs/2604.15531
+- PDF: https://arxiv.org/pdf/2604.15531
+- HTML: https://arxiv.org/html/2604.15531v1
+
+## BibTeX
+
+```bibtex
+@article{nikolopoulos2026spurious,
+  title={Spurious Predictability in Financial Machine Learning},
+  author={Nikolopoulos, Sotirios D.},
+  journal={arXiv preprint arXiv:2604.15531},
+  year={2026}
+}
+```