Fixed bug in wrapping algorithm

WGUNDERWOOD · May 3, 2024 · 41d873a · 41d873a
1 parent 10000dc
commit 41d873a
Show file tree

Hide file tree

Showing 3 changed files with 140 additions and 132 deletions.
diff --git a/src/format.rs b/src/format.rs
@@ -12,7 +12,8 @@ pub fn format_file(file: &str, debug: bool) -> String {
     new_file = apply_indent(&new_file, debug);
 
     let mut wrap_tries = 0;
-    while needs_wrap(&file) && wrap_tries < MAX_WRAP_TRY {
+    while needs_wrap(&new_file) && wrap_tries < MAX_WRAP_TRY {
+        dbg!("wrapping");
         wrap_tries += 1;
         new_file = wrap(&new_file);
         new_file = remove_trailing_spaces(&new_file);

diff --git a/tests/phd_dissertation_in.tex b/tests/phd_dissertation_in.tex
@@ -352,23 +352,23 @@ \chapter{Introduction}
 % nonparametric estimation is good
 The benefits of the nonparametric framework are clear: statistical procedures
 can be formulated in cases where the stringent assumptions of parametric models
-are untestable, demonstrably violated, or simply unreasonable.
-As a consequence,
-the resulting methods often inherit desirable robustness properties against
-various forms of misspecification or misuse. The class of problems that can be
-formulated is correspondingly larger: arbitrary distributions and
-relationships can be characterized and estimated in a principled manner.
+are untestable, demonstrably violated, or simply unreasonable. As a
+consequence, the resulting methods often inherit desirable robustness
+properties against various forms of misspecification or misuse. The class of
+problems that can be formulated is correspondingly larger: arbitrary
+distributions and relationships can be characterized and estimated in a
+principled manner.
 
 % nonparametric estimation is hard
 Nonetheless, these attractive properties do come at a price. In particular, as
-its name suggests, the nonparametric approach forgoes the ability to reduce
-a complex statistical problem to that of estimating a fixed, finite number of
-parameters. Rather, nonparametric procedures typically involve making inferences
-about a growing number of parameters simultaneously, as witnessed in
+its name suggests, the nonparametric approach forgoes the ability to reduce a
+complex statistical problem to that of estimating a fixed, finite number of
+parameters. Rather, nonparametric procedures typically involve making
+inferences about a growing number of parameters simultaneously, as witnessed in
 high-dimensional regimes, or even directly handling infinite-dimensional
 objects such as entire regression or density functions. As a consequence,
-nonparametric estimators are usually less efficient than their
-correctly specified parametric counterparts, when they are available; rates of
+nonparametric estimators are usually less efficient than their correctly
+specified parametric counterparts, when they are available; rates of
 convergence tend to be slower, and confidence sets more conservative. Another
 challenge is that theoretical mathematical analyses of nonparametric estimators
 are often significantly more demanding than those required for low-dimensional
@@ -386,27 +386,26 @@ \chapter{Introduction}
 ubiquitous component of modern data science tool kits. Valid uncertainty
 quantification is essential for hypothesis testing, error bar construction,
 assessing statistical significance, and performing power analyses. Inference is
-a central concept in classical statistics, and despite the rapid
-recent development of theory for modern nonparametric estimators, their
-applicability to statistical inference is in certain cases rather less well
-studied; theoretically sound and practically implementable inference procedures
-are sometimes absent in the literature.
+a central concept in classical statistics, and despite the rapid recent
+development of theory for modern nonparametric estimators, their applicability
+to statistical inference is in certain cases rather less well studied;
+theoretically sound and practically implementable inference procedures are
+sometimes absent in the literature.
 
 % complex data
 In any statistical modeling problem, the selection and application of an
 estimator must naturally be tailored to the available data. Today, much of the
 data produced and analyzed does not necessarily fit neatly into the classical
 framework of independent and identically distributed samples, and instead might
-consist of time series, stochastic processes, networks,
-or high-dimensional or functional data, to name just a few.
-Therefore, it is important to understand how nonparametric methods might be
-adapted to correctly handle these data types, maintaining fast estimation rates
-and valid techniques for statistical inference. The technical challenges
-associated with such an endeavor are non-trivial; many standard techniques are
-ineffective in the presence of dependent or infinite-dimensional data, for
-example. As such, the development of new mathematical results in probability
-theory plays an important role in the comprehensive treatment of nonparametric
-statistics with complex data.
+consist of time series, stochastic processes, networks, or high-dimensional or
+functional data, to name just a few. Therefore, it is important to understand
+how nonparametric methods might be adapted to correctly handle these data
+types, maintaining fast estimation rates and valid techniques for statistical
+inference. The technical challenges associated with such an endeavor are
+non-trivial; many standard techniques are ineffective in the presence of
+dependent or infinite-dimensional data, for example. As such, the development
+of new mathematical results in probability theory plays an important role in
+the comprehensive treatment of nonparametric statistics with complex data.
 
 \section*{Overview of the dissertation}
 
@@ -438,23 +437,22 @@ \section*{Overview of the dissertation}
 % mondrian random forests
 One interesting such example is that of the Mondrian random forest, in which
 the underlying partitions (or trees) are constructed independently of the data.
-Naturally, this restriction rules out many classical random forest models, which
-exhibit a complex and data-dependent partitioning scheme. Instead, trees are
-sampled from a canonical stochastic process known as the Mondrian process,
+Naturally, this restriction rules out many classical random forest models,
+which exhibit a complex and data-dependent partitioning scheme. Instead, trees
+are sampled from a canonical stochastic process known as the Mondrian process,
 which endows the resulting tree and forest estimators with various agreeable
 features.
 
 % what we do
-We study the estimation and inference properties of Mondrian
-random forests in the nonparametric regression setting. In particular, we
-establish a novel central limit theorem for the estimates made by a Mondrian
-random forest which, when combined with a characterization of the bias and a
-consistent variance estimator, allows one to perform asymptotically valid
-statistical inference, such as constructing confidence intervals, on the
-unknown regression function. We also provide a debiasing procedure for Mondrian
-random forests, which allows them to achieve minimax-optimal estimation rates
-with H{\"o}lder smooth regression functions, for any smoothness parameter and
-in arbitrary dimension.
+We study the estimation and inference properties of Mondrian random forests in
+the nonparametric regression setting. In particular, we establish a novel
+central limit theorem for the estimates made by a Mondrian random forest which,
+when combined with a characterization of the bias and a consistent variance
+estimator, allows one to perform asymptotically valid statistical inference,
+such as constructing confidence intervals, on the unknown regression function.
+We also provide a debiasing procedure for Mondrian random forests, which allows
+them to achieve minimax-optimal estimation rates with H{\"o}lder smooth
+regression functions, for any smoothness parameter and in arbitrary dimension.
 
 % kernel
 Chapter~\ref{ch:kernel}, titled ``Dyadic Kernel Density Estimators,'' is based
@@ -470,24 +468,24 @@ \section*{Overview of the dissertation}
 complex structure present in the network.
 
 % broad scope
-We focus on nonparametric estimation and inference with dyadic
-data, and in particular we seek methods that are robust in the sense that our
-results should hold uniformly across the support of the data. Such uniformity
-guarantees allow for statistical inference in a broader range of settings,
-including specification testing and distributional counterfactual analysis. We
-specifically consider the problem of uniformly estimating a dyadic
-density function, focusing on kernel estimators taking the form of dyadic
-empirical processes.
+We focus on nonparametric estimation and inference with dyadic data, and in
+particular we seek methods that are robust in the sense that our results should
+hold uniformly across the support of the data. Such uniformity guarantees allow
+for statistical inference in a broader range of settings, including
+specification testing and distributional counterfactual analysis. We
+specifically consider the problem of uniformly estimating a dyadic density
+function, focusing on kernel estimators taking the form of dyadic empirical
+processes.
 
 % main contributions
 Our main contributions include the minimax-optimal uniform convergence rate of
 the dyadic kernel density estimator, along with strong approximation results
 for the associated standardized and Studentized $t$-processes. A consistent
-variance estimator enables the construction of feasible uniform
-confidence bands for the unknown density function. We showcase the broad
-applicability of our results by developing novel counterfactual density
-estimation and inference methodology for dyadic data, which can be used for
-causal inference and program evaluation.
+variance estimator enables the construction of feasible uniform confidence
+bands for the unknown density function. We showcase the broad applicability of
+our results by developing novel counterfactual density estimation and inference
+methodology for dyadic data, which can be used for causal inference and program
+evaluation.
 % why it is difficult
 A crucial feature of dyadic distributions is that they may be ``degenerate'' at
 certain points in the support of the data, a property that makes our analysis
@@ -496,12 +494,11 @@ \section*{Overview of the dissertation}
 % applications
 For implementation purposes, we discuss inference procedures based on positive
 semi-definite covariance estimators, mean squared error optimal bandwidth
-selectors, and robust bias correction. We illustrate the empirical
-performance of our methods in simulations and with
-real-world trade data, for which we make comparisons between observed and
-counterfactual trade distributions in different years. Our technical results
-on strong approximations and maximal inequalities are of potential
-independent interest.
+selectors, and robust bias correction. We illustrate the empirical performance
+of our methods in simulations and with real-world trade data, for which we make
+comparisons between observed and counterfactual trade distributions in
+different years. Our technical results on strong approximations and maximal
+inequalities are of potential independent interest.
 
 % yurinskii
 Finally, Chapter~\ref{ch:yurinskii}, titled ``Yurinskii's Coupling for