Skip to content

Commit

Permalink
Fixed bug in wrapping algorithm
Browse files Browse the repository at this point in the history
  • Loading branch information
WGUNDERWOOD committed May 3, 2024
1 parent 10000dc commit 41d873a
Show file tree
Hide file tree
Showing 3 changed files with 140 additions and 132 deletions.
3 changes: 2 additions & 1 deletion src/format.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ pub fn format_file(file: &str, debug: bool) -> String {
new_file = apply_indent(&new_file, debug);

let mut wrap_tries = 0;
while needs_wrap(&file) && wrap_tries < MAX_WRAP_TRY {
while needs_wrap(&new_file) && wrap_tries < MAX_WRAP_TRY {
dbg!("wrapping");
wrap_tries += 1;
new_file = wrap(&new_file);
new_file = remove_trailing_spaces(&new_file);
Expand Down
115 changes: 56 additions & 59 deletions tests/phd_dissertation_in.tex
Original file line number Diff line number Diff line change
Expand Up @@ -352,23 +352,23 @@ \chapter{Introduction}
% nonparametric estimation is good
The benefits of the nonparametric framework are clear: statistical procedures
can be formulated in cases where the stringent assumptions of parametric models
are untestable, demonstrably violated, or simply unreasonable.
As a consequence,
the resulting methods often inherit desirable robustness properties against
various forms of misspecification or misuse. The class of problems that can be
formulated is correspondingly larger: arbitrary distributions and
relationships can be characterized and estimated in a principled manner.
are untestable, demonstrably violated, or simply unreasonable. As a
consequence, the resulting methods often inherit desirable robustness
properties against various forms of misspecification or misuse. The class of
problems that can be formulated is correspondingly larger: arbitrary
distributions and relationships can be characterized and estimated in a
principled manner.

% nonparametric estimation is hard
Nonetheless, these attractive properties do come at a price. In particular, as
its name suggests, the nonparametric approach forgoes the ability to reduce
a complex statistical problem to that of estimating a fixed, finite number of
parameters. Rather, nonparametric procedures typically involve making inferences
about a growing number of parameters simultaneously, as witnessed in
its name suggests, the nonparametric approach forgoes the ability to reduce a
complex statistical problem to that of estimating a fixed, finite number of
parameters. Rather, nonparametric procedures typically involve making
inferences about a growing number of parameters simultaneously, as witnessed in
high-dimensional regimes, or even directly handling infinite-dimensional
objects such as entire regression or density functions. As a consequence,
nonparametric estimators are usually less efficient than their
correctly specified parametric counterparts, when they are available; rates of
nonparametric estimators are usually less efficient than their correctly
specified parametric counterparts, when they are available; rates of
convergence tend to be slower, and confidence sets more conservative. Another
challenge is that theoretical mathematical analyses of nonparametric estimators
are often significantly more demanding than those required for low-dimensional
Expand All @@ -386,27 +386,26 @@ \chapter{Introduction}
ubiquitous component of modern data science tool kits. Valid uncertainty
quantification is essential for hypothesis testing, error bar construction,
assessing statistical significance, and performing power analyses. Inference is
a central concept in classical statistics, and despite the rapid
recent development of theory for modern nonparametric estimators, their
applicability to statistical inference is in certain cases rather less well
studied; theoretically sound and practically implementable inference procedures
are sometimes absent in the literature.
a central concept in classical statistics, and despite the rapid recent
development of theory for modern nonparametric estimators, their applicability
to statistical inference is in certain cases rather less well studied;
theoretically sound and practically implementable inference procedures are
sometimes absent in the literature.

% complex data
In any statistical modeling problem, the selection and application of an
estimator must naturally be tailored to the available data. Today, much of the
data produced and analyzed does not necessarily fit neatly into the classical
framework of independent and identically distributed samples, and instead might
consist of time series, stochastic processes, networks,
or high-dimensional or functional data, to name just a few.
Therefore, it is important to understand how nonparametric methods might be
adapted to correctly handle these data types, maintaining fast estimation rates
and valid techniques for statistical inference. The technical challenges
associated with such an endeavor are non-trivial; many standard techniques are
ineffective in the presence of dependent or infinite-dimensional data, for
example. As such, the development of new mathematical results in probability
theory plays an important role in the comprehensive treatment of nonparametric
statistics with complex data.
consist of time series, stochastic processes, networks, or high-dimensional or
functional data, to name just a few. Therefore, it is important to understand
how nonparametric methods might be adapted to correctly handle these data
types, maintaining fast estimation rates and valid techniques for statistical
inference. The technical challenges associated with such an endeavor are
non-trivial; many standard techniques are ineffective in the presence of
dependent or infinite-dimensional data, for example. As such, the development
of new mathematical results in probability theory plays an important role in
the comprehensive treatment of nonparametric statistics with complex data.

\section*{Overview of the dissertation}

Expand Down Expand Up @@ -438,23 +437,22 @@ \section*{Overview of the dissertation}
% mondrian random forests
One interesting such example is that of the Mondrian random forest, in which
the underlying partitions (or trees) are constructed independently of the data.
Naturally, this restriction rules out many classical random forest models, which
exhibit a complex and data-dependent partitioning scheme. Instead, trees are
sampled from a canonical stochastic process known as the Mondrian process,
Naturally, this restriction rules out many classical random forest models,
which exhibit a complex and data-dependent partitioning scheme. Instead, trees
are sampled from a canonical stochastic process known as the Mondrian process,
which endows the resulting tree and forest estimators with various agreeable
features.

% what we do
We study the estimation and inference properties of Mondrian
random forests in the nonparametric regression setting. In particular, we
establish a novel central limit theorem for the estimates made by a Mondrian
random forest which, when combined with a characterization of the bias and a
consistent variance estimator, allows one to perform asymptotically valid
statistical inference, such as constructing confidence intervals, on the
unknown regression function. We also provide a debiasing procedure for Mondrian
random forests, which allows them to achieve minimax-optimal estimation rates
with H{\"o}lder smooth regression functions, for any smoothness parameter and
in arbitrary dimension.
We study the estimation and inference properties of Mondrian random forests in
the nonparametric regression setting. In particular, we establish a novel
central limit theorem for the estimates made by a Mondrian random forest which,
when combined with a characterization of the bias and a consistent variance
estimator, allows one to perform asymptotically valid statistical inference,
such as constructing confidence intervals, on the unknown regression function.
We also provide a debiasing procedure for Mondrian random forests, which allows
them to achieve minimax-optimal estimation rates with H{\"o}lder smooth
regression functions, for any smoothness parameter and in arbitrary dimension.

% kernel
Chapter~\ref{ch:kernel}, titled ``Dyadic Kernel Density Estimators,'' is based
Expand All @@ -470,24 +468,24 @@ \section*{Overview of the dissertation}
complex structure present in the network.

% broad scope
We focus on nonparametric estimation and inference with dyadic
data, and in particular we seek methods that are robust in the sense that our
results should hold uniformly across the support of the data. Such uniformity
guarantees allow for statistical inference in a broader range of settings,
including specification testing and distributional counterfactual analysis. We
specifically consider the problem of uniformly estimating a dyadic
density function, focusing on kernel estimators taking the form of dyadic
empirical processes.
We focus on nonparametric estimation and inference with dyadic data, and in
particular we seek methods that are robust in the sense that our results should
hold uniformly across the support of the data. Such uniformity guarantees allow
for statistical inference in a broader range of settings, including
specification testing and distributional counterfactual analysis. We
specifically consider the problem of uniformly estimating a dyadic density
function, focusing on kernel estimators taking the form of dyadic empirical
processes.

% main contributions
Our main contributions include the minimax-optimal uniform convergence rate of
the dyadic kernel density estimator, along with strong approximation results
for the associated standardized and Studentized $t$-processes. A consistent
variance estimator enables the construction of feasible uniform
confidence bands for the unknown density function. We showcase the broad
applicability of our results by developing novel counterfactual density
estimation and inference methodology for dyadic data, which can be used for
causal inference and program evaluation.
variance estimator enables the construction of feasible uniform confidence
bands for the unknown density function. We showcase the broad applicability of
our results by developing novel counterfactual density estimation and inference
methodology for dyadic data, which can be used for causal inference and program
evaluation.
% why it is difficult
A crucial feature of dyadic distributions is that they may be ``degenerate'' at
certain points in the support of the data, a property that makes our analysis
Expand All @@ -496,12 +494,11 @@ \section*{Overview of the dissertation}
% applications
For implementation purposes, we discuss inference procedures based on positive
semi-definite covariance estimators, mean squared error optimal bandwidth
selectors, and robust bias correction. We illustrate the empirical
performance of our methods in simulations and with
real-world trade data, for which we make comparisons between observed and
counterfactual trade distributions in different years. Our technical results
on strong approximations and maximal inequalities are of potential
independent interest.
selectors, and robust bias correction. We illustrate the empirical performance
of our methods in simulations and with real-world trade data, for which we make
comparisons between observed and counterfactual trade distributions in
different years. Our technical results on strong approximations and maximal
inequalities are of potential independent interest.

% yurinskii
Finally, Chapter~\ref{ch:yurinskii}, titled ``Yurinskii's Coupling for
Expand Down
Loading

0 comments on commit 41d873a

Please sign in to comment.