diff --git a/DESCRIPTION b/DESCRIPTION index 24c741cc..4e2d1d04 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: redist -Version: 4.2.0.9999 -Date: 2023-04-24 +Version: 4.2.0 +Date: 2024-01-11 Title: Simulation Methods for Legislative Redistricting Authors@R: c( person("Christopher T.", "Kenny", email = "christopherkenny@fas.harvard.edu", role = c("aut", "cre")), diff --git a/R/redist_ms.R b/R/redist_ms.R index d7b841db..26247f4b 100644 --- a/R/redist_ms.R +++ b/R/redist_ms.R @@ -77,7 +77,7 @@ #' #' McCartan, C., & Imai, K. (2023). Sequential Monte Carlo for Sampling #' Balanced and Compact Redistricting Plans. *Annals of Applied Statistics* 17(4). -#' Available at \url{http://dx.doi.org/10.1214/23-AOAS1763}. +#' Available at \doi{10.1214/23-AOAS1763}. #' #' DeFord, D., Duchin, M., and Solomon, J. (2019). Recombination: A family of #' Markov chains for redistricting. arXiv preprint arXiv:1911.05725. diff --git a/R/redist_smc.R b/R/redist_smc.R index 9336e985..6d2a20e2 100644 --- a/R/redist_smc.R +++ b/R/redist_smc.R @@ -111,7 +111,7 @@ #' @references #' McCartan, C., & Imai, K. (2023). Sequential Monte Carlo for Sampling #' Balanced and Compact Redistricting Plans. *Annals of Applied Statistics* 17(4). -#' Available at \url{http://dx.doi.org/10.1214/23-AOAS1763}. +#' Available at \doi{10.1214/23-AOAS1763}. #' #' @examples \donttest{ #' data(fl25) diff --git a/cran-comments.md b/cran-comments.md index debbc377..e74c3803 100644 --- a/cran-comments.md +++ b/cran-comments.md @@ -1,19 +1,22 @@ ## Test environments -* local R installation (macOS), R 4.2.3 -* local R installation (Windows), R 4.2.2 -* ubuntu 20.04 (on GitHub Actions), (devel) -* ubuntu 20.04 (on GitHub Actions), (release) -* ubuntu 20.04 (on GitHub Actions), (old release) +* local R installation (macOS), R 4.3.2 +* local R installation (Windows), R 4.3.2 +* ubuntu 22.04 (on GitHub Actions), (release) +* ubuntu 22.04 (on GitHub Actions), (old release) * macOS-latest (on GitHub Actions), (release) * Windows (on GitHub Actions), (release) +* Windows (on Winbuilder), (devel and release) + ## R CMD check results 0 errors | 0 warnings | 0 notes ## Reverse Dependencies + There are no reverse dependencies to check. ## Additional Notes -* Fixes undefinited behavior found in CRAN checks. +* Fixes itemize braces Rd note for `redist.calc.frontier.size.Rd`, `redist.crsg.Rd`, and `redist.rsg.Rd`. +* Fixes "memory not mapped" error in `persily` function, which was due to be removed from the package in this release. diff --git a/docs/404.html b/docs/404.html index dce7cc96..cf3ea5f4 100644 --- a/docs/404.html +++ b/docs/404.html @@ -39,7 +39,7 @@ Common Arguments to `redist` Functions
@@ -62,13 +62,10 @@If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the “copyright” line and a pointer to where the full notice is found.
-<one line to give the program's name and a brief idea of what it does.>
-Copyright (C) <year> <name of author>
-
-This program is free software; you can redistribute it and/or modify
-it under the terms of the GNU General Public License as published by
-the Free Software Foundation; either version 2 of the License, or
-(at your option) any later version.
-
-This program is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-GNU General Public License for more details.
-
-You should have received a copy of the GNU General Public License along
-with this program; if not, write to the Free Software Foundation, Inc.,
-51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
<one line to give the program's name and a brief idea of what it does.>
+Copyright (C) <year> <name of author>
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License along
+with this program; if not, write to the Free Software Foundation, Inc.,
+51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this when it starts in an interactive mode:
-69, Copyright (C) year name of author
- Gnomovision version for details type `show w'.
- Gnomovision comes with ABSOLUTELY NO WARRANTY; This is free software, and you are welcome to redistribute it
-under certain conditions; type `show c' for details.
Gnomovision version 69, Copyright (C) year name of author
+Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+This is free software, and you are welcome to redistribute it
+under certain conditions; type `show c' for details.
The hypothetical commands show w
and show c
should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than show w
and show c
; they could even be mouse-clicks or menu items–whatever suits your program.
You should also get your employer (if you work as a programmer) or your school, if any, to sign a “copyright disclaimer” for the program, if necessary. Here is a sample; alter the names:
-in the program
- Yoyodyne, Inc., hereby disclaims all copyright interest `Gnomovision' (which makes passes at compilers) written by James Hacker.
-
-<signature of Ty Coon>, 1 April 1989
-Ty Coon, President of Vice
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+`Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+<signature of Ty Coon>, 1 April 1989
+Ty Coon, President of Vice
This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License.
@@ -190,7 +187,7 @@
+library(redist)
+library(ggplot2)
+library(dplyr)
+library(patchwork)
+# set seed for reproducibility
+set.seed(1)
The redist
package is designed to allow for replicable
+redistricting simulations. This vignette covers the Flip Markov Chain
+Monte Carlo method discussed in: Automated
+Redistricting Simulation Using Markov Chain Monte Carlo.
data(fl25)
+data(fl25_enum)
+plan <- fl25_enum$plans[, 7241]
+fl25$plan <- plan
+fl_map <- redist_map(fl25, existing_plan = plan, pop_tol = 0.2, total_pop = pop)
+#> Projecting to CRS 3857
+constr <- redist_constr(fl_map) %>%
+ add_constr_edges_rem(0.02)
+set.seed(1)
+sims <- redist_flip(map = fl_map, nsims = 6, constraints = constr)
+#>
+#>
[36m──
[39m
[1m
[31mredist_flip()
[39m
[22m
[36m───────────────────────────────────────────────────────────────
[39m
+#>
+#> ──
[1m
[1m
[31mAutomated Redistricting Simulation Using Markov Chain Monte Carlo
[39m
[1m
[22m ──
+#>
[36mℹ
[39m Preprocessing data.
+#>
[36mℹ
[39m Starting swMH().
+#>
+
[32m■■■■■■
[39m 17% | ETA: 0s | MH Acceptance: 1.00
+
+
[32m■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
[39m 100% | ETA: 0s | MH Acceptance: 1.00
The flip
algorithm is one of the more straightforward
+redistricting algorithms. Beginning with an initial partition of a
+graph, it proposes flipping a node from one partition to an adjacent
+partition. By checking that the proposed flip meets basic constraints,
+such as keeping partitions contiguous and staying within a certain
+population parity, it ensures that all proposed new partitions are also
+valid partitions. The implementation within redist
is a bit
+more advanced that this, as it allows for multiple flips and rejecting
+valid partitions based on a Metropolis Hastings algorithm. The following
+walks through the basics of this algorithm to provide an introduction to
+using flip
correctly and efficiently.
Suppose we are redistricting this small map above on the left. To use
+the flip
algorithm, we need to consider the adjacency graph
+that underlies this map, which is above on the right. Each of the 25
+precincts on the left are displayed as a node on the right, connected if
+they are contiguous on the map. If we use the above district as an
+initial plan, we can then run flip
for a few steps.
While this map is extremely small, the five iterations give a basic +idea of what is going on behind the scenes. At each iteration, it +searches the boundary for possible swaps, selects one, and then accepts +or rejects the proposals. With very weak constraints, like those used to +create the above example, almost every swap is accepted. Even then, +though, it doesn’t guarantee that some iterations won’t repeat other +plans sampled. In fact, in the above, the second iteration is the same +plan as the initialization.
+This possibility is very important for ensuring that the sampled +plans are representative of the desired target distribution, which is +controlled by the constraints chosen. The possible constraints are +discussed below, as is information on setting up simulations and some +advice on ensuring that your simulations are efficient.
+ +Flip is incredibly powerful for local exploration. If you can make +large changes to a summary statistic of interest without making large +changes to the map itself, this may tell an important story of what went +into making the map.
+Flip is one of the easiest to understand algorithms and has +theoretical guarantees behind it. This can make it especially useful +when the audience of interest does not have an advanced background in +mathematics or statistics.
+Flip has the power to make less compact maps than many other +algorithms. This can be especially powerful when a blind allegiance to +compactness makes otherwise viable plans appear to be outliers.
+Our implementation of flip
has many more Gibbs
+constraints than our other implementations. This can allow you to
+consider different forms of partisan
and
+countysplit
constraints among others.
However, with these strengths do come weaknesses. Like most Markov
+Chain Monte Carlo methods, convergence can’t be shown, it can only be
+suggested. Diagnostics, like those in the section on
+diagnostic plots, can help ensure that convergence is likely, but
+can never show that it has indeed happened. Additionally,
+flip
makes relatively small moves per iteration, so many
+more iterations are needed to move around the space. If your map is
+particularly large, you may require several hundred iterations to make
+the map substantively different, which leads to thinning
+the chain, which is dropping many sequential iterations. However,
+thinning
doesn’t make the algorithm more efficient, so you
+still need to work through those plans, which comes with a time
+cost.
One of the keys to ensuring good performance is the choice of
+initialization. In some cases, a starting point may be obvious, such as
+when you want to explore the local area around an existing map. If
+that’s the use case, then it is straightforward to use that plan as the
+starting point. However, if the goal is to understand the larger space
+of possibilities, then starting from just one map can be misleading.
+Why? Since constraint tuning is not a perfect science, you could be
+setting the constraints too strong and, if that map is very good on some
+dimension, the flip
algorithm may have difficulty getting
+away from that point without a very large number of iterations.
Our implementation defaults to using the Sequential Monte Carlo (SMC)
+algorithm via redist_smc()
to create an initial partition
+of the districts, if no district is provided.
While the implementations of Random Seed and Grow (RSG
)
+and Compact Random Seed and Grow (CRSG
) via
+redist.rsg()
and redist.crsg
do not sample
+from a defined target distribution, they can serve as useful
+initializations for flip
as they help provide a more
+diverse set of starting states. SMC
is often faster and
+provides more theoretical guarantees, but tends to sample very compact
+districts, even when decreasing the compactness constraint. As such,
+when trying to decide if chains have likely converged or not, it can be
+misleading to only check chains that start from very compact states.
With the basics of what the flip
algorithm is doing
+down, we can proceed into how to use the algorithm.
To begin running the MCMC algorithm, we have to provide some basic
+information, typically beginning with a shapefile. The below loads an
+Iowa dataset included within the redist
package and plots
+the actual congressional districts from 2012-2021. (Iowa is a favorite
+choice for redistricting simulation examples, as it requires keeping
+counties together in plans which allows us to use the counties as the
+unit for redistricting, rather than thousands of precincts.)
+data(iowa)
+redist.plot.map(iowa, plan = cd_2010)
+map_ia <- redist_map(iowa, existing_plan = cd_2010, pop_tol = 0.05)
From there, we need to build an adjacency graph which identifies
+which counties are touching which other counties on a map. If you have
+an existing plan, it’s generally advised to supply this to the optional
+plan
argument to ensure that the existing plan is a valid,
+connected plan. If you get a warning, the geomander R package
+can help solve potential issues.
In addition, we need population for each unit. We’ve included
+iowa$pop
as the total population as of the 2010 Census.
+From there, we have the basic information that we need to run our first
+simulation. The below indicates that we are simulating 1000 plans (with
+nsims
) for the state of Iowa that have at most a population
+parity deviation of 0.05 (with pop_tol
).
sims <- redist_flip(map_ia, nsims = 100)
+#>
+#>
[36m──
[39m
[1m
[31mredist_flip()
[39m
[22m
[36m───────────────────────────────────────────────────────────────
[39m
+#>
+#> ──
[1m
[1m
[31mAutomated Redistricting Simulation Using Markov Chain Monte Carlo
[39m
[1m
[22m ──
+#>
[36mℹ
[39m Preprocessing data.
+#>
[36mℹ
[39m Starting swMH().
+#>
+
[32m■
[39m 1% | ETA: 0s
+
+
[32m■■■■■■■■■■■■■■■■■■■■■■■■■■
[39m 84% | ETA: 0s | MH Acceptance: 0.62
+
+
[32m■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
[39m 100% | ETA: 0s | MH Acceptance: 0.65
The printed output can be silenced by setting
+verbose = FALSE
, however it displays very important
+information. First, it displays when preprocessing begins and when the
+algorithm actually starts. Each 10% of the way through the
+flip
algorithm, it outputs the current estimated Metropolis
+acceptance. Here, we’ve specified no Gibbs constraints, so the
+acceptance will always be near 100%.
The output is an object of class redist
.
+class(sims)
+#> [1] "redist_plans" "tbl_df" "tbl" "data.frame"
The sims
object includes various pieces of information
+that were tracked while simulating, but we focus on
+get_plans_matrix(sims)
, which is a matrix that contains the
+plans.
+dim(get_plans_matrix(sims))
+#> [1] 99 101
Checking the dimensions shows that each plan is saved as a column, +where each row is a precinct. From this, we can extract a single plan as +we would from a normal matrix, like below, where we plot the final +simulated plan.
+
+redist.plot.map(shp = iowa, plan = get_plans_matrix(sims)[, 100])
Now, this plan is incredibly non-compact, which can be an issue.
+However, we should expect this type of outcome, as we didn’t include a
+compactness constraint while simulating. Thus, the only things checked
+were contiguity and that no plan would be outside of the
+pop_tol
set above. Since there are many more non-compact
+plans than compact plans in the space of all redistricting plans, we end
+up with highly non-compact districts. We can fix this by specifying a
+constraint, as below:
constr <- redist_constr(map_ia) %>% add_constr_edges_rem(0.4)
+
+sims_comp <- redist_flip(map_ia, nsims = 100, constraints = constr)
+#>
+#>
[36m──
[39m
[1m
[31mredist_flip()
[39m
[22m
[36m───────────────────────────────────────────────────────────────
[39m
+#>
+#> ──
[1m
[1m
[31mAutomated Redistricting Simulation Using Markov Chain Monte Carlo
[39m
[1m
[22m ──
+#>
[36mℹ
[39m Preprocessing data.
+#>
[36mℹ
[39m Starting swMH().
+#>
+
[32m■
[39m 1% | ETA: 0s
+
+
[32m■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
[39m 100% | ETA: 0s | MH Acceptance: 0.61
The first arguments as the same, but this adds three key arguments.
+First, setting constraint
to any combination of the nine
+implemented constraints allows us to specify the target distribution.
+Setting constraintweights = 0.4
means that we want to put a
+relatively weak weight on the compactness, though a weak constraint
+still does a lot of work. There are four compact
+constraints implemented currently. The recommended is to use
+edges-removed
because it can be calculated very
+quickly.
If we plot the final map sampled from the above code, we can see that +it is far more compact.
+
+redist.plot.map(shp = iowa, plan = get_plans_matrix(sims_comp)[, 100])
When running larger redistricting analyses, one important step is to +run multiple chains of the MCMC algorithm. This will also allow us to +diagnose convergence better, using the Gelman-Rubin plot, as seen in the +section on Diagnostic Plots.
+On Windows and in smaller capacities, it is useful to run the
+algorithm within an lapply
loop. First, we set up the seed
+for replicability and decide on the number of chains and
+simulations.
+set.seed(1)
+nchains <- 4
+nsims <- 100
Here, we opt to initialize using the SMC
algorithm. When
+we want to initialize without providing an initial partition, we need to
+specify the number of districts, ndists
.
+constr <- redist_constr(map_ia) %>% add_constr_edges_rem(0.4)
+map_ia <- redist_map(iowa, ndists = 4, pop_tol = 0.05)
+flip_chains <- lapply(1:nchains, function(x){
+ redist_flip(map_ia, nsims = nsims,
+ constraints = constr, verbose = FALSE)
+})
In Unix-based systems, this can be run considerably faster by running +this in parallel.
+
+mcmc_chains <- parallel::mclapply(1:nchains, function(x){
+ redist_flip(map_ia, nsims = nsims,
+ constraints = constr, verbose = FALSE)
+}, mc.set.seed = 1, mc.cores = parallel::detectCores())
redist_flip()
+The new, tidy interface to functions with redist
+introduces a pair of key objects, redist_map
and
+redist_plans
. The Get Started
+page goes into depth about these, but this shows the basics of how
+to work with the flip
algorithm within the newer
+interface.
As in the standard interface, we need a data set to work with. This +example will also follow with using the included Iowa data.
+
+data(iowa)
Rather than building the adjacency graph manually, here we can set
+this up using redist_map
which will build it an add it as a
+column.
+iowa_map <- redist_map(iowa, existing_plan = cd_2010, pop_tol=0.01)
We set a population tolerance of 1%. While this is generally a good
+population parity tolerance for most simulations, be careful when using
+the default within flip
. If your starting partition sits
+outside of that population deviation, flip
may take a
+very, very long time to find a valid partition to
+flip.
Now, we can pass the redist_map
object to
+redist_flip
to begin simulating.
tidy_sims <- redist_flip(iowa_map, nsims = 100)
+#>
+#>
[36m──
[39m
[1m
[31mredist_flip()
[39m
[22m
[36m───────────────────────────────────────────────────────────────
[39m
+#>
+#> ──
[1m
[1m
[31mAutomated Redistricting Simulation Using Markov Chain Monte Carlo
[39m
[1m
[22m ──
+#>
[36mℹ
[39m Preprocessing data.
+#>
[36mℹ
[39m Starting swMH().
+#>
+
[32m■
[39m 1% | ETA: 0s
+
+
[32m■■■■■■■■■■■■■■■■■■■■■■■■
[39m 77% | ETA: 0s | MH Acceptance: 0.79
+
+
[32m■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
[39m 100% | ETA: 0s | MH Acceptance: 0.80
redist_flip
’s constraint includes a relatively weak
+compactness constraint by default because simulating compact maps is far
+more efficient and completely non-compact maps are not super useful for
+most purposes.
You can override this by making a blank redist_constr
+object
+cons <- redist_constr(iowa_map)
Then, you can pass this to redist_flip
.
tidy_sims_no_comp <- redist_flip(iowa_map, nsims = 100, constraints = cons)
+#>
+#>
[36m──
[39m
[1m
[31mredist_flip()
[39m
[22m
[36m───────────────────────────────────────────────────────────────
[39m
+#>
+#> ──
[1m
[1m
[31mAutomated Redistricting Simulation Using Markov Chain Monte Carlo
[39m
[1m
[22m ──
+#>
[36mℹ
[39m Preprocessing data.
+#>
[36mℹ
[39m Starting swMH().
+#>
+
[32m■■■■■■■■■■■
[39m 34% | ETA: 0s | MH Acceptance: 1.00
+
+
[32m■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
[39m 100% | ETA: 0s | MH Acceptance: 0.97
redist_flip
outputs a redist_plans
+object.
+class(tidy_sims)
+#> [1] "redist_plans" "tbl_df" "tbl" "data.frame"
To extract the plans, use get_plans_matrix().
+plans <- get_plans_matrix(tidy_sims)
Alternatively, you can directly use functions on the
+redist_plans
object. For example, if we want to measure the
+competitiveness of each plan:
+tidy_sims <- tidy_sims %>%
+ mutate(competitiveness = compet_talisman(pl(), iowa_map, rvote = rep_08, dvote = dem_08))
+tidy_sims %>%
+ ggplot(aes(x = competitiveness)) +
+ geom_density() +
+ theme_bw()
For more information on using redist_plans
objects, see
+the Get Started page.
When using the MCMC algorithms, there are various useful diagnostic
+plots. The redist.diagplot
function creates familiar plots
+by converting numeric entries into mcmc
objects to use with
+coda
.
We use the dissimilarity index in Massey and Denton 1988 as a summary
+statistic for the following examples. This can be computed with
+seg_dissim
. In this case, we create a Republican
+dissimilarity index. We can work with two examples, the first is a
+single vector of the segregation index, while the second is a list of
+vectors, with one vector for each chain.
+seg <- by_plan(seg_dissim(tidy_sims, iowa_map, rep_08, pop))
The first three plots only need a single index.
+
+redist.diagplot(seg, plot = "autocorr")
+redist.diagplot(seg, plot = "densplot")
+redist.diagplot(seg, plot = "mean")
As examples for the next two plots, we can use the example above +which ran 4 chains. This is the same index, but computed for each +chain.
+
+seg_chains <- lapply(1:nchains, function(i) {
+ seg_dissim(flip_chains[[i]], iowa_map, rep_08, pop)
+})
+redist.diagplot(sumstat = seg_chains, plot = "trace")
+redist.diagplot(sumstat = seg_chains, plot = 'gelmanrubin')
When using the flip
algorithm, the most important and
+difficult step is setting the right constraint weights. While there may
+be some general pieces of advice for doing so, no advice can replace
+working with your data. The bottom line is that every data set is a bit
+different. What works for one state’s redistricting process, with the
+data specific to that state at that time may not transfer to another
+state or municipality or school district. The general process of finding
+what works might be very similar, but getting the right set of
+constraint weights and other parameters will vary immensely. Even
+starting from a different plan within the same time and place can change
+the weights that perform best. Like most things, the key to tuning
+flip
is patience. Going for a full scale simulation without
+testing some parameter configurations is likely an inefficient use of
+time and computing power.
The following highlights some advice on how to tune flip
+to make it work for your particular redistricting problem. For the
+advice, we’ll use the following example:
data(iowa)
+iowa_map <- redist_map(iowa, existing_plan = cd_2010, pop_tol = 0.02, total_pop = pop)
+
+cons <- redist_constr(iowa_map) %>%
+ add_constr_edges_rem(0.5) %>%
+ add_constr_pop_dev(100)
+
+sims <- redist_flip(map = iowa_map, nsims = 100)
+#>
+#>
[36m──
[39m
[1m
[31mredist_flip()
[39m
[22m
[36m───────────────────────────────────────────────────────────────
[39m
+#>
+#> ──
[1m
[1m
[31mAutomated Redistricting Simulation Using Markov Chain Monte Carlo
[39m
[1m
[22m ──
+#>
[36mℹ
[39m Preprocessing data.
+#>
[36mℹ
[39m Starting swMH().
+#>
+
[32m■
[39m 1% | ETA: 0s
+
+
[32m■■■■■■■■■■■■■■■■■■
[39m 58% | ETA: 0s | MH Acceptance: 0.60
+
+
[32m■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
[39m 100% | ETA: 0s | MH Acceptance: 0.55
One of the first things to check when working with flip
+is the Metropolis Hastings ratio. It is printed to the console when
+verbose = TRUE
. If you have silenced printing or warnings,
+the output saves the Metropolis Hastings decisions. You can check the
+acceptance ratio in a redist_plans
object with
+mean(sims$mhdecisions, na.rm = TRUE)
+#> [1] 0.55
Reference plans included in the object will not have an
+mhdecision
, so you can remove them with
+na.rm = TRUE
.
The goal is to generally have the Metropolis Hastings ratio lie +between 20% and 40%. If simulating with only a single parameter, the +goal is generally to be near 40%, while with many parameters, you likely +want to be near 20%. If over the course of many simulations you find +yourself just above or just below, that probably isn’t a problem if the +simulations are in the right probability space.
+lambda
and eprob
+lambda
and eprob
both control the amount of
+movement within flip
. They can be very powerful things to
+increase. lambda
defaults to 0, while eprob
+defaults to 0.05. Each of these parameters leads to fairly small
+movements between sequential iterations of the algorithm.
+sims_new <- redist_flip(map = iowa_map, nsims = 100, constraints = cons,
+ eprob = 0.10, lambda = 2, verbose = FALSE)
+mean(sims_new$mhdecisions, na.rm = TRUE)
+#> [1] 0.46
In this example, we’ve increased each of these.
+lambda = 2
, up from its default of 0, while
+eprob = 0.10
, up from its default of 0.05
.
+What’s going on here can characterized fairly well by the Hamming
+distance between sequential runs.
+dists <- redist.distances(plans = get_plans_matrix(sims))$Hamming
+dists_new <- redist.distances(plans = get_plans_matrix(sims_new))$Hamming
+adj_dists <- rep(NA_integer_, 100)
+adj_dists_new <- rep(NA_integer_, 100)
+for(i in 1:100){
+ adj_dists[i] <- dists[i, i + 1]
+ adj_dists_new[i] <- dists_new[i, i + 1]
+}
+tibble(Hamming = c(adj_dists, adj_dists_new),
+ `lambda/eprob` = c(rep('0/0.05', 100), rep('2/0.10', 100))) %>%
+ ggplot() +
+ geom_density(aes(x = Hamming, color = `lambda/eprob`)) +
+ theme_bw()
lambda
controls the number of components swapped between
+each iterations, while eprob
controls the size of the
+swapped partitions. Increasing each of this values can be important for
+increasing the amount of movement between outputted plans. These can be
+adjusted automatically using adapt_lambda
and
+adapt_lambda
when starting a simulation, though adjusting
+them manually to fit your problem is better practice, as it leads to
+more control over the process.
pop_tol
+Sometimes a starting map sits in a neighborhood of maps that isn’t +very conducive to using it as a starting point. This is most often +characterized by running a single iteration that runs (seemingly) +forever. A typical fix for this is to weaken the population tolerance +and use a Gibbs constraint to pull the simulations back into the target +range. I’ve done this for the tuning example, even though it’s +unnecessary.
+After simulating, if there is a hard constraint to consider, we can +check the parities:
+
+sims <- sims %>% mutate(par = plan_parity(map = iowa_map))
And then we can subset to the correct space.
+ +With the right set of parameters, this will lead to a reasonable set +of simulations. In this case, we end up with about 10% of the +simulations when using a soft constraint, which is not uncommon. In +general, you want to aim for as low as a hard population parity as +possible, while using a strong weight on the Gibbs population when the +hard constraint is above what’s necessary. This helps maximize the +efficiency of your simulations, while allowing for additional movement +between neighborhoods of valid plans.
+More often than not, there are multiple constraints that are +important to a redistricting problem. There are two general paths to +success when working with more than one or two constraints.
+First, you might want to add one at a time, generally starting with
+the compactness constraint. If flip
doesn’t consider
+compactness at all, it has an unfortunate behavior of creating
+incredibly non-compact maps. However, with even a very weak compactness
+constraint, it performs very well in avoiding those maps that are so
+non-compact that they aren’t worthy of consideration. Then you can add
+the next constraints once at a time, weakening them a bit each time you
+add a new constraint. As above, you want to make sure that your
+acceptance rate is between 20% and 40%. If it’s too low, you won’t get
+sufficient movement around the probability space and if it’s too high,
+you likely aren’t characterizing the probability space you want to
+characterize.
The other way to tune is to run a simulation with a kitchen sink type +set up.
+
+cons <- redist_constr(iowa_map) %>%
+ add_constr_edges_rem(0.25) %>%
+ add_constr_pop_dev(50) %>%
+ add_constr_compet(10, rvote = rep_08, dvote = dem_08) %>%
+ add_constr_splits(10, admin = region)
Then we can run this for a relatively small number of iterations.
+sims <- redist_flip(iowa_map, 100, constraints = cons)
+#>
+#>
[36m──
[39m
[1m
[31mredist_flip()
[39m
[22m
[36m───────────────────────────────────────────────────────────────
[39m
+#>
+#> ──
[1m
[1m
[31mAutomated Redistricting Simulation Using Markov Chain Monte Carlo
[39m
[1m
[22m ──
+#>
[36mℹ
[39m Preprocessing data.
+#>
[36mℹ
[39m Starting swMH().
+#>
+
[32m■
[39m 1% | ETA: 0s
+
+
[32m■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
[39m 93% | ETA: 0s | MH Acceptance: 0.77
+
+
[32m■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
[39m 100% | ETA: 0s | MH Acceptance: 0.75
Now, the interesting this here is that adding more constraints +actually increased the acceptance probability. This is because +correlated constraints can guide the algorithm towards high probability +neighborhoods where there are multiple maps which could be considered! +To address this, we might want to increase the constraint weight +slightly across the board. Had the weights been far too low, we might +lower them, particularly on constraints that we are not too worried +about.
+
+cons <- cons <- redist_constr(iowa_map) %>%
+ add_constr_edges_rem(1.5) %>%
+ add_constr_pop_dev(100) %>%
+ add_constr_compet(40, rvote = rep_08, dvote = dem_08) %>%
+ add_constr_splits(20, admin = region)
+
+sims <- redist_flip(iowa_map, 100, constraints = cons)
+#>
+#> ── redist_flip() ───────────────────────────────────────────────────────────────
+#>
+#> ── Automated Redistricting Simulation Using Markov Chain Monte Carlo ──
+#> ℹ Preprocessing data.
+#> ℹ Starting swMH().
For example, this new set of constraints might be a good place to +simulate at.
+Notably, the process of tuning should be guided by the constraint +outputs and their relative values. The average compactness value of +edges removed that we’re constraining on has a summary like the +following:
+
+summary(sims$constraint_edges_removed, na.rm = TRUE)
+#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
+#> 33.00 36.00 36.50 38.18 41.00 46.00 4
The population constraint can be summarized as:
+
+summary(sims$constraint_pop_dev, na.rm = TRUE)
+#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
+#> 0.000007 0.000064 0.000129 0.000168 0.000215 0.000633 4
These are measured on completely different scales, so it shouldn’t be +surprising that population has a much higher weight. This is a constant +difficulty in tuning, as the total number of edges on a graph or the +volatility of the population isn’t something that’s easily standardized +and transferred between maps, unfortunately.
+ +Redistricting simulation is very much statistics rather than hard
+science. When working with flip
, or any redistricting
+sampler, there will be a component that resembles art. Each important
+variable needs to be included, but getting every variable to the correct
+target space is not necessarily easy. In general, it may be best to
+start with one or two constraints and slowly add them to the model. This
+can help ensure that one single constraint doesn’t dominate the entire
+process.
When starting off, it’s never a bad idea to run a single simulation
+to make sure that everything works. If it doesn’t do what you’re
+expecting, that’s much better than waiting for 1,000,000 iterations to
+run. If that works, try 100 or 1000. Only once you’ve seen that it’s
+moving and appears to be moving in reasonable directions should you try
+for those large numbers of simulations. Remember that running 1,000,000
+steps of flip
with completely useless parameters is not a
+very good use of time or computing power.
And finally, when in doubt, it never hurts to run a few extra +simulations. Once you know that the code is working, it shouldn’t cost +much at all to run just a few extra iterations or a few simulations from +new starting points. If the results agree with your prior findings, +that’s more support for them. If they disagree, then you know what could +be wrong and can run even more additional simulations to figure out +what’s right!
+ +There are 39 incorporated cities in King County, which together cover 19% of the population and 19% of the area of the county. The remainder is “unincorporated King County”. The county contains a significant @@ -217,8 +214,8 @@
redist_map
obj
#> 8 330427 EXCALIBUR UNINCORP 7 660 501 402 164 151 0.00689
#> 9 333087 FED 30-3087 FED 7 820 645 453 223 124 0
#> 10 333238 FED 30-3238 FED 7 997 674 356 161 90 0
-#> # … with 2,552 more rows, and 2 more variables:
-#> # geometry <MULTIPOLYGON [US_survey_foot]>, adj <list>
+#> # ℹ 2,552 more rows
+#> # ℹ 2 more variables: geometry <MULTIPOLYGON [US_survey_foot]>, adj <list>
This redist_map
object contains an adjacency graph for
the county. We can explore this graph, and zoom in on the city of
Seattle, using plot()
.
Looking at the information in the header, and comparing it to the
original king
object, we see that the number of districts
has been updated from 9 to 3, and the population tolerances have been
@@ -321,19 +318,19 @@
Under the hood, merge_by()
does several things. First,
it groups the shapefile by the provided key or keys (here,
city
). By default it also groups by existing districts, so
@@ -385,7 +382,7 @@
-cat(redist.splits(king_land$distr, king_land$city), "split cities\n")
+cat(splits_admin(king_land$distr, king_land, city), "split cities\n")
#> 11 11 11 11 11 11 11 11 11 split cities
king_land %>%
@@ -414,7 +411,7 @@ Freezing#> 8 1 101 KMR 0.0128 20460 15787 12954 7288 3476 <int [10]>
#> 9 1 101 LFP 0.0281 12598 9975 9465 6063 1992 <int [4]>
#> 10 1 101 SHL 0.00447 53007 42873 34346 20895 7184 <int [14]>
-#> # … with 1,958 more rows
The plot above shows which cities will be frozen together so that
they cannot be split. Notice that we merge by not just
unsplit_id
but also city
, so that adjacent
@@ -431,7 +428,7 @@
Notice how the Merged from another map...
line
@@ -554,7 +551,7 @@
Site built with pkgdown 2.0.6.
+Site built with pkgdown 2.0.7.
diff --git a/docs/articles/map-preproc_files/figure-html/city-distr-plot-1.png b/docs/articles/map-preproc_files/figure-html/city-distr-plot-1.png index 1390f5ef..7d2934d5 100644 Binary files a/docs/articles/map-preproc_files/figure-html/city-distr-plot-1.png and b/docs/articles/map-preproc_files/figure-html/city-distr-plot-1.png differ diff --git a/docs/articles/map-preproc_files/figure-html/core-plans-1.png b/docs/articles/map-preproc_files/figure-html/core-plans-1.png index 796ddde5..32e51751 100644 Binary files a/docs/articles/map-preproc_files/figure-html/core-plans-1.png and b/docs/articles/map-preproc_files/figure-html/core-plans-1.png differ diff --git a/docs/articles/map-preproc_files/figure-html/cores-1.png b/docs/articles/map-preproc_files/figure-html/cores-1.png index ed03a300..35a726c5 100644 Binary files a/docs/articles/map-preproc_files/figure-html/cores-1.png and b/docs/articles/map-preproc_files/figure-html/cores-1.png differ diff --git a/docs/articles/map-preproc_files/figure-html/king-land-1.png b/docs/articles/map-preproc_files/figure-html/king-land-1.png index 5c52ccf5..eab3d708 100644 Binary files a/docs/articles/map-preproc_files/figure-html/king-land-1.png and b/docs/articles/map-preproc_files/figure-html/king-land-1.png differ diff --git a/docs/articles/map-preproc_files/figure-html/king-water-1.png b/docs/articles/map-preproc_files/figure-html/king-water-1.png index 3319d5b0..6243474d 100644 Binary files a/docs/articles/map-preproc_files/figure-html/king-water-1.png and b/docs/articles/map-preproc_files/figure-html/king-water-1.png differ diff --git a/docs/articles/map-preproc_files/figure-html/unnamed-chunk-6-1.png b/docs/articles/map-preproc_files/figure-html/unnamed-chunk-6-1.png index ece5183b..29dc0760 100644 Binary files a/docs/articles/map-preproc_files/figure-html/unnamed-chunk-6-1.png and b/docs/articles/map-preproc_files/figure-html/unnamed-chunk-6-1.png differ diff --git a/docs/articles/map-preproc_files/figure-html/unsplit-plan-1.png b/docs/articles/map-preproc_files/figure-html/unsplit-plan-1.png index 309d6445..c4311678 100644 Binary files a/docs/articles/map-preproc_files/figure-html/unsplit-plan-1.png and b/docs/articles/map-preproc_files/figure-html/unsplit-plan-1.png differ diff --git a/docs/articles/map-preproc_files/figure-html/water-plot-1.png b/docs/articles/map-preproc_files/figure-html/water-plot-1.png index e9a4376b..51b89ead 100644 Binary files a/docs/articles/map-preproc_files/figure-html/water-plot-1.png and b/docs/articles/map-preproc_files/figure-html/water-plot-1.png differ diff --git a/docs/articles/redist.html b/docs/articles/redist.html index 292ed51c..16a1599a 100644 --- a/docs/articles/redist.html +++ b/docs/articles/redist.html @@ -40,7 +40,7 @@ @@ -63,13 +63,10 @@ Common Arguments to `redist` Functionsredist
#> 8 19015 Boone 4 26306 25194 202 505 20027 19448 103 260 13929
#> 9 19017 Brem… 1 24276 23459 186 239 18763 18242 155 137 12871
#> 10 19019 Buch… 1 20958 20344 59 243 15282 14979 32 128 10338
-#> # … with 89 more rows, and 5 more variables: dem_08 <dbl>, rep_08 <dbl>,
-#> # region <chr>, geometry <MULTIPOLYGON [US_survey_foot]>, adj <list>
+#> # ℹ 89 more rows
+#> # ℹ 5 more variables: dem_08 <dbl>, rep_08 <dbl>, region <chr>,
+#> # geometry <MULTIPOLYGON [US_survey_foot]>, adj <list>
This looks much the same as iowa
itself, but metadata
has been added, and there’s a new column, adj
.
-areas = as.numeric(units::set_units(sf::st_area(iowa_map$geometry), mi^2))
+areas = as.numeric(units::set_units(sf::st_area(iowa_map$geometry), mi^2))
plot(iowa_map, fill = pop / areas) +
scale_fill_viridis_c(name="Population density (people / sq. mi)",
trans="sqrt")
print(iowa_plans)
-#> A <redist_plans> containing 1000 sampled plans and 1 reference plan
+#> A <redist_plans> containing 1,000 sampled plans and 1 reference plan
#> Plans have 4 districts from a 99-unit map, and were drawn using Sequential
#> Monte Carlo.
#> With plans resampled from weights
-#> Plans matrix: int [1:99, 1:1001] 3 3 1 2 4 1 1 4 1 1 ...
+#> Plans matrix: int [1:99, 1:1001] 1 1 2 3 4 2 2 4 2 2 ...
#> # A tibble: 4,004 × 4
#> draw district total_pop chain
#> <fct> <int> <dbl> <int>
-#> 1 cd_2010 1 761548 NA
-#> 2 cd_2010 2 761624 NA
-#> 3 cd_2010 3 761612 NA
+#> 1 cd_2010 1 761612 NA
+#> 2 cd_2010 2 761548 NA
+#> 3 cd_2010 3 761624 NA
#> 4 cd_2010 4 761571 NA
-#> 5 1 1 765390 1
-#> 6 1 2 766481 1
-#> 7 1 3 760381 1
-#> 8 1 4 754103 1
-#> 9 2 1 758248 1
-#> 10 2 2 755611 1
-#> # … with 3,994 more rows
We can explore specific simulated plans with
redist.plot.plans()
.
@@ -504,7 +502,7 @@Analyzing the simulated plans
iowa_plans = match_numbers(iowa_plans, iowa_map$cd_2010) print(iowa_plans) -#> A <redist_plans> containing 1000 sampled plans and 1 reference plan +#> A <redist_plans> containing 1,000 sampled plans and 1 reference plan #> Plans have 4 districts from a 99-unit map, and were drawn using Sequential #> Monte Carlo. #> With plans resampled from weights @@ -516,13 +514,13 @@
Analyzing the simulated plans#> 2 cd_2010 2 761624 NA 1 #> 3 cd_2010 3 761612 NA 1 #> 4 cd_2010 4 761571 NA 1 -#> 5 1 1 766481 1 0.641 -#> 6 1 2 754103 1 0.641 -#> 7 1 3 760381 1 0.641 -#> 8 1 4 765390 1 0.641 -#> 9 2 1 758248 1 0.872 -#> 10 2 2 764095 1 0.872 -#> # … with 3,994 more rows
Then we can add summary statistics by district, using
redist
’s analysis functions. Here, we’ll compute the
population deviation, the perimeter-based compactness measure related to
@@ -533,7 +531,7 @@
Once summary statistics of interest have been calculated, it’s very important to check the algorithm’s diagnostics. As with any complex sampling algorithm, things can go wrong. Diagnostics, while not @@ -563,25 +561,25 @@
summary(iowa_plans)
#> SMC: 1,000 sampled plans of 4 districts on 99 units
-#> `adapt_k_thresh`=0.985 • `seq_alpha`=0.5
+#> `adapt_k_thresh`=0.99 • `seq_alpha`=0.5
#> `est_label_mult`=1 • `pop_temper`=0
-#> Plan diversity 80% range: 0.50 to 0.82
+#> Plan diversity 80% range: 0.45 to 0.81
#>
#> R-hat values for summary statistics:
#> pop_overlap pop_dev comp pct_min pct_dem
-#> 1.002 1.010 1.000 1.002 1.009
+#> 1.002 1.014 1.033 1.001 1.014
#> Sampling diagnostics for SMC run 1 of 2 (500 samples)
#> Eff. samples (%) Acc. rate Log wgt. sd Max. unique Est. k
-#> Split 1 492 (98.4%) 5.5% 0.26 322 (102%) 5
-#> Split 2 486 (97.3%) 10.1% 0.33 312 ( 99%) 3
-#> Split 3 480 (96.0%) 4.2% 0.42 275 ( 87%) 2
-#> Resample 416 (83.2%) NA% 0.38 425 (134%) NA
+#> Split 1 492 (98.4%) 5.6% 0.25 316 (100%) 5
+#> Split 2 484 (96.8%) 7.5% 0.36 304 ( 96%) 4
+#> Split 3 476 (95.1%) 3.0% 0.44 272 ( 86%) 3
+#> Resample 402 (80.5%) NA% 0.43 407 (129%) NA
#> Sampling diagnostics for SMC run 2 of 2 (500 samples)
#> Eff. samples (%) Acc. rate Log wgt. sd Max. unique Est. k
-#> Split 1 491 (98.3%) 5.7% 0.26 309 ( 98%) 5
-#> Split 2 483 (96.5%) 9.7% 0.39 309 ( 98%) 3
-#> Split 3 478 (95.7%) 4.4% 0.42 283 ( 90%) 2
-#> Resample 408 (81.5%) NA% 0.39 429 (136%) NA
+#> Split 1 491 (98.3%) 5.6% 0.26 309 ( 98%) 5
+#> Split 2 484 (96.8%) 6.8% 0.36 297 ( 94%) 4
+#> Split 3 480 (96.1%) 2.9% 0.42 264 ( 84%) 3
+#> Resample 425 (85.0%) NA% 0.39 424 (134%) NA
#> • Watch out for low effective samples, very low acceptance rates (less than
#> 1%), large std. devs. of the log weights (more than 3 or so), and low numbers
#> of unique plans. R-hat values for summary statistics should be between 1 and
@@ -609,7 +607,7 @@ Analyzing the simulated plans= max(pct_min),
dem_distr = sum(pct_dem > 0.5))
print(plan_sum)
-#> A <redist_plans> containing 1000 sampled plans and 1 reference plan
+#> A <redist_plans> containing 1,000 sampled plans and 1 reference plan
#> Plans have 4 districts from a 99-unit map, and were drawn using Sequential
#> Monte Carlo.
#> With plans resampled from weights
@@ -618,16 +616,16 @@ Analyzing the simulated plans#> draw max_dev avg_comp max_pct_min dem_distr
#> <fct> <dbl> <dbl> <dbl> <int>
#> 1 cd_2010 0.0000535 0.428 0.114 3
-#> 2 1 0.00983 0.361 0.118 3
-#> 3 2 0.00894 0.438 0.115 3
-#> 4 3 0.00322 0.426 0.128 3
-#> 5 4 0.00322 0.426 0.128 3
-#> 6 5 0.00468 0.401 0.126 3
-#> 7 6 0.00608 0.433 0.109 3
-#> 8 7 0.00910 0.380 0.113 3
-#> 9 8 0.00740 0.430 0.114 3
-#> 10 9 0.00810 0.504 0.110 3
-#> # … with 991 more rows
+#> 2 1 0.00554 0.435 0.118 3
+#> 3 2 0.00725 0.413 0.113 3
+#> 4 3 0.00997 0.330 0.128 3
+#> 5 4 0.00457 0.365 0.121 3
+#> 6 5 0.00894 0.508 0.115 3
+#> 7 6 0.00889 0.406 0.119 3
+#> 8 7 0.00562 0.350 0.117 3
+#> 9 8 0.00921 0.407 0.110 3
+#> 10 9 0.00984 0.375 0.119 3
+#> # ℹ 991 more rows
These tables of statistics are easily plotted using existing
libraries like ggplot2
, but redist
provides a
number of helpful plotting functions that automate some common tasks,
@@ -665,7 +663,7 @@
-pal = scales::viridis_pal()(5)[-1]
+pal = scales::viridis_pal()(5)[-1]
redist.plot.scatter(iowa_plans, pct_min, pct_dem,
color=pal[subset_sampled(iowa_plans)$district]) +
scale_color_manual(values="black")
@@ -719,7 +717,7 @@ Site built with pkgdown 2.0.6.
+Site built with pkgdown 2.0.7.
diff --git a/docs/articles/redist_files/figure-html/dev-comp-plot-1.png b/docs/articles/redist_files/figure-html/dev-comp-plot-1.png index 8411e67a..33808d29 100644 Binary files a/docs/articles/redist_files/figure-html/dev-comp-plot-1.png and b/docs/articles/redist_files/figure-html/dev-comp-plot-1.png differ diff --git a/docs/articles/redist_files/figure-html/ia-sim-plans-1.png b/docs/articles/redist_files/figure-html/ia-sim-plans-1.png index b60af887..0dbb1ac0 100644 Binary files a/docs/articles/redist_files/figure-html/ia-sim-plans-1.png and b/docs/articles/redist_files/figure-html/ia-sim-plans-1.png differ diff --git a/docs/articles/redist_files/figure-html/iowa-adj-1.png b/docs/articles/redist_files/figure-html/iowa-adj-1.png index 0f153e5d..16bca13a 100644 Binary files a/docs/articles/redist_files/figure-html/iowa-adj-1.png and b/docs/articles/redist_files/figure-html/iowa-adj-1.png differ diff --git a/docs/articles/redist_files/figure-html/iowa-chloro-1.png b/docs/articles/redist_files/figure-html/iowa-chloro-1.png index f2c4b8f3..753316e4 100644 Binary files a/docs/articles/redist_files/figure-html/iowa-chloro-1.png and b/docs/articles/redist_files/figure-html/iowa-chloro-1.png differ diff --git a/docs/articles/redist_files/figure-html/iowa-chloro-2.png b/docs/articles/redist_files/figure-html/iowa-chloro-2.png index f53de927..30652936 100644 Binary files a/docs/articles/redist_files/figure-html/iowa-chloro-2.png and b/docs/articles/redist_files/figure-html/iowa-chloro-2.png differ diff --git a/docs/articles/redist_files/figure-html/iowa-chloro-3.png b/docs/articles/redist_files/figure-html/iowa-chloro-3.png index a41d7352..0710bd5f 100644 Binary files a/docs/articles/redist_files/figure-html/iowa-chloro-3.png and b/docs/articles/redist_files/figure-html/iowa-chloro-3.png differ diff --git a/docs/articles/redist_files/figure-html/scatter-1.png b/docs/articles/redist_files/figure-html/scatter-1.png index 645d8d3a..5ad9f63b 100644 Binary files a/docs/articles/redist_files/figure-html/scatter-1.png and b/docs/articles/redist_files/figure-html/scatter-1.png differ diff --git a/docs/articles/redist_files/figure-html/signature-1.png b/docs/articles/redist_files/figure-html/signature-1.png index 40882f8b..bc3301b6 100644 Binary files a/docs/articles/redist_files/figure-html/signature-1.png and b/docs/articles/redist_files/figure-html/signature-1.png differ diff --git a/docs/authors.html b/docs/authors.html index 6a4760fc..a05fdf21 100644 --- a/docs/authors.html +++ b/docs/authors.html @@ -17,7 +17,7 @@ @@ -38,13 +38,10 @@ Common Arguments to `redist` FunctionsMcCartan C, Imai K (2020). +
McCartan C, Imai K (2023). “Sequential Monte Carlo for sampling balanced and compact redistricting plans.” -arXiv preprint. -https://arxiv.org/abs/2008.06131. +Annals of Applied Statistics, 17(4). +http://dx.doi.org/10.1214/23-AOAS1763.
@Article{mccartan2020, title = {Sequential Monte Carlo for sampling balanced and compact redistricting plans}, author = {Cory McCartan and Kosuke Imai}, - year = {2020}, - journal = {arXiv preprint}, - url = {https://arxiv.org/abs/2008.06131}, + year = {2023}, + volume = {17}, + number = {4}, + journal = {Annals of Applied Statistics}, + url = {http://dx.doi.org/10.1214/23-AOAS1763}, }
Fifield B, Imai K, Kawahara J, Kenny C (2020). “The essential role of empirical validation in legislative redistricting simulation.” @@ -178,7 +177,7 @@
Papers:
redist
is available on CRAN and can be installed using:
install.packages("redist")
You can also install the most recent development version of redist
(which is usually quite stable) using the `remotes`` package.
You can also install the most recent development version of redist
(which is usually quite stable) using the remotes
package.
if (!require(remotes)) install.packages("remotes")
remotes::install_github("alarm-redist/redist@dev", dependencies=TRUE)
After generating plans, you can use redist
’s plotting functions to study the geographic and partisan characteristics of the simulated ensemble.
library(ggplot2)
@@ -192,9 +186,10 @@ Getting started
iowa_plans = iowa_plans %>%
- mutate(Compactness = distr_compactness(iowa_map),
+ mutate(Compactness = comp_polsby(pl(), iowa_map),
`Population deviation` = plan_parity(iowa_map),
`Democratic vote` = group_frac(iowa_map, dem_08, tot_08))
+#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
hist(iowa_plans, `Population deviation`) + hist(iowa_plans, Compactness) +
plot_layout(guides="collect") +
@@ -256,7 +251,7 @@ Developers
@@ -272,7 +267,7 @@ Dev status
diff --git a/docs/news/index.html b/docs/news/index.html
index bfcb8fe9..bbd46121 100644
--- a/docs/news/index.html
+++ b/docs/news/index.html
@@ -17,7 +17,7 @@
NEWS.md
+
diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml
index 262cb296..1f960df5 100644
--- a/docs/pkgdown.yml
+++ b/docs/pkgdown.yml
@@ -1,11 +1,10 @@
-pandoc: 2.19.2
-pkgdown: 2.0.6
+pandoc: 3.1.1
+pkgdown: 2.0.7
pkgdown_sha: ~
articles:
common_args: common_args.html
+ flip: flip.html
map-preproc: map-preproc.html
- mcmc: mcmc.html
- mpi-slurm: mpi-slurm.html
redist: redist.html
-last_built: 2023-03-20T17:52Z
+last_built: 2024-01-12T01:39Z
diff --git a/docs/reference/EPSG.html b/docs/reference/EPSG.html
index 08751382..0eae57a2 100644
--- a/docs/reference/EPSG.html
+++ b/docs/reference/EPSG.html
@@ -21,7 +21,7 @@
@@ -42,13 +42,10 @@
Common Arguments to `redist` Functions
R/tidy.R
+ Source: R/redist_plans.R
add_reference.Rd
A dataframe output from redist.prep.polsbypopper
A dataframe output from redistmetrics::prep_perims
SMC Redistricting Sampler (McCartan and Imai 2020)
SMC Redistricting Sampler (McCartan and Imai 2023)
Redistricting via Compact Random Seed and Grow Algorithm
(Deprecated) Flip MCMC Redistricting Simulator
(Deprecated) Flip MCMC Redistricting Simulator using Simulated Annealing
Run parameter testing for redist.flip
Run parameter testing for redist_flip
`*`(<redist_scorer>)
`+`(<redist_scorer>)
`-`(<redist_scorer>)
Scoring function arithmetic
Combine scoring functions
scorer_group_pct()
scorer_pop_dev()
scorer_splits()
scorer_multisplits()
scorer_frac_kept()
scorer_polsby_popper()
scorer_status_quo()
Functions that help setup outputs for easier use
(Deprecated) Combine successive runs of redist.flip
(Deprecated) redist.combine.anneal
Combine successive runs of redist.mcmc.mpi
Other functions
Simulation Methods for Legislative Redistricting
redist: Simulation Methods for Legislative Redistricting
Extract the sampling information from a redistricting simulation
Local Plan Optimization
Pick One Plan from Many Plans
Access the Current redist_plans()
Object
redist.dist.pop.overlap()
Compare the Population Overlap Across Plans at the District Level
Display an interactive map
redist.prec.pop.overlap()
Compare the Population Overlap Across Plans at the Precinct Level
Prep Polsby Popper Perimeter Dataframe
redist_map
object is contiguousR/tidy.R
+ Source: R/map_helpers.R
is_contiguous.Rd
R/splits.R
+ Source: R/plans_helpers.R
is_county_split.Rd
R/tidy.R
+ Source: R/plans_helpers.R
last_plan.Rd
R/tidy.R
+ Source: R/numbering.R
match_numbers.Rd
R/tidy.R
+ Source: R/numbering.R
number_by.Rd
Useful inside piped expressions and dplyr
functions.
pl()
A redist_plans
object, or NULL
if not called from inside a
+dplyr
function.
pl()
+#> NULL
+
+
redist_map
# S3 method for redist_map
-plot(x, fill = NULL, by_distr = FALSE, adj = FALSE, interactive = FALSE, ...)
by_distr
.
-if TRUE
, show an interactive map in the viewer
-rather than a static map. Ignores adj
and by_distr
.
passed on to redist.plot.map
(or
-redist.plot.adj
if adj=TRUE
, or
-redist.plot.interactive
if interactive=TRUE
).
+redist.plot.adj
if adj=TRUE
).
Useful parameters may include zoom_to
, boundaries
, and
title
.
R/tidy.R
+ Source: R/plans_helpers.R
prec_assignment.Rd
prec_assignment(prec, .data = cur_plans())
prec_assignment(prec, .data = pl())
R/tidy.R
+ Source: R/plans_helpers.R
prec_cooccurrence.Rd
Merging map units through merge_by
or summarize
+
Merging map units through merge_by
or summarize
changes the indexing of each unit. Use this function to take a set of
redistricting plans from a redist
algorithm and re-index them to
be compatible with the original set of units.
R/redist-package.R
redist-package.Rd
Enables researchers to sample redistricting plans from a pre-specified target -distribution using Sequential Monte Carlo and Markov Chain Monte Carlo -algorithms. The package allows for the implementation of various constraints -in the redistricting process such as geographic compactness and population -parity requirements. Tools for analysis such as computation of various -summary statistics and plotting functionality are also included. The package -implements methods described in Fifield, Higgins, Imai and Tarr (2020) -doi:10.1080/10618600.2020.1739532 -, Fifield, Imai, Kawahara, and Kenny (2020) -doi:10.1080/2330443X.2020.1791773 -, and McCartan and Imai (2020) -arXiv:2008.06131.
+ +Enables researchers to sample redistricting plans from a pre-specified target distribution using Sequential Monte Carlo and Markov Chain Monte Carlo algorithms. The package allows for the implementation of various constraints in the redistricting process such as geographic compactness and population parity requirements. Tools for analysis such as computation of various summary statistics and plotting functionality are also included. The package implements the SMC algorithm of McCartan and Imai (2023) doi:10.1214/23-AOAS1763 +, the enumeration algorithm of Fifield, Imai, Kawahara, and Kenny (2020) doi:10.1080/2330443X.2020.1791773 +, the Flip MCMC algorithm of Fifield, Higgins, Imai and Tarr (2020) doi:10.1080/10618600.2020.1739532 +, the Merge-split/Recombination algorithms of Carter et al. (2019) arXiv:1911.01503 and DeFord et al. (2021) doi:10.1162/99608f92.eb30390f +, and the Short-burst optimization algorithm of Cannon et al. (2020) arXiv:2011.02288.
Barbu, Adrian and Song-Chun Zhu. (2005) "Generalizing Swendsen-Wang to -Sampling Arbitrary Posterior Probabilities." IEEE Transactions on -Pattern Analysis and Machine Intelligence.
-Fifield, Benjamin, Michael Higgins, Kosuke Imai and Alexander -Tarr. (2020) "Automated Redistricting Simulation Using Markov -Chain Monte Carlo." Available at -https://imai.fas.harvard.edu/research/files/redist.pdf.
-Swendsen, Robert and Jian-Sheng Wang. (1987) "Nonuniversal Critical -Dynamics in Monte Carlo Simulations." Physical Review Letters.
+Useful links:
List, four objects
maxnumeric, maximum frontier size
averagenumeric, average frontier size
average_sqnumeric, average((frontier size)^2)
sequencenumeric vector, lists out all sizes for every frontier
List, four objects
max
numeric, maximum frontier size
average
numeric, average frontier size
average_sq
numeric, average((frontier size)^2)
sequence
numeric vector, lists out all sizes for every frontier
R/tidy.R
, R/compactness.R
+ Source: R/tidy_deprecations.R
, R/deprecations.R
redist.compactness.Rd
it checks for an Rds, if no rds exists at the path,
it creates an rds with borders and saves it.
-This can be created in advance with redist.prep.polsbypopper
.
prep_perims()
.
A dataframe output from redist.prep.polsbypopper
A dataframe output from prep_perims()
.
R/tidy.R
, R/competitiveness.R
+ Source: R/tidy_deprecations.R
, R/deprecations.R
redist.competitiveness.Rd
list, containing three objects containing the completed redistricting -plan.
plan
A vector of length N, indicating the
+plan.
plan
: A vector of length N, indicating the
district membership of each precinct.
district_list
A list of length Ndistrict. Each list contains a
vector of the precincts in the respective district.
R/splits.R
+ Source: R/deprecations.R
redist.district.splits.Rd
data(iowa)
ia <- redist_map(iowa, existing_plan = cd_2010, total_pop = pop, pop_tol = 0.01)
plans <- redist_smc(ia, 50, silent = TRUE)
-splits <- redist.district.splits(plans, ia$region)
+#old redist.district.splits(plans, ia$region)
+splits_count(plans, ia, region)
+#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
+#> [1,] 3 2 2 3 2 3 3 3 3 3 3 2 3 3
+#> [2,] 2 3 2 2 4 2 2 2 2 2 3 1 2 2
+#> [3,] 2 2 2 2 2 2 2 2 1 2 3 2 2 2
+#> [4,] 2 2 3 2 2 2 2 2 2 2 2 3 2 2
+#> [5,] 4 3 3 3 3 3 3 4 3 3 3 3 3 3
+#> [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26]
+#> [1,] 3 3 3 2 3 4 2 2 2 3 2 3
+#> [2,] 2 2 3 3 2 2 3 1 1 2 4 2
+#> [3,] 2 2 3 2 1 2 2 2 2 2 2 2
+#> [4,] 2 2 2 2 2 2 2 3 3 2 2 2
+#> [5,] 3 3 3 3 3 3 3 3 3 3 4 3
+#> [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38]
+#> [1,] 3 3 3 2 3 3 3 3 2 3 2 3
+#> [2,] 2 3 2 3 3 2 4 3 3 2 1 1
+#> [3,] 2 2 1 2 2 1 1 2 2 1 2 3
+#> [4,] 2 3 2 2 2 2 2 2 2 2 3 3
+#> [5,] 3 3 3 3 3 4 2 4 4 4 3 2
+#> [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49] [,50]
+#> [1,] 3 2 2 2 2 3 3 3 3 3 3 3
+#> [2,] 2 3 3 3 3 2 2 3 3 2 3 2
+#> [3,] 2 2 2 2 2 2 2 2 1 1 2 1
+#> [4,] 2 2 2 2 3 2 2 3 2 2 2 2
+#> [5,] 4 3 3 3 3 4 4 3 2 3 4 3
+#> [,51]
+#> [1,] 3
+#> [2,] 2
+#> [3,] 2
+#> [4,] 2
+#> [5,] 2
Given a percent goal for majority minority districts, this computes the average
value of minority in non-majority minority districts. This value is "tgt_other"
-in redist.flip
and redist_smc
.
redist_flip
and redist_smc
.
R/redist_flip_tidy.R
+ Source: R/redist_flip.R
redist_flip.Rd
redist_flip
provides a tidy interface to the methods in
-redist.flip
.
This function allows users to simulate redistricting plans +using a Markov Chain Monte Carlo algorithm (Fifield, Higgins, Imai, and Tarr 2020). Several +constraints corresponding to substantive requirements in the redistricting +process are implemented, including population parity and geographic +compactness. In addition, the function includes multiple-swap and simulated +tempering functionality to improve the mixing of the Markov Chain.
A redist_constr
object.
The amount by which to thin the Markov Chain. The
default is 1
.
Whether to print initialization statement. Default is TRUE
.
Deprecated. Use thin
.
This function allows users to simulate redistricting plans -using a Markov Chain Monte Carlo algorithm (Fifield, Higgins, Imai, and Tarr 2020). Several -constraints corresponding to substantive requirements in the redistricting -process are implemented, including population parity and geographic -compactness. In addition, the function includes multiple-swap and simulated -tempering functionality to improve the mixing of the Markov Chain.
-redist_flip
allows for Gibbs constraints to be supplied via a list object
-passed to constraints
. This is a change from the original redist.flip
-behavior to allow for a more straightforward function call when used within a pipe.
-A key difference between redist_flip
and redist.flip
is that
+
redist_flip
allows for Gibbs constraints to be supplied via a list object
+passed to constraints
.
redist_flip
uses a small compactness constraint by default, as this improves
the realism of the maps greatly and also leads to large speed improvements.
(One of the most time consuming aspects of the flip MCMC backend is checking for
@@ -322,7 +324,7 @@
R/redist_flip_tidy.R
+ Source: R/redist_flip.R
redist_flip_anneal.Rd
R/redist_smc.R
redist_smc.Rd
redist_smc
uses a Sequential Monte Carlo algorithm (McCartan and Imai 2020)
-to generate nearly independent congressional or legislative redistricting
-plans according to contiguity, population, compactness, and administrative
-boundary constraints.
redist_smc
uses a Sequential Monte Carlo algorithm (McCartan and Imai 2023)
+to generate representative samples of congressional or legislative
+redistricting plans according to contiguity, population, compactness, and
+administrative boundary constraints.
This function draws nearly-independent samples from a specific target measure,
-controlled by the map
, compactness
, and constraints
parameters.
This function draws samples from a specific target measure controlled by
+the map
, compactness
, and constraints
parameters.
Key to ensuring good performance is monitoring the efficiency of the resampling
process at each SMC stage. Unless silent=FALSE
, this function will print
out the effective sample size of each resampling step to allow the user to
@@ -280,8 +277,10 @@
McCartan, C., & Imai, K. (2020). Sequential Monte Carlo for Sampling Balanced and Compact Redistricting Plans. -Available at https://arxiv.org/abs/2008.06131.
+McCartan, C., & Imai, K. (2023). Sequential Monte Carlo for Sampling +Balanced and Compact Redistricting Plans. Annals of Applied Statistics 17(4). +Available at doi:10.1214/23-AOAS1763 +.
redist_scorer
functions may be combined together to optimize along multiple
+dimensions. Rather than linearly combining multiple scorers to form a single
+objective as with scorer-arith, these functions allow analysts to approximate
+the Pareto frontier for a set of scorers.
combine_scorers(...)
+
+# S3 method for redist_scorer
+cbind(..., deparse.level = 1)
function of class redist_scorer. Will return a matrix with each +column containing every plan's scores for a particular scoring function.
+perimeter distance dataframe from redist.prep.polsbypopper
perimeter distance dataframe from prep_perims()
A scoring function of class redist_scorer
. single numeric value, where larger values are better for frac_kept
,
-group_pct
, and polsby_popper
and smaller values are better for splits
and pop_dev
.
A scoring function of class redist_scorer
which returns a single numeric value per plan.
+Larger values are generally better for frac_kept
, group_pct
, and polsby_popper
and smaller values are better for splits
and pop_dev
.
tally_var(map, x, .data = cur_plans())
tally_var(map, x, .data = pl())
a redist_plans
object
a redist_plans
object or matrix of plans