Skip to content

Commit

Permalink
rebalance natural repairs
Browse files Browse the repository at this point in the history
  • Loading branch information
breandan committed Mar 19, 2024
1 parent c24b704 commit 384bf50
Show file tree
Hide file tree
Showing 10 changed files with 2,442 additions and 2,381 deletions.
4 changes: 2 additions & 2 deletions latex/splash2024/experiments/timings.tex
Original file line number Diff line number Diff line change
Expand Up @@ -385,7 +385,7 @@
23 618
15 532
};
\addplot [draw=orange, fill=orange, mark=*, only marks]
\addplot [draw=red, fill=red, mark=*, only marks]
table{%
x y
20 58108
Expand Down Expand Up @@ -848,7 +848,7 @@
15 6672
14 4076
};
\addplot [draw=red, fill=red, mark=*, only marks]
\addplot [draw=orange, fill=orange, mark=*, only marks]
table{%
x y
40 42016
Expand Down
20 changes: 20 additions & 0 deletions latex/splash2024/len_dist_bifi.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
\begin{tikzpicture}
\begin{axis}[
xlabel={$|\sigma|$},
ylabel={Precision@1},
title={BIFI Repair Precision},
ybar,
axis lines*=left,
xtick={0, 10, 20, 30, 40, 50, 60, 70},
ytick={0, 0.1, 0.2, 0.3, 0.4},
ymax=0.4,
bar width=4pt,
]

\addplot[green, fill=green] coordinates {(0, 0.196013) (10, 0.326401) (20, 0.318538) (30, 0.272843) (40, 0.213894) (50, 0.206651) (60, 0.247525) (70, 0.179245)};
\addplot[blue, fill=blue] coordinates {(0, 0.174603) (10, 0.176651) (20, 0.209573) (30, 0.19195) (40, 0.18851) (50, 0.176166) (60, 0.110787) (70, 0.106383)};
\addplot[orange, fill=orange] coordinates {(0, 0.015873) (10, 0.021858) (20, 0.030435) (30, 0.02439) (40, 0.032922) (50, 0.045) (60, 0.027397) (70, 0.017094)};

\legend{Δ=1,Δ=2,Δ=3}
\end{axis}
\end{tikzpicture}
20 changes: 20 additions & 0 deletions latex/splash2024/len_dist_s2p.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
\begin{tikzpicture}
\begin{axis}[
xlabel={$|\sigma|$},
ylabel={Precision@1},
title={Seq2Parse Repair Precision},
ybar,
axis lines*=left,
xtick={0, 10, 20, 30, 40, 50, 60, 70},
ytick={0, 0.1, 0.2, 0.3, 0.4},
ymax=0.4,
bar width=4pt,
]

\addplot[green, fill=green] coordinates {(0, 0.368078) (10, 0.416931) (20, 0.393548) (30, 0.387001) (40, 0.3125) (50, 0.289926) (60, 0.258278) (70, 0.198157)};
\addplot[blue, fill=blue] coordinates {(0, 0.127450) (10, 0.131627) (20, 0.133111) (30, 0.107287) (40, 0.099537) (50, 0.104729) (60, 0.100418) (70, 0.123762)};
\addplot[orange, fill=orange] coordinates {(0, 0.037735) (10, 0.072) (20, 0.086093) (30, 0.08125) (40, 0.088235) (50, 0.022727) (60, 0.054945) (70, 0.061538)};

\legend{Δ=1,Δ=2,Δ=3}
\end{axis}
\end{tikzpicture}
20 changes: 20 additions & 0 deletions latex/splash2024/len_dist_tidy.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
\begin{tikzpicture}
\begin{axis}[
xlabel={$|\sigma|$},
ylabel={Precision@1},
title={BIFI Repair Precision},
ybar,
axis lines*=left,
xtick={0, 10, 20, 30, 40, 50, 60, 70},
ytick={0, 0.1, 0.2, 0.3, 0.4},
ymax=0.4,
bar width=4pt,
]

\addplot[green, fill=green] coordinates {(0, 0.196013) (10, 0.326401) (20, 0.318538) (30, 0.272843) (40, 0.213894) (50, 0.206651) (60, 0.247525) (70, 0.179245)};
\addplot[blue, fill=blue] coordinates {(0, 0.174603) (10, 0.176651) (20, 0.209573) (30, 0.19195) (40, 0.18851) (50, 0.176166) (60, 0.110787) (70, 0.106383)};
\addplot[orange, fill=orange] coordinates {(0, 0.015873) (10, 0.021858) (20, 0.030435) (30, 0.02439) (40, 0.032922) (50, 0.045) (60, 0.027397) (70, 0.017094)};

\legend{Δ=1,Δ=2,Δ=3}
\end{axis}
\end{tikzpicture}
3 changes: 2 additions & 1 deletion latex/splash2024/preamble.tex
Original file line number Diff line number Diff line change
Expand Up @@ -614,4 +614,5 @@
\tikzstyle{arrow} = [->,thick]

%\usetikzlibrary{external}
%\tikzexternalize[prefix=figures/]
%\tikzexternalize[prefix=figures/]
\definecolor{green}{RGB}{0,128,0}
4,734 changes: 2,364 additions & 2,370 deletions latex/splash2024/sample_efficiency.tex

Large diffs are not rendered by default.

Binary file modified latex/splash2024/splash.pdf
Binary file not shown.
13 changes: 11 additions & 2 deletions latex/splash2024/splash.tex
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@
\item $\mathcal{L}(G_\cap)$ is too large to enumerate, so we sample from the intersection grammar $G_\cap$. Sampling is necessary for $\sim20\%$ of the dataset.
\end{enumerate}

As long as we have done our job correctly, the intersection language should contain every repair within a certain Levenshtein distance, and no invalid repairs. This procedure is depicted in Fig.~\ref{fig:flowchart}. In the following section, we will describe how we construct the intersecection grammar (\S~\ref{sec:lev_nfa}, \ref{sec:lev_bh}), then, provide an explicit technique for extracting all repairs contained within it (\S~\ref{sec:ptree}). Finally, we use an n-gram model to rank and return the top-k results by likelihood (\S~\ref{sec:ranking}).
As long as we have done our job correctly, the intersection language should contain every repair within a certain Levenshtein distance, and no invalid repairs. This procedure is depicted in Fig.~\ref{fig:flowchart}. In the following section, we will describe how we construct the intersecection grammar (\S~\ref{sec:lev_nfa}, \ref{sec:lev_bh}), then, provide an explicit technique for extracting all repairs contained within it's language (\S~\ref{sec:ptree}). Finally, we use an n-gram model to rank and return the top-k results by likelihood (\S~\ref{sec:ranking}).

\subsection{Preliminaries}

Expand Down Expand Up @@ -1011,10 +1011,19 @@
For our first experiment, we run the sampler until the human repair is detected, then measure the number of samples required to draw the exact human repair across varying Levenshtein radii.

\begin{figure}[h!]
\input{sample_efficiency}
\input{sample_efficiency}
\caption{Sample efficiency of LBH sampler at varying Levenshtein radii. After drawing up to $\sim10^5$ samples without replacement we can usually saturate the admissible set for almost all repairs fewer than four edits.}\label{fig:sample_efficiency}
\end{figure}

We can also plot this for two other SoTA models, S2P and BIFI across length and distance.

\begin{figure}[h!]
\resizebox{.32\textwidth}{!}{\input{len_dist_s2p}}
\resizebox{.32\textwidth}{!}{\input{len_dist_s2p}}
\resizebox{.32\textwidth}{!}{\input{len_dist_bifi}}
\caption{Tidyparse, Seq2Parse and BIFI precision at various lengths and Levenshtein distances.}\label{fig:len_dist_prec}
\end{figure}

Next, measure the precision at various ranking cutoffs for varying wall-clock timeouts. Here, P@\{k=1, 5, 10, All\} indicates the percentage of syntax errors with a human repair of $\Delta=\{1, 2, 3, 4\}$ edits found in $\leq p$ seconds that were matched within the top-k results, based on n-gram likelihood.

\begin{figure}[h!]
Expand Down
7 changes: 2 additions & 5 deletions latex/splash2024/throughput.tex
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
% This file was created with tikzplotlib v0.10.1.
\begin{tikzpicture}

\definecolor{green}{RGB}{0,128,0}

\begin{axis}[
legend cell align={left},
legend style={fill opacity=0.8, draw opacity=1, text opacity=1, draw=lightgray204},
Expand Down Expand Up @@ -44,7 +41,7 @@
16 123.658064516129
17 36.6111111111111
};
\addplot [semithick, red, mark=*, mark size=3, mark options={solid}]
\addplot [semithick, orange, mark=*, mark size=3, mark options={solid}]
table {%
0 1506.8
1 2076.57142857143
Expand Down Expand Up @@ -86,7 +83,7 @@
16 12.1414141414141
17 4.1
};
\addplot [semithick, orange, mark=*, mark size=3, mark options={solid}]
\addplot [semithick, red, mark=*, mark size=3, mark options={solid}]
table {%
0 nan
1 73461
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ var CFG_THRESH = 20_000
var MAX_UNIQUE = 20_000 // Maximum number of unique samples to generate
var MAX_SAMPLE = 20 // Maximum number of repairs to sample
var MAX_TOKENS = 40 // Maximum number of tokens per repair
var MAX_RADIUS = 4
var MAX_RADIUS = 3
var TIMEOUT_MS = 90_000 // Timeout for each repair attempt (default, modify elsewhere)
var MAX_REPAIR = 2 // Maximum number of edits per repair

Expand Down

0 comments on commit 384bf50

Please sign in to comment.