Skip to content

Commit

Permalink
add a glossary
Browse files Browse the repository at this point in the history
  • Loading branch information
breandan committed Nov 4, 2024
1 parent dae72cf commit e66214a
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 1 deletion.
Binary file modified latex/thesis/Thesis.pdf
Binary file not shown.
3 changes: 3 additions & 0 deletions latex/thesis/Thesis.tex
Original file line number Diff line number Diff line change
Expand Up @@ -65,13 +65,16 @@
\input{content/Acknowledgements}
\clearpage


% Next comes the GENERATED list of contents, figures and tables.
\tableofcontents
\listoffigures
\listoftables
\clearpage


\input{content/Terminology}
\clearpage

% Here comes the actual thesis content, we switch back to arabic page numbers.
\pagenumbering{arabic}
Expand Down
3 changes: 2 additions & 1 deletion latex/thesis/content/Ch2_Formal_Language_Theory.tex
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,13 @@ \chapter{\rm\bfseries Formal Language Theory}

In computer science, it is common to conflate two distinct notions for a set. The first is a collection sitting on some storage device, e.g., a dataset. The second is a lazy construction: not an explicit collection of objects, but a representation that allows us to efficiently determine membership on demand. This lets us represent infinite sets without requiring an infinite amount of storage. Inclusion then, instead of being simply a lookup query, becomes a decision procedure. This is the basis of formal language theory.

The representation we are chiefly interested in are grammars, which are a common metanotation for specifying the syntactic constraints on programs, shared by nearly every programming language. Programming language grammars are overapproximations to the true language of interest, providing a fast procedure for rejecting invalid programs and parsing valid ones.
The representation we are chiefly interested in is the grammar, a common metanotation for specifying the syntactic constraints on programs, shared by nearly every programming language. Programming language grammars are overapproximations to the true language of interest, but provide a fast procedure for rejecting invalid programs and parsing valid ones.

Like all representations, grammars are a trade-off between expressiveness and efficiency. It is often possible to represent the same finite set with multiple representations of varying complexity. For example, the set of strings containing ten or fewer balanced parentheses can be expressed as deterministic finite automaton containing millions of states, or a simple conjunctive grammar containing a few productions.

Formal languages are arranged in a hierarchy of containment, where each language family strictly contains its predecessors. The lowest level are the set of finite languages. Type 3 contains infinite languages generated by a regular grammar. Level 2 contains context-free languages, which admit parenthetical nesting. Supersets, such as recursively enumerable sets, are also possible. There are other kinds of formal languages, such as logics and circuits, which are incomparable.

Like sets, it is possible to abstractly combine languages by manipulating their grammars, mirroring the setwise operations of union, intersection, and difference over languages. These operations are convenient for combining, for example, syntactic and semantic constraints on programs. For example, we might have two grammars, $G_a, G_b$ representing two properties that are both necessary for a program to be considered valid. We can treat valid programs $P$ as a subset of the language intersection $P \subseteq \mathcal{L}(G_a) \cap \mathcal{L}(G_b)$.


\clearpage
13 changes: 13 additions & 0 deletions latex/thesis/content/Terminology.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
\chapter*{\rm\bfseries Terminology}
\label{ch:terminology}

Technical and vernacular collisions induce a strange semantic synesthesia, e.g., complete, consistent, kernel, reflexive, regression, regular, sound. The intension may be distantly related to standard English, but if one tries to interpret such jargon colloquially, there is no telling how far astray they will go. For this reason, we provide a glossary of terms to help the non-technical reader navigate the landscape of this thesis.

\begin{itemize}
\item \textbf{Automaton}: A mathematical model of computation that can occupy one of a finite number of states at any given time, and makes transitions between states according to a set of rules.
\item \textbf{Deterministic}: A property of a system that, given the same input, will always produce the same output.
\item \textbf{Grammar}: A set of rules that define the syntax of a language.
\item \textbf{Intersection}: The set of elements common to two or more sets.
\item \textbf{Probabilistic}: A property of a system that, given the same input, may produce different outputs.
\item \textbf{Theory}: A set of sentences in a formal language.
\end{itemize}

0 comments on commit e66214a

Please sign in to comment.