Part 1: Language definition

To parse a formal language we must describe the language in terms of rules. First, we will explore the concept of the root rule in textX to help you avoid common mistakes. Then, we will write the rules for propositional logic step by step. Together, those rules form what we call a formal grammar.

Root rule

The root rule is the first rule of a textX grammar, it will be matched first and one time only. Consequently, the class of the model (i.e., the root node of the AST) will be defined by the root rule.

Each rule needs a name and an expression to match; so does the root rule. We can name the root rule as Model. Propositional logic is the language of well-formed formulas, so the Model rule will match a WFF rule (that we need to define).

The root rule can be defined as follows:

Model: WFF;

Rules of propositional logic

The rules of propositional logic will be defined next. Propositional logic formulas such as “A and B or C” can be ambiguous without clear reading conventions, because you can interpret it as either “(A and B) or C” or “A and (B or C)”, which are not equivalent. To avoid ambiguity, we need to introduce a mechanism to enforce only one interpretation. One way to achieve this is by using parentheses to explicitly specify the grouping of operations. Another method is to establish precedence rules, such as applying the “and” operator before the “or” operator. Precedence rules are slightly more difficult to implement, so we will implement the parenthesis method.

The grammar is made of rules, such as the rule for a general well-formed formula, which will be modeled as a WFF rule. The WFF rule leads to other rules using ordered choice expressions. In propositional logic, a WFF can be either an atom, a negation, a conjunction, a disjunction, an implication, an equivalence, or a combination of those. The grammar represents each type of WFF as a rule, and they are connected by the use of a pipe (|) to indicate an ordered choice, i.e., the first match that succeeds will be used. However, the order in which you connect the rules will not result in different outputs, because we will explicitly indicate the order of parsing by using parentheses.

Here’s a refresher for simple wffs:

  • Atom: $A,B,C,\dots$, and for our purposes, any string using alphanumeric characters or underscores.
  • Negation: $\neg A$
  • Conjunction: $A \land B$
  • Disjunction: $A \lor B$
  • Implication: $A \Rightarrow B$
  • Equivalence: $A \Leftrightarrow B$

Translating to our grammar:

Model: WFF;
WFF: Atom | Negation | Conjunction | Disjunction | Implication | Equivalence;

At this point, the grammar still does not distinguish an Atom from a Negation or other connectives. We need to define the expressions that the rules need to match:

  • The Atom rule is a terminal symbol that represents a propositional variable, which can be matched using a textX built-in rule named ID. This rule matches common identifiers made of letters, digits, and underscores.
  • The Negation rule follows the Atom rule. It matches the expression of a tilde preceding a WFF rule.
  • The other rules concern the binary connectives: Conjunction, Disjunction, Implication, and Equivalence. Those connectives are infix operations between two WFFs. To avoid ambiguity, expressions involving these connectives must be enclosed in parentheses.

Adding the above rules to our grammar gives us the following:

Model: WFF;
WFF: Atom | Negation | Conjunction | Disjunction | Implication | Equivalence;
Atom: ID;
Negation: '~' WFF;
Conjunction: '(' WFF '&' WFF ')';
Disjunction: '(' WFF '|' WFF ')';
Implication: '(' WFF '=>' WFF ')';
Equivalence: '(' WFF '<=>' WFF ')';

Note: The choice of symbols for the operators is completely arbitrary, but once chosen, they must be adhered to.

Next, we will delve into understanding what a textX model is and provide a step-by-step guide on how to define one.