$RULE
/ Design Influences: Grapa's $RULE type is inspired by Prolog's executable BNF and logic programming paradigm, enabling dynamic, mutable grammars and rule-based execution. This design empowers users to experiment with language syntax and parsing, making Grapa especially valuable for educators, researchers, and anyone interested in language design. The overarching goal is to make Grapa (and GrapaPy) as inviting and usable as possible for the broadest audience. / References:
This datatype is basis of the grapa language. The syntax of the language is implemented as a set of global rule variables that are accessible and changeable - making the grapa language syntax dynamically mutable, either globally, or modified within a specific function by creating local variable rules that override the global rules. Rules variables can also be defined to support parsing of a domain specific language, or defining a data ETL task as a language by defining the rules for the data and applying the data to the rules - in the same way a language would be defined.
The @
vs $
Symbols in Grapa
@
- Runtime Evaluation
The @
symbol is used for runtime evaluation and dereferencing. In rule tokens (like {@<assignappend,{$1,$4}>}
), @
serves as a dereferencing operator that gets the actual function/operation at runtime.
Examples:
{@<assignappend,{$1,$4}>} /* @ dereferences to get the actual assignappend function */
{@<if,{$3,$5}>} /* @ dereferences to get the actual if function */
<@variable> /* @ dereferences to get the actual value of variable */
$
- Compile-time Lexical Processing
The $
symbol is used for compile-time lexical processing in the Grapa lexer. It acts as an escape character that modifies how tokens are processed during compilation.
Lexical Modifiers:
$ID /* Converts to $SYSID (system identifier) */
$STR /* Converts to $SYSSTR (system string) */
$INT /* Converts to $SYSINT (system integer) */
$[ /* Lexical flag for special processing */
$] /* Lexical flag for special processing */
$& /* Lexical flag for XML/HTML processing */
Why the distinction matters:
- @
operates at runtime - gets actual values and executes functions
- $
operates at compile-time - modifies how the lexer processes tokens
- @
is for evaluation, $
is for lexical transformation
There are three basic steps: * Define the rules (rules may reference other rules) with code to execute for each rule option. * Apply raw data to the rule to generate an execution plan. * Execute the plan, with parameters if specified.
Example system rule:
> @$start
rule <$command_list> $SYSID("BE") | <$command_list> ';' | <$command_list>
The above is the entry point to the grapa language. This entry point is a global variable that can be updated...so take caution as it is the default entry point all scripts. It is also possible to navigate the language syntax by looking up each rule that is lined from any other rule.
Example of defining a custom rule, and applying the rule:
> x = rule $INT $INT {op(a:$1,b:$2){a*b}} | $INT {op(a:$1){a}}
> (op()("4",x))()
4
> (op()("4 3",x))()
12
> op()("4",x)
@<[op,@<[op,@<var,{a}>],{"a":4}>],{}>
> op()("4 3",x)
@<[op,@<[op,@<mul,{@<var,{a}>,@<var,{b}>}>],{"a":4,"b":3}>],{}>
To simplify creating rules that depend on matching on a predefined list of values, create a list and reference the list in the rules. If the rule matches, the value of the matched item is passed in as the token value.
> t = {x:"matched on x",y:5};
> r = rule $INT @t $INT {op(a:$2){a}};
> (op()("1 x 2",r))();
matched on x
> (op()("1 y 2",r))();
5
If the lookup needs to be against something other than a list (maybe checking a database or even a web service), an operation can be added to a rule token that will run at compile time if the rule matches. Note that this will be called during the planning phase (or compile time) every time the rule is evaluated - so take caution on attaching a heavy workload. The result of the operation will be used as the result of the token matched.
> x = rule $STR {@<lit,{$1}>};
> r = rule $INT <x,op(a:$1){a.len()}> $INT {op(a:$2){a}};
> (op()("44 'x' 22",r))();
1
Rules can also be embedded. Unfortunately, the current grammar requires an operation for a rule if the rule is to return any value - it doesn't make any assumptions. So as of the current release an embedded rule of (x|y) would match on x or y, but with no operation attached to x or y the result would not produce an output for the match. In some future version, this will be addressed to return a default. For now, an operation needs to be inserted if you want a value to be returned.
If the token handler returns an $ERR object, it will cause the rule to fail. So it not only has the ability to add additional logic to processing the token, is also can also serve as an additional component to the planning engine by validating the token against the intent of the rule - something that could not be done at a later phase and could only be done during the planning/compile phase.
> r = rule $INT (x{@<lit,{"found x"}>}|y{@<lit,{"found y"}>}) $INT {op(a:$2){a}};
> (op()("44 y 22",r))();
found y
If there is a well known end sequence, and you want to collect all the tokens in between, you can insert an empty rule. All of the tokens, including any white space, will be passed on. The risk of using this in a rule is that if the end token does not exist, all tokens will be processed in the empty rule.
> r = rule $INT <> $INT {op(a:$2){a}};
> (op()("44 'x' 22",r))();
[" ","x"," "]
The stop rule for a catchall can be either a token, a rule reference, or an embedded rule.
> e = rule $INT {@<lit,{$1}>} | $ID {@<lit,{$1}>};
> r = rule $INT <> <e> {op(a:$3){a}};
> (op()("44 'x' 22",r))();
22
> (op()("44 'x' hi",r))();
hi
> r = rule $INT <> ($INT{@<lit,{$1}>}|$ID{@<lit,{$1}>}) {op(a:$3){a}};
> (op()("44 'x' 22",r))();
22
> (op()("44 'x' hi",r))();
hi
Start/Stop Token Sequences with Parentheses
When you need a sequence of tokens as the start or stop markers, use parentheses to define embedded rules. This allows you to specify multi-token sequences without defining separate rules.
Simple single tokens:
/* Single token start/stop */
catch_sq_bracket_rule = rule '[' <> ']' {@<lit,{$2}>};
Multi-token sequences with parentheses:
/* Multi-token sequence start/stop */
catch_sq_bracket_rule = rule ('[') <> (']') {@<lit,{$2}>};
/* This is equivalent to: rule '[' <> ']' {@<lit,{$2}>} */
/* More complex multi-token sequences */
xml_comment_rule = rule ('<' '!' '-' '-') <> ('-' '-' '>') {@<lit,{$2}>};
/* Captures everything between <!-- and --> */
Advanced XML parsing example from Grapa's grammar:
/* Sophisticated XML comment parsing with token joining */
$global["$elSkipValueY"]
= rule '-' <@<join,{$1}>> ('-' '-' '>') {@<lit,{$2}>}
;
$global["$elSkipValueX"]
= rule <$elSkipValueY> {@<createxml,{$1}>}
;
$global["$elSkipValue"]
= rule <$elSkipValueX> {@<createel,{{},$1}>}
;
$global["$tagHtmlValue"]
= rule '<' '!' '-' <$elSkipValue> {@<name,{"!--",$4}>}
;
Key Points:
- Parentheses ()
: Define embedded rules for multi-token sequences
- Single tokens: Can be used directly without parentheses
- Token joining: Use <@<join,{$1}>>
to concatenate collected tokens
- Complex sequences: Parentheses allow complex start/stop patterns
A operation can also be associated with the empty rule above, which can process the tokens and either pass through the list, or pass through some transformed version. The following is an example f transforming.
> r = rule $INT <op(a:$1){a.len().str()+" raw characters:"+a.join("")}> $INT {op(a:$2){a}};
> (op()("44 'x' 22",r))();
3 raw characters: x
If the token handler returns an $ERR object, the corresponding rule option will fail and the next rule option will be evaluated. The following is an example of causing the first rule option to fail.
> r = rule $INT <op(a:$1){$ERR()}> $INT {op(a:$2){a}} | $INT <> $INT {op(a:$2){a}};
> (op()("44 'x' 22",r))();
[" ","x"," "]
The catchall token can also be used to add support for comments in a grammar. The following is used in the grapa syntax. This rule is then inserted in places in the grammar where comments can be included.
$starcomment = rule ('/' '*') <> ('*' '/');
Rules can include both right recursion and left recursion. The following is an example of right recursion. Note that an internal function "prepend" is used to build the list. This is because grapa uses recursive decent, which results in the last match being processed first.
> r1 = rule $ID <r1> {@<prepend,{$2,$1}>} | $ID {@<createarray,{$1}>};
> (op()("a b c",r1))();
[a,b,c]
But right recursion may not be desirable in some cases. One example is order of operations for additon and subtraction, where the evaluation needs to happen from left to right. Here is an example using right recursion.
r1 = rule
$INT '+' <r1> {@<add,{$1,$3}>}
| $INT '-' <r1> {@<sub,{$1,$3}>}
| $INT {@<lit,{$1}>}
;
> op()("5-3+2",r1);
@<[op,@<sub,{5,@<add,{3,2}>}>],{}>
> (op()("5-3+2",r1))();
0
Note that for "5-3+2", the additon of 3+2 happens first resulting in 5-(3+2), which produces the wrong answer.
Left recursion is required to address this.
r2 = rule
<r2> '+' $INT {@<add,{$1,$3}>}
| <r2> '-' $INT {@<sub,{$1,$3}>}
| $INT {@<lit,{$1}>}
;
> op()("5-3+2",r2);
@<[op,@<add,{@<sub,{5,3}>,2}>],{}>
> (op()("5-3+2",r2))();
4
Note that with left recursion, the subtraction happens first resulting in (5-3)+2, which produces the correct answer.
Rule Composition and Concatenation
Rules can be dynamically composed and extended using the ++=
operator (see Assignment Operators), which allows you to concatenate additional rule alternatives to an existing rule. This enables dynamic grammar construction and runtime rule modification.
Appending to Existing Rules
Important: To add additional alternatives to an existing rule, use the ++=
operator, not the |
operator. The |
operator only works within a single rule definition, while ++=
allows you to append to an existing rule variable.
Self-Modifying Grammar System
Grapa's grammar system is self-modifying - the BNF (Backus-Naur Form) in lib/grapa/$grapa.grc
defines the rules for creating and modifying grammar rules themselves. This creates a powerful meta-programming capability where:
- The grammar defines how to define grammar rules
- Rules can be modified at runtime using the same grammar system
- The
|
operator creates alternatives within a single rule definition - The
++=
operator appends new alternatives to existing rule variables
Token Types and Syntax
Based on the Grapa grammar system in lib/grapa/$grapa.grc
, there are important distinctions between token types:
Identifier Tokens (automatic conversion):
/* Define initial rule with identifier */
custom_command = rule test {op(){'matched first'}};
/* Append additional alternatives using ++= */
custom_command ++= rule test2 {op(){'matched second'}};
custom_command ++= rule test3 {op(){'matched third'}};
/* Test the rule - use identifiers directly */
test; /* Returns: 'matched first' */
test2; /* Returns: 'matched second' */
test3; /* Returns: 'matched third' */
String Tokens (explicit conversion required):
/* Define initial rule with string token */
custom_command = rule $STR('first') {op(){'matched first'}};
/* Append additional alternatives using ++= */
custom_command ++= rule $STR('second') {op(){'matched second'}};
custom_command ++= rule $STR('third') {op(){'matched third'}};
/* Test the rule - use string literals */
"first"; /* Returns: 'matched first' */
"second"; /* Returns: 'matched second' */
"third"; /* Returns: 'matched third' */
Key Points:
- ++=
operator: Appends new rule alternatives to existing rule variables
- |
operator: Only works within a single rule definition for alternatives
- $ID
tokens: Automatically converted (e.g., test
becomes identifier)
- $STR
tokens: Require explicit conversion (e.g., $STR('test')
for strings)
- $INT
tokens: Match integer literals (e.g., 42
, 100
, -5
)
- $FLOAT
tokens: Match floating-point literals (e.g., 3.14
, -2.5
)
- Dynamic grammar construction: Enables building complex grammars incrementally
- Runtime rule modification: Rules can be extended during program execution
- Self-modifying grammar: The BNF defines how to modify the BNF itself
Token Usage Guidelines:
- Function/predicate names: Use $ID
for unquoted identifiers or $STR
for quoted strings
- Arguments: Use $STR
for quoted strings, $INT
for integers, or $ID
for unquoted identifiers
- Choice depends on calling syntax: $ID
requires unquoted calls, $STR
requires quoted calls
BNF Structure for Rule Definition
The grammar system uses this pattern for defining rules:
$global["$option_list"]
= rule <$option> '|' <$option_list> {@<prepend,{$3,$1}>}
| <$option> {@<createrule,{$1}>}
;
This shows:
- = rule
: Defines a new rule
- |
: Creates alternatives within the same rule definition
- {@<createrule,{$1}>}
: Creates a rule from the parsed tokens
- {@<prepend,{$3,$1}>}
: Prepends new alternatives to existing rules
Rule Alternative Ordering for Backtracking
Critical: Rule alternatives must be ordered from most specific to least specific (longest to shortest token sequences). This is essential for proper backtracking in the rule engine.
Correct Ordering Pattern:
$global["$list2"]
= rule ',' ',' <$list2> {@<prepend,{$3,null,null}>} /* 3 tokens */
| ',' <$list2> {@<prepend,{$2,null}>} /* 2 tokens */
| <$param2> ',' <$list2> {@<prepend,{$3,$1}>} /* 3 tokens */
| <$param2> ',' {@<createlist,{$1,null}>} /* 2 tokens */
| <$param2> {@<createlist,{$1}>} /* 1 token */
| ',' {@<createlist,{null,null}>} /* 1 token */
;
Why This Ordering Matters: - Backtracking: The rule engine tries alternatives in order - Greedy Matching: Longer patterns must come first to be matched - Prevention of Short-Circuiting: If a simple rule comes first, it will always match and prevent longer, more specific rules from being reached - Ambiguity Resolution: More specific rules should take precedence
Example of Incorrect Ordering:
/* WRONG - Simple rule first will prevent longer rules from matching */
custom_command = rule test {op(){'simple'}}; /* 1 token - too early! */
custom_command ++= rule test $STR {op(arg:$2){'with arg'}}; /* 2 tokens - never reached */
custom_command ++= rule test $STR $STR {op(arg1:$2,arg2:$3){'with 2 args'}}; /* 3 tokens - never reached */
Correct Ordering:
/* CORRECT - Most specific rules first */
custom_command = rule test $STR $STR {op(arg1:$2,arg2:$3){'with 2 args'}}; /* 3 tokens */
custom_command ++= rule test $STR {op(arg:$2){'with arg'}}; /* 2 tokens */
custom_command ++= rule test {op(){'simple'}}; /* 1 token - last */
Testing Custom Rules
Important: Custom rules must be tested using string execution since they cannot be used in the same script where they are defined (due to compilation timing).
/* Define custom rules */
custom_command = rule test {op(){'simple'}};
custom_command ++= rule test $STR {op(arg:$2){'with arg: ' + arg}};
/* Test using string execution */
op()("test")(); /* Returns: 'simple' */
op()("test hello")(); /* Returns: 'with arg: hello' */
/* Alternative testing method - more readable */
script1 = "test";
script2 = "test hello";
script1.exec(); /* Returns: 'simple' */
script2.exec(); /* Returns: 'with arg: hello' */
/* Both methods are equivalent:
op()("test")() == "test".exec()
op()("test hello")() == "test hello".exec()
Note: exec() cannot take parameters, but op()() can.
For most testing scenarios, exec() provides better readability.
*/
Note: Custom rules do not persist when the application exits. They must be redefined in each session.
Language Integration Example
This pattern is commonly used for implementing domain-specific languages and syntax extensions:
/* Initialize SQL syntax */
custom_command = rule select $STR from $STR {op(fields:$2,table:$4){
("SELECT " + fields + " FROM " + table).echo();
}};
/* Add INSERT syntax */
custom_command ++= rule insert into $STR values $STR ',' $INT ',' $STR {op(table:$3,name:$6,age:$8,city:$10){
("INSERT INTO " + table + " VALUES " + name + "," + age + "," + city).echo();
}};
/* Add UPDATE syntax */
custom_command ++= rule update $STR set $STR '=' $STR where $STR '=' $STR {op(table:$2,field:$4,value:$6,where_field:$8,where_value:$10){
("UPDATE " + table + " SET " + field + "=" + value + " WHERE " + where_field + "=" + where_value).echo();
}};
/* Add DELETE syntax */
custom_command ++= rule delete from $STR where $STR '=' $STR {op(table:$3,where_field:$5,where_value:$7){
("DELETE FROM " + table + " WHERE " + where_field + "=" + where_value).echo();
}};
This approach enables true native syntax implementation where multiple language constructs can coexist using Grapa's grammar system.
Basic Rule Concatenation
> x = rule $INT $INT {op(a:$1,b:$2){a*b}}
rule $INT $INT {@<[op,@<mul,{@<var,{a}>,@<var,{b}>}>],{"a":$1,"b":$2}>}
> x ++= rule $INT {op(a:$1){a}}
$INT $INT {@<[op,@<mul,{@<var,{a}>,@<var,{b}>}>],{"a":$1,"b":$2}>}| $INT {@<[op,@<var,{a}>],{"a":$1}>}
> (op()("4",x))()
4
> (op()("4 3",x))()
12
Dynamic Grammar Building
This feature is particularly powerful for building domain-specific languages incrementally:
> // Start with a basic rule
> parser = rule $ID {op(a:$1){a}};
> // Add support for numbers
> parser ++= rule $INT {op(a:$1){a}};
> // Add support for string literals
> parser ++= rule $STR {op(a:$1){a}};
> // Add support for function calls
> parser ++= rule $ID '(' $ID ')' {op(a:$1,b:$3){a+" calls "+b}};
> // Test the expanded parser
> (op()("hello",parser))();
hello
> (op()("42",parser))();
42
> (op()("func(arg)",parser))();
func calls arg
Use Cases
- Incremental Language Development: Build complex grammars step by step
- Plugin Systems: Allow extensions to add new syntax rules
- Dynamic Parsing: Modify parsing behavior based on runtime conditions
- Domain-Specific Languages: Create specialized parsers that can be extended
For those interested in how to modify a recursive decent parser to support left recursion, see the following on how this was addressed for Python:
https://medium.com/@gvanrossum_83706/left-recursive-peg-grammars-65dab3c580e1
A simular approach is used for Grapa - but with a few improvements on the appropach to support more complex scenarios than what's required to support the Python syntax alone.
Language Syntax Extension
Grapa's most powerful feature is its ability to dynamically extend the language syntax at runtime. This allows you to create domain-specific languages, add custom commands, and modify the grammar without restarting the application.
Custom Commands and Functions
The Grapa parser includes two special variables that allow you to inject custom syntax:
custom_command
: For domain-specific commands that perform actions but don't return valuescustom_function
: For functions that return values
Basic Syntax Extension
// Define a custom function that returns a value
custom_function = rule select $INT {op(p:$2){p*5}};
// Now you can use it directly
select 4; // Returns 20
x = select 8; // x = 40
// Define a custom command that performs an action
custom_command = rule reset_data {op(){clear_database()}};
// Use the custom command
reset_data; // Executes clear_database()
How It Works
When you define custom_function
or custom_command
, you're essentially creating isolated rule execution for domain-specific processing. The parser:
- Falls through the built-in command patterns
- Matches your custom rule when the input fits the pattern
- Executes your code with the matched parameters
- Returns results (for functions) or performs actions (for commands)
Scoping and Namespace Control
Custom syntax can be defined at different scopes, giving you precise control over where extensions are available:
Local Scope (Function-Level)
function_with_custom_syntax() {
// Custom syntax only available within this function
custom_function = rule local_cmd $INT {op(n:$2){n*2}};
result = local_cmd 5; // Works here
// Function exits, local_cmd is gone
}
// Global scope - local_cmd is undefined
// local_cmd 5 would cause an error
Global Scope (Permanent)
function_with_permanent_syntax() {
// Promote to global namespace - available everywhere
$global["custom_function"] = rule global_cmd $INT {op(n:$2){n*3}};
result = global_cmd 5; // Works here
// Function exits, but global_cmd is still available globally
}
// Global scope - global_cmd is still available
result = global_cmd 10; // Works here too
Conditional Global Promotion
function_with_conditional_syntax() {
// Create local syntax first
custom_function = rule temp_cmd $INT {op(n:$2){n*4}};
// Test it locally
if (temp_cmd 5 == 20) {
// If it works well, promote to global
$global["custom_function"] = rule temp_cmd $INT {op(n:$2){n*4}};
echo "Syntax promoted to global";
}
// temp_cmd is now available globally
}
Dynamic Compilation with Custom Syntax
Important: Custom syntax changes don't affect already-compiled code. They only apply to new code compiled with op()
():
Compilation vs. Runtime
// This script was compiled with the original grammar
function test() {
// Even if we change custom_function here, it won't affect
// the already-compiled code in this function
custom_function = rule new_syntax $INT {op(n:$2){n*10}};
// This line was compiled before the custom_function change
// so it still uses the old grammar
old_syntax 5; // Uses whatever was defined when script was compiled
}
Dynamic Compilation
function dynamic_compilation() {
// Set up custom syntax
custom_function = rule custom_cmd $INT {op(n:$2){n*5}};
// Compile and execute new code dynamically
script_text = "result = custom_cmd 10; echo result;";
op()(script_text)(); // Uses current custom syntax
// Or compile to a function for reuse
compiled_function = op()(script_text);
compiled_function(); // Executes with custom syntax
}
Real-World Use Cases
Domain-Specific Languages
function create_sql_dsl() {
// Define SQL-like syntax
custom_function = rule SELECT $STR FROM $STR {op(fields:$2,table:$4){
build_select_query(fields, table)
}};
custom_function = rule WHERE $STR {op(condition:$2){
build_where_clause(condition)
}};
// Now you can write SQL-like code
script = "SELECT name,age FROM users WHERE age > 18;";
query = op()(script)();
return query;
}
Configuration-Driven Scripts
function load_plugin(plugin_config) {
// Set up plugin-specific syntax
custom_function = rule plugin_cmd $STR {op(cmd:$2){execute_plugin(cmd)}};
// Compile plugin script with new syntax
plugin_script = plugin_config.script;
plugin_function = op()(plugin_script);
// Execute with plugin syntax available
plugin_function();
}
Thread-Safe Syntax Extensions
// Each thread can have its own syntax extensions
function thread_with_custom_syntax() {
// Local custom syntax for this thread
custom_function = rule thread_cmd $INT {op(n:$2){n*thread_id()}};
// This syntax is isolated to this thread
result = thread_cmd 5;
}
Best Practices
- Start Local: Define custom syntax locally first, then promote to global when ready
- Use Descriptive Names: Choose names that clearly indicate what your syntax does
- Document Patterns: Include examples of how to use your custom syntax
- Test Thoroughly: Custom syntax can affect parsing, so test edge cases
- Consider Scope: Use local scope for experimental features, global for stable ones
- Performance: Remember that custom syntax is evaluated at parse time
Limitations and Considerations
- Parse-Time Execution: Custom syntax rules are evaluated during parsing, not execution
- No Recursive Extensions: You can't extend the grammar to support extending itself
- Performance Impact: Complex custom syntax can slow down parsing
- Debugging: Custom syntax errors can be harder to debug than regular code errors
This syntax extension capability makes Grapa uniquely powerful for creating domain-specific languages, implementing custom DSLs, and adapting the language to specific use cases without modifying the core implementation.
Rule Organization Patterns
For complex parsing scenarios, organizing rules in data structures provides better maintainability and reusability.
Object-Based Rule Organization
Organize related rules in $LIST
objects with a common parsing interface:
/* Define ETL rules in a $LIST object */
etl_rules = {
/* Common parsing interface */
parse = op(parse_data, parse_rule){op()(parse_data, parse_rule)()},
/* CSV parsing rule */
csv_parser = rule $ID ',' $INT ',' $ID {op(name:$1,age:$3,city:$5){
{name: name, age: age, city: city};
}},
/* Log parsing rules */
catch_sq_bracket_rule = rule '[' <> ']' {@<lit,{$2}>},
id_list_rule = rule $ID <id_list_rule> {@<prepend,{$2,$1}>} | $ID {@<createarray,{$1}>},
log_parser = rule <catch_sq_bracket_rule> <id_list_rule> {op(timestamp:$1,message:$2){
{timestamp: timestamp, message: message, level: "INFO"};
}},
/* Config parsing rule */
config_parser = rule $ID '=' $ID {op(key:$1,value:$3){
{key: key, value: value};
}}
};
/* Execute with custom rule set */
csv_result = etl_rules.parse("John,25,NY", etl_rules.csv_parser);
log_result = etl_rules.parse("[2024-01-15 10:30:00] User login successful", etl_rules.log_parser);
config_result = etl_rules.parse("server_name=myapp", etl_rules.config_parser);
Key Benefits: - Organized Structure: Rules grouped logically in objects - Common Interface: Shared parsing method across all rules - Easy Extension: Add new rules to the object - Clean Syntax: Simple method calls for rule execution
Class-Based Rule Organization
Use classes for more structured rule organization with methods:
/* Define rules in a class */
etl_rules_class = class {
/* Common parsing interface */
parse = op(parse_data, parse_rule){op()(parse_data, parse_rule)()};
/* CSV parsing rule */
csv_parser = rule $ID ',' $INT ',' $ID {op(name:$1,age:$3,city:$5){
{name: name, age: age, city: city};
}};
/* Log parsing rules */
catch_sq_bracket_rule = rule '[' <> ']' {@<lit,{$2}>};
id_list_rule = rule $ID <id_list_rule> {@<prepend,{$2,$1}>} | $ID {@<createarray,{$1}>};
log_parser = rule <catch_sq_bracket_rule> <id_list_rule> {op(timestamp:$1,message:$2){
{timestamp: timestamp, message: message, level: "INFO"};
}};
/* Config parsing rule */
config_parser = rule $ID '=' $ID {op(key:$1,value:$3){
{key: key, value: value};
}};
};
/* Create instance and execute */
etl_rules = etl_rules_class();
csv_result = etl_rules.parse("John,25,NY", etl_rules.csv_parser);
log_result = etl_rules.parse("[2024-01-15 10:30:00] User login successful", etl_rules.log_parser);
config_result = etl_rules.parse("server_name=myapp", etl_rules.config_parser);
Key Benefits: - Object-Oriented: Use classes for better organization - Instance Methods: Each instance has its own rule set - Encapsulation: Rules are contained within the class - Reusability: Create multiple instances for different contexts
Helper Rule Design Patterns
For complex parsing scenarios, break down parsing into modular helper rules that work together.
Modular Rule Composition
Design helper rules that can be reused across multiple main rules:
/* Helper rules for complex parsing */
catch_sq_bracket_rule = rule '[' <> ']' {@<lit,{$2}>};
id_list_rule = rule $ID <id_list_rule> {@<prepend,{$2,$1}>} | $ID {@<createarray,{$1}>};
/* Main rule using helper rules */
log_rule = rule <catch_sq_bracket_rule> <id_list_rule> {op(timestamp:$1,message:$2){
return {
timestamp: timestamp,
message: message,
level: "INFO"
};
}};
/* Another rule using the same helper */
config_rule = rule $ID '=' $ID {op(key:$1,value:$3){
return {key: key, value: value};
}};
/* Execute with helper rules */
log_result = op()("[2024-01-15 10:30:00] User login successful", log_rule)();
config_result = op()("server_name=myapp", config_rule)();
Design Principles: - Single Responsibility: Each helper rule handles one parsing concern - Reusability: Helper rules can be used by multiple main rules - Composability: Main rules combine helper rules for complex parsing - Maintainability: Changes to helper rules affect all dependent rules
Rule Dependency Management
Structure rules to show clear dependencies and relationships:
/* Base parsing rules */
field_rule = rule $ID {op(value:$1){value}};
separator_rule = rule ',' {op(){","}};
/* Composite rules using base rules */
csv_row_rule = rule <field_rule> (<separator_rule> <csv_row_rule> | ) {op(first:$1,rest:$3){
if (rest) {
return [first] + rest;
} else {
return [first];
};
}};
/* Complex rule using composite rules */
structured_data_rule = rule <csv_row_rule> '|' <csv_row_rule> {op(row1:$1,row2:$3){
return {
primary: row1,
secondary: row2,
combined: row1 + row2
};
}};
Dynamic Rule Management
Add rules to existing rule collections dynamically using different patterns.
Object Extension with +=
Operator
Add new rules to existing $LIST
objects:
/* Start with base ETL rules object */
etl_rules = {
parse = op(parse_data, parse_rule){op()(parse_data, parse_rule)()},
};
/* Dynamically add new rules using += operator */
etl_rules += (csv_parser : rule $ID ',' $INT ',' $ID {op(name:$1,age:$3,city:$5){{name: name, age: age, city: city};}});
/* Execute with dynamically added rule */
csv_result = etl_rules.parse("John,25,NY", etl_rules.csv_parser);
/* Result: {name: "John", age: 25, city: "NY"} */
Key Features:
- Object Extension: Add rules to existing $LIST
objects using +=
operator
- Immediate Availability: New rules are immediately available for use
- Flexible Addition: Add rules based on runtime conditions
Class-Based Dynamic Rule Addition
Use class methods to add rules dynamically to $CLASS
instances:
/* Define base ETL rules class with add_rule method */
etl_rules_class = class {
parse = op(parse_data, parse_rule){op()(parse_data, parse_rule)()};
add_rule = op(pName,pRule) {$this.@pName=pRule;};
};
/* Create instance and add rules dynamically */
etl_rules = etl_rules_class();
etl_rules.add_rule("csv_parser", rule $ID ',' $INT ',' $ID {op(name:$1,age:$3,city:$5){{name: name, age: age, city: city};}});
/* Execute with dynamically added rule */
csv_result = etl_rules.parse("John,25,NY", etl_rules.csv_parser);
/* Result: {name: "John", age: 25, city: "NY"} */
Key Features:
- Class Methods: Use class methods to add rules dynamically to $CLASS
instances
- Encapsulated Management: Rule addition is controlled by class methods
- Instance Isolation: Each class instance has its own rule set
Comparison: +=
vs ++=
vs add_rule()
Method | Use Case | Target | Syntax |
---|---|---|---|
++= |
Rule concatenation | Rule variables | rule_var ++= rule ... |
+= |
Object extension | $LIST objects |
obj += (key : rule ...) |
add_rule() |
Class method | $CLASS instances |
instance.add_rule("name", rule ...) |
Return Value Patterns
Rules can use different patterns for returning values, each with specific use cases.
Explicit Return Pattern
Use explicit return
statements for clarity and control:
/* Explicit return pattern */
csv_rule = rule $ID ',' $INT ',' $ID {op(name:$1,age:$3,city:$5){
return {
name: name,
age: age,
city: city
};
}};
When to Use: - Complex Logic: When the rule body has multiple statements - Conditional Returns: When different conditions return different values - Debugging: When you need to add logging or validation before returning
Implicit Return Pattern
Use implicit return (last expression) for simple, single-expression rules:
/* Implicit return pattern */
csv_parser = rule $ID ',' $INT ',' $ID {op(name:$1,age:$3,city:$5){
{name: name, age: age, city: city};
}};
When to Use: - Simple Rules: When the rule body is a single expression - Functional Style: When following functional programming patterns - Conciseness: When you want more concise rule definitions
Mixed Return Patterns
Combine both patterns based on complexity:
/* Simple rules use implicit return */
field_rule = rule $ID {op(value:$1){value}};
/* Complex rules use explicit return */
complex_rule = rule $ID ',' $INT ',' $ID {op(name:$1,age:$3,city:$5){
/* Validation logic */
if (age < 0 || age > 150) {
return $err("Invalid age: " + age);
};
/* Return structured data */
return {
name: name,
age: age,
city: city,
valid: true
};
}};
Rule Testing and Debugging
Systematic testing and debugging techniques for rule definitions.
Testing Rule Definitions
Test rules systematically to ensure they work correctly:
/* Define test rules */
test_rules = {
parse = op(parse_data, parse_rule){op()(parse_data, parse_rule)()},
csv_parser = rule $ID ',' $INT ',' $ID {op(name:$1,age:$3,city:$5){
{name: name, age: age, city: city};
}},
log_parser = rule ('[') <> (']') $ID {op(timestamp:$1,message:$3){
{timestamp: timestamp, message: message, level: "INFO"};
}}
};
/* Test cases */
test_cases = [
("John,25,NY", test_rules.csv_parser, {name: "John", age: 25, city: "NY"}),
("[2024-01-15] User login", test_rules.log_parser, {timestamp: "2024-01-15", message: "User login", level: "INFO"})
];
/* Run tests */
i = 0;
while (i < test_cases.len()) {
test_case = test_cases[i];
input = test_case[0];
rule = test_case[1];
expected = test_case[2];
result = test_rules.parse(input, rule);
if (result == expected) {
("✓ Test " + (i + 1) + " passed").echo();
} else {
("✗ Test " + (i + 1) + " failed: expected " + expected + ", got " + result).echo();
};
i += 1;
};
Debugging Rule Execution
Use debugging techniques to understand rule behavior:
/* Debug rule execution */
debug_rule = rule $ID ',' $INT ',' $ID {op(name:$1,age:$3,city:$5){
("Debug: name=" + name + ", age=" + age + ", city=" + city).echo();
return {name: name, age: age, city: city};
}};
/* Test with debugging */
result = op()("John,25,NY", debug_rule)();
/* Output: Debug: name=John, age=25, city=NY */
Rule Validation Patterns
Validate rule definitions before use:
/* Rule validation function */
validate_rule = op(rule_name, rule_definition, test_input, expected_output) {
try {
result = op()(test_input, rule_definition)();
if (result == expected_output) {
return {valid: true, rule: rule_name, message: "Rule works correctly"};
} else {
return {valid: false, rule: rule_name, message: "Rule returned unexpected result", expected: expected_output, actual: result};
};
} catch (error) {
return {valid: false, rule: rule_name, message: "Rule execution failed", error: error};
};
};
/* Validate rules */
csv_rule = rule $ID ',' $INT ',' $ID {op(name:$1,age:$3,city:$5){
{name: name, age: age, city: city};
}};
validation_result = validate_rule("csv_parser", csv_rule, "John,25,NY", {name: "John", age: 25, city: "NY"});
validation_result.echo();
Best Practices for Rule Testing
- Test Edge Cases: Test with empty input, malformed input, and boundary conditions
- Validate Output: Ensure rules return expected data structures
- Test Rule Combinations: Test how rules work together in complex scenarios
- Performance Testing: Measure execution time for performance-critical rules
- Error Handling: Test how rules handle invalid input gracefully