Language Design with Executable BNF
Overview
Grapa's $RULE
system represents a fundamental innovation in language design - an executable BNF (Backus-Naur Form) that goes far beyond traditional grammar definitions. This guide explores how to leverage this powerful system for creating custom languages, domain-specific languages (DSLs), and extending Grapa itself.
Key Concepts
Executable BNF: Beyond Traditional Parsing
Unlike traditional BNF systems that only define syntax, Grapa's $RULE
system:
- Executes arbitrary code during parsing via action codes
- Builds execution trees that can be evaluated later
- Supports dynamic grammar mutation at runtime
- Handles complex token types with special behaviors
- Manages execution context and parameter binding
Three-Phase Processing Model
Input → Compilation → Execution Tree → Runtime Evaluation
- Compilation Phase: Raw input is parsed against BNF rules, building execution trees
- Tree Construction: Action codes create
$OP
and$CODE
nodes with parameter binding - Runtime Evaluation: Trees are evaluated with lazy parameter resolution
Advanced Rule Token Capabilities
The rule token < >
provides sophisticated reference and lookup capabilities:
Rule References
/* Reference another rule */
$global["$expression"] = rule <$term> '+' <$expression> {@<add,{$1,$3}>}
Variable References with Complex Lookups
/* Simple variable reference */
$global["$lookup"] = rule @variable {@<var,{$1}>}
/* Array/List indexing */
$global["$array_lookup"] = rule @tb["d"] {@<var,{$1}>}
$global["$list_lookup"] = rule @tb.d {@<var,{$1}>}
$global["$index_lookup"] = rule @tb[8] {@<var,{$1}>}
/* Database/File object references */
$global["$db_lookup"] = rule @{}.table("ROW") {@<var,{$1}>}
Post-Processing with Optional $OP
/* Rule with compile-time data transformation */
$global["$filtered_data"] = rule <$raw_data,op(a:$1){a.grep("pattern")}> {@<var,{$1}>}
Predefined Functions in Rule Tokens
Working Solution - Wrapper Function Pattern:
/* Define function using op(){} syntax */
my_func = op(p){p.len()};
/* Use wrapper function in rule token */
$global["$processed_data"] = rule <$raw_data,op(b:$1){my_func(b)}> {@<var,{$1}>}
Key Benefits:
- Explicit Parameter Passing: op(b:$1){my_func(b)}
clearly shows parameter flow
- Works with Any Function: Can wrap any function definition
- No Grammar Changes: Uses existing $op
syntax
- ETL-Friendly: Perfect for data transformation pipelines
- Reusable: Functions can be defined once and used in multiple rules
ETL Processing Example:
/* Define ETL processing functions */
validate_data = op(p){p.grep("valid")};
transform_data = op(p){p.upper()};
filter_data = op(p){p.len() > 10 ? p : null};
/* Use in ETL pipeline rules */
$global["$etl_pipeline"] = rule
<$raw_data,op(b:$1){validate_data(b)}>
<$validated,op(b:$1){transform_data(b)}>
<$transformed,op(b:$1){filter_data(b)}>
{@<var,{$1}>}
Alternative Syntax (Future Enhancement):
/* Define predefined function using @<op,{parameters}> syntax */
filter_function = @<grep,{@<this>,@<lit,{"pattern"}>}>;
/* Use predefined function in rule token */
$global["$filtered_data"] = rule <$raw_data,filter_function> {@<var,{$1}>}
Key Features:
- Execution Tree Return: Always returns execution trees that must be evaluated
- Namespace Resolution: Follows Grapa's namespace hierarchy (local → function → global)
- Dynamic Lookup: Supports arrays, lists, objects, database objects
- Compile-Time Processing: Post-processing $OP
runs during compilation phase
- Token Lookback: Access previous tokens via $1
, $2
, etc.
- Error Handling: Can return $ERR
to cause rule failure
- Performance Note: Post-processing runs every evaluation (avoid heavy operations)
- Predefined Functions: Functions can be defined once and reused across multiple rules
- Cleaner Syntax: Avoids embedding raw code in grammar rules
Language Extensibility Patterns
1. Syntax Extension Approaches
Primary Approach: Direct BNF Integration
Add syntax directly to existing BNF rules using @<function_name,{parameters}>
pattern:
/* Add to $command rule for control structures */
| for '(' <$comp> ';' <$comp> ';' <$comp> ')' <$command> {@<for,{$3,$5,$7,$9}>}
| for $ID in <$comp> <$command> {@<forin,{$2,$4,$6}>}
/* Add to $comp rule for expressions */
| '$' '{' <$comp> '}' {@<interpolate,{$3}>}
| range '(' <$comp> ')' {@<range,{$3}>}
Secondary Approach: Custom Command/Function Variables
Use custom_command
/custom_function
as variables that leverage existing grammar rules:
/* custom_command - leverages existing $comp and $command rules */
custom_command = rule for $ID from <$comp> to <$comp> <$command> {
op(var:$2, start:$4, end:$6, body:$8){
/* Uses existing $comp and $command rules */
op()(var + " = " + start)();
while (op()(var + " <= " + end)()) {
op()(body)();
op()(var + " = " + var + " + 1")();
};
}
};
/* custom_function - leverages existing $comp rules */
custom_function = rule <$comp> '*=' <$comp> {
op(left:$1, right:$3){
result = op()(left)() * op()(right)();
op()(left + " = " + result)();
result;
}
};
/* Use directly like any other syntax */
for i from 1 to 5 { ("Count: " + i).echo(); };
x = 10; x *= 3; ("x = " + x).echo();
2. Dynamic Grammar Modification
/* Add new language constructs at runtime */
$global["$custom_loop"] = rule 'repeat' $INT 'times' '{' <$command_list> '}' {
op(count:$2, body:$5) {
i = 0;
while (i < count) {
body();
i += 1;
};
}
};
/* Use the new syntax immediately */
op(parse)('repeat 5 times { "Hello".echo(); }')();
3. Protocol Parsing
/* Define HTTP request grammar */
$global["$http_request"] = rule <$http_method> ' ' <$http_path> ' ' <$http_version> '\r\n' <$http_headers> {
op(method:$1, path:$3, version:$5, headers:$7) {
return create_request(method, path, version, headers);
}
};
/* Parse HTTP requests */
request = op(parse)('GET /api/users HTTP/1.1\r\nHost: example.com\r\n')();
Domain-Specific Language (DSL) Creation
1. Configuration DSL
/* Configuration language grammar */
$global["$config_entry"] = rule $ID '=' <$config_value> ';' {
op(key:$1, value:$3) {
set_config(key, value);
}
};
$global["$config_value"] = rule $STR | $INT | $BOOL | '[' <$config_list> ']' | '{' <$config_object> '}';
/* Use the configuration DSL */
op(parse)('server_name = "myapp";')();
op(parse)('port = 8080;')();
op(parse)('debug = true;')();
op(parse)('allowed_hosts = ["localhost", "127.0.0.1"];')();
2. Data Processing DSL
/* Pipeline processing language */
$global["$pipeline"] = rule 'pipeline' '{' <$pipeline_steps> '}' {
op(steps:$3) {
result = null;
i = 0;
while (i < steps.len()) {
step = steps[i];
result = execute_pipeline_step(step, result);
i += 1;
};
return result;
}
};
$global["$pipeline_steps"] = rule <$pipeline_step> ('|' <$pipeline_steps> | );
/* Execute data processing pipeline */
result = op(parse)('pipeline {
load "data.csv" |
filter "age > 25" |
sort "name" |
output "results.json"
}')();
3. Validation DSL
/* Data validation language */
$global["$validation"] = rule 'validate' $STR '{' <$validation_rules> '}' {
op(data:$2, rules:$4) {
validation_result = true;
i = 0;
while (i < rules.len()) {
rule = rules[i];
if (!apply_validation_rule(data, rule)) {
validation_result = false;
};
i += 1;
};
return validation_result;
}
};
$global["$validation_rules"] = rule <$validation_rule> ('|' <$validation_rules> | );
$global["$validation_rule"] = rule 'field' $STR 'required' | 'field' $STR 'type' $STR;
/* Execute validation */
is_valid = op(parse)('validate user_data {
field "name" required |
field "age" type "integer" |
field "email" type "email"
}')();
Advanced Language Design Patterns
1. Recursive Grammar Rules
/* CSV parser with recursive grammar */
$global["$csv_parser"] = rule <$csv_row> ('\n' <$csv_parser> | '\n' | ) {
op(row:$1, rest:$3) {
if (rest) {
return [row] + rest;
} else {
return [row];
};
}
};
$global["$csv_row"] = rule <$csv_field> (',' <$csv_row> | );
/* Parse CSV data */
csv_data = "name,age,city\nJohn,25,NY\nJane,30,LA";
parsed = op(parse)(csv_data)();
2. Context-Sensitive Parsing
/* Context-aware parsing with state */
$global["$context_parser"] = rule <$context_state> <$context_rule> {
op(state:$1, rule:$2) {
/* Apply context-specific parsing rules */
return parse_with_context(state, rule);
}
};
$global["$context_state"] = rule 'in' $STR '{' | 'out' $STR '}';
/* Use context-sensitive parsing */
result = op(parse)('in "sql" { SELECT * FROM users }')();
3. Dynamic Token Types
/* Define custom token types */
$global["$custom_token"] = rule $SYM("SQL_KEYWORD") | $SYM("JSON_PATH") | $SYM("XPATH_EXPR");
/* Use custom tokens in grammar */
$global["$sql_statement"] = rule $SYM("SQL_KEYWORD") <$sql_expression> {
op(keyword:$1, expr:$2) {
return execute_sql_statement(keyword, expr);
}
};
Performance Optimization
1. Grammar Compilation Caching
/* Cache compiled grammar rules for performance */
grammar_cache = {};
compile_grammar = op(rule_name, rule_def) {
if (grammar_cache[rule_name]) {
return grammar_cache[rule_name];
};
compiled = op(parse)(rule_def);
grammar_cache[rule_name] = compiled;
return compiled;
};
/* Use cached grammar */
sql_grammar = compile_grammar("sql", "custom_command = rule select...");
json_grammar = compile_grammar("json", "custom_function = rule $STR->$STR...");
2. Lazy Evaluation
/* Lazy evaluation of complex expressions */
$global["$lazy_expression"] = rule 'lazy' '{' <$expression> '}' {
op(expr:$3) {
/* Create lazy evaluation wrapper */
return op() {
return expr();
};
}
};
/* Execute lazy expression only when needed */
lazy_result = op(parse)('lazy { expensive_calculation() }')();
/* Expression not evaluated until lazy_result() is called */
Advanced Variable Manipulation
1. Indirect Variable Assignment (@@
)
A key discovery in Grapa's rule system is the @@
syntax for indirect variable access and assignment:
/* Basic variable reference */
x = 1;
y = "x"; /* y contains the string "x" */
/* Variable reference */
@y; /* Returns "x" (the variable name) */
/* Indirect variable access */
@@y; /* Returns 1 (the value of variable x) */
/* Indirect variable assignment */
@@y = 10; /* Assigns 10 to variable x */
@@y += 1; /* Increments variable x */
/* In rule implementations */
custom_command = rule for $ID from <$comp> to <$comp> <$command> {
op(var:$2, start:$4, end:$6, body:$8){
/* Instead of complex string concatenation */
/* op()(var + " = " + start())(); */
/* Use indirect assignment */
@@var = start();
while (@@var <= end()) {
body();
@@var += 1;
};
}
};
Benefits: - Cleaner Syntax: Eliminates complex string concatenation - Better Performance: No string operations for variable assignment - More Readable: Intent is much clearer - Type Safety: Direct variable access without string manipulation
Historical Context: @
Symbol Evolution
The @
symbol in Grapa has evolved from the original design:
Original Design (Early Grapa):
/* Variables were literal by default */
x = 5; /* x was the literal string "x" */
@x; /* @x was needed to get the value 5 */
Current Design:
/* Automatic dereferencing for cleaner syntax */
x = 5; /* x automatically gets the value 5 */
@x; /* @x gets the variable name "x" (for rule tokens) */
@@x; /* @@x gets the value of the variable named by x */
Why This Matters:
- Cleaner Code: No need for @
symbols everywhere
- Rule Tokens: @
still needed in < >
tokens for variable dereferencing
- Indirect Access: @@
provides one more level of indirection for dynamic variable names
Consistent @
Pattern:
The @
symbol serves the same purpose throughout Grapa - dereferencing:
/* In rule definitions */
{@<assignappend,{$1,$4}>} /* @ dereferences the rule token */
{@<if,{$3,$5}>} /* @ dereferences the rule token */
/* In variable references */
<@variable> /* @ dereferences the variable name */
<@tb["d"]> /* @ dereferences the array access */
/* In direct usage */
@x; /* @ dereferences the variable name */
@@x; /* @ dereferences twice */
Unified Concept: @
always means "get the actual value/operation, not the literal token"
System Namespace Protection: $
Prefix
Grapa uses the $
prefix as a system namespace protection mechanism:
/* User namespace (recommended) */
x = 5; /* User variable */
name = "hello"; /* User string */
/* System namespace (reserved) */
$x = 5; /* System variable */
$name = "hello"; /* System string */
Why This Matters:
- Everything in Grapa is variables (language, classes, functions, etc.)
- Protection: Prevents accidental override of core system features
- Guideline: Use $
prefix only when accessing system namespace is necessary
- Interchangeable: Both work, but system namespace should be avoided in user code
Error Handling and Debugging
1. Grammar Error Recovery
/* Robust grammar with error recovery */
$global["$robust_rule"] = rule <$primary_rule> | <$fallback_rule> {
op(primary:$1, fallback:$2) {
if (primary.type() != $ERR) {
return primary;
} else {
("Recovering from error with fallback").echo();
return fallback;
};
}
};
2. Debug Grammar Execution
/* Debug grammar execution */
$global["$debug_rule"] = rule 'debug' <$rule> {
op(rule:$2) {
("Debug: Executing rule").echo();
result = rule();
("Debug: Rule result: " + result).echo();
return result;
}
};
/* Use debug wrapper */
debug_result = op(parse)('debug { custom_rule() }')();
Real-World Applications
1. Configuration Management System
/* Complete configuration DSL */
$global["$config_system"] = rule 'config' '{' <$config_entries> '}' {
op(entries:$3) {
("Loading configuration...").echo();
i = 0;
while (i < entries.len()) {
entry = entries[i];
apply_config_entry(entry);
i += 1;
};
("Configuration loaded successfully").echo();
}
};
$global["$config_entries"] = rule <$config_entry> (';' <$config_entries> | );
$global["$config_entry"] = rule $ID '=' <$config_value> | 'include' $STR;
/* Load configuration */
op(parse)('config {
server_name = "myapp";
port = 8080;
debug = true;
include "local.conf";
}')();
2. Data Transformation Pipeline
/* ETL pipeline DSL */
$global["$etl_pipeline"] = rule 'etl' '{' <$etl_steps> '}' {
op(steps:$3) {
("Starting ETL pipeline...").echo();
result = null;
i = 0;
while (i < steps.len()) {
step = steps[i];
("Executing step: " + step).echo();
result = execute_etl_step(step, result);
i += 1;
};
("ETL pipeline completed").echo();
return result;
}
};
$global["$etl_steps"] = rule <$etl_step> ('|' <$etl_steps> | );
$global["$etl_step"] = rule 'extract' $STR | 'transform' $STR | 'load' $STR;
/* Execute ETL pipeline */
result = op(parse)('etl {
extract "source.csv" |
transform "clean_data" |
load "target.db"
}')();
3. API Definition Language
/* API definition DSL */
$global["$api_definition"] = rule 'api' $STR '{' <$api_endpoints> '}' {
op(name:$2, endpoints:$4) {
("Defining API: " + name).echo();
return create_api(name, endpoints);
}
};
$global["$api_endpoints"] = rule <$api_endpoint> (';' <$api_endpoints> | );
$global["$api_endpoint"] = rule $STR $STR '{' <$endpoint_body> '}' {
op(method:$1, path:$2, body:$4) {
return define_endpoint(method, path, body);
}
};
/* Define API */
api = op(parse)('api "user_api" {
GET "/users" { return get_users(); } |
POST "/users" { return create_user(request.body); } |
PUT "/users/{id}" { return update_user(id, request.body); }
}')();
Best Practices
1. Grammar Design
- Start Simple: Begin with basic rules and gradually add complexity
- Use Clear Names: Name rules and tokens descriptively
- Handle Errors: Include error recovery and fallback mechanisms
- Document Grammar: Provide clear documentation for custom syntax
2. Performance
- Cache Compiled Rules: Reuse compiled grammar rules when possible
- Optimize Token Types: Use appropriate token types for efficiency
- Lazy Evaluation: Defer expensive operations until needed
- Profile Execution: Monitor performance of custom grammar rules
3. Integration
- Leverage Existing Libraries: Use Grapa's C++ libraries when possible
- Follow Patterns: Use established patterns like direct BNF integration for native features,
custom_command
/custom_function
as variables that leverage existing grammar rules - Test Thoroughly: Validate grammar rules with comprehensive testing
- Version Control: Track grammar changes and maintain compatibility
Conclusion
Grapa's executable BNF system provides unprecedented power for language design and extension. By understanding and leveraging this system, you can:
- Create custom languages tailored to specific domains
- Build sophisticated DSLs for complex workflows
- Extend Grapa itself with new syntax and capabilities
- Implement protocol parsers for various data formats
- Design configuration languages for dynamic systems
The key is to start with simple patterns and gradually build complexity, always keeping in mind the three-phase processing model and the distinction between commands and functions.