Grammar Design Guide
Overview
Grapa was originally designed as a language for creating languages. The current Grapa syntax you're familiar with was actually just an example that evolved into something powerful! This guide shows you how to use Grapa's original purpose - creating your own grammars and languages.
Note: This guide uses Grapa syntax for action codes (e.g.,
{$1 + $3}
) which can contain any valid Grapa code. The action codes are processed through Grapa's full compilation and execution pipeline.
Parameter Persistence Best Practice
Important: The $1
, $2
, $3
parameters get reset as the code runs. For complex action codes, assign them to local variables first:
@global["$complex_rule"]
= rule <$value1> '+' <$value2> {
@local.p1 = $1; // Save $1 before it gets reset
@local.p2 = $3; // Save $3 before it gets reset
// Now use the saved values for complex operations
if (p1.type() == $STR && p2.type() == $STR) {
p1 + p2 // String concatenation
} else {
p1.int() + p2.int() // Numeric addition
}
}
;
Key Concepts
1. Left Recursion is Your Friend
Traditional parsers can't handle left recursion, but Grapa automatically detects and handles it using a sophisticated two-phase approach:
// ✅ GOOD: Natural left-associative expressions
@global["$additive"]
= rule <$additive> '+' <$multiplicative> {$1 + $3}
| <$additive> '-' <$multiplicative> {$1 - $3}
| <$multiplicative>
;
This enables natural mathematical expressions like a + b + c
without complex precedence tables.
2. Operator Precedence Through Grammar Structure
Precedence emerges from grammar hierarchy - it's not hardcoded:
// ✅ CORRECT: Precedence through grammar levels
@global["$expr_or"] // Lowest precedence
= rule <$expr_or> '||' <$expr_and> {$1 || $3}
| <$expr_and>
;
@global["$expr_and"] // Higher precedence
= rule <$expr_and> '&&' <$expr_equality> {$1 && $3}
| <$expr_equality>
;
@global["$expr_equality"] // Even higher precedence
= rule <$expr_equality> '==' <$expr_comparison> {$1 == $3}
| <$expr_equality> '!=' <$expr_comparison> {$1 != $3}
| <$expr_comparison>
;
Precedence Rules: - Higher precedence = Lower in grammar hierarchy - Left-associative = Left-recursive rules - Right-associative = Right-recursive rules (rare)
3. Action Codes Build Execution Trees
When a rule matches, Grapa creates execution tree nodes:
// Action code format: {grapa_expression}
@global["$addition"]
= rule <$term> '+' <$addition> {$1 + $3}
| <$term>
;
Parameter Access with $1
, $2
, $3
, etc.
Action codes can access matched parameters using $
followed by a number:
@global["$example_rule"]
= rule <$term> '+' <$expression> {$1 + $3}
| <$term>
;
Parameter Mapping: - $1
= First parameter (<$term>
) - $2
= Second parameter ('+'
) - $3
= Third parameter (<$expression>
)
Important Notes: - Indexing starts at $1
(not $0
) - $0
does not exist in Grapa's parameter system - Parameters are 0-based internally but 1-based in action codes - Any valid Grapa code can be used in action codes
How It Works: 1. During rule matching, tokens are collected into a parameter queue 2. When action codes execute, $1
, $2
, $3
map to pInput->Head(0)
, pInput->Head(1)
, pInput->Head(2)
3. Parameters are wrapped in GrapaLibraryParam
for safe access and evaluation
Action Code Best Practices:
// ✅ GOOD: Clear operation names
@global["$string_concat"]
= rule <$string> '+' <$string_concat> {$1 + $3}
| <$string>
;
// ✅ GOOD: Multiple parameters
@global["$function_call"]
= rule <$identifier> '(' <$argument_list> ')' {$1($3)}
| <$identifier> '(' ')' {$1()}
;
// ✅ GOOD: Conditional operations
@global["$conditional"]
= rule <$condition> '?' <$true_expr> ':' <$false_expr> {$1 ? $3 : $5}
| <$condition>
;
// ✅ GOOD: Complex parameter access
@global["$complex_operation"]
= rule <$value1> '+' <$value2> {
if ($1.type() == $STR && $3.type() == $STR) {
$1 + $3 // String concatenation
} else {
$1.int() + $3.int() // Numeric addition
}
}
| <$value1>
;
4. Parameter Types and Advanced Usage
Parameter Types
Parameters in action codes can be various types:
// Different parameter types
@global["$mixed_parameters"]
= rule <$identifier> '=' <$expression> ';' {
// $1 = identifier (string)
// $2 = '=' (literal)
// $3 = expression (could be any type)
// $4 = ';' (literal)
$1 + " = " + $3.str()
}
;
Parameter Type Handling:
// ✅ GOOD: Type-aware parameter processing
@global["$smart_concat"]
= rule <$value1> '+' <$value2> {
if ($1.type() == $STR) {
$1 + $3.str() // String concatenation
} else if ($1.type() == $INT) {
$1.int() + $3.int() // Numeric addition
} else {
$1.str() + $3.str() // Convert to strings
}
}
| <$value1>
;
Advanced Parameter Access
Skipping Parameters:
// ✅ GOOD: Skip literal tokens, focus on meaningful parameters
@global["$function_call"]
= rule <$identifier> '(' <$arguments> ')' {$1($3)}
| <$identifier> '(' ')' {$1()}
;
// $1 = function name, $3 = arguments (skipping '(' and ')')
Conditional Parameter Access:
// ✅ GOOD: Handle optional parameters
@global["$optional_param"]
= rule <$required> ',' <$optional> {$1 + $3}
| <$required> {$1}
;
5. Token Types and Matching
Grapa uses a sophisticated token type system:
// Built-in token types
@global["$identifier"]
= rule <$litname> // Matches literal names
@global["$number"]
= rule <$lit> // Matches literals (INT, FLOAT, STR)
@global["$operator"]
= rule '+' | '-' | '*' | '/' // Direct character matching
Custom Token Types:
// ✅ GOOD: Define custom tokens for clarity
@global["$custom_operator"]
= rule '++' | '--' | '**' | '*/' // Multi-character operators
@global["$keyword"]
= rule 'if' | 'while' | 'for' | 'return' // Reserved words
6. Error Handling and Recovery
Grapa's parser is designed for graceful error recovery:
// ✅ GOOD: Provide fallback alternatives
@global["$statement"]
= rule <$if_statement>
| <$while_statement>
| <$assignment>
| <$expression> ';' // Expression as statement
| ';' // Empty statement
;
Common Problems and Solutions
Problem 1: Parameter Access Issues
// ❌ BAD: Using $0 (doesn't exist)
@global["$bad_rule"]
= rule <$term> '+' <$term> {$0 + $2} // $0 is invalid!
// ✅ GOOD: Start with $1
@global["$good_rule"]
= rule <$term> '+' <$term> {$1 + $3} // $1 and $3 are valid
;
Parameter Troubleshooting: - $0
doesn't exist - indexing starts at $1
- Parameter count mismatch - ensure you have enough parameters - Type errors - use .str()
, .int()
, .type()
for type conversion
Problem 2: Infinite Recursion
// ❌ BAD: Direct left recursion without base case
@global["$bad_rule"]
= rule <$bad_rule> '+' <$term> {$1 + $3}
;
// ✅ GOOD: Include base case
@global["$good_rule"]
= rule <$good_rule> '+' <$term> {$1 + $3}
| <$term> // Base case
;
Problem 2: Ambiguous Grammar
// ❌ BAD: Ambiguous - both rules can match same input
@global["$ambiguous"]
= rule <$expr> '+' <$expr> {$1 + $3}
| <$expr> '*' <$expr> {$1 * $3}
;
// ✅ GOOD: Clear precedence hierarchy
@global["$unambiguous"]
= rule <$additive> '+' <$multiplicative> {$1 + $3}
| <$multiplicative>
;
Problem 3: Missing Error Recovery
// ❌ BAD: No error recovery
@global["$fragile"]
= rule <$perfect_match> // Fails on any error
// ✅ GOOD: Graceful error handling
@global["$robust"]
= rule <$perfect_match>
| <$error_recovery> // Handle errors gracefully
;
Performance Considerations
Grammar Optimization
// ✅ GOOD: Minimize backtracking
@global["$efficient_parsing"]
= rule <$unique_prefix> <$rest> // Unique first token
| <$another_unique> <$rest2>
;
// ❌ AVOID: Ambiguous prefixes
@global["$inefficient_parsing"]
= rule <$ambiguous> <$rest> // Multiple rules start same way
| <$ambiguous> <$rest2>
;
Testing Your Grammar
Test Cases to Include
// Test precedence
"a + b * c" // Should be: a + (b * c)
// Test associativity
"a - b - c" // Should be: (a - b) - c
// Test error recovery
"a + ; b" // Should recover and parse "b"
// Test edge cases
"" // Empty input
"a" // Single token
"a + b + c" // Multiple operations
Validation Checklist
- [ ] Precedence works correctly
- [ ] Associativity is as expected
- [ ] Error recovery handles bad input
- [ ] Performance is acceptable
- [ ] Edge cases are handled
Practical Example: Calculator Grammar
Here's a complete example of a simple calculator grammar:
// calculator.grc
@global["$expression"]
= rule <$expression> '+' <$term> {$1 + $3}
| <$expression> '-' <$term> {$1 - $3}
| <$term>
;
@global["$term"]
= rule <$term> '*' <$factor> {$1 * $3}
| <$term> '/' <$factor> {$1 / $3}
| <$factor>
;
@global["$factor"]
= rule '(' <$expression> ')' {$2}
| <$number>
;
@global["$number"]
= rule <$lit>
;
Integration with Grapa System
Loading Your Grammar
// Load grammar file
include "my_grammar.grc"
// Use your rules
result = parse_my_language("input text")
Extending Existing Grammar
// Extend existing rules
@global["$extended_expression"]
= rule <$existing_expression>
| <$my_new_operator> <$extended_expression>
;
Advanced Features
Complex Action Codes
Since action codes can contain any valid Grapa code, you can do complex operations:
@global["$complex_operation"]
= rule <$value> '+' <$value> {
// Complex Grapa code here
if ($1.type() == $STR && $3.type() == $STR) {
$1 + $3 // String concatenation
} else {
$1.int() + $3.int() // Numeric addition
}
}
| <$value>
;
Conditional Parsing
@global["$conditional_parse"]
= rule <$condition> '?' <$true_expr> ':' <$false_expr> {
$1 ? $3 : $5
}
| <$condition>
;
Conclusion
Key takeaways for grammar authors:
- Left recursion is powerful - use it for natural expression parsing
- Precedence emerges from structure - design your grammar hierarchy carefully
- Action codes can contain any Grapa code - they're processed through the full pipeline
- Error recovery is essential - make your grammar robust
- Test thoroughly - precedence and associativity are critical
- Performance matters - avoid ambiguous or inefficient patterns
The Grapa grammar system is surprisingly powerful and flexible, but understanding these principles will help you create effective, maintainable grammars.
Next Steps
- Try the calculator example above
- Experiment with your own grammars
- Check the API Reference for available functions
- Look at Examples for more complex patterns
- Join the community to share your grammar designs
Grapa: A language for creating languages, evolved into something much more.