From $OBJ
len()
Gets the length of the item.
"hi".len() -> 2
{1,2,3}.len() -> 3
left(count)
Gets the left bytes of an item.
"testing".left(2) -> "te"
Use a negative number to truncate right.
"testing".left(-2) -> "testi"
right(count)
Gets the right bytes of an item.
"testing".right(2) -> "ng"
Use a negative number to truncate left.
"testing".right(-2) -> "sting"
mid(start,len)
Gets the middle bytes of an item.
"testing".mid(2,3) -> "sti"
midtrim(items, offset, blocksize)
Extracts data from padded tables using references with optional follow-on Grapa lambda code execution.
Parameters:
items
- Array of extraction rules, each containing:[0]
- label (string) - Name for the extracted field[1]
- offset (integer) - Position within the block (0-based, negative values relative to end)[2]
- length (integer) - Number of characters to extract (negative values relative to end)[3]
- ltrim (optional, string) - Characters to trim from left side[4]
- rtrim (optional, string) - Characters to trim from right side[5]
- lambda (optional, function) - Follow-on Grapa code to execute on extracted valueoffset
- Starting position in the source stringblocksize
- Size of the block to process
Behavior:
- Processes the source string in blocks of specified size
- For each extraction rule, extracts the specified substring
- Applies left/right trimming if specified
- Executes optional lambda function on the extracted value
- Returns a dictionary with labels as keys and extracted/processed values as values
Examples:
/* Basic extraction from padded table */
data = "this is a test to see";
result = data.midtrim([["a",2,1," "," "],["b",10,5," "," ",op(a){a.len();}]],1,13);
/* Result: {"a":"s","b":3} */
/* Fixed-width record parsing */
record = "John Doe 25 Engineer";
fields = record.midtrim([
["first", 0, 8, " ", ""], /* Extract first name, trim spaces */
["last", 8, 8, " ", ""], /* Extract last name, trim spaces */
["age", 16, 2, " ", ""], /* Extract age */
["job", 18, 10, " ", ""] /* Extract job title, trim spaces */
], 0, 28);
/* Result: {"first":"John","last":"Doe","age":"25","job":"Engineer"} */
/* CSV-like parsing with data transformation */
csv_line = " apple, banana, cherry ";
parsed = csv_line.midtrim([
["fruit1", 0, 6, " ,", ""], /* Extract first fruit */
["fruit2", 7, 8, " ,", "", op(x){x.upper();}], /* Extract second fruit, convert to uppercase */
["fruit3", 16, 8, " ,", ""] /* Extract third fruit */
], 0, 24);
/* Result: {"fruit1":"apple","fruit2":"BANANA","fruit3":"cherry"} */
/* Log file parsing with numeric conversion */
log_entry = "2024-01-15 14:30:25 [INFO] User login successful";
log_data = log_entry.midtrim([
["date", 0, 10, "", ""], /* Extract date */
["time", 11, 8, "", ""], /* Extract time */
["level", 20, 5, "[", "]"], /* Extract log level */
["message", 26, 25, " ", ""], /* Extract message */
["word_count", 26, 25, " ", "", op(msg){msg.split(" ").len();}] /* Count words in message */
], 0, 51);
/* Result: {"date":"2024-01-15","time":"14:30:25","level":"INFO","message":"User login successful","word_count":3} */
Advanced Features:
- Relative Positioning: Use negative offsets/lengths for relative positioning from end of block
- Conditional Processing: Lambda functions can perform validation, transformation, or conditional logic
- Nested Processing: Lambda functions can call other Grapa methods on the extracted data
- Error Handling: Use
.iferr()
in lambda functions for robust error handling
Use Cases:
- Fixed-width file parsing (COBOL, legacy data formats)
- Log file analysis with structured extraction
- CSV/TSV processing with custom delimiters
- Database dump parsing with field extraction
- Network protocol parsing with structured message formats
- Document parsing with position-based extraction
rtrim([chars])
Trims characters from the right side of a string.
Parameters:
chars
(optional) - Character(s) to trim. Can be:- Single character:
"x"
- trims that specific character - String:
"xyz"
- trims that specific string pattern - Array:
[" ", "\t", "\n", "\r"]
- trims any of the characters in the array - Omitted: defaults to space
" "
Examples:
/* Default: trim spaces */
" testing ".rtrim() -> " testing"
/* Single character */
"bbbtestingbbb".rtrim("b") -> "bbbtesting"
/* Multiple whitespace characters */
" \t\n\rhello world ".rtrim([" ", "\t", "\n", "\r"]) -> " \t\n\rhello world"
/* String pattern */
"helloworldworld".rtrim("world") -> "hello"
ltrim([chars])
Trims characters from the left side of a string.
Parameters:
chars
(optional) - Character(s) to trim. Can be:- Single character:
"x"
- trims that specific character - String:
"xyz"
- trims that specific string pattern - Array:
[" ", "\t", "\n", "\r"]
- trims any of the characters in the array - Omitted: defaults to space
" "
Examples:
/* Default: trim spaces */
" testing ".ltrim() -> "testing "
/* Single character */
"bbbtestingbbb".ltrim("b") -> "testingbbb"
/* Multiple whitespace characters */
" \t\n\rhello world ".ltrim([" ", "\t", "\n", "\r"]) -> "hello world "
/* String pattern */
"worldworldhello".ltrim("world") -> "hello"
trim([chars])
Trims characters from both left and right sides of a string.
Parameters:
chars
(optional) - Character(s) to trim. Can be:- Single character:
"x"
- trims that specific character - String:
"xyz"
- trims that specific string pattern - Array:
[" ", "\t", "\n", "\r"]
- trims any of the characters in the array - Omitted: defaults to space
" "
Examples:
/* Default: trim spaces */
" testing ".trim() -> "testing"
/* Single character */
"bbbtestingbbb".trim("b") -> "testing"
/* Multiple whitespace characters */
" \t\n\rhello world ".trim([" ", "\t", "\n", "\r"]) -> "hello world"
/* String pattern */
"worldworldhelloworldworld".trim("world") -> "hello"
/* Common whitespace trimming */
whitespace = [" ", "\t", "\n", "\r"];
" \t\n\r hello world \t\n\r ".trim(whitespace) -> "hello world"
lpad(n,[str])
Pads left to bring the total size up to n characters. Defaults to pad with a space, but will use str for padding if provided.
Will left truncate input if length of input is less than n.
"test".lpad(7,"X") -> "XXXtest"
rpad(n,[str])
Pads left to bring the total size up to n characters. Defaults to pad with a space, but will use str for padding if provided.
Will right truncate input if length of input is less than n.
"test".rpad(7,"X") -> "testXXX"
lrot([n])
For $LIST, $ARRAY, $XML.
Moves n (defaul=1) items from the start of the list to the end of the list, 1 at a time.
["a","b","c","d","e"].lrot(2) -> ["c","d","e","a","b"]
rrot([n])
For $LIST, $ARRAY, $XML.
Moves n (defaul=1) items from the end of the list to the start of the list, 1 at a time.
["a","b","c","d","e"].rrot(2) -> ["d","e","a","b","c"]
reverse()
Reverses the older of a list.
{z:1,m:2,p:3,b:4}.reverse() -> {"b":4,"p":3,"m":2,"z":1}
"testing".reverse() -> "gnitset"
replace(old,new)
Replaces iteems.
"testing".replace("t","g") -> "gesging"
grep(pattern, options, delimiter, normalization, mode, num_workers)
Extracts matches from a string using PCRE2-powered regular expressions with full Unicode support. Returns an array of results or JSON format with named groups.
For comprehensive Unicode, advanced regex, diacritic-insensitive, and output option documentation, see Unicode Grep Documentation.
Parameters:
pattern
— PCRE2 regular expression string with Unicode support, named groups, and advanced features.options
— Combination of the following flags:
Matching Options: - a
– All mode: treat the entire input as one block (no line splitting). - i
– Case-insensitive match with Unicode case folding. - d
– Diacritic-insensitive match (strip accents/diacritics from both input and pattern, robust Unicode-aware). - v
– Invert match (select non-matching lines or spans). - x
– Match entire line exactly (equivalent to anchoring with ^
and $
). - N
– Normalize input and pattern to NFC Unicode form.
Output Options: - o
– Output only matched substrings. - n
– Prefix matches with line number. - l
– Return only matching line numbers. - b
– Prefix results with byte offset. - j
– JSON output format with named groups, offsets, and line numbers.
Processing Options: - c
– Return count of matches (or count of deduplicated matches if d
is also set). - d
– Deduplicate results (line-level by default, or substring-level when combined with o
, g
, or b
). - g
– Group matches per line.
Parallel Processing: - num_workers
– Number of worker threads: 0
for auto-detection, 1
for sequential, 2+
for parallel processing.
Unicode Support: - Unicode categories: \p{L}
, \p{N}
, \p{Z}
, \p{P}
, \p{S}
, \p{C}
, \p{M}
- Unicode scripts: \p{sc=Latin}
, \p{sc=Han}
, etc. - Unicode script extensions: \p{scx:Han}
, etc. - Unicode general categories: \p{Lu}
, \p{Ll}
, etc. - Named groups: (?P<name>...)
- Atomic groups: (?>...)
- Lookaround assertions: (?=...)
, (?<=...)
, (?!...)
, (?<!...)
- Unicode grapheme clusters: \X
- Advanced Unicode properties: \p{Emoji}
, \p{So}
, etc. - Possessive quantifiers: *+
, ++
, ?+
, {n,m}+
- Conditional patterns: ?(condition)...
- Context lines: A
, B
, C
options
Not Supported: - Unicode blocks: \p{In_Basic_Latin}
, etc. - Unicode age properties: \p{Age=...}
- Unicode bidirectional classes: \p{Bidi_Class:...}
Examples:
/* Basic pattern matching */
"apple 123 pear 456\nbanana 789".grep("\\d+", "o");
/* → ["123", "456", "789"] */
/* With line numbers */
"apple 123 pear 456\nbanana 789".grep("\\d+", "on");
/* → ["1:123", "1:456", "2:789"] */
/* Unicode support */
"Hello 世界 123 €".grep("\\p{L}+", "o");
/* → ["Hello", "世界"] */
/* Named groups with JSON output */
"John Doe".grep("(?P<first>\\w+) (?P<last>\\w+)", "oj");
/* → [{"match":"John Doe","first":"John","last":"Doe","offset":0,"line":1}] */
/* Date parsing with JSON output */
"2023-04-27\n2022-12-31".grep("(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})", "oj");
/* → [
{"match":"2023-04-27","year":"2023","month":"04","day":"27","offset":0,"line":1},
{"match":"2022-12-31","year":"2022","month":"12","day":"31","offset":11,"line":2}
] */
/* Raw string literals for better readability */
"file.txt".grep(r"^[a-zA-Z0-9_]+\.txt$", "x");
/* → ["file.txt"] - No need to escape backslashes */
"user@domain.com".grep(r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$", "x");
/* → ["user@domain.com"] - Much cleaner than escaped version */
/* Raw strings preserve literal escape sequences */
"\\x45".grep(r"\x45", "o");
/* → ["\\x45"] - Literal string, not character "E" */
/* Context lines */
"Line 1\nLine 2\nLine 3\nLine 4".grep("Line 2", "A1B1");
/* → ["Line 1", "Line 2", "Line 3"] */
/* Unicode normalization (NFC) */
"café".grep("cafe", "o", "", "NFC");
/* → ["café"] */
/* Binary mode for raw byte processing */
"\\x48\\x65\\x6c\\x6c\\x6f".grep("Hello", "o", "", "NONE", "BINARY");
/* → ["Hello"] */
/* Custom delimiter examples */
"apple|||pear|||banana".grep("\\w+", "o", "|||");
/* → ["apple", "pear", "banana"] */
"section1###section2###section3".grep("section\\d+", "o", "###");
/* → ["section1", "section2", "section3"] */
📖 For comprehensive Unicode grep documentation including advanced features, named groups, JSON output, and Unicode properties, see Unicode Grep Documentation.
💡 Tip: Use raw string literals (prefix with
r
) for better regex pattern readability. For example,r"\w+"
instead of"\\w+"
. Raw strings suppress all escape sequences except for escaping the quote character used to enclose the string.
Diacritic-Insensitive Matching (d
option)
The d
option enables diacritic-insensitive matching. When enabled, both the input and the pattern are: 1. Unicode normalized (NFC by default, or as specified) 2. Case folded (Unicode-aware, not just ASCII) 3. Diacritics/accents are stripped (works for Latin, Greek, Cyrillic, Turkish, Vietnamese, and more)
This allows matches like: - "café".grep("cafe", "d")
→ ["café"]
- "CAFÉ".grep("cafe", "di")
→ ["CAFÉ"]
- "mañana".grep("manana", "d")
→ ["mañana"]
- "İstanbul".grep("istanbul", "di")
→ ["İstanbul"]
- "καφές".grep("καφες", "d")
→ ["καφές"]
- "кофе".grep("кофе", "di")
→ ["кофе"]
Special Capabilities
- Handles both precomposed (NFC) and decomposed (NFD) Unicode forms
- Supports diacritic-insensitive matching for Latin, Greek, Cyrillic, Turkish, Vietnamese, and more
- Works with case-insensitive (
i
) and normalization (N
, or normalization parameter) options - Robust for international text, including combining marks
Limitations
- Only covers scripts and diacritics explicitly mapped (Latin, Greek, Cyrillic, Turkish, Vietnamese, etc.)
- Does not transliterate between scripts (e.g., Greek to Latin)
- Does not remove all possible Unicode marks outside supported ranges (e.g., rare/archaic scripts)
- For full Unicode normalization, use with the normalization parameter (e.g.,
"NFC"
,"NFD"
) - Does not perform locale-specific collation (e.g., German ß vs ss)
Example
input = "café\nCAFÉ\ncafe\u0301\nCafe\nCAFÉ\nmañana\nmañana\nİstanbul\nistanbul\nISTANBUL\nstraße\nSTRASSE\nStraße\nкофе\nКофе\nκαφές\nΚαφές\n";
result = input.grep(r"cafe", "di");
/* Result: ["café", "CAFÉ", "café", "Cafe", "CAFÉ"] */
split(sep, max, axis)
Splits into an array.
"one\ntwo\nthree".split("\n") -> ["one","two","three"]
"this is a test".split(" ") -> ["this","is","a","test"]
"this is a test split into parts".split(3) -> ["this is a t","est split i","nto parts"]
"this is a test split into parts".split(" ", 3) -> ["this is a test ","split into ","parts"]
join(item)
Joins what has been split.
["this is a test ","split into ","parts"].join("") -> "this is a test split into part"
upper()
Converts to upper case.
"hi".upper() -> "HI"
lower()
Converts to lower.
"HI".lower() -> "hi"
data = (stop).range(start,step)
(9).range(1,2);
[1,3,5,7]
sort(axis,order,kind)
argsort(axis,order,kind)
unique(op)
group(op1,op2,op3)
raw()
Converts a value into it's raw bytes. Displays in hex form, but is stored in raw. Required in many cases as an intermediate form. For example, converting from a $STR to an $INT, you have two choices.
> "34".int();
34
> "34".raw();
3334
> "34".raw().int();
13108
uraw()
Converts to an unsigned raw value. To avoid sign issues, a leading zero is added to raw/int. To remove, used uraw and uint.
> (0xFF).raw();
0x0FF
> (0xFF).uraw();
FF
> (0xFF).raw().int();
255
> (0xFF).raw().uint();
255
> (0xFF).uraw().int();
-1
> (0xFF).uraw().uint();
255
bool()
Converts to what the system sees as true/false.
> "1".bool();
true
> (0).bool();
false
int()
Converts to $INT.
"44".int() -> 44
uint()
Converts to unsigned $INT.
See $INT.
float([bits [,extra]])
Converts to $FLOAT. Sets bit count for the entire number. Calculations are performed with "extra" bits and truncated for display.
"4.21".float() -> 4.21
"4.21".float(300,7) / "10412.42".float(300,7) -> 0.00040432483514879346011782083319727786624050893068085997299379010835137268761728
("4.21".float(300,7) / "10412.42".float(300,7)).float(50) -> 0.00040432483514879
fix([bits [,extra]])
Converts to fixed float. Sets bit count after the decimal to bits. Calculations are performed with "extra" bits and truncated for display.
setfloat([bits [,extra]])
Sets the default float type to float, and the default bits and extra.
setfix([bits [,extra]])
Sets the default float type to fix, and the default bits and extra.
str()
Converts to string.
(44).str() -> "44"
base(base)
Converts number to base. Base of power of 2 works well. Have not fully tested others. Performed as a series of mods and divisions. So can become expensive on very large numbers. Split large numbers into parts before using - but careful on the split.
(15).base(8) -> 17
(15).base(7) -> 21
hex()
Converts item to hex. The hex value is stored as an ascii representation of '0'-'F' characters.
bin()
Converts to binary.
(0xC).bin() -> 1100
setconst(truefalse)
Sets a bit to lock the variable from being modified. If set as const, the variable will not be locked when accessed. Useful for global variables accessed by multiple threads where if not a const will cause threads to block on access which is not needed if the variable doesn't change.
Performance & Parallelism: All array/vector transformation methods (e.g.,
.map()
,.filter()
,.reduce()
) are parallel by default, robust, and production-ready for ETL workloads. Grapa's parallelism is well tested for high-throughput data processing.