$file()

Minimal Example

f = $file();
f.chd("C:/Users/user/Projects/MyProject");
f.cd("src");
f.ls();  /* List files in /src */
f.get("README.md");  /* Read a file */
f.set("test.txt", "Hello, World!");  /* Write a file */

References:

Provides the ability to navigate either the file system or a database, querying data and updating data. This class/libraries will be enhanced over time to support navigating data types beyond the file system and the grapa database - such as JSON/XML and unstructured data where a mapping can be defined (maybe with a set of rules). With a few additional enhancements, this class/library will also enable extending the grapa syntax to include SQL with $file for the underlying data.

When to Use $file() vs Other Data Types

Use `$file()` (Persistent Storage) When:

Data persistence is required (survives program restarts)
Very large datasets that don't fit in memory
Disk-based storage with BTree indexing
Long-term data storage and retrieval
File system integration needed

Use `{}.table()` (In-Memory BTree) When:

Large datasets (> 100-500 items) where memory efficiency matters
Range queries or ordered data access needed
Complex queries across multiple fields
Still in-memory but need better performance for large datasets

Use `{}` (Linked List) When:

Small to medium datasets (< 100-500 items typically)
Frequent modifications (insertions, deletions, updates)
Simple key-value storage without complex queries
Configuration data, user preferences, cache data

Performance Characteristics:

{} (Linked List): Very fast for small datasets due to optimized double-linked list implementation
{}.table() (BTree): Slower than {} for small datasets but more memory-efficient for large datasets
$file() (Persistent): Disk I/O overhead but provides persistence and handles unlimited data size

Each example below assumes the following command has been issued:

f = $file();

Which assigns f an instance of the $file class. The following are then operations that can be used from the $file class.

The name field for the commands can include a path relative to the "working directory" (see pwd()). If the "working directory" is a OS filesystem directory, then the path must reference a file within the OS filesystem. If the "working directory" is a grapa table, then the path and data item would be within the grapa table. What is not currently supported is referencing a grapa table item when the "working directory" is not within a grapa table.

type()

Returns the type of the file object.

f.type();
$file

table([type])

The table function creates an in-memory database.

Parameters: - type (optional): Database type ("DIR", "GROUP", "ROW", "COL")

t = f.table();
t.type();
$TABLE

t = f.table("ROW");  /* Row-oriented storage */
t = f.table("COL");  /* Column-oriented storage */

Database Types: The table can be configured as either: - Row Store (RTABLE_TREE): Traditional row-oriented storage, optimized for transactional workloads - Column Store (CTABLE_TREE): Column-oriented storage, optimized for analytical queries and aggregations

Note: Column store databases use fragmented data storage (FREC_DATA) for efficient handling of sparse data and dynamic growth.

Grapa provides two levels of directory navigation: working directory (relative) and home directory (absolute). This dual-level system allows for flexible project management and navigation.

pwd() - Print Working Directory

Returns the current working directory, relative to the current home directory.

Purpose: Shows where you are within the current project context Returns: Relative path (e.g., /lib, /docs) Use Case: Navigation within a project or database

f.pwd();
/

f.cd("lib");
f.pwd();
/lib

f.cd("grapa");
f.pwd();
/lib/grapa

cd([name]) - Change Working Directory

Changes the current working directory, relative to the current home directory.

Parameters: - name (optional): Directory name or path to navigate to - "..": Move up one level - "/": Move to root of current home directory - "path": Move to specific subdirectory

f.cd("lib");
f.pwd();
/lib

f.cd("..");
f.pwd();
/

f.cd("/docs");
f.pwd();
/docs

f.cd("..");
f.pwd();
/

phd() - Print Home Directory

Returns the current home directory (absolute path).

Purpose: Shows the base directory that serves as the root for relative navigation Returns: Absolute path (e.g., C:\Users\matichuk\Documents\GitHub\grapa) Use Case: Project switching and absolute path reference

f.phd();
C:\Users\matichuk\Documents\GitHub\grapa

f.chd("C:/Users/matichuk/Documents/NewProject");
f.phd();
C:\Users\matichuk\Documents\NewProject

chd(filesystempath) - Change Home Directory

Changes the current home directory to a new absolute path.

Parameters: - filesystempath: Absolute or relative path to set as new home directory

Note: This resets the working directory to the root (/) of the new home directory.

/* Change to absolute path */
f.chd("C:/Users/matichuk/Documents/NewProject");
f.phd();
C:\Users\matichuk\Documents\NewProject
f.pwd();
/

/* Change to relative path from current home */
f.chd("../sibling_project");
f.phd();
C:\Users\matichuk\Documents\sibling_project
f.pwd();
/

Function	Purpose	Scope	Change Method	Example Output
`pwd()`	Show current location	Relative to home	`cd()`	`/lib/grapa`
`phd()`	Show base directory	Absolute system	`chd()`	`C:\Users\matichuk\Documents\GitHub\grapa`

Typical Workflow: 1. Use chd() to set your project's home directory 2. Use cd() to navigate within the project 3. Use pwd() to see your current location within the project 4. Use phd() to see the absolute project location

ls([name])

Retrieves a list of files/directories in the current working directory.

Return Format: Returns a list of objects with the following properties: - $KEY: File or directory name - $TYPE: Type ("FILE", "GROUP", etc.) - $BYTES: File size in bytes (0 for directories) - $PATH: Path parameter passed to .ls() (relative to $sys().getenv($WORK))

Note: - When navigating a traditional file system, folders/directories will be listed as $TYPE: "GROUP". - In a database context, GROUP also refers to hierarchical/grouped database structures. - The $PATH field contains the parameter passed to .ls(), making it easy to construct full paths.

/* List current directory */
f.ls();
[
  {"$PATH":"","$KEY":"docs","$TYPE":"GROUP","$BYTES":0},
  {"$PATH":"","$KEY":"README.md","$TYPE":"FILE","$BYTES":4302}
]

/* List specific directory */
f.ls("test");
[
  {"$PATH":"test","$KEY":"database","$TYPE":"GROUP","$BYTES":896},
  {"$PATH":"test","$KEY":"README.md","$TYPE":"FILE","$BYTES":10174}
]

/* Navigate to directory and list */
f.cd("test/database");
f.ls();
[
  {"$PATH":"","$KEY":"minimal_btree_test.grc","$TYPE":"FILE","$BYTES":710},
  {"$PATH":"","$KEY":"test_table_basic.grc","$TYPE":"FILE","$BYTES":7731}
]

/* Construct full paths using $PATH and $KEY */
files = f.ls("test/database");
foreach file in files {
    full_path = file["$PATH"] + "/" + file["$KEY"];
    (file["$TYPE"] + ":" + full_path + "\n").echo();
}
/* Output:
FILE:test/database/minimal_btree_test.grc
FILE:test/database/test_table_basic.grc
*/

mk(name [,type])

Creates a directory at the current working directory within the file system.

Parameters: - name: Directory name to create - type (optional): Type of directory/database to create

Type Options: - "" or "DIR": Creates a regular directory (default) - "GROUP": Creates a database of GROUP type for hierarchical data, or a folder/directory in the file system - "ROW": Creates a ROW store database optimized for transactional workloads - "COL": Creates a COL store database optimized for analytical queries

Column Store (COL) Characteristics: - Uses fragmented data storage for efficient sparse data handling - Optimized for column-oriented queries and aggregations - Better performance for analytical workloads - Efficient storage of wide tables with many optional fields

Note: - When using mk() in a file system context, GROUP is equivalent to creating a folder/directory. - In a database context, GROUP creates a hierarchical/grouped database structure.

/* Create regular directory */
f.mk("test");
f.cd("test");
f.ls();
[]

/* Create database directory */
f.mk("testdb", "GROUP");
f.cd("testdb");
f.ls();
[]

/* Create column store database */
f.mk("analytics_db", "COL");
f.cd("analytics_db");

Type Table

Type	Description/Use Case	Storage Model
GROUP	Folder/directory in file system, or hierarchical/grouped database	GROUP_TREE
ROW	Transactional, record-based, OLTP, point queries	RTABLE_TREE, BYTE_DATA
COL	Analytical, column-based, sparse/large datasets	CTABLE_TREE, FREC_DATA

rm(name)

Removes a directory or file.

f.rm("test");

Note: This will recursively remove directories and their contents.

set(name, value)

Creates or updates a file with the specified content.

Parameters: - name: File name or special stream ($stdout, $stderr) - value: Content to write to the file or stream

Special Streams: - $stdout: Writes to standard output stream - $stderr: Writes to standard error stream

/* Regular file operations */
f.set("test.txt", "Hello, World!");
f.set("config.json", '{"name": "test", "value": 123}');

/* Write to standard streams */
f.set($stdout, "Output message\n");
f.set($stderr, "Error message\n");

get(name)

Reads the content of a file.

Parameters: - name: File name

Return Format: Returns file content in hexadecimal format.

f.set("test.txt", "Hello, World!");
content = f.get("test.txt");
/* Returns: 0x48656C6C6F2C20576F726C6421 */
contentStr = content.str();
/* Returns: "Hello, World!" */

Note: When reading from filesystem files, Grapa doesn't know the data format, so content is returned in hexadecimal format. Use .str() to convert to readable string format. For database operations where data types are stored, Grapa may return properly typed data.

getfield(name, field)

Reads structured data from a file or database record.

Parameters: - name: File name or record key - field: Field name (required)

Return Format: Returns field content in hexadecimal format.

f.setfield("user1", "name", "John Doe");
name = f.getfield("user1", "name");
/* Returns: 0x4A6F686E20446F65 */
nameStr = name.str();
/* Returns: "John Doe" */

Note: When reading from filesystem files, Grapa doesn't know the data format, so content is returned in hexadecimal format. Use .str() to convert to readable string format. For database operations where data types are stored, Grapa may return properly typed data.

setfield(name, field, value)

Writes structured data to a file or database record.

Parameters: - name: File name or record key - field: Field name (required) - value: Content to write

f.setfield("user1", "name", "John Doe");
f.setfield("user1", "age", 30);

Method Comparison: .get()/.set() vs .getfield()/.setfield()

The two method pairs serve different purposes:

Method	Parameter Order	Use Case
`.get()`	`obj.get(name)`	Simple file content reading
`.set()`	`obj.set(name, value)`	Simple file content writing
`.getfield()`	`obj.getfield(name, field)`	Structured data field reading
`.setfield()`	`obj.setfield(name, field, value)`	Structured data field writing

Examples:

f = $file();

/* Simple file operations */
f.set("config.txt", "Simple file content");
content = f.get("config.txt");

/* Structured data operations */
f.setfield("user1", "name", "Alice");
f.setfield("user1", "age", 25);
name = f.getfield("user1", "name");
age = f.getfield("user1", "age");

When to Use Which Method:

Use .get()/.set() for simple file operations (reading/writing file content)
Use .getfield()/.setfield() for structured data operations and database-style field access

Data Type Handling:

Filesystem Files: - Both .get() and .getfield() return hexadecimal format when reading from filesystem files - Grapa doesn't know the data format, so you must use .str() to convert to readable strings - Example: content = f.get("file.txt"); readable = content.str();

Database Operations: - When using .getfield() with database tables that have defined field types, Grapa may return properly typed data - The data type information is stored with the database schema - Example: age = f.getfield("user1", "age"); might return a proper $INT instead of hexadecimal

Practical Example: Filesystem vs Database

f = $file();

/* Filesystem file - Grapa doesn't know the data type */
f.set("user.txt", "John Doe");
name = f.get("user.txt");
/* Returns: 0x4A6F686E20446F65 (hexadecimal) */
nameStr = name.str();
/* Returns: "John Doe" */

/* Database with defined field types - Grapa knows the data type */
f.mk("users", "ROW");
f.mkfield("name", "STR", "VAR");
f.mkfield("age", "INT", "FIX", 4);
f.cd("users");

f.setfield("user1", "name", "Alice");
f.setfield("user1", "age", 25);

/* With defined field types, Grapa may return properly typed data */
name = f.getfield("user1", "name");  /* May return proper $STR */
age = f.getfield("user1", "age");    /* May return proper $INT */

Recommendation: Use .getfield()/.setfield() for any operation involving structured data or field-specific access.

info(name)

Returns detailed metadata information about a file or directory.

Parameters: - name: File or directory name to inspect

Return Format: Returns an object with file metadata containing: - $TYPE: Type of item ("FILE", "DIR", or "ERR" for errors/non-existent) - $BYTES: Size in bytes (0 for directories, actual size for files) - error: Error code (-1) if item doesn't exist or is inaccessible

Examples:

/* File information */
f.set("test.txt", "Hello, World!");
info = f.info("test.txt");
/* Returns: {"$TYPE":"FILE","$BYTES":13} */

/* Directory information */
f.mk("test_dir");
dir_info = f.info("test_dir");
/* Returns: {"$TYPE":"DIR","$BYTES":0} */

/* Non-existent item */
error_info = f.info("nonexistent.txt");
/* Returns: {"error":-1} */

Use Cases:

File Type Detection:

info = f.info("document.txt");
if (info["$TYPE"] == "FILE") {
    "This is a file\n".echo();
} else if (info["$TYPE"] == "DIR") {
    "This is a directory\n".echo();
} else {
    "Item doesn't exist\n".echo();
}

File Size Analysis:

info = f.info("large_file.txt");
if (info["$TYPE"] == "FILE") {
    size = info["$BYTES"];
    if (size > 1000000) {
        "File is larger than 1MB\n".echo();
    }
}

Batch File Processing:

files = ["file1.txt", "file2.txt", "file3.txt"];
total_size = 0;
i = 0;
while (i < files.length()) {
    info = f.info(files[i]);
    if (info["$TYPE"] == "FILE") {
        total_size = total_size + info["$BYTES"];
    }
    i = i + 1;
}
"Total size: " + total_size + " bytes\n".echo();

Key Benefits: - Lightweight: No need to open/read files to get metadata - Fast: Direct OS system calls for file system operations - Unified Interface: Same function works for files and directories - Cross-Platform: Works consistently across different operating systems - Error Handling: Clear error responses for non-existent items

Implementation Notes: - Uses stat64() on Unix/Linux systems - Uses FindFirstFileA() on Windows systems - Works in both file system and database contexts - Essential for file management, storage monitoring, and data validation

split(count, name, path, delim, option)

Splits a large file into multiple smaller, manageable parts for processing, storage, or transfer.

Parameters: - count: Number of files to split into (must be > 0) - name: Input file name to split - path: Output directory path for the split files (created if needed) - delim: Delimiter to use for splitting (default: "\n") - option: Special options - "csv": Copy header to each file (preserves CSV headers) - "start": Search backwards for delimiter on split (prevents content breaking)

Return Format: Returns an array of created file names.

result = f.split(4, "large_file.txt", "split_output", "\n", "");
/* Returns: ["1.large_file.txt","2.large_file.txt","3.large_file.txt","4.large_file.txt"] */

Examples:

Basic File Splitting:

/* Create a large file */
large_content = "";
i = 1;
while (i <= 100) {
    large_content = large_content + "Line " + i + "\n";
    i = i + 1;
};
f.set("large_file.txt", large_content);

/* Split into 4 parts */
result = f.split(4, "large_file.txt", "split_output", "\n", "");
/* Creates: 1.large_file.txt, 2.large_file.txt, 3.large_file.txt, 4.large_file.txt */

CSV File Splitting with Header Preservation:

/* Split CSV file while preserving headers in each part */
result = f.split(3, "data.csv", "csv_parts", "", "csv");
/* Each split file includes the original header row */

Custom Delimiter Splitting:

/* Split on pipe character instead of newlines */
result = f.split(2, "custom_data.txt", "output", "|", "");
/* Splits content at pipe boundaries */

Smart Boundary Detection:

/* Use start option to avoid breaking content arbitrarily */
result = f.split(2, "log_file.txt", "log_parts", "\n", "start");
/* Searches backwards for delimiter to maintain logical boundaries */

Use Cases:

Large File Management:

/* Split large database export for processing */
f.split(10, "database_export.csv", "exports", "", "csv");

Log File Processing:

/* Split large log files for parallel analysis */
f.split(5, "server.log", "log_chunks", "\n", "");

Data Pipeline Preparation:

/* Prepare data for distributed processing */
f.split(8, "dataset.txt", "chunks", "\n", "start");

Key Features: - Automatic Naming: Files named as 1.filename, 2.filename, etc. - Size Distribution: Calculates optimal part sizes based on total file size - Memory Efficient: Processes files in chunks, not all at once - Flexible Delimiters: Supports any character or string as delimiter - Error Handling: Returns {"error":-1} for non-existent files, null for invalid parameters - Cross-Platform: Works consistently across operating systems

Implementation Notes: - Uses efficient block-based file I/O for memory management - Automatically creates output directory if it doesn't exist - Handles remainder content appropriately when file size doesn't divide evenly - Supports both file system and database contexts - Zero-padded numbering ensures proper file sorting

mkfield(name [, fieldType [, storeType [, storeSize [, storeGrow]]]])

Creates a field within the current working directory (database context).

Parameters: - name: Field name - fieldType (optional): Type of field (default: "STR") - storeType (optional): Storage type (default: "VAR") - storeSize (optional): Size for fixed fields - storeGrow (optional): Growth size for variable fields

Field Types: | Type | Description | |------|-------------| | BOOL | Fixed size for $BOOL | | TIME | Stores an $INT. Size depends on storeType and storeSize | | INT | Stores an $INT. Size depends on storeType and storeSize | | FLOAT | Stores a $FLOAT. Size depends on storeType and storeSize | | STR | Stores a $STR. Size depends on storeType and storeSize | | TABLE | Stores a $TABLE. Size depends on storeType and storeSize | | RAW | Stores a $RAW. Size depends on storeType and storeSize |

Storage Types: | Type | Description | Use Case | |------|-------------|----------| | FIX | Fixed field size, data embedded in row/col | Small, frequently accessed fields | | VAR | Variable field size, uses extra reference | Medium-sized variable data | | PAR | Partitioned field for large data updates | Large data requiring partial updates, COL store $TABLE types |

Important Notes: - Column Store Fixed Fields: Use fragmented data storage (FREC_DATA) for efficient sparse data handling - Growth Parameters: The storeGrow parameter is automatically set to storeSize for fixed fields if not specified - Performance: Column store is optimized for analytical queries across columns

f.mkfield("test");
f.mkfield("age", "INT", "FIX", 4);
f.mkfield("name", "STR", "VAR");

rmfield(name)

Deletes a field within the current working directory (database context).

f.rmfield("test");

dump()

Used for debugging the database during development. Displays the BTree structure of the data dictionary and fields and indexes for the current working directory when in a database (either in memory or on the file system).

f.dump().echo();

Performance Considerations

Row Store vs Column Store

Row Store (ROW) - Best for: Transactional workloads, frequent record updates, point queries - Storage: Contiguous data blocks per record - Performance: Fast record retrieval and updates

Column Store (COL) - Best for: Analytical queries, column scans, aggregations, sparse data - Storage: Fragmented data storage for efficient sparse data handling - Performance: Fast column-oriented operations, better compression

Storage Type Performance

FIX (Fixed) - Fastest access for small, frequently used fields - Predictable storage requirements - Best for primary keys, status flags, small integers

VAR (Variable) - Flexible storage for variable-length data - Good for medium-sized text fields - Slight overhead for reference management

PAR (Partitioned) - Best for large data requiring partial updates - Efficient for very large fields - Used automatically for COL store $TABLE types

Troubleshooting

Common Issues

Column Store Performance - Issue: Slow performance on small datasets - Solution: Consider row store for small, transactional workloads

Field Creation Errors - Issue: Fields not created properly - Solution: Ensure proper field type and storage parameters are specified

Storage Efficiency - Issue: High storage overhead - Solution: Use appropriate storage types and monitor growth parameters

Debug Information

Use the dump() function to inspect database structure:

f.dump().echo();

This provides detailed information about: - Database type and structure - Field definitions and storage types - Data distribution and storage efficiency

Error Handling

When operations fail, the system returns error objects:

/* Non-existent file */
result = f.get("non_existent.txt");
/* Returns: {"error":-1} */

/* Non-existent directory */
result = f.cd("non_existent_dir");
/* Returns: {"error":-1} */

Usage Examples

Basic File Operations

f = $file();

/* Create and write to a file */
f.set("test.txt", "Hello, World!");

/* Read file content */
content = f.get("test.txt");

/* List directory contents */
files = f.ls();

/* Navigate directories */
f.cd("docs");
f.pwd();  /* Returns: /docs */

Database Operations

/* Create a column store database */
f.mk("analytics_db", "COL");
f.cd("analytics_db");

/* Create fields */
f.mkfield("id", "INT", "FIX", 4);
f.mkfield("name", "STR", "VAR");
f.mkfield("age", "INT", "FIX", 4);

/* Add data */
f.set("user1", "John Doe", "name");
f.set("user1", 30, "age");

/* Retrieve data */
name = f.get("user1", "name");
age = f.get("user1", "age");

Row Store vs Column Store Example

/* Row store for transactional data */
f.mk("transaction_db", "ROW");
f.cd("transaction_db");
f.mkfield("order_id", "INT", "FIX", 4);
f.mkfield("customer_id", "INT", "FIX", 4);
f.mkfield("amount", "FLOAT", "FIX", 8);

/* Column store for analytical data */
f.mk("analytics_db", "COL");
f.cd("analytics_db");
f.mkfield("date", "TIME", "FIX", 8);
f.mkfield("product_id", "INT", "FIX", 4);
f.mkfield("sales_amount", "FLOAT", "FIX", 8);
f.mkfield("region", "STR", "VAR");

Python Integration

For Python developers, Grapa's file system and database capabilities can be leveraged through the Python integration. The unified API provides seamless access to both file systems and databases, making it ideal for data science, web development, and system administration workflows.

Key Python Use Cases

Data Science and Analytics: - Column store databases for analytical workloads - Large file management with built-in splitting capabilities - Unified data access across different storage types

Web Application Development: - Backend data management with row store for transactional data - Content management with flexible field types - API development with consistent data access patterns

System Administration: - Log file management with automatic splitting for large files - Configuration management with unified path navigation - Data pipeline integration for ETL workflows

For detailed examples and best practices, see the Python Integration Guide.