$file()
Minimal Example
f = $file();
f.chd("C:/Users/user/Projects/MyProject");
f.cd("src");
f.ls(); /* List files in /src */
f.get("README.md"); /* Read a file */
f.set("test.txt", "Hello, World!"); /* Write a file */
References:
Provides the ability to navigate either the file system or a database, querying data and updating data. This class/libraries will be enhanced over time to support navigating data types beyond the file system and the grapa database - such as JSON/XML and unstructured data where a mapping can be defined (maybe with a set of rules). With a few additional enhancements, this class/library will also enable extending the grapa syntax to include SQL with $file for the underlying data.
Each example below assumes the following command has been issued:
f = $file();
Which assigns f an instance of the $file class. The following are then operations that can be used from the $file class.
The name field for the commands can include a path relative to the "working directory" (see pwd()). If the "working directory" is a OS filesystem directory, then the path must reference a file within the OS filesystem. If the "working directory" is a grapa table, then the path and data item would be within the grapa table. What is not currently supported is referencing a grapa table item when the "working directory" is not within a grapa table.
type()
Returns the type of the file object.
f.type();
$file
table()
The table function creates an in-memory database.
t = f.table();
t.type();
$TABLE
Database Types: The table can be configured as either: - Row Store (RTABLE_TREE): Traditional row-oriented storage, optimized for transactional workloads - Column Store (CTABLE_TREE): Column-oriented storage, optimized for analytical queries and aggregations
Note: Column store databases use fragmented data storage (FREC_DATA) for efficient handling of sparse data and dynamic growth.
Directory Navigation
Grapa provides two levels of directory navigation: working directory (relative) and home directory (absolute). This dual-level system allows for flexible project management and navigation.
pwd() - Print Working Directory
Returns the current working directory, relative to the current home directory.
Purpose: Shows where you are within the current project context Returns: Relative path (e.g., /lib
, /docs
) Use Case: Navigation within a project or database
f.pwd();
/
f.cd("lib");
f.pwd();
/lib
f.cd("grapa");
f.pwd();
/lib/grapa
cd([name]) - Change Working Directory
Changes the current working directory, relative to the current home directory.
Parameters: - name
(optional): Directory name or path to navigate to - ".."
: Move up one level - "/"
: Move to root of current home directory - "path"
: Move to specific subdirectory
f.cd("lib");
f.pwd();
/lib
f.cd("..");
f.pwd();
/
f.cd("/docs");
f.pwd();
/docs
f.cd("..");
f.pwd();
/
phd() - Print Home Directory
Returns the current home directory (absolute path).
Purpose: Shows the base directory that serves as the root for relative navigation Returns: Absolute path (e.g., C:\Users\matichuk\Documents\GitHub\grapa
) Use Case: Project switching and absolute path reference
f.phd();
C:\Users\matichuk\Documents\GitHub\grapa
f.chd("C:/Users/matichuk/Documents/NewProject");
f.phd();
C:\Users\matichuk\Documents\NewProject
chd(filesystempath) - Change Home Directory
Changes the current home directory to a new absolute path.
Parameters: - filesystempath
: Absolute or relative path to set as new home directory
Note: This resets the working directory to the root (/
) of the new home directory.
/* Change to absolute path */
f.chd("C:/Users/matichuk/Documents/NewProject");
f.phd();
C:\Users\matichuk\Documents\NewProject
f.pwd();
/
/* Change to relative path from current home */
f.chd("../sibling_project");
f.phd();
C:\Users\matichuk\Documents\sibling_project
f.pwd();
/
Directory Navigation Comparison
Function | Purpose | Scope | Change Method | Example Output |
---|---|---|---|---|
pwd() | Show current location | Relative to home | cd() | /lib/grapa |
phd() | Show base directory | Absolute system | chd() | C:\Users\matichuk\Documents\GitHub\grapa |
Typical Workflow: 1. Use chd()
to set your project's home directory 2. Use cd()
to navigate within the project 3. Use pwd()
to see your current location within the project 4. Use phd()
to see the absolute project location
ls([name])
Retrieves a list of files/directories in the current working directory.
Return Format: Returns a list of objects with the following properties: - $KEY
: File or directory name - $TYPE
: Type ("FILE", "GROUP", etc.) - $BYTES
: File size in bytes (0 for directories)
Note: - When navigating a traditional file system, folders/directories will be listed as $TYPE: "GROUP"
. - In a database context, GROUP
also refers to hierarchical/grouped database structures.
f.ls();
[
{"$KEY":"docs","$TYPE":"GROUP","$BYTES":0},
{"$KEY":"README.md","$TYPE":"FILE","$BYTES":4302}
]
/* Check type of a directory */
f.cd("docs");
f.type();
/* Returns: GROUP */
mk(name [,type])
Creates a directory at the current working directory within the file system.
Parameters: - name
: Directory name to create - type
(optional): Type of directory/database to create
Type Options: - ""
or "DIR"
: Creates a regular directory (default) - "GROUP"
: Creates a database of GROUP type for hierarchical data, or a folder/directory in the file system - "ROW"
: Creates a ROW store database optimized for transactional workloads - "COL"
: Creates a COL store database optimized for analytical queries
Column Store (COL) Characteristics: - Uses fragmented data storage for efficient sparse data handling - Optimized for column-oriented queries and aggregations - Better performance for analytical workloads - Efficient storage of wide tables with many optional fields
Note: - When using mk()
in a file system context, GROUP
is equivalent to creating a folder/directory. - In a database context, GROUP
creates a hierarchical/grouped database structure.
/* Create regular directory */
f.mk("test");
f.cd("test");
f.ls();
[]
/* Create database directory */
f.mk("testdb", "GROUP");
f.cd("testdb");
f.ls();
[]
/* Create column store database */
f.mk("analytics_db", "COL");
f.cd("analytics_db");
Type Table
Type | Description/Use Case | Storage Model |
---|---|---|
GROUP | Folder/directory in file system, or hierarchical/grouped database | GROUP_TREE |
ROW | Transactional, record-based, OLTP, point queries | RTABLE_TREE, BYTE_DATA |
COL | Analytical, column-based, sparse/large datasets | CTABLE_TREE, FREC_DATA |
rm(name)
Removes a directory or file.
f.rm("test");
Note: This will recursively remove directories and their contents.
set(name, value [, field])
Creates or updates a file with the specified content.
Parameters: - name
: File name - value
: Content to write to the file - field
(optional): Field name (defaults to $VALUE
)
f.set("test.txt", "Hello, World!");
f.set("config.json", '{"name": "test", "value": 123}');
get(name [, field])
Reads the content of a file.
Parameters: - name
: File name - field
(optional): Field name (defaults to $VALUE
)
Return Format: Returns file content in hexadecimal format.
f.set("test.txt", "Hello, World!");
content = f.get("test.txt");
/* Returns: 0x48656C6C6F2C20576F726C6421 */
Note: File content is returned in hexadecimal format, not plain text. To convert to string, you may need to use additional processing.
info(name)
Returns detailed metadata information about a file or directory.
Parameters: - name
: File or directory name to inspect
Return Format: Returns an object with file metadata containing: - $TYPE
: Type of item ("FILE"
, "DIR"
, or "ERR"
for errors/non-existent) - $BYTES
: Size in bytes (0 for directories, actual size for files) - error
: Error code (-1) if item doesn't exist or is inaccessible
Examples:
/* File information */
f.set("test.txt", "Hello, World!");
info = f.info("test.txt");
/* Returns: {"$TYPE":"FILE","$BYTES":13} */
/* Directory information */
f.mk("test_dir");
dir_info = f.info("test_dir");
/* Returns: {"$TYPE":"DIR","$BYTES":0} */
/* Non-existent item */
error_info = f.info("nonexistent.txt");
/* Returns: {"error":-1} */
Use Cases:
File Type Detection:
info = f.info("document.txt");
if (info["$TYPE"] == "FILE") {
"This is a file\n".echo();
} else if (info["$TYPE"] == "DIR") {
"This is a directory\n".echo();
} else {
"Item doesn't exist\n".echo();
}
File Size Analysis:
info = f.info("large_file.txt");
if (info["$TYPE"] == "FILE") {
size = info["$BYTES"];
if (size > 1000000) {
"File is larger than 1MB\n".echo();
}
}
Batch File Processing:
files = ["file1.txt", "file2.txt", "file3.txt"];
total_size = 0;
i = 0;
while (i < files.length()) {
info = f.info(files[i]);
if (info["$TYPE"] == "FILE") {
total_size = total_size + info["$BYTES"];
}
i = i + 1;
}
"Total size: " + total_size + " bytes\n".echo();
Key Benefits: - Lightweight: No need to open/read files to get metadata - Fast: Direct OS system calls for file system operations - Unified Interface: Same function works for files and directories - Cross-Platform: Works consistently across different operating systems - Error Handling: Clear error responses for non-existent items
Implementation Notes: - Uses stat64()
on Unix/Linux systems - Uses FindFirstFileA()
on Windows systems - Works in both file system and database contexts - Essential for file management, storage monitoring, and data validation
split(parts, name, path, delim, option)
Splits a large file into multiple smaller, manageable parts for processing, storage, or transfer.
Parameters: - parts
: Number of files to split into (must be > 0) - name
: Input file name to split - path
: Output directory path for the split files (created if needed) - delim
: Delimiter to use for splitting (default: "\n"
) - option
: Special options - "csv"
: Copy header to each file (preserves CSV headers) - "start"
: Search backwards for delimiter on split (prevents content breaking)
Return Format: Returns an array of created file names.
result = f.split(4, "large_file.txt", "split_output", "\n", "");
/* Returns: ["1.large_file.txt","2.large_file.txt","3.large_file.txt","4.large_file.txt"] */
Examples:
Basic File Splitting:
/* Create a large file */
large_content = "";
i = 1;
while (i <= 100) {
large_content = large_content + "Line " + i + "\n";
i = i + 1;
};
f.set("large_file.txt", large_content);
/* Split into 4 parts */
result = f.split(4, "large_file.txt", "split_output", "\n", "");
/* Creates: 1.large_file.txt, 2.large_file.txt, 3.large_file.txt, 4.large_file.txt */
CSV File Splitting with Header Preservation:
/* Split CSV file while preserving headers in each part */
result = f.split(3, "data.csv", "csv_parts", "", "csv");
/* Each split file includes the original header row */
Custom Delimiter Splitting:
/* Split on pipe character instead of newlines */
result = f.split(2, "custom_data.txt", "output", "|", "");
/* Splits content at pipe boundaries */
Smart Boundary Detection:
/* Use start option to avoid breaking content arbitrarily */
result = f.split(2, "log_file.txt", "log_parts", "\n", "start");
/* Searches backwards for delimiter to maintain logical boundaries */
Use Cases:
Large File Management:
/* Split large database export for processing */
f.split(10, "database_export.csv", "exports", "", "csv");
Log File Processing:
/* Split large log files for parallel analysis */
f.split(5, "server.log", "log_chunks", "\n", "");
Data Pipeline Preparation:
/* Prepare data for distributed processing */
f.split(8, "dataset.txt", "chunks", "\n", "start");
Key Features: - Automatic Naming: Files named as 1.filename
, 2.filename
, etc. - Size Distribution: Calculates optimal part sizes based on total file size - Memory Efficient: Processes files in chunks, not all at once - Flexible Delimiters: Supports any character or string as delimiter - Error Handling: Returns {"error":-1}
for non-existent files, null
for invalid parameters - Cross-Platform: Works consistently across operating systems
Implementation Notes: - Uses efficient block-based file I/O for memory management - Automatically creates output directory if it doesn't exist - Handles remainder content appropriately when file size doesn't divide evenly - Supports both file system and database contexts - Zero-padded numbering ensures proper file sorting
mkfield(name [, fieldType [, storeType [, storeSize [, storeGrow]]]])
Creates a field within the current working directory (database context).
Parameters: - name
: Field name - fieldType
(optional): Type of field (default: "STR"
) - storeType
(optional): Storage type (default: "VAR"
) - storeSize
(optional): Size for fixed fields - storeGrow
(optional): Growth size for variable fields
Field Types: | Type | Description | |------|-------------| | BOOL
| Fixed size for $BOOL | | TIME
| Stores an $INT. Size depends on storeType and storeSize | | INT
| Stores an $INT. Size depends on storeType and storeSize | | FLOAT
| Stores a $FLOAT. Size depends on storeType and storeSize | | STR
| Stores a $STR. Size depends on storeType and storeSize | | TABLE
| Stores a $TABLE. Size depends on storeType and storeSize | | RAW
| Stores a $RAW. Size depends on storeType and storeSize |
Storage Types: | Type | Description | Use Case | |------|-------------|----------| | FIX
| Fixed field size, data embedded in row/col | Small, frequently accessed fields | | VAR
| Variable field size, uses extra reference | Medium-sized variable data | | PAR
| Partitioned field for large data updates | Large data requiring partial updates, COL store $TABLE types |
Important Notes: - Column Store Fixed Fields: Use fragmented data storage (FREC_DATA) for efficient sparse data handling - Growth Parameters: The storeGrow
parameter is automatically set to storeSize
for fixed fields if not specified - Performance: Column store is optimized for analytical queries across columns
f.mkfield("test");
f.mkfield("age", "INT", "FIX", 4);
f.mkfield("name", "STR", "VAR");
rmfield(name)
Deletes a field within the current working directory (database context).
f.rmfield("test");
debug()
Used for debugging the database during development. Displays the BTree structure of the data dictionary and fields and indexes for the current working directory when in a database (either in memory or on the file system).
f.debug();
Performance Considerations
Row Store vs Column Store
Row Store (ROW) - Best for: Transactional workloads, frequent record updates, point queries - Storage: Contiguous data blocks per record - Performance: Fast record retrieval and updates
Column Store (COL) - Best for: Analytical queries, column scans, aggregations, sparse data - Storage: Fragmented data storage for efficient sparse data handling - Performance: Fast column-oriented operations, better compression
Storage Type Performance
FIX (Fixed) - Fastest access for small, frequently used fields - Predictable storage requirements - Best for primary keys, status flags, small integers
VAR (Variable) - Flexible storage for variable-length data - Good for medium-sized text fields - Slight overhead for reference management
PAR (Partitioned) - Best for large data requiring partial updates - Efficient for very large fields - Used automatically for COL store $TABLE types
Troubleshooting
Common Issues
Column Store Performance - Issue: Slow performance on small datasets - Solution: Consider row store for small, transactional workloads
Field Creation Errors - Issue: Fields not created properly - Solution: Ensure proper field type and storage parameters are specified
Storage Efficiency - Issue: High storage overhead - Solution: Use appropriate storage types and monitor growth parameters
Debug Information
Use the debug()
function to inspect database structure:
f.debug();
This provides detailed information about: - Database type and structure - Field definitions and storage types - Data distribution and storage efficiency
Error Handling
When operations fail, the system returns error objects:
/* Non-existent file */
result = f.get("non_existent.txt");
/* Returns: {"error":-1} */
/* Non-existent directory */
result = f.cd("non_existent_dir");
/* Returns: {"error":-1} */
Usage Examples
Basic File Operations
f = $file();
/* Create and write to a file */
f.set("test.txt", "Hello, World!");
/* Read file content */
content = f.get("test.txt");
/* List directory contents */
files = f.ls();
/* Navigate directories */
f.cd("docs");
f.pwd(); /* Returns: /docs */
Database Operations
/* Create a column store database */
f.mk("analytics_db", "COL");
f.cd("analytics_db");
/* Create fields */
f.mkfield("id", "INT", "FIX", 4);
f.mkfield("name", "STR", "VAR");
f.mkfield("age", "INT", "FIX", 4);
/* Add data */
f.set("user1", "John Doe", "name");
f.set("user1", 30, "age");
/* Retrieve data */
name = f.get("user1", "name");
age = f.get("user1", "age");
Row Store vs Column Store Example
/* Row store for transactional data */
f.mk("transaction_db", "ROW");
f.cd("transaction_db");
f.mkfield("order_id", "INT", "FIX", 4);
f.mkfield("customer_id", "INT", "FIX", 4);
f.mkfield("amount", "FLOAT", "FIX", 8);
/* Column store for analytical data */
f.mk("analytics_db", "COL");
f.cd("analytics_db");
f.mkfield("date", "TIME", "FIX", 8);
f.mkfield("product_id", "INT", "FIX", 4);
f.mkfield("sales_amount", "FLOAT", "FIX", 8);
f.mkfield("region", "STR", "VAR");
Python Integration
For Python developers, Grapa's file system and database capabilities can be leveraged through the Python integration. The unified API provides seamless access to both file systems and databases, making it ideal for data science, web development, and system administration workflows.
Key Python Use Cases
Data Science and Analytics: - Column store databases for analytical workloads - Large file management with built-in splitting capabilities - Unified data access across different storage types
Web Application Development: - Backend data management with row store for transactional data - Content management with flexible field types - API development with consistent data access patterns
System Administration: - Log file management with automatic splitting for large files - Configuration management with unified path navigation - Data pipeline integration for ETL workflows
For detailed examples and best practices, see the Python Integration Guide.