Database Storage Types
Overview
Grapa supports three distinct database storage types, each optimized for different workloads and use cases:
- COL (Column Store): Optimized for analytical workloads and large datasets
- ROW (Row Store): Optimized for transactional workloads and frequent updates
- GROUP: Optimized for grouped data operations and hierarchical structures
Storage Type Comparison
Aspect | Column Store (COL) | Row Store (ROW) | Group Store (GROUP) |
---|---|---|---|
Storage Pattern | Column-oriented | Row-oriented | Group-oriented |
Best For | Analytical queries | Transactional workloads | Hierarchical data |
Sparse Data | Very efficient | Less efficient | Moderate efficiency |
Point Queries | Slower | Faster | Moderate |
Column Scans | Very fast | Slower | Moderate |
Updates | Block-level | Record-level | Group-level |
Compression | Excellent | Good | Moderate |
Column Store (COL) - Innovative Implementation
Architecture
Column store databases in Grapa are optimized for analytical workloads and large datasets. Unlike row store databases that store data record-by-record, column stores organize data by columns, making them ideal for queries that access specific fields across many records.
Storage Model
- Fragmented Data Storage: Uses FREC_DATA for efficient handling of sparse data
- Column-Oriented: Each field is stored as a separate column
- Dynamic Growth: Columns can grow incrementally as new records are added
- Sparse Data Support: Only stores data where it actually exists
Key Components
- CTABLE_TREE: The main tree structure for column store databases
- FREC_DATA: Fragmented data storage for fixed fields
- SDATA_TREE: Tree structure for variable fields
- Growth Parameters: Configurable growth sizes for efficient storage management
Use Cases
Ideal For
- Analytical Queries: Aggregations, sums, averages across columns
- Data Warehousing: Large datasets with many columns
- Time-Series Data: Historical data analysis
- Sparse Data: Tables with many optional fields
- Column Scans: Queries that access specific fields across records
Not Ideal For
- Frequent Record Updates: Row store is better for transactional workloads
- Point Queries: Individual record lookups
- Small Datasets: Overhead may not be justified
Performance Optimizations
Storage Efficiency
- Sparse Data Handling: Only allocates space for actual data
- Compression: Similar data types can be compressed efficiently
- Block-Level Access: Loads only relevant data blocks into memory
Query Performance
- Column Scans: Fast access to all values in a column
- Aggregations: Efficient computation of sums, averages, etc.
- Cache Locality: Better memory cache utilization for column-oriented access
Field Types and Storage
Fixed Fields (STORE_FIX)
- Use FREC_DATA for fragmented storage
- Automatically set growth size to field size if not specified
- Efficient for small, frequently accessed fields
Variable Fields (STORE_VAR)
- Use SDATA_TREE for tree-based storage
- Good for medium-sized variable data
- Supports efficient updates and deletions
Partitioned Fields (STORE_PAR)
- Use FREC_DATA for large data requiring partial updates
- Ideal for very large data that needs incremental updates
- Used for COL store $TABLE types
Implementation Details
Fragmented Data Storage (FREC_DATA)
Column store fixed fields use fragmented data storage for several key reasons:
- Sparse Data Efficiency: Only creates data blocks when needed, avoiding wasted space for NULL values
- Dynamic Growth: Columns can grow incrementally without reallocating entire data structures
- Update Efficiency: Individual blocks can be updated without affecting the entire column
- Memory Efficiency: Only relevant blocks need to be loaded into memory for queries
Growth Parameter Management
The system automatically sets growth parameters for optimal performance: - Fixed Fields: storeGrow
defaults to storeSize
if not specified - Variable Fields: Uses configurable growth sizes for tree-based storage - Dynamic Adjustment: Growth parameters can be tuned based on usage patterns
Row Store (ROW)
Overview
Row store databases store data record-by-record, making them ideal for transactional workloads where you frequently access complete records.
Use Cases
- Transactional Applications: Banking, e-commerce, user management
- Frequent Updates: Applications with many INSERT/UPDATE operations
- Point Queries: Individual record lookups
- Small to Medium Datasets: Where analytical performance isn't critical
Performance Characteristics
- Fast Point Queries: Quick access to individual records
- Efficient Updates: Record-level updates without affecting other data
- Good for OLTP: Online transaction processing workloads
Group Store (GROUP)
Overview
Group store databases are optimized for hierarchical data structures and grouped operations.
Use Cases
- Hierarchical Data: Organizational charts, file systems, nested structures
- Grouped Operations: Data that naturally forms groups or categories
- Tree-like Structures: Data with parent-child relationships
Performance Characteristics
- Efficient Group Operations: Fast access to grouped data
- Hierarchical Queries: Good performance for tree-like data structures
- Moderate Flexibility: Balanced between row and column stores
Best Practices
Choosing the Right Storage Type
Use Column Store (COL) When:
- You have analytical workloads with aggregations
- Your data is sparse (many NULL values)
- You perform column scans frequently
- You have large datasets (>1M records)
- You need excellent compression
Use Row Store (ROW) When:
- You have transactional workloads
- You frequently update individual records
- You perform many point queries
- You have small to medium datasets
- You need fast individual record access
Use Group Store (GROUP) When:
- Your data has natural hierarchical structure
- You perform grouped operations frequently
- You have tree-like data relationships
- You need balanced performance characteristics
Schema Design
- Choose appropriate field types and storage options
- Consider data sparsity when designing schemas
- Use fixed fields for small, frequently accessed data
Query Optimization
- Design queries to match your storage type's strengths
- Use aggregations and column scans for column stores
- Use point queries for row stores
- Use grouped operations for group stores
Monitoring and Maintenance
Performance Monitoring
- Track access patterns for each storage type
- Monitor storage efficiency
- Analyze query performance
Storage Optimization
- Regular compression analysis for column stores
- Growth parameter tuning
- Data distribution optimization
Maintenance Tasks
- Block-level defragmentation for column stores
- Index maintenance
- Statistics updates
Future Enhancements
Planned Features
- Advanced compression algorithms for all storage types
- Cross-storage-type query optimization
- Materialized views
- Query result caching
Performance Improvements
- Vectorized query execution
- SIMD optimizations
- Parallel query processing
- Adaptive query optimization