Machine Learning in Grapa
Overview
Grapa provides comprehensive vector operations that enable the implementation of various machine learning algorithms. This guide covers the available capabilities, implementation approaches, and practical examples for machine learning tasks.
Available Capabilities
Core Mathematical Operations
Grapa's vector operations provide all the mathematical foundations needed for machine learning:
- Matrix Operations: Multiplication, inversion, transpose, determinant
- Statistical Functions: Mean, standard deviation, covariance, correlation
- Linear Algebra: Solving linear systems, eigenvalues, matrix decomposition
- Element-wise Operations: Addition, subtraction, multiplication, division, power
- Vector Operations: Dot product, norm calculation, element-wise functions
Supported Algorithms
✅ Linear Models
- Linear Regression (Normal Equation)
- Linear Regression (Gradient Descent)
- Ridge Regression (L2 Regularization)
- Lasso Regression (L1 Regularization) - via optimization
✅ Statistical Analysis
- Correlation Analysis
- Feature Scaling and Normalization
- Principal Component Analysis (PCA)
- Covariance Analysis
✅ Model Evaluation
- R-squared (Coefficient of Determination)
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- Mean Absolute Error (MAE)
Linear Regression
Overview
Linear regression is a fundamental machine learning algorithm that models the relationship between independent variables (features) and a dependent variable (target) using a linear function.
Mathematical Form: y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε
Implementation Approaches
1. Normal Equation (Closed Form)
The normal equation provides the optimal solution directly:
/* Linear regression using normal equation */
linear_regression_normal = op(X, y) {
/* Add bias term (intercept) */
X_with_bias = add_bias_column(X);
/* Calculate coefficients using normal equation */
XT = X_with_bias.t();
XTX = XT.dot(X_with_bias);
XTy = XT.dot(y);
/* Solve for coefficients */
coefficients = XTX.inv().dot(XTy);
coefficients;
};
Advantages: - Exact solution (no iterations needed) - Fast for small to medium datasets - No hyperparameters to tune
Limitations: - O(n³) complexity for matrix inversion - Memory intensive for large datasets - May fail for singular matrices
2. Gradient Descent (Iterative)
Gradient descent iteratively updates coefficients to minimize the cost function:
/* Linear regression using gradient descent */
linear_regression_gradient = op(X, y, learning_rate, iterations) {
/* Initialize coefficients */
n_features = X.shape().get(1);
coefficients = [];
i = 0;
while (i < n_features) {
coefficients += 0.0;
i = i + 1;
}
coefficients = coefficients.vector();
/* Gradient descent */
i = 0;
while (i < iterations) {
/* Predictions */
predictions = X.dot(coefficients);
/* Calculate gradients */
errors = y - predictions;
gradients = X.t().dot(errors) * (2.0 / X.shape().get(0));
/* Update coefficients */
coefficients = coefficients - (gradients * learning_rate);
i = i + 1;
}
coefficients;
};
Advantages: - Works with large datasets - Memory efficient - Can handle non-linear cost functions
Limitations: - Requires hyperparameter tuning (learning rate, iterations) - May converge to local minima - Slower for small datasets
3. Ridge Regression (L2 Regularization)
Ridge regression adds L2 regularization to prevent overfitting:
/* Ridge regression with regularization */
ridge_regression = op(X, y, lambda) {
/* Add bias term */
X_with_bias = add_bias_column(X);
/* Add regularization term */
n_features = X_with_bias.shape().get(1);
regularization_matrix = [];
i = 0;
while (i < n_features) {
row = [];
j = 0;
while (j < n_features) {
if (i == j && i > 0) { /* Don't regularize bias term */
row += lambda;
} else {
row += 0.0;
}
j = j + 1;
}
regularization_matrix += row;
i = i + 1;
}
regularization_matrix = regularization_matrix.vector();
/* Calculate coefficients */
XT = X_with_bias.t();
XTX = XT.dot(X_with_bias);
XTX_reg = XTX + regularization_matrix;
XTy = XT.dot(y);
coefficients = XTX_reg.inv().dot(XTy);
coefficients;
};
Advantages: - Prevents overfitting - Handles multicollinearity - Numerically stable
Limitations: - Requires tuning regularization parameter - May underfit if lambda is too large
Data Preprocessing
Feature Scaling
Standardization (Z-score normalization) is crucial for many algorithms:
/* Standardization (Z-score normalization) */
standardize_features = op(X) {
means = X.mean(0); /* Mean of each feature */
stds = X.std(0); /* Standard deviation of each feature */
/* Standardize each feature */
X_standardized = [];
i = 0;
while (i < X.shape().get(0)) {
row = [];
j = 0;
while (j < X.shape().get(1)) {
standardized_val = (X.get(i).get(j) - means.get(j)) / stds.get(j);
row += standardized_val;
j = j + 1;
}
X_standardized += row;
i = i + 1;
}
X_standardized.vector();
};
Feature Normalization
Min-max normalization scales features to a specific range:
/* Min-max normalization */
normalize_features = op(X) {
mins = X.min(0); /* Minimum of each feature */
maxs = X.max(0); /* Maximum of each feature */
/* Normalize each feature */
X_normalized = [];
i = 0;
while (i < X.shape().get(0)) {
row = [];
j = 0;
while (j < X.shape().get(1)) {
normalized_val = (X.get(i).get(j) - mins.get(j)) / (maxs.get(j) - mins.get(j));
row += normalized_val;
j = j + 1;
}
X_normalized += row;
i = i + 1;
}
X_normalized.vector();
};
Model Evaluation
Performance Metrics
/* Calculate R-squared (coefficient of determination) */
calculate_r_squared = op(y_true, y_pred) {
y_mean = y_true.mean();
ss_total = ((y_true - y_mean) * #[op(x){x**2}]#).sum();
ss_residual = ((y_true - y_pred) * #[op(x){x**2}]#).sum();
r_squared = 1.0 - (ss_residual.get(0) / ss_total.get(0));
r_squared;
};
/* Calculate Mean Squared Error */
calculate_mse = op(y_true, y_pred) {
mse = ((y_true - y_pred) * #[op(x){x**2}]#).mean().get(0);
mse;
};
/* Calculate Root Mean Squared Error */
calculate_rmse = op(y_true, y_pred) {
mse = calculate_mse(y_true, y_pred);
rmse = mse ** 0.5;
rmse;
};
/* Calculate Mean Absolute Error */
calculate_mae = op(y_true, y_pred) {
mae = ((y_true - y_pred) * #[op(x){x < 0 ? -x : x}]#).mean().get(0);
mae;
};
Complete Linear Regression Implementation
Full Class Implementation
/* Complete linear regression implementation */
LinearRegression = {
/* Initialize model */
init: op() {
$this.coefficients = null;
$this.feature_means = null;
$this.feature_stds = null;
$this.is_fitted = false;
},
/* Fit the model */
fit: op(X, y, method = "normal", lambda = 0.0) {
/* Store feature statistics for scaling */
$this.feature_means = X.mean(0);
$this.feature_stds = X.std(0);
/* Scale features */
X_scaled = $this.standardize_features(X);
/* Add bias term */
X_with_bias = $this.add_bias_column(X_scaled);
if (method == "normal") {
/* Normal equation method */
$this.coefficients = $this.normal_equation(X_with_bias, y);
} else if (method == "gradient") {
/* Gradient descent method */
$this.coefficients = $this.gradient_descent(X_with_bias, y, 0.01, 1000);
} else if (method == "ridge") {
/* Ridge regression */
$this.coefficients = $this.ridge_regression(X_with_bias, y, lambda);
}
$this.is_fitted = true;
},
/* Make predictions */
predict: op(X) {
if (!$this.is_fitted) {
throw "Model not fitted. Call fit() first.";
}
/* Scale features */
X_scaled = $this.standardize_features(X);
/* Add bias term */
X_with_bias = $this.add_bias_column(X_scaled);
/* Make predictions */
predictions = X_with_bias.dot($this.coefficients);
predictions;
},
/* Calculate performance metrics */
score: op(X, y) {
predictions = $this.predict(X);
r_squared = $this.calculate_r_squared(y, predictions);
mse = $this.calculate_mse(y, predictions);
rmse = $this.calculate_rmse(y, predictions);
mae = $this.calculate_mae(y, predictions);
{
"r_squared": r_squared,
"mse": mse,
"rmse": rmse,
"mae": mae
};
},
/* Helper methods */
standardize_features: op(X) {
/* Implementation as shown above */
},
add_bias_column: op(X) {
/* Implementation as shown above */
},
normal_equation: op(X, y) {
/* Implementation as shown above */
},
gradient_descent: op(X, y, learning_rate, iterations) {
/* Implementation as shown above */
},
ridge_regression: op(X, y, lambda) {
/* Implementation as shown above */
},
calculate_r_squared: op(y_true, y_pred) {
/* Implementation as shown above */
},
calculate_mse: op(y_true, y_pred) {
/* Implementation as shown above */
},
calculate_rmse: op(y_true, y_pred) {
/* Implementation as shown above */
},
calculate_mae: op(y_true, y_pred) {
/* Implementation as shown above */
}
};
Usage Examples
Basic Linear Regression
/* Create and fit a linear regression model */
model = LinearRegression();
model.init();
/* Prepare data */
X = [[1, 2], [3, 4], [5, 6], [7, 8]].vector();
y = [10, 20, 30, 40].vector();
/* Fit the model */
model.fit(X, y, "normal");
/* Make predictions */
predictions = model.predict(X);
/* Evaluate performance */
metrics = model.score(X, y);
("R-squared: " + metrics.get("r_squared")).echo();
("RMSE: " + metrics.get("rmse")).echo();
Ridge Regression
/* Ridge regression with regularization */
model = LinearRegression();
model.init();
/* Fit with ridge regression */
model.fit(X, y, "ridge", 0.1);
/* Compare different lambda values */
lambda_values = [0.0, 0.1, 1.0, 10.0];
lambda_values.each(op(lambda) {
model.fit(X, y, "ridge", lambda);
metrics = model.score(X, y);
("Lambda=" + lambda + ": R²=" + metrics.get("r_squared")).echo();
});
Gradient Descent
/* Gradient descent for large datasets */
model = LinearRegression();
model.init();
/* Fit using gradient descent */
model.fit(X, y, "gradient");
/* Custom learning rate and iterations */
model.fit(X, y, "gradient", 0.001, 2000);
Performance Considerations
Algorithm Selection
| Dataset Size | Recommended Method | Performance Expectation |
|---|---|---|
| < 100 samples | Normal Equation | < 1ms |
| 100-1000 samples | Normal Equation | < 100ms |
| 1000-10000 samples | Gradient Descent | < 1min |
| > 10000 samples | Gradient Descent | Variable |
Memory Usage
| Dataset Size | Memory Usage | Considerations |
|---|---|---|
| < 100 samples | < 1KB | No concerns |
| 100-1000 samples | < 100KB | Efficient |
| 1000-10000 samples | < 1MB | Monitor usage |
| > 10000 samples | > 10MB | Consider chunking |
Parallel Model Training
Grapa's threading capabilities provide a significant advantage over Python for machine learning workflows that benefit from parallel computation:
Hyperparameter Tuning
/* Parallel hyperparameter search - all models train simultaneously */
lambda_values = [0.001, 0.01, 0.1, 1.0, 10.0, 100.0];
ridge_results = lambda_values.map(op(lambda_val) {
/* Each model trains in its own thread */
model = train_ridge_regression(X, y, lambda_val);
{lambda: lambda_val, r2: model.r2, rmse: model.rmse};
});
/* Results available immediately after all threads complete */
best_model = ridge_results.max(op(result) { result.r2; });
Cross-Validation
/* Parallel cross-validation folds */
cv_folds = [0, 1, 2, 3, 4];
cv_scores = cv_folds.map(op(fold) {
/* Each fold trains in parallel */
/* Note: Copy shared data for thread safety with vectors */
X_train, X_val, y_train, y_val = split_data(X.copy(), y.copy(), fold);
model = train_model(X_train, y_train);
model.score(X_val, y_val);
});
/* Calculate mean CV score */
mean_cv_score = cv_scores.mean();
Model Comparison
/* Compare multiple algorithms in parallel */
algorithms = ["linear", "ridge", "lasso"];
results = algorithms.map(op(algorithm) {
model = train_model(X, y, algorithm);
{algorithm: algorithm, score: model.score(X_test, y_test)};
});
Thread Safety with Vectors
Important: When using vectors in parallel operations, you may need to copy shared data for thread safety:
/* Thread-safe parallel hyperparameter tuning */
lambda_values = [0.001, 0.01, 0.1, 1.0];
ridge_results = lambda_values.map(op(lambda_val) {
/* Copy shared vectors for thread safety */
X_copy = X.copy();
y_copy = y.copy();
/* Each thread works with its own copy */
model = train_ridge_regression(X_copy, y_copy, lambda_val);
{lambda: lambda_val, r2: model.r2, rmse: model.rmse};
});
Why copying is needed: - Vectors may have internal state that's not thread-safe - Copying ensures each thread has independent data - Prevents race conditions and data corruption - Required for reliable parallel vector operations
Advantages over Python:
- No GIL limitations - true parallel CPU execution
- Shared memory - no data serialization overhead
- Simple syntax - automatic thread management with .map()
- Lightweight threads - minimal creation overhead
Performance Benefits: - 4x speedup for 4 parallel models (vs sequential) - Memory efficient - shared data across threads (with copying when needed) - Scalable - easily parallelize any independent computations
Precision Optimization
For performance-critical machine learning applications, you can significantly improve training speed by reducing floating-point precision:
/* Set system precision for machine learning optimization */
// ✅ High performance - 32-bit precision (~2x faster)
32.setfloat(0);
model.fit(X, y, "normal"); // Much faster training
// ✅ Balanced - 64-bit precision (good speed/accuracy)
64.setfloat(0);
model.fit(X, y, "normal"); // Good performance
// ✅ High accuracy - 128-bit precision (default)
128.setfloat(0);
model.fit(X, y, "normal"); // Maximum accuracy
Performance Impact: - 32-bit precision: ~2x faster than 128-bit precision - 64-bit precision: ~1.1x faster than 128-bit precision - 128-bit precision: Maximum accuracy (default)
Fixed-Point vs Floating-Point for Machine Learning
Grapa automatically switches between floating-point and fixed-point representations depending on the mathematical function, with internal optimizations in GrapaFloat.cpp that choose the best representation for each operation. For most machine learning applications, the system default choice is sufficient:
/* System automatically chooses optimal representation per operation */
32.setfloat(0); // Sets system preference, but Grapa optimizes internally
model.fit(X, y, "normal");
/* Alternative system preference */
32.setfix(0); // Sets system preference, but Grapa optimizes internally
model.fit(X, y, "normal");
For Machine Learning:
- System optimization: Grapa's internal functions automatically choose the best representation for each mathematical operation
- Minimal impact: For most ML applications, the choice between setfloat() and setfix() has minimal impact on results
- Focus on precision: The bit precision (32, 64, 128) has much more impact than the float/fix choice
- Default recommendation: Use setfloat() as the default unless you have specific requirements:
When to Use Lower Precision: - ✅ Machine learning applications (32-bit often sufficient) - ✅ Real-time inference requiring maximum speed - ✅ Batch processing of large datasets - ✅ When accuracy requirements allow for lower precision - ✅ Prototyping and experimentation
When to Use Higher Precision: - ✅ Financial modeling requiring exact precision - ✅ Scientific computing with strict accuracy requirements - ✅ Final model deployment where accuracy is critical - ✅ When precision is more important than speed
Precision Performance Example:
/* Linear regression with different precision settings */
n_samples = 10000;
// 32-bit precision - Fast training (Grapa optimizes representation internally)
32.setfloat(0);
start_time = $TIME().utc();
model.fit(X, y, "normal");
training_time_32bit = start_time.ms(); // ~277ms
// 128-bit precision - Accurate training (Grapa optimizes representation internally)
128.setfloat(0);
start_time = $TIME().utc();
model.fit(X, y, "normal");
training_time_128bit = start_time.ms(); // ~529ms
// 32-bit is ~1.9x faster with minimal accuracy loss
Best Practices
- Always scale features before training
- Use ridge regression for numerical stability
- Monitor performance for large datasets
- Validate results with multiple metrics
- Handle edge cases (singular matrices, etc.)
- Leverage parallel execution for hyperparameter tuning and cross-validation
- Use
.map()for independent computations - automatic parallelization - Copy shared vectors in parallel operations - required for thread safety
- Consider precision optimization for performance-critical applications
- Use 32-bit precision for machine learning when speed is important
- Use 128-bit precision for final models when accuracy is critical
- Use
setfloat()as default - Grapa optimizes representation internally - Focus on bit precision rather than float/fix choice for ML applications
Advanced Topics
Principal Component Analysis (PCA)
/* Simple PCA implementation */
pca = op(X, n_components) {
/* Center the data */
means = X.mean(0);
X_centered = X - means;
/* Calculate covariance matrix */
cov_matrix = X_centered.cov();
/* Get eigenvalues and eigenvectors */
eigen_result = cov_matrix.eigh();
/* Sort by eigenvalues and select top components */
/* Implementation details depend on eigen_result structure */
/* Project data onto principal components */
/* X_pca = X_centered.dot(eigenvectors) */
};
Cross-Validation
/* K-fold cross-validation */
cross_validate = op(X, y, k_folds, model_type) {
n_samples = X.shape().get(0);
fold_size = n_samples / k_folds;
scores = [];
i = 0;
while (i < k_folds) {
/* Split data into train/test */
start_idx = i * fold_size;
end_idx = (i + 1) * fold_size;
/* Create train/test splits */
/* Implementation details for data splitting */
/* Train and evaluate model */
model = LinearRegression();
model.init();
model.fit(X_train, y_train, model_type);
metrics = model.score(X_test, y_test);
scores += metrics.get("r_squared");
i = i + 1;
}
/* Return average score */
scores.mean();
};
Examples and References
For complete working examples, see:
- Linear Regression Example - Complete implementation with data generation, training, and evaluation
- Vector Operations - Core vector operations used in machine learning
- Performance Examples - Performance optimization techniques
See Also
- $VECTOR object - Core vector operations
- Vector Performance Guide - Performance optimization
- Statistical Functions - Built-in statistical operations
- Mathematical Operations - Mathematical operators
Conclusion
Grapa's vector operations provide a comprehensive foundation for machine learning, particularly linear models. The mathematical capabilities are complete, performance is excellent for typical use cases, and the implementation is straightforward and practical.
Key strengths: - Complete mathematical foundation for linear models - Multiple implementation approaches (normal equation, gradient descent, regularization) - Comprehensive evaluation metrics and statistical analysis - Excellent performance for small to medium datasets - True parallel execution for hyperparameter tuning and cross-validation - Easy to implement and use with clear, readable code
Competitive Advantages over Python:
- No GIL limitations - true parallel CPU execution for model training
- Shared memory threading - no data serialization overhead
- Simple parallel syntax - automatic thread management with .map()
- Lightweight threads - minimal creation overhead
This makes Grapa a viable platform for implementing machine learning algorithms, particularly for educational purposes, prototyping, hyperparameter tuning, and small to medium-scale applications where parallel execution provides significant performance benefits.