Skip to content

Machine Learning in Grapa

Overview

Grapa provides comprehensive vector operations that enable the implementation of various machine learning algorithms. This guide covers the available capabilities, implementation approaches, and practical examples for machine learning tasks.

Available Capabilities

Core Mathematical Operations

Grapa's vector operations provide all the mathematical foundations needed for machine learning:

  • Matrix Operations: Multiplication, inversion, transpose, determinant
  • Statistical Functions: Mean, standard deviation, covariance, correlation
  • Linear Algebra: Solving linear systems, eigenvalues, matrix decomposition
  • Element-wise Operations: Addition, subtraction, multiplication, division, power
  • Vector Operations: Dot product, norm calculation, element-wise functions

Supported Algorithms

Linear Models

  • Linear Regression (Normal Equation)
  • Linear Regression (Gradient Descent)
  • Ridge Regression (L2 Regularization)
  • Lasso Regression (L1 Regularization) - via optimization

Statistical Analysis

  • Correlation Analysis
  • Feature Scaling and Normalization
  • Principal Component Analysis (PCA)
  • Covariance Analysis

Model Evaluation

  • R-squared (Coefficient of Determination)
  • Mean Squared Error (MSE)
  • Root Mean Squared Error (RMSE)
  • Mean Absolute Error (MAE)

Linear Regression

Overview

Linear regression is a fundamental machine learning algorithm that models the relationship between independent variables (features) and a dependent variable (target) using a linear function.

Mathematical Form: y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε

Implementation Approaches

1. Normal Equation (Closed Form)

The normal equation provides the optimal solution directly:

/* Linear regression using normal equation */
linear_regression_normal = op(X, y) {
    /* Add bias term (intercept) */
    X_with_bias = add_bias_column(X);

    /* Calculate coefficients using normal equation */
    XT = X_with_bias.t();
    XTX = XT.dot(X_with_bias);
    XTy = XT.dot(y);

    /* Solve for coefficients */
    coefficients = XTX.inv().dot(XTy);
    coefficients;
};

Advantages: - Exact solution (no iterations needed) - Fast for small to medium datasets - No hyperparameters to tune

Limitations: - O(n³) complexity for matrix inversion - Memory intensive for large datasets - May fail for singular matrices

2. Gradient Descent (Iterative)

Gradient descent iteratively updates coefficients to minimize the cost function:

/* Linear regression using gradient descent */
linear_regression_gradient = op(X, y, learning_rate, iterations) {
    /* Initialize coefficients */
    n_features = X.shape().get(1);
    coefficients = [];
    i = 0;
    while (i < n_features) {
        coefficients += 0.0;
        i = i + 1;
    }
    coefficients = coefficients.vector();

    /* Gradient descent */
    i = 0;
    while (i < iterations) {
        /* Predictions */
        predictions = X.dot(coefficients);

        /* Calculate gradients */
        errors = y - predictions;
        gradients = X.t().dot(errors) * (2.0 / X.shape().get(0));

        /* Update coefficients */
        coefficients = coefficients - (gradients * learning_rate);

        i = i + 1;
    }
    coefficients;
};

Advantages: - Works with large datasets - Memory efficient - Can handle non-linear cost functions

Limitations: - Requires hyperparameter tuning (learning rate, iterations) - May converge to local minima - Slower for small datasets

3. Ridge Regression (L2 Regularization)

Ridge regression adds L2 regularization to prevent overfitting:

/* Ridge regression with regularization */
ridge_regression = op(X, y, lambda) {
    /* Add bias term */
    X_with_bias = add_bias_column(X);

    /* Add regularization term */
    n_features = X_with_bias.shape().get(1);
    regularization_matrix = [];
    i = 0;
    while (i < n_features) {
        row = [];
        j = 0;
        while (j < n_features) {
            if (i == j && i > 0) {  /* Don't regularize bias term */
                row += lambda;
            } else {
                row += 0.0;
            }
            j = j + 1;
        }
        regularization_matrix += row;
        i = i + 1;
    }
    regularization_matrix = regularization_matrix.vector();

    /* Calculate coefficients */
    XT = X_with_bias.t();
    XTX = XT.dot(X_with_bias);
    XTX_reg = XTX + regularization_matrix;
    XTy = XT.dot(y);

    coefficients = XTX_reg.inv().dot(XTy);
    coefficients;
};

Advantages: - Prevents overfitting - Handles multicollinearity - Numerically stable

Limitations: - Requires tuning regularization parameter - May underfit if lambda is too large

Data Preprocessing

Feature Scaling

Standardization (Z-score normalization) is crucial for many algorithms:

/* Standardization (Z-score normalization) */
standardize_features = op(X) {
    means = X.mean(0);                   /* Mean of each feature */
    stds = X.std(0);                     /* Standard deviation of each feature */

    /* Standardize each feature */
    X_standardized = [];
    i = 0;
    while (i < X.shape().get(0)) {
        row = [];
        j = 0;
        while (j < X.shape().get(1)) {
            standardized_val = (X.get(i).get(j) - means.get(j)) / stds.get(j);
            row += standardized_val;
            j = j + 1;
        }
        X_standardized += row;
        i = i + 1;
    }
    X_standardized.vector();
};

Feature Normalization

Min-max normalization scales features to a specific range:

/* Min-max normalization */
normalize_features = op(X) {
    mins = X.min(0);                     /* Minimum of each feature */
    maxs = X.max(0);                     /* Maximum of each feature */

    /* Normalize each feature */
    X_normalized = [];
    i = 0;
    while (i < X.shape().get(0)) {
        row = [];
        j = 0;
        while (j < X.shape().get(1)) {
            normalized_val = (X.get(i).get(j) - mins.get(j)) / (maxs.get(j) - mins.get(j));
            row += normalized_val;
            j = j + 1;
        }
        X_normalized += row;
        i = i + 1;
    }
    X_normalized.vector();
};

Model Evaluation

Performance Metrics

/* Calculate R-squared (coefficient of determination) */
calculate_r_squared = op(y_true, y_pred) {
    y_mean = y_true.mean();
    ss_total = ((y_true - y_mean) * #[op(x){x**2}]#).sum();
    ss_residual = ((y_true - y_pred) * #[op(x){x**2}]#).sum();
    r_squared = 1.0 - (ss_residual.get(0) / ss_total.get(0));
    r_squared;
};

/* Calculate Mean Squared Error */
calculate_mse = op(y_true, y_pred) {
    mse = ((y_true - y_pred) * #[op(x){x**2}]#).mean().get(0);
    mse;
};

/* Calculate Root Mean Squared Error */
calculate_rmse = op(y_true, y_pred) {
    mse = calculate_mse(y_true, y_pred);
    rmse = mse ** 0.5;
    rmse;
};

/* Calculate Mean Absolute Error */
calculate_mae = op(y_true, y_pred) {
    mae = ((y_true - y_pred) * #[op(x){x < 0 ? -x : x}]#).mean().get(0);
    mae;
};

Complete Linear Regression Implementation

Full Class Implementation

/* Complete linear regression implementation */
LinearRegression = {
    /* Initialize model */
    init: op() {
        $this.coefficients = null;
$this.feature_means = null;
$this.feature_stds = null;
$this.is_fitted = false;
    },

    /* Fit the model */
    fit: op(X, y, method = "normal", lambda = 0.0) {
        /* Store feature statistics for scaling */
        $this.feature_means = X.mean(0);
        $this.feature_stds = X.std(0);

        /* Scale features */
        X_scaled = $this.standardize_features(X);

        /* Add bias term */
        X_with_bias = $this.add_bias_column(X_scaled);

        if (method == "normal") {
            /* Normal equation method */
            $this.coefficients = $this.normal_equation(X_with_bias, y);
        } else if (method == "gradient") {
            /* Gradient descent method */
            $this.coefficients = $this.gradient_descent(X_with_bias, y, 0.01, 1000);
        } else if (method == "ridge") {
            /* Ridge regression */
            $this.coefficients = $this.ridge_regression(X_with_bias, y, lambda);
        }

        $this.is_fitted = true;
    },

    /* Make predictions */
    predict: op(X) {
        if (!$this.is_fitted) {
            throw "Model not fitted. Call fit() first.";
        }

        /* Scale features */
        X_scaled = $this.standardize_features(X);

        /* Add bias term */
        X_with_bias = $this.add_bias_column(X_scaled);

        /* Make predictions */
        predictions = X_with_bias.dot($this.coefficients);
        predictions;
    },

    /* Calculate performance metrics */
    score: op(X, y) {
        predictions = $this.predict(X);
        r_squared = $this.calculate_r_squared(y, predictions);
        mse = $this.calculate_mse(y, predictions);
        rmse = $this.calculate_rmse(y, predictions);
        mae = $this.calculate_mae(y, predictions);

        {
            "r_squared": r_squared,
            "mse": mse,
            "rmse": rmse,
            "mae": mae
        };
    },

    /* Helper methods */
    standardize_features: op(X) {
        /* Implementation as shown above */
    },

    add_bias_column: op(X) {
        /* Implementation as shown above */
    },

    normal_equation: op(X, y) {
        /* Implementation as shown above */
    },

    gradient_descent: op(X, y, learning_rate, iterations) {
        /* Implementation as shown above */
    },

    ridge_regression: op(X, y, lambda) {
        /* Implementation as shown above */
    },

    calculate_r_squared: op(y_true, y_pred) {
        /* Implementation as shown above */
    },

    calculate_mse: op(y_true, y_pred) {
        /* Implementation as shown above */
    },

    calculate_rmse: op(y_true, y_pred) {
        /* Implementation as shown above */
    },

    calculate_mae: op(y_true, y_pred) {
        /* Implementation as shown above */
    }
};

Usage Examples

Basic Linear Regression

/* Create and fit a linear regression model */
model = LinearRegression();
model.init();

/* Prepare data */
X = [[1, 2], [3, 4], [5, 6], [7, 8]].vector();
y = [10, 20, 30, 40].vector();

/* Fit the model */
model.fit(X, y, "normal");

/* Make predictions */
predictions = model.predict(X);

/* Evaluate performance */
metrics = model.score(X, y);
("R-squared: " + metrics.get("r_squared")).echo();
("RMSE: " + metrics.get("rmse")).echo();

Ridge Regression

/* Ridge regression with regularization */
model = LinearRegression();
model.init();

/* Fit with ridge regression */
model.fit(X, y, "ridge", 0.1);

/* Compare different lambda values */
lambda_values = [0.0, 0.1, 1.0, 10.0];
lambda_values.each(op(lambda) {
    model.fit(X, y, "ridge", lambda);
    metrics = model.score(X, y);
    ("Lambda=" + lambda + ": R²=" + metrics.get("r_squared")).echo();
});

Gradient Descent

/* Gradient descent for large datasets */
model = LinearRegression();
model.init();

/* Fit using gradient descent */
model.fit(X, y, "gradient");

/* Custom learning rate and iterations */
model.fit(X, y, "gradient", 0.001, 2000);

Performance Considerations

Algorithm Selection

Dataset Size Recommended Method Performance Expectation
< 100 samples Normal Equation < 1ms
100-1000 samples Normal Equation < 100ms
1000-10000 samples Gradient Descent < 1min
> 10000 samples Gradient Descent Variable

Memory Usage

Dataset Size Memory Usage Considerations
< 100 samples < 1KB No concerns
100-1000 samples < 100KB Efficient
1000-10000 samples < 1MB Monitor usage
> 10000 samples > 10MB Consider chunking

Precision Optimization

For performance-critical machine learning applications, you can significantly improve training speed by reducing floating-point precision:

/* Set system precision for machine learning optimization */
// ✅ High performance - 32-bit precision (~2x faster)
32.setfloat(0);
model.fit(X, y, "normal");  // Much faster training

// ✅ Balanced - 64-bit precision (good speed/accuracy)
64.setfloat(0);
model.fit(X, y, "normal");  // Good performance

// ✅ High accuracy - 128-bit precision (default)
128.setfloat(0);
model.fit(X, y, "normal");  // Maximum accuracy

Performance Impact: - 32-bit precision: ~2x faster than 128-bit precision - 64-bit precision: ~1.1x faster than 128-bit precision - 128-bit precision: Maximum accuracy (default)

Fixed-Point vs Floating-Point for Machine Learning

Grapa automatically switches between floating-point and fixed-point representations depending on the mathematical function, with internal optimizations in GrapaFloat.cpp that choose the best representation for each operation. For most machine learning applications, the system default choice is sufficient:

/* System automatically chooses optimal representation per operation */
32.setfloat(0);  // Sets system preference, but Grapa optimizes internally
model.fit(X, y, "normal");

/* Alternative system preference */
32.setfix(0);    // Sets system preference, but Grapa optimizes internally  
model.fit(X, y, "normal");

For Machine Learning: - System optimization: Grapa's internal functions automatically choose the best representation for each mathematical operation - Minimal impact: For most ML applications, the choice between setfloat() and setfix() has minimal impact on results - Focus on precision: The bit precision (32, 64, 128) has much more impact than the float/fix choice - Default recommendation: Use setfloat() as the default unless you have specific requirements

When to Use Lower Precision: - ✅ Machine learning applications (32-bit often sufficient) - ✅ Real-time inference requiring maximum speed - ✅ Batch processing of large datasets - ✅ When accuracy requirements allow for lower precision - ✅ Prototyping and experimentation

When to Use Higher Precision: - ✅ Financial modeling requiring exact precision - ✅ Scientific computing with strict accuracy requirements - ✅ Final model deployment where accuracy is critical - ✅ When precision is more important than speed

Precision Performance Example:

/* Linear regression with different precision settings */
n_samples = 10000;

// 32-bit precision - Fast training (Grapa optimizes representation internally)
32.setfloat(0);
start_time = $TIME().utc();
model.fit(X, y, "normal");
training_time_32bit = start_time.ms();  // ~277ms

// 128-bit precision - Accurate training (Grapa optimizes representation internally)
128.setfloat(0);
start_time = $TIME().utc();
model.fit(X, y, "normal");
training_time_128bit = start_time.ms();  // ~529ms

// 32-bit is ~1.9x faster with minimal accuracy loss

Best Practices

  1. Always scale features before training
  2. Use ridge regression for numerical stability
  3. Monitor performance for large datasets
  4. Validate results with multiple metrics
  5. Handle edge cases (singular matrices, etc.)
  6. Consider precision optimization for performance-critical applications
  7. Use 32-bit precision for machine learning when speed is important
  8. Use 128-bit precision for final models when accuracy is critical
  9. Use setfloat() as default - Grapa optimizes representation internally
  10. Focus on bit precision rather than float/fix choice for ML applications

Advanced Topics

Principal Component Analysis (PCA)

/* Simple PCA implementation */
pca = op(X, n_components) {
    /* Center the data */
    means = X.mean(0);
    X_centered = X - means;

    /* Calculate covariance matrix */
    cov_matrix = X_centered.cov();

    /* Get eigenvalues and eigenvectors */
    eigen_result = cov_matrix.eigh();

    /* Sort by eigenvalues and select top components */
    /* Implementation details depend on eigen_result structure */

    /* Project data onto principal components */
    /* X_pca = X_centered.dot(eigenvectors) */
};

Cross-Validation

/* K-fold cross-validation */
cross_validate = op(X, y, k_folds, model_type) {
    n_samples = X.shape().get(0);
    fold_size = n_samples / k_folds;

    scores = [];

    i = 0;
    while (i < k_folds) {
        /* Split data into train/test */
        start_idx = i * fold_size;
        end_idx = (i + 1) * fold_size;

        /* Create train/test splits */
        /* Implementation details for data splitting */

        /* Train and evaluate model */
        model = LinearRegression();
        model.init();
        model.fit(X_train, y_train, model_type);
        metrics = model.score(X_test, y_test);
        scores += metrics.get("r_squared");

        i = i + 1;
    }

    /* Return average score */
    scores.mean();
};

Examples and References

For complete working examples, see:

See Also

Conclusion

Grapa's vector operations provide a comprehensive foundation for machine learning, particularly linear models. The mathematical capabilities are complete, performance is excellent for typical use cases, and the implementation is straightforward and practical.

Key strengths: - Complete mathematical foundation for linear models - Multiple implementation approaches (normal equation, gradient descent, regularization) - Comprehensive evaluation metrics and statistical analysis - Excellent performance for small to medium datasets - Easy to implement and use with clear, readable code

This makes Grapa a viable platform for implementing machine learning algorithms, particularly for educational purposes, prototyping, and small to medium-scale applications.