Under the Hood

The Technology

A multi-model approach that combines the best of machine learning research. Different tools for different problems—stacked for maximum accuracy.

Not All Predictions Are Equal

MAPE = Mean Absolute Percentage Error (lower is better)

Method
Typical Error
Best Case
Data Required
Traditional Human Expert Estimation
10-25%
10%
Experience
ML Linear Regression
15-30%
12%
50+ projects
ML Random Forest
8-15%
6%
200+ projects
ML XGBoost / Gradient Boosting
6-12%
5%
200+ projects
Deep Neural Network (MLP)
8-15%
6%
500+ projects
Deep LSTM (Time-Series)
5-10%
<5%
500+ with temporal
Ensemble Stacked Models
4-8%
3%
500+ projects
FSE's Approach: We don't pick one method—we use an ensemble that combines multiple models, letting each contribute where it's strongest. The meta-learner figures out which model to trust for each prediction.

Three Layers of Intelligence

From raw data to calibrated predictions

1

Structured Analysis

XGBoost + Random Forest

Tree-based models that excel at finding patterns in your project data: size, type, location, contract structure. These methods handle missing data naturally and tell us exactly which features matter most.

6-12% Typical MAPE
Minutes Training Time
High Interpretability
2

Temporal Patterns

LSTM Neural Networks

Construction projects unfold over time. LSTM networks "remember" patterns across sequences—seasonal cost variations, market trends, phase dependencies. They capture what static models miss.

<5% Best Case MAPE
Hours Training Time
Medium Interpretability

Beyond Point Estimates

Prediction Intervals

Instead of "$5M", you get: "10% chance below $4M, 50% around $5M, 10% above $7M." Quantile regression provides calibrated uncertainty bounds for risk-aware decisions.

Joint Cost + Duration

Multi-task learning predicts cost and schedule together. The model learns their relationship: delays increase costs, scope changes affect both. One model, two outputs, shared intelligence.

SHAP Explainability

See exactly how each factor contributes: "Floor area added $1.2M, NYC location added $0.8M, fixed contract saved $0.3M." No black boxes—full transparency on every prediction.

Transfer Learning

Pre-train on broad construction data, fine-tune on your specific projects. Limited data? The model brings general construction knowledge, then adapts to your patterns.

Architecture Deep Dive

For the engineers who want specifics

Input Processing

Text Description Fine-tuned BERT → Feature Vector
Structured Data Size, type, location → Normalized
Time Series Cost indices over time → LSTM Encoder

Ensemble Layer

XGBoost 100-500 trees, 6 levels deep
Neural Network 256→128→64 neurons, ReLU
LSTM 256 units, temporal sequences

Meta Learner

Stacking Ridge regression combines predictions
Multi-task Joint cost + duration heads
Quantile P10, P50, P90 outputs
Cost: $5.2M [±$0.8M at 80% CI]
Duration: 14 months [±2 months]

Key Input Features

Size
  • Total floor area
  • Number of floors
  • Building height
Design
  • Compactness ratio
  • Percentage of openings
  • Structural system
Project
  • Contract type
  • Tendering method
  • Provisional sums
Market
  • Inflation rate
  • Material price indices
  • Labor market conditions
Location
  • Geographic region
  • Soil conditions
  • Local labor rates