Case Study β’ Logistics & Supply Chain MDM
Logistics & Supply Chain
MDM (Multi-Regional)
Transforming fragmented multi-system data into a trusted, governed operational estate β 15+ year engagement across seven regional locations.
Executive summary
A two-phase programme: Phase 1 (2021β2023) delivered a unified enterprise data platform; Phase 2 (2025) deployed seven AI and optimisation engines on the trusted Gold layer. Data foundation first, AI second.
Month delivery
Q1 2021βQ3 2023 data platform engagement
Regional locations
Across two continents with unified governance
Duplicate rate
Entity duplicate rate eliminated at go-live
Documents ingested
OCR pipeline into governed data estate
Client overview
Rapid Link is a Miami-based logistics and warehouse management company and a recognized industry leader in end-to-end supply chain operations. The client manages international freight movements across multiple global trade lanes, warehousing of perishable and non-perishable goods, and serves a wide client base demanding speed, accuracy, and full supply chain visibility. EFutures has partnered with the client since 2007 β a 15+ year engagement that began with a full platform replacement and has expanded to encompass advanced data intelligence capabilities.
The challenge
Fragmented systems, multi-regional data chaos, and zero operational visibility when the engagement began.
Fragmented Systems
Customer, supplier, and shipment data split across TMS, CRM, and warehouse platforms with no shared identity model and no single trusted view.
Document Records
14,000+ document-based records β scanned PDFs and Excel onboarding files β entirely outside any structured data pipeline.
Duplicate Entities
38% duplicate rate across partner and customer entities, silently corrupting operational reporting and completely unknown to the business.
Manual Reconciliation
Finance and operations teams spending significant capacity on manual cross-system reconciliation every reporting cycle.
Shadow Spreadsheets
Regional dashboards distrusted by managers, leading to offline shadow spreadsheets and unreliable shipment performance reporting.
No External Connectivity
No integration to carrier databases, port authority scheduling feeds, or real-time market pricing data for freight quotation.
The solution
EFutures designed and built a full Logistics and Warehouse Management platform from the ground up, replacing the legacy system with a scalable, modular solution that unified all operations onto a single platform. A foundational module within the platform is the Master Data Management engine that resolved the client's data quality crisis across all three source systems.
The intelligent freight quotation engine integrates six live data sources to identify the optimal carrier and generate a data-backed base freight estimate β replacing a slow, error-prone, manual quotation process. Phase 1 core quotation engine is in production; Phase 2 full landed-cost calculation suite is in development.
EFutures delivery approach
Five-step methodology: from data chaos to governed data estate.
Data Profiling
Profiled all three source systems β first objective view of 38% duplication, null rates, and format inconsistencies across the data estate.
Intelligent Ingestion
OCR extracted, classified, and confidence-scored 14,000+ scanned shipping forms and vendor onboarding documents for structured staging.
Entity Resolution
Fuzzy matching combined with ML attribute comparisons collapsed partner name variants into single trusted golden records automatically.
Stewardship Workflows
Remaining exceptions routed to operations and finance domain stewards for controlled manual review β human judgment applied on edge cases only.
Sustainable Governance
KPIs embedded in operational dashboards. Ownership transitioned to internal business teams at go-live. No EFutures dependency after six months.
Solution architecture
Hybrid cloud data platform β Azure primary (70%), AWS for streaming, partner files, and legacy connectors (30%).
AWS β Streaming & Partner Files (30%)
- Amazon Kinesis β real-time freight event streaming from GPS/telematics vendors with native SDK integrations.
- Amazon S3 β external partner file drop landing zone; ADF moves files to ADLS Gen2 Bronze within 15 minutes.
- AWS Lambda β legacy carrier API connectors forwarding normalised events to Azure Service Bus.
Azure β Primary Data Platform (70%)
- ADLS Gen2 β Bronze, Silver, and Gold Medallion layers in Delta Lake format.
- Azure Databricks β primary Spark compute for ingestion, entity resolution, and ML training.
- Azure Synapse Analytics β Gold layer serving for Power BI DirectQuery.
- Microsoft Purview β governance, lineage, and sensitivity classification.
- Azure API Management β integration hub for all consuming systems and AI engines.
- AKS / Azure Container Apps β containerised AI microservices with auto-scaling.
Bronze β Raw Data Landing
Append-only immutable store. Provenance metadata at ingestion. OCR document pipeline for 14,000+ records. Real-time Kinesis events and batch ADF exports co-partitioned. Cross-border data segregation for LGPD compliance.
Silver β Cleansing & Entity Resolution
PySpark entity resolution with deterministic and fuzzy matching. dbt models with 2,400+ quality tests. Confidence-tier routing: HIGH auto-promotes, MEDIUM to stewards, LOW blocked. Duplicate rate reduced from 38% to under 3%.
Gold β Governed Master Data
Seven master data domains with global MDM IDs. Delta Lake time-travel for audit and ML training. Stewardship-approved records only. Synapse serverless SQL and Snowflake Data Sharing for partner access.
Customer
Carrier
Supplier
Port
Route
Container
Document
| Stage | Sources / Inputs | Tools / Engines | Output |
|---|---|---|---|
| 01 | TMS APIs, regional ERP exports, GPS/telematics streams, carrier rate feeds, port APIs, FX feeds, 14,000+ scanned documents. | Amazon Kinesis, Azure Data Factory, AWS Lambda, Azure AI Document Intelligence, Apache Airflow. | ADLS Gen2 Delta Lake Bronze layer with provenance metadata. Append-only, immutable. |
| 02 | All ingested records from Stage 01. Real-time events and batch records co-partitioned. | ADLS Gen2, Microsoft Purview schema registry, Terraform, schema validation at landing. | Immutable raw record store. Failed records quarantined. Available for unlimited reprocessing. |
| 03 | Bronze incremental delta, reference data tables, steward decision feedback. | Azure Databricks PySpark entity resolution, dbt models, Delta Lake ACID writes. | Delta Lake Silver tables, entity candidates with confidence scores, stewardship exception queue. |
| 04 | HIGH-confidence candidates, steward-approved MEDIUM/LOW, aggregation rules per domain. | Azure Synapse Analytics, Snowflake, Azure Data Factory approval gates, dbt Gold models. | Golden records per domain, global MDM IDs, KPI tables, AI training datasets via time-travel. |
| 05 | Gold layer golden records, AI inference requests, operational system sync. | Power BI Premium, Azure API Management, AKS / Azure Container Apps, Azure Service Bus, Snowflake Data Sharing. | Executive dashboards, AI engine outputs, golden record sync to regional TMS, partner carrier feeds. |
Miami, Florida β Global HQ
Primary operations hub. International freight coordination, finance consolidation, and executive reporting. Azure East US 2 primary; Amazon Kinesis and S3 in us-east-1 for streaming and partner file drops.
California β West Coast Operations
Pacific trade lane management. APAC carrier relationships and Port of Los Angeles / Long Beach scheduling. High-frequency container throughput requiring real-time event streaming.
Texas β Inland Logistics Hub
Cross-continental road and rail freight coordination. Mexico border crossing logistics and customs brokerage.
Brazil β South America Anchor
Largest South American operation. LGPD compliance required. Processed in Azure Brazil South; Amazon Kinesis sa-east-1 for real-time freight events.
Chile & Colombia β Regional Hubs
Andean corridor freight management. Pacific South America port connectivity. Spanish-language document processing via OCR pipeline.
Partner Networks β Peru, Ecuador, Argentina
Operational presence through partner logistics networks. Data ingested via file-based adapters and API connectors to the central Bronze landing zone.
Intelligence & optimisation modules
All seven modules built exclusively on the cleansed Gold layer. Every output traceable via Microsoft Purview lineage to source golden records.
Replaces the 2β4 hour manual quotation process with sub-60-second automated response. Integrates six live data sources against cleansed Carrier and Route golden records.
Core capabilities
- Optimal carrier identification across all active shipping lines.
- Routing cost from 15+ years of internal shipment history.
- Carrier reliability scoring from purchased shipping line databases.
- Port authority scheduling integration via Port golden records.
- Current carrier rate and surcharge feeds from 17 major shipping lines.
- Real-time FX rate feeds for multi-currency landed-cost calculation.
Gold layer consumed
Carrier, Route, and Port golden records; 15+ years shipment history for ML training.
ML / Analytics
Gradient boosting ensemble on historical landed-cost outcomes. Quarterly retraining on Gold time-travel snapshots. Response under 60 seconds.
Maximises container space utilisation by computing the optimal cargo loading plan. Utilisation improved from 68% to 84% within six months of deployment.
Core capabilities
- Three-dimensional load plan generation per container and vessel.
- Weight distribution optimisation within structural limits.
- Fragility, hazmat separation, and temperature constraint enforcement.
- Multi-container load balancing across an entire shipment programme.
- Container type selection matched to cargo profile and carrier specifications.
Gold layer consumed
Container, Route, and Shipment golden records; historical load plan outcomes.
ML / Analytics
Constraint satisfaction with genetic algorithm. Evaluates 10,000+ candidate configurations per request. Runtime under 90 seconds.
Identifies lowest-cost, lowest-risk routing across ocean, air, road, rail, and intermodal options with full cost, transit time, reliability, and carbon transparency.
Core capabilities
- Multi-modal route enumeration across all feasible paths.
- Carrier reliability and OTP scoring per route segment.
- Port congestion signal integration from Port golden records.
- Cost-per-TEU optimisation across all route segments.
- Carbon emissions scoring per route option (CSRD scope 3 readiness).
Gold layer consumed
Route, Port, and Carrier golden records with 15+ years of completed shipment history.
ML / Analytics
Multi-objective optimisation (Pareto across cost, time, reliability, carbon). Updated weekly from Gold layer performance data.
Per-shipment ETA with confidence interval, updated in real time as GPS events and port status updates arrive. ETA accuracy improved 34% vs prior carrier-provided ETAs.
Core capabilities
- Real-time ETA per active shipment updated on every status event.
- Confidence interval reporting (80% and 95% bands per prediction).
- Multi-segment ETA: origin port, transhipment, destination, final delivery.
- Disruption scenario modelling (port congestion, weather, vessel delays).
- Customer-facing ETA API for integration with client tracking portals.
Gold layer consumed
Shipment and Route golden records; Kinesis real-time stream for GPS and carrier milestones.
ML / Analytics
LSTM time-series trained on 15+ years of completed shipment event sequences. Monthly retraining on new completed shipments.
Recommends optimal quoted price balancing margin, competitive positioning, and capacity utilisation. Trained on 15+ years of quotation win/loss outcomes.
Core capabilities
- Win probability estimation at any proposed price point.
- Price band recommendation with expected margin at each point.
- Competitor pricing signal integration from nightly market rate feeds.
- Capacity utilisation-aware pricing when trade lane capacity is tight.
- Margin floor enforcement: no recommendation below configured minimum.
Gold layer consumed
Quotation history, Client, and Route golden records spanning 15+ years.
ML / Analytics
Reinforcement learning agent (Proximal Policy Optimisation). Reward: margin per won quote. Continuous retraining on new outcomes.
Scores every active shipment continuously across carrier reliability, customs clearance, route disruption, and weather/geopolitical risk. Enables proactive intervention before escalation.
Core capabilities
- Composite risk score (0β100) per active shipment updated every 4 hours.
- Four independent dimension scores with alert when composite exceeds 70/100.
- Proactive customer notification workflow for high-risk shipments.
- Rising score escalates even when below threshold.
Gold layer consumed
Shipment, Carrier, and Port golden records; Kinesis GPS stream; weather and geopolitical feeds.
ML / Analytics
XGBoost classifier trained on completed shipments labelled by delay outcome.
Analyses fuel consumption patterns across road and sea freight to identify cost reduction opportunities and BAF/CAF surcharge overcharging at route, carrier, and vessel level.
Core capabilities
- Lane-level fuel cost benchmark based on vessel efficiency and bunker price.
- Carrier fuel efficiency ranking by trade lane.
- BAF/CAF surcharge validation: actual charge vs bunker index benchmark.
- Road freight fuel consumption tracking by driver, route, and vehicle.
- Carbon intensity reporting per trade lane (CSRD scope 3 readiness).
Gold layer consumed
Carrier, Route, Bunker fuel price, and Shipment golden records.
ML / Analytics
Regression on fuel cost drivers. Power BI dashboard updated daily with carrier efficiency league table.
Shared module platform
All modules access Gold layer via Azure API Management REST endpoints. Each module issued a Managed Identity with minimum required read permission scoped to its Gold layer domains. Model retraining via Azure Synapse on Gold time-travel snapshots. All seven modules containerised on Azure AKS with Helm charts. Module outputs written to Gold Intelligence golden records via stewardship workflow β never directly by the module.
Technology stack
Data platform (2021β2023) and AI layer (2025) β deliberate hybrid architecture with no vendor lock-in post go-live.
Data Ingestion
OCR pipeline for 14,000+ scanned documents; AI attribute extraction and confidence scoring via Azure AI Document Intelligence.
Entity Resolution
Fuzzy matching + ML deduplication; automated golden record creation with stable global identifiers.
Freight Engine
Multi-source integration: internal history, carrier rate feeds, port authority data, FX APIs, market trend data.
Governance
Business-owned stewardship workflows; KPI-embedded operational dashboards; no vendor dependency post go-live.
2025 AI Layer
Azure AI Foundry, Azure OpenAI (GPT-4o), Azure Container Apps, Azure Machine Learning Model Registry, Microsoft Entra ID.
Infrastructure
Terraform IaC, GitHub Actions CI/CD, dbt on Databricks, Apache Airflow on AKS, Delta Lake time-travel.
Security, governance & compliance
Phase 1 design constraints shaping every architectural decision across seven countries.
Identity & access
- Microsoft Entra ID for all platform access with Managed Identities for service-to-service auth.
- AWS IAM federated to Entra ID via SAML 2.0 β no static IAM keys.
- No service has standing write access to Gold layer ADLS Gen2.
Data residency & compliance
- Brazil LGPD: customer data processed in Azure Brazil South.
- Colombian Ley 1581 with LGPD-equivalent controls.
- US CTPAT requirements for customs clearance data.
- OFAC sanctions screening with restricted named-individual access.
Immutable audit trail
- Azure Monitor Log Analytics with 7-year WORM retention.
- Microsoft Purview lineage graph β source to Gold provenance.
- Microsoft Sentinel SIEM for anomaly detection.
Data quality governance
- 2,400+ automated dbt quality tests across all seven domains.
- Data Ownership Register signed by domain leads before go-live.
- Governance ownership transitioned to business teams β zero EFutures dependency after 12 months.
Results & outcomes
Quantified impact across data quality, operations, and commercial accuracy β measured at 6 and 12 months post go-live.
Freight quotation
vs 2β4 hours manual process
Container utilisation
Within 6 months of deployment
ETA accuracy gain
vs prior carrier-provided ETAs
Reconciliation eliminated
Finance manual cross-system effort
Business-owned governance
Independent operation 12+ months post close
AI engines deployed
All consuming Gold layer golden records only
Project timeline
| Phase | Period | Key deliverables | Status |
|---|---|---|---|
| Phase 0 β Foundation | Q1 2021 (12 weeks) | Architecture design, hybrid cloud blueprint, Terraform IaC, data profiling sprint, 38% duplicate rate confirmed, governance operating model, CI/CD established. | Completed |
| Phase 1 β Bronze & Ingestion | Q2βQ3 2021 (20 weeks) | ADLS Gen2 Bronze live across 7 regions. Kinesis real-time pipeline. ADF batch ETL. OCR pipeline for 14,000+ documents. Purview schema registry populated. | Completed |
| Phase 2 β Silver & MDM | Q4 2021βQ2 2022 (28 weeks) | Entity resolution engine, dbt Silver models, 7 master data domains, stewardship queue live. Duplicate rate: 38% β 8% β 3% after steward resolution. | Completed |
| Phase 3 β Gold & Analytics | Q3 2022βQ1 2023 (28 weeks) | Synapse Gold warehouse, Snowflake secondary surface, Power BI dashboards, RLS per region, governance ownership transition to RapidLink operations. | Completed |
| Phase 5 β AI Sub-Modules | Q1βQ3 2025 | Seven AI engines on trusted Gold layer: Freight Quotation, Container Optimisation, Route Optimisation, Predictive ETA, Dynamic Pricing, Risk Scoring, Fuel Analytics. | Active |
Future scalability roadmap
- New region onboarding (Mexico, Argentina, Peru) β Terraform module and connector playbook, no core architecture change.
- EUDR and ESG data layer on Supplier and Route golden records.
- Real-time container track and trace B2B portal on existing Kinesis infrastructure.
- Autonomous customs pre-clearance agent from Shipment and Document golden records.
- Partner data exchange network via Snowflake Data Sharing pattern.
RapidLink owns all source code, infrastructure configuration, data in open-format storage, and AI model training code. The platform can be extended, scaled, or migrated by any competent data engineering team without EFutures involvement.
Build your AI-ready data foundation
EFutures designs governed data platforms first β then activates AI and optimisation capabilities on trusted golden records. Discuss your enterprise data and AI engineering programme.