EFUTURESCFO AI Engineering

Case Study β€’ Logistics & Supply Chain MDM

Logistics & Supply Chain
MDM (Multi-Regional)

Transforming fragmented multi-system data into a trusted, governed operational estate β€” 15+ year engagement across seven regional locations.

7 AI & Optimisation Engines
Client Rapid Link
Industry Logistics & Supply Chain
Location Miami β€’ California β€’ Texas β€’ Brazil β€’ Chile β€’ Colombia
Engagement Data Platform 2021–2023 β€’ AI Sub-Modules 2025 β€’ Hybrid Cloud Azure + AWS
Confidential

Executive summary

A two-phase programme: Phase 1 (2021–2023) delivered a unified enterprise data platform; Phase 2 (2025) deployed seven AI and optimisation engines on the trusted Gold layer. Data foundation first, AI second.

27

Month delivery

Q1 2021–Q3 2023 data platform engagement

7

Regional locations

Across two continents with unified governance

38β†’3%

Duplicate rate

Entity duplicate rate eliminated at go-live

14K+

Documents ingested

OCR pipeline into governed data estate

Client overview

Rapid Link is a Miami-based logistics and warehouse management company and a recognized industry leader in end-to-end supply chain operations. The client manages international freight movements across multiple global trade lanes, warehousing of perishable and non-perishable goods, and serves a wide client base demanding speed, accuracy, and full supply chain visibility. EFutures has partnered with the client since 2007 β€” a 15+ year engagement that began with a full platform replacement and has expanded to encompass advanced data intelligence capabilities.

The challenge

Fragmented systems, multi-regional data chaos, and zero operational visibility when the engagement began.

Fragmented Systems

Customer, supplier, and shipment data split across TMS, CRM, and warehouse platforms with no shared identity model and no single trusted view.

Document Records

14,000+ document-based records β€” scanned PDFs and Excel onboarding files β€” entirely outside any structured data pipeline.

Duplicate Entities

38% duplicate rate across partner and customer entities, silently corrupting operational reporting and completely unknown to the business.

Manual Reconciliation

Finance and operations teams spending significant capacity on manual cross-system reconciliation every reporting cycle.

Shadow Spreadsheets

Regional dashboards distrusted by managers, leading to offline shadow spreadsheets and unreliable shipment performance reporting.

No External Connectivity

No integration to carrier databases, port authority scheduling feeds, or real-time market pricing data for freight quotation.

The solution

EFutures designed and built a full Logistics and Warehouse Management platform from the ground up, replacing the legacy system with a scalable, modular solution that unified all operations onto a single platform. A foundational module within the platform is the Master Data Management engine that resolved the client's data quality crisis across all three source systems.

The intelligent freight quotation engine integrates six live data sources to identify the optimal carrier and generate a data-backed base freight estimate β€” replacing a slow, error-prone, manual quotation process. Phase 1 core quotation engine is in production; Phase 2 full landed-cost calculation suite is in development.

Duplicate Rate
38%
<3%
Documents in Trusted Pipeline
0
14,000+
Manual Reconciliation
High
Eliminated

EFutures delivery approach

Five-step methodology: from data chaos to governed data estate.

01

Data Profiling

Profiled all three source systems β€” first objective view of 38% duplication, null rates, and format inconsistencies across the data estate.

02

Intelligent Ingestion

OCR extracted, classified, and confidence-scored 14,000+ scanned shipping forms and vendor onboarding documents for structured staging.

03

Entity Resolution

Fuzzy matching combined with ML attribute comparisons collapsed partner name variants into single trusted golden records automatically.

04

Stewardship Workflows

Remaining exceptions routed to operations and finance domain stewards for controlled manual review β€” human judgment applied on edge cases only.

05

Sustainable Governance

KPIs embedded in operational dashboards. Ownership transitioned to internal business teams at go-live. No EFutures dependency after six months.

Solution architecture

Hybrid cloud data platform β€” Azure primary (70%), AWS for streaming, partner files, and legacy connectors (30%).

AWS β€” Streaming & Partner Files (30%)

  • Amazon Kinesis β€” real-time freight event streaming from GPS/telematics vendors with native SDK integrations.
  • Amazon S3 β€” external partner file drop landing zone; ADF moves files to ADLS Gen2 Bronze within 15 minutes.
  • AWS Lambda β€” legacy carrier API connectors forwarding normalised events to Azure Service Bus.

Azure β€” Primary Data Platform (70%)

  • ADLS Gen2 β€” Bronze, Silver, and Gold Medallion layers in Delta Lake format.
  • Azure Databricks β€” primary Spark compute for ingestion, entity resolution, and ML training.
  • Azure Synapse Analytics β€” Gold layer serving for Power BI DirectQuery.
  • Microsoft Purview β€” governance, lineage, and sensitivity classification.
  • Azure API Management β€” integration hub for all consuming systems and AI engines.
  • AKS / Azure Container Apps β€” containerised AI microservices with auto-scaling.

Bronze β€” Raw Data Landing

Append-only immutable store. Provenance metadata at ingestion. OCR document pipeline for 14,000+ records. Real-time Kinesis events and batch ADF exports co-partitioned. Cross-border data segregation for LGPD compliance.

Silver β€” Cleansing & Entity Resolution

PySpark entity resolution with deterministic and fuzzy matching. dbt models with 2,400+ quality tests. Confidence-tier routing: HIGH auto-promotes, MEDIUM to stewards, LOW blocked. Duplicate rate reduced from 38% to under 3%.

Gold β€” Governed Master Data

Seven master data domains with global MDM IDs. Delta Lake time-travel for audit and ML training. Stewardship-approved records only. Synapse serverless SQL and Snowflake Data Sharing for partner access.

Customer

Carrier

Supplier

Port

Route

Container

Document

Stage Sources / Inputs Tools / Engines Output
01 TMS APIs, regional ERP exports, GPS/telematics streams, carrier rate feeds, port APIs, FX feeds, 14,000+ scanned documents. Amazon Kinesis, Azure Data Factory, AWS Lambda, Azure AI Document Intelligence, Apache Airflow. ADLS Gen2 Delta Lake Bronze layer with provenance metadata. Append-only, immutable.
02 All ingested records from Stage 01. Real-time events and batch records co-partitioned. ADLS Gen2, Microsoft Purview schema registry, Terraform, schema validation at landing. Immutable raw record store. Failed records quarantined. Available for unlimited reprocessing.
03 Bronze incremental delta, reference data tables, steward decision feedback. Azure Databricks PySpark entity resolution, dbt models, Delta Lake ACID writes. Delta Lake Silver tables, entity candidates with confidence scores, stewardship exception queue.
04 HIGH-confidence candidates, steward-approved MEDIUM/LOW, aggregation rules per domain. Azure Synapse Analytics, Snowflake, Azure Data Factory approval gates, dbt Gold models. Golden records per domain, global MDM IDs, KPI tables, AI training datasets via time-travel.
05 Gold layer golden records, AI inference requests, operational system sync. Power BI Premium, Azure API Management, AKS / Azure Container Apps, Azure Service Bus, Snowflake Data Sharing. Executive dashboards, AI engine outputs, golden record sync to regional TMS, partner carrier feeds.

Miami, Florida β€” Global HQ

Primary operations hub. International freight coordination, finance consolidation, and executive reporting. Azure East US 2 primary; Amazon Kinesis and S3 in us-east-1 for streaming and partner file drops.

California β€” West Coast Operations

Pacific trade lane management. APAC carrier relationships and Port of Los Angeles / Long Beach scheduling. High-frequency container throughput requiring real-time event streaming.

Texas β€” Inland Logistics Hub

Cross-continental road and rail freight coordination. Mexico border crossing logistics and customs brokerage.

Brazil β€” South America Anchor

Largest South American operation. LGPD compliance required. Processed in Azure Brazil South; Amazon Kinesis sa-east-1 for real-time freight events.

Chile & Colombia β€” Regional Hubs

Andean corridor freight management. Pacific South America port connectivity. Spanish-language document processing via OCR pipeline.

Partner Networks β€” Peru, Ecuador, Argentina

Operational presence through partner logistics networks. Data ingested via file-based adapters and API connectors to the central Bronze landing zone.

Intelligence & optimisation modules

All seven modules built exclusively on the cleansed Gold layer. Every output traceable via Microsoft Purview lineage to source golden records.

Replaces the 2–4 hour manual quotation process with sub-60-second automated response. Integrates six live data sources against cleansed Carrier and Route golden records.

Core capabilities

  • Optimal carrier identification across all active shipping lines.
  • Routing cost from 15+ years of internal shipment history.
  • Carrier reliability scoring from purchased shipping line databases.
  • Port authority scheduling integration via Port golden records.
  • Current carrier rate and surcharge feeds from 17 major shipping lines.
  • Real-time FX rate feeds for multi-currency landed-cost calculation.

Gold layer consumed

Carrier, Route, and Port golden records; 15+ years shipment history for ML training.

ML / Analytics

Gradient boosting ensemble on historical landed-cost outcomes. Quarterly retraining on Gold time-travel snapshots. Response under 60 seconds.

Shared module platform

All modules access Gold layer via Azure API Management REST endpoints. Each module issued a Managed Identity with minimum required read permission scoped to its Gold layer domains. Model retraining via Azure Synapse on Gold time-travel snapshots. All seven modules containerised on Azure AKS with Helm charts. Module outputs written to Gold Intelligence golden records via stewardship workflow β€” never directly by the module.

Technology stack

Data platform (2021–2023) and AI layer (2025) β€” deliberate hybrid architecture with no vendor lock-in post go-live.

Data Ingestion

OCR pipeline for 14,000+ scanned documents; AI attribute extraction and confidence scoring via Azure AI Document Intelligence.

Entity Resolution

Fuzzy matching + ML deduplication; automated golden record creation with stable global identifiers.

Freight Engine

Multi-source integration: internal history, carrier rate feeds, port authority data, FX APIs, market trend data.

Governance

Business-owned stewardship workflows; KPI-embedded operational dashboards; no vendor dependency post go-live.

2025 AI Layer

Azure AI Foundry, Azure OpenAI (GPT-4o), Azure Container Apps, Azure Machine Learning Model Registry, Microsoft Entra ID.

Infrastructure

Terraform IaC, GitHub Actions CI/CD, dbt on Databricks, Apache Airflow on AKS, Delta Lake time-travel.

Security, governance & compliance

Phase 1 design constraints shaping every architectural decision across seven countries.

Identity & access

  • Microsoft Entra ID for all platform access with Managed Identities for service-to-service auth.
  • AWS IAM federated to Entra ID via SAML 2.0 β€” no static IAM keys.
  • No service has standing write access to Gold layer ADLS Gen2.

Data residency & compliance

  • Brazil LGPD: customer data processed in Azure Brazil South.
  • Colombian Ley 1581 with LGPD-equivalent controls.
  • US CTPAT requirements for customs clearance data.
  • OFAC sanctions screening with restricted named-individual access.

Immutable audit trail

  • Azure Monitor Log Analytics with 7-year WORM retention.
  • Microsoft Purview lineage graph β€” source to Gold provenance.
  • Microsoft Sentinel SIEM for anomaly detection.

Data quality governance

  • 2,400+ automated dbt quality tests across all seven domains.
  • Data Ownership Register signed by domain leads before go-live.
  • Governance ownership transitioned to business teams β€” zero EFutures dependency after 12 months.

Results & outcomes

Quantified impact across data quality, operations, and commercial accuracy β€” measured at 6 and 12 months post go-live.

<60s

Freight quotation

vs 2–4 hours manual process

68β†’84%

Container utilisation

Within 6 months of deployment

34%

ETA accuracy gain

vs prior carrier-provided ETAs

100%

Reconciliation eliminated

Finance manual cross-system effort

100%

Business-owned governance

Independent operation 12+ months post close

7

AI engines deployed

All consuming Gold layer golden records only

Project timeline

Phase Period Key deliverables Status
Phase 0 β€” Foundation Q1 2021 (12 weeks) Architecture design, hybrid cloud blueprint, Terraform IaC, data profiling sprint, 38% duplicate rate confirmed, governance operating model, CI/CD established. Completed
Phase 1 β€” Bronze & Ingestion Q2–Q3 2021 (20 weeks) ADLS Gen2 Bronze live across 7 regions. Kinesis real-time pipeline. ADF batch ETL. OCR pipeline for 14,000+ documents. Purview schema registry populated. Completed
Phase 2 β€” Silver & MDM Q4 2021–Q2 2022 (28 weeks) Entity resolution engine, dbt Silver models, 7 master data domains, stewardship queue live. Duplicate rate: 38% β†’ 8% β†’ 3% after steward resolution. Completed
Phase 3 β€” Gold & Analytics Q3 2022–Q1 2023 (28 weeks) Synapse Gold warehouse, Snowflake secondary surface, Power BI dashboards, RLS per region, governance ownership transition to RapidLink operations. Completed
Phase 5 β€” AI Sub-Modules Q1–Q3 2025 Seven AI engines on trusted Gold layer: Freight Quotation, Container Optimisation, Route Optimisation, Predictive ETA, Dynamic Pricing, Risk Scoring, Fuel Analytics. Active

Future scalability roadmap

  • New region onboarding (Mexico, Argentina, Peru) β€” Terraform module and connector playbook, no core architecture change.
  • EUDR and ESG data layer on Supplier and Route golden records.
  • Real-time container track and trace B2B portal on existing Kinesis infrastructure.
  • Autonomous customs pre-clearance agent from Shipment and Document golden records.
  • Partner data exchange network via Snowflake Data Sharing pattern.

RapidLink owns all source code, infrastructure configuration, data in open-format storage, and AI model training code. The platform can be extended, scaled, or migrated by any competent data engineering team without EFutures involvement.

Build your AI-ready data foundation

EFutures designs governed data platforms first β€” then activates AI and optimisation capabilities on trusted golden records. Discuss your enterprise data and AI engineering programme.

Welcome Back

Access your practitioner frameworks and tools.

Everything Included
  • βœ“ Master Classes β€” 15 series, 255 parts
  • βœ“ Platinum Deep Dive β€” 17 series
  • βœ“ Workshops β€” 06 sessions
  • βœ“ Business Rivalries β€” 30+ narratives
  • βœ“ Videos β€” 180+ videos
  • βœ“ Free Toolkits β€” 40+ downloads
  • βœ“ Excel Templates β€” 30 Templates
Login to Unlock Full Access β€” View all premium content anytime, anywhere. Plus, download Free Toolkits and Excel Models instantly.
Single Plan

Join the Network

6 month free registration. No credit card required

Loading document…