Matt Coburn

AI startup founder and builder


I design and implement document → ontology → knowledge-graph systems with strict grounding, provenance, and evaluation invariants, and take them from design partners to revenue as a hands-on player-coach (70–80% coding).

Track record: Built AI platforms processing millions of documents for Fortune 500 clients. Led teams of 15+ engineers. Generated $2M+ revenue as technical co-founder. Shipped production ML to 2 of top 3 global automakers.

Aristotle — Founding Engineer & Tech Lead (80% hands-on / 20% management)

Los Angeles, CA | June 2024 – Present

Owned technical direction end-to-end and shipped a zero-to-one enterprise document intelligence platform into production for customers including 2 of the top 3 global automakers. Built the core backend, set engineering standards, and hired/led 4 engineers while staying deeply hands-on.

Core invariant: no model output can change system state—or reach a user—unless it is schema-valid, auditable, and traceable to source documents.

What I built:

Key contributions:

  • Ontology & entity system: Deterministic core entities (People, Orgs, Events, Relationships) in Pydantic + Postgres with strict identity semantics; extensible customer-defined schemas validated at ingest
  • Document → fact → provenance pipeline: Structured extraction with click-to-source spans/pages; ambiguity forced into explicit review flows
  • Grounded LLM UX: Streaming answers with verified vs pending states, citations for assertions/quotes, and HITL review tools
  • Agent runtime: Tool-using agents implemented as explicit state machines with boundary validation and retry/escalation semantics
  • Retrieval + eval gates: Hybrid BM25 + dense retrieval tuned on Precision@K / Recall@K; regression harnesses used as release gates
  • Engineering leadership (“golden path”): Established CI/CD, code review standards, release gates, observability, and production-readiness rituals. Personally built foundational scaffolding (auth, service patterns, shared model contracts), then used it to unblock and scale delivery cadence.

Impact: Enabled enterprise workflows that reduced manual document review, enforced grounding guarantees, and unblocked adoption in regulated environments.


WorkFusion — VP of Data Science (Applied AI & Document Intelligence)

New York, NY | 2023

Joined as a hands-on VP to ship applied AI in regulated financial environments, leading ~15 data scientists while staying hands-on in system design and core implementation.

  • Shipped KYC/AML systems (transaction monitoring, risk scoring, case prioritization) for major institutions; improved detection outcomes by ~35%.
  • Built LLM-based document extraction & auto-labeling for high-volume IDP workflows with strict governance
  • Designed security-first MLOps, evaluation, and audit practices for regulated production

Tangible Intelligence — Founder / CEO / Chief Data Scientist

Dallas, TX | Jan 2020 – Apr 2023 Technical co-founder and CEO of a document-intelligence startup; generated ~$2M enterprise revenue over two years.

  • Built a no-code HITL extraction platform with ontology-driven schemas, deterministic entity resolution, and click-to-source provenance
  • Built SafeScan, a real-time sensitive-data and anomaly detection system combining ML, rules, and search
  • Led SOC 2–aligned deployments; scaled and managed an 8-person engineering/data team

M Science (Jefferies subsidiary) — Principal Data Scientist, Founding Lead (Ontology Platform)

New York, NY | May 2018 – Dec 2019

Conceived, architected, and shipped an ontology-driven knowledge platform as an internal startup. Hired and led an independent team, shipped to production, and supported clients including BlackRock, Two Sigma, and Citadel.

  • Built an Ontology-as-a-Service layer modeling companies, securities, events, and relationships across public + proprietary data.
  • Designed entity resolution rules and relationship semantics prioritizing deterministic identity and reproducibility over heuristic joins.
  • Built an application layer that let clients apply shared ontology semantics to private datasets with isolation and fine-grained access control.

Expedia Group (Hotels.com) — Data Scientist (Search & Ads Optimization)

Dallas, TX | Oct 2016 – May 2018

Worked on large-scale search/ads systems with models and pipelines influencing $200M+ annual spend.

  • Built models estimating expected value of clicks across millions of keywords, enabling daily automated bid optimization.
  • Identified systemic underperformance across publisher inventory; reallocated spend, yielding ~240% ROI improvement on targeted segments.
  • Built revenue-critical production pipelines with monitoring and safeguards.

Education

University of Texas at Dallas — B.S. Electrical Engineering, 2013


Technical Notes

Core Stack: Python, SQL, TypeScript, C LLMs & NLP: PyTorch, HuggingFace, OpenAI/Anthropic APIs, RAG architectures, hybrid retrieval (BM25 + embeddings)
Backend: FastAPI, Pydantic, PostgreSQL, Redis, Docker, Kubernetes, AWS
Data & Infra: Pandas, NumPy, Spark, ETL/streaming pipelines, vector DBs
Specialties: Document intelligence, structured extraction, ontology design, AIUX, evaluation frameworks, production ML governance


Let's Build Something Extraordinary

I'm passionate about making AI work reliably in production with deterministic guarantees. Looking for Staff/Principal Engineering or Applied AI Leadership roles where I can leverage my experience building enterprise-grade ML systems.