Richard Luo

Machine Learning Engineer | Agentic AI

Richard Luo, Machine Learning Engineer

About Me


Hii! I'm Richard, a ML Engineer at T-Mobile as part of the Innovation Pod for the IntentCX program. I help translate cutting-edge research into production-ready ML systems, and I work directly with OpenAI engineers to build large-scale agentic AI systems. I hold a MS in Computer Science from Georgia Tech with a concentration in Machine Learning. Outside of work, I'm really into music, basketball, and figure skating! If you're interested in working with me, feel free to shoot me a message!

For fun, check out my photography insta here !

Work Experience

T-Mobile

Machine Learning Engineer

IntentCX – Innovation pod

October 2025 - Present

TikTok

Software Engineer

Data Trust and Safety - Video Safety

June 2024 - October 2025

Amazon Web Services (AWS)

Software Engineer Intern

CloudFront Platforms Team

May 2023 - August 2023

Redesigned CloudFront's worldwide host/server inventory system to be more resilient, performant, and scalable across tens of thousand worldwide locations. Improved critical endpoint speed by 47% using concurrency to eliminate timeout issues and increase service availability and host recovery abilities. Eliminated throttling issues by reducing AWS API calls by 50%. Improved service visibility for operators by creating a dashboard that monitors error rates, CloudWatch alarms, and other key metrics

Projects

Publications

Shotluck Holmes video captioning model architecture diagram
Paper ACM Multimedia — 2025

Shotluck Holmes

Outperformed prior Shot2Story captioning/summarization baselines with a smaller, compute-efficient shot-aware LLVM. ACM Multimedia 2025.

Multimodal LLMsVideo CaptioningVideo SummarizationPretrainingData Curation
  • Shot-level modeling improves narrative coherence across long videos.
  • Improved pretraining + data collection to extend small LLVMs from images to frame sequences.
  • Strong Shot2Story results without scaling to massive models.
  • Focuses on shot-by-shot semantics that many video approaches overlook.
  • Benchmarked on Shot2Story to test linking events across shots into a coherent story.
Joint Moment Retrieval transformer architecture diagram
Paper arXiv — 2023

Joint Moment Retrieval & Highlight Detection

One multimodal transformer that retrieves the exact moments and detects the best highlights from a natural-language query—search and summarization in one pass. arXiv 2023.

Multimodal TransformersVideo RetrievalHighlight DetectionAudio-VisualNLP
  • Joint formulation for query-based moment retrieval + highlight detection.
  • Uses both visual and audio cues to align relevance and “interestingness” with user intent.
  • Transformer-style encoder–decoder design inspired by ViT techniques.
  • Targets long-form videos where both precision (timestamps) and highlights matter.
  • Outputs both retrieved moments and a compact highlight set for quick viewing.

Builds

HokusAI AI-generated artwork example
Hackathon Horizons Hackathon (1st Place) — 2021

HokusAI

1st-place hackathon winner: a WebXR text-to-art experience that turns prompts into generated artwork using CLIP + VQGAN. Deployed end-to-end with Firebase and a Colab GPU backend.

ReactWebXRFirebaseCLIPVQGANGoogle ColabHeroku
  • React + WebXR frontend for VR, hosted on Heroku.
  • Firebase Auth + Firestore for accounts and a request queue.
  • Colab GPU workers poll requests, run generation, and upload results.
  • Built under hackathon constraints (no cloud GPU credits) with a scrappy but reliable pipeline.
  • Split into separate web and server repos to move fast and ship.
Sketch2Drawings before and after colorization comparison
Project 2021

Sketch2Drawings

Transforms black-and-white sketches into colorized drawings with paired image-to-image translation using cGANs. Canny-edge preprocessing creates clean training pairs for consistent outputs.

cGANImage-to-ImageCanny EdgeComputer Vision
  • Paired cGAN translation from sketch edges to colorized drawings.
  • Canny edge detection to generate aligned sketch/color training pairs.
  • Plausible, consistent colorization from rough inputs.
  • Conditional generation preserves structure while hallucinating textures/colors.
  • Compact pipeline: preprocess → train cGAN → infer colorized outputs.
Sakana character

Drag me around!