Richard Luo

Machine Learning Engineer | Agentic AI

About Me

Hii! I'm Richard, a ML Engineer at T-Mobile as part of the Innovation Pod for the IntentCX program. I help translate cutting-edge research into production-ready ML systems, and I work directly with OpenAI engineers to build large-scale agentic AI systems. I hold a MS in Computer Science from Georgia Tech with a concentration in Machine Learning. Outside of work, I'm really into music, basketball, and figure skating! If you're interested in working with me, feel free to shoot me a message!

For fun, check out my photography insta here !

Work Experience

T-Mobile

Machine Learning Engineer

IntentCX – Innovation pod

October 2025 - Present

TikTok

Software Engineer

Data Trust and Safety - Video Safety

June 2024 - October 2025

Amazon Web Services (AWS)

Software Engineer Intern

CloudFront Platforms Team

May 2023 - August 2023

Redesigned CloudFront's worldwide host/server inventory system to be more resilient, performant, and scalable across tens of thousand worldwide locations. Improved critical endpoint speed by 47% using concurrency to eliminate timeout issues and increase service availability and host recovery abilities. Eliminated throttling issues by reducing AWS API calls by 50%. Improved service visibility for operators by creating a dashboard that monitors error rates, CloudWatch alarms, and other key metrics

Projects

Publications

Paper ACM Multimedia — 2025

Shotluck Holmes

Outperformed prior Shot2Story captioning/summarization baselines with a smaller, compute-efficient shot-aware LLVM. ACM Multimedia 2025.

Multimodal LLMsVideo CaptioningVideo SummarizationPretrainingData Curation

Shot-level modeling improves narrative coherence across long videos.
Improved pretraining + data collection to extend small LLVMs from images to frame sequences.
Strong Shot2Story results without scaling to massive models.
Focuses on shot-by-shot semantics that many video approaches overlook.
Benchmarked on Shot2Story to test linking events across shots into a coherent story.

Paper Code

Joint Moment Retrieval transformer architecture diagram

Paper arXiv — 2023

Joint Moment Retrieval & Highlight Detection

One multimodal transformer that retrieves the exact moments and detects the best highlights from a natural-language query—search and summarization in one pass. arXiv 2023.

Multimodal TransformersVideo RetrievalHighlight DetectionAudio-VisualNLP

Joint formulation for query-based moment retrieval + highlight detection.
Uses both visual and audio cues to align relevance and “interestingness” with user intent.
Transformer-style encoder–decoder design inspired by ViT techniques.
Targets long-form videos where both precision (timestamps) and highlights matter.
Outputs both retrieved moments and a compact highlight set for quick viewing.

Paper Code

Builds

Hackathon Horizons Hackathon (1st Place) — 2021

HokusAI

1st-place hackathon winner: a WebXR text-to-art experience that turns prompts into generated artwork using CLIP + VQGAN. Deployed end-to-end with Firebase and a Colab GPU backend.

ReactWebXRFirebaseCLIPVQGANGoogle ColabHeroku

React + WebXR frontend for VR, hosted on Heroku.
Firebase Auth + Firestore for accounts and a request queue.
Colab GPU workers poll requests, run generation, and upload results.
Built under hackathon constraints (no cloud GPU credits) with a scrappy but reliable pipeline.
Split into separate web and server repos to move fast and ship.

Devpost Web repo Server repo

Project 2021

Sketch2Drawings

Transforms black-and-white sketches into colorized drawings with paired image-to-image translation using cGANs. Canny-edge preprocessing creates clean training pairs for consistent outputs.

cGANImage-to-ImageCanny EdgeComputer Vision

Paired cGAN translation from sketch edges to colorized drawings.
Canny edge detection to generate aligned sketch/color training pairs.
Plausible, consistent colorization from rough inputs.
Conditional generation preserves structure while hallucinating textures/colors.
Compact pipeline: preprocess → train cGAN → infer colorized outputs.

Code