I am a Machine Learning Engineer at TikTok, focusing on large-scale video understanding and live recommendation system [View my CV ].
Previously, I earned my Ph.D. in Electrical and Computer Engineering from the Institute for Robotics and Intelligent Machines at Georgia Tech, under the guidance of Patricio A. Vela. I completed my B.E. in Automation at Southeast University in 2018, where I was advised by Wenze Shao and Yangang Wang.
In 2017, I worked as a research intern at the Applied Nonlinear Control Lab, University of Alberta, advised by Alan Lynch. Later, I joined the NVIDIA Learning and Perception Research group as a research intern from May 2020 to May 2021 and again from May 2022 to December 2022, under the mentorship of Stan Birchfield, while collaborating closely with Jonathan Tremblay, Stephen Tyree, and Bowen Wen. Most recently, I was a research intern with the Meta FAIR Accel Ego-HowTo team from May 2023 to November 2023, where I was advised by Kevin Liang and Matt Feiszli. My past research has focused on deep learning, computer vision, and robotics, with a particular emphasis on 3D perception and robotic system design.
Machine Learning Engineer, 08/2024 - Present
TikTok
Research Intern in FAIR Accel, 05/2023 - 11/2023
Meta
Research Intern in Learning and Perception Research Group, 05/2022 - 12/2022 & 05/2020 - 05/2021
NVIDIA
Ph.D. in Electrical and Computer Engineering, 2024
Georgia Institute of Technology
M.S. in Electrical and Computer Engineering, 2020
Georgia Institute of Technology
B.E. in Automation, 2018
Southeast University
A parallelized optimization method based on fast Neural Radiance Fields (NeRF) for estimating 6-DoF target poses.
A single-stage, category-level 6-DoF pose estimation algorithm that simultaneously detects and tracks instances of objects within a known category.
A single-stage, keypoint-based approach for category-level object pose estimation that operates on unknown object instances within a known category using a single RGB image as input.
A system for multi-level scene awareness for robotic manipulation, including three types of information: 1) a point cloud representation of all the surfaces in the scene, for the purpose of obstacle avoidance. 2) the rough pose of unknown objects from categories corresponding to primitive shapes (e.g., cuboids and cylinders), and 3) full 6-DoF pose of known objects.