WorldString IEI Lab

Actionable World Representation

Kunqi Xu^♦¹, Jitao Li³, Jianglong Ye², Tianshu Tang¹, Isabella Liu², Sifei Liu⁴, Xueyan Zou^♦¹

¹Tsinghua University — IEI Lab
²UC San Diego
³CalTech
⁴NVIDIA

^♦ Equal contribution

Building Neural Digital Twins for Real-World Objects

Real-world capture in the lab: RGB-D sensing and object-centric processing toward keypoint-conditioned WorldString training data.

WorldString Interactive visualization

Each row shows three panels, from left to right: input keypoints, learned token-colored shape, and error map. Drag to rotate and scroll to zoom; panning is disabled. The camera stays centered on the cloud or keypoints. Use Sync other panels to this view in the keypoint panel to copy its camera, frame, and play/pause state to the other two panels in the same row.

Error map: green — TP · red — FP · blue — FN (red/blue are softened toward greener tones in the panel for readability). Keypoints are colored by joint index (HSV) and drawn as shaded spheres; they use the same frame index as the adjacent shape panels.

Robot hand (Articulated objects)

Keypoints (input)

Loading…

Sync other panels to this view Frame 0/0

Learned token assignment

Loading…

Frame 0/0

Error map vs. ground truth

Loading…

Frame 0/0

SMPL football motion (Skinning objects)

Keypoints (input)

Loading…

Sync other panels to this view Frame 0/0

Learned token assignment

Loading…

Frame 0/0

Error map vs. ground truth

Loading…

Frame 0/0

Earphone (Soft objects)

Keypoints (input)

Loading…

Sync other panels to this view Frame 0/0

Learned token assignment

Loading…

Frame 0/0

Error map vs. ground truth

Loading…

Frame 0/0

Double stretch (soft / deformable)

Keypoints (input)

Loading…

Sync other panels to this view Frame 0/0

Learned token assignment

Loading…

Frame 0/0

Error map vs. ground truth

Loading…

Frame 0/0

Unitree Go2

Keypoints (input)

Loading…

Sync other panels to this view Frame 0/0

Learned token assignment

Loading…

Frame 0/0

Error map vs. ground truth

Loading…

Frame 0/0

Training Process Visualization

Training process visualization for the Go2 and H1 models. Test the saved checkpoints on same keypoint states.

Go2 training process (point cloud)

Loading…

Frame 0/0

Go2 multi-pose

H1 training process (point cloud)

Loading…

Frame 0/0

H1 multi-pose

Local visualization based on checkpoints

Code here

Actionable World Representation

Building Neural Digital Twins for Real-World Objects

WorldString Interactive visualization

Robot hand (Articulated objects)

Keypoints (input)

Learned token assignment

Error map vs. ground truth

SMPL football motion (Skinning objects)

Keypoints (input)

Learned token assignment

Error map vs. ground truth

Earphone (Soft objects)

Keypoints (input)

Learned token assignment

Error map vs. ground truth

Double stretch (soft / deformable)

Keypoints (input)

Learned token assignment

Error map vs. ground truth

Unitree Go2

Keypoints (input)

Learned token assignment

Error map vs. ground truth

Training Process Visualization

Go2 training process (point cloud)

Go2 multi-pose

H1 training process (point cloud)

H1 multi-pose

Local visualization based on checkpoints

3 mins Video Intro to WorldString