WorldString IEI Lab

Actionable World Representation

Kunqi Xu1, Jitao Li3, Jianglong Ye2, Tianshu Tang1, Isabella Liu2, Sifei Liu4, Xueyan Zou1

Equal contribution

Building Neural Digital Twins for Real-World Objects

Real-world capture in the lab: RGB-D sensing and object-centric processing toward keypoint-conditioned WorldString training data.

WorldString Interactive visualization

Each row shows three panels, from left to right: input keypoints, learned token-colored shape, and error map. Drag to rotate and scroll to zoom; panning is disabled. The camera stays centered on the cloud or keypoints. Use Sync other panels to this view in the keypoint panel to copy its camera, frame, and play/pause state to the other two panels in the same row.

Error map: green — TP · red — FP · blue — FN (red/blue are softened toward greener tones in the panel for readability). Keypoints are colored by joint index (HSV) and drawn as shaded spheres; they use the same frame index as the adjacent shape panels.

Robot hand (Articulated objects)

Keypoints (input)

Loading…

Learned token assignment

Loading…

Error map vs. ground truth

Loading…

SMPL football motion (Skinning objects)

Keypoints (input)

Loading…

Learned token assignment

Loading…

Error map vs. ground truth

Loading…

Earphone (Soft objects)

Keypoints (input)

Loading…

Learned token assignment

Loading…

Error map vs. ground truth

Loading…

Training Process Visualization

Training process visualization for the Go2 and H1 models. Test the saved checkpoints on same keypoint states.

Go2 training process (point cloud)

Loading…

Go2 multi-pose

H1 training process (point cloud)

Loading…

H1 multi-pose

Local visualization based on checkpoints

Code here

3 mins Video Intro to WorldString