College
College of Engineering and Polymer Science
Date of Last Revision
2026-04-28 12:34:10
Major
Mechanical Engineering
Honors Course
MECE 471:003
Number of Credits
2
Degree Name
Bachelor of Science
Date of Expected Graduation
Spring 2026
Abstract
This report presents the design, development, and deployment of a teleoperated robotic arm system capable of vision-based imitation learning. The system uses a leader-follower architecture with myArm C650 and myArm M750 robots manufactured by Elephant Robotics. The leader arm movement is controlled by the operator, whereas the follower performs actions acquired from demonstrations. Data collection included synchronized joint-angle recording and multi- camera video streams capturing human demonstrations.
These demonstrations were processed with a deep learning imitation algorithm called Action Chunking with Transformers (ACT). The main capability of this algorithm is its ability to make future action predictions based solely on visual and proprioceptive data due to a Conditional Variational Autoencoder-based representation of the distribution of multi-step action trajectories.
The first stage of implementation implied adaptation of the original version of the ALOHA implementation proposed by Stanford University researchers. Even though the training procedure led to successful convergence of the algorithm, deploying the action generation inference algorithm turned out impossible due to compatibility problems between the format of the dataset and the format required for robot interaction via pymycobot. As a result, after analysis of this issue, the implementation pipeline was changed to the official LeRobot implementation by the Hugging Face framework, which guarantees temporal alignment of the dataset along with universal interaction with the model at all stages of the workflow.
The system was designed for two particular tasks. The first one was a straightforward pick-and- place task which involved training the system on 80,000 steps with 55 demonstrations for validation purposes. The second task and main application of Senior Design was peg-and-hole insertion. We used 100,000 steps and 76 demonstrations in training the system for Task 2. In the course of learning, the total loss decreased from 4.5 to 0.04 in 100,000 steps. Particularly, the L1 reconstruction loss decreased from 0.75 to 0.025 and no signs of overfitting were detected in our experiments. The autonomous Task 2 policy achieved a success rate of 85% (26 out of 30) within 19 seconds on average for each trial. One of the most interesting things about the model was the fact that it learned the workspace itself. Every training demonstration was performed in a blank environment with only a peg and a hole in the scene. Still, when random distractor objects were introduced to the scene in the process of testing, the robot managed to self-correct its motion during this process by avoiding incorrect objects and finding the right one.
Therefore, the robot learned the recognition of different objects, instead of memorizing their coordinates. Thus, our results proved that vision-based imitation learning was possible even on relatively affordable, single-armed equipment. Moreover, ACT policies showed generalization ability that was not predefined in the process of training.
Research Sponsor
Yalin Dong
First Reader
Jutta Luettmer-Strathmann
Second Reader
Manigandan Kannan
Honors Faculty Advisor
Scott Sawyer
Proprietary and/or Confidential Information
No
Community Engaged Scholarship
No
Recommended Citation
Emmanuel, Favour E. and Donahue, Hunter J., "Teleoperated Robotic Arm with Vision-Based Imitation Learning" (2026). Williams Honors College, Honors Research Projects. 2136.
https://ideaexchange.uakron.edu/honors_research_projects/2136
Included in
Acoustics, Dynamics, and Controls Commons, Computer-Aided Engineering and Design Commons, Other Mechanical Engineering Commons