Cobot user frame calibration: a practical guide

Have you ever wondered why your cobot doesn’t accurately reach the positions it should, for example during a pallettizing operation?

In this study, conducted in collaboration with the Applied Mechanics group of the University of Brescia, we studied the repeatability performances of the vision-based RPS user frame calibration system of the Sawyer cobot and compared them with the performances obtained after calibrating the robot using traditional 3- and 5-points methods involving rigid markers.

The analysis showed that the RPS system is currently not robust enough to guarantee acceptable performances when the robot moves very distant from the landmark position due to the computation of local planes instead of a single plane considering several calibration points. 

Check out the paper to find out more about this study!

Hands-Free v2 to teleoperate robotic manipulators: three axis precise positioning study


The idea of this thesis project is to improve the already developed teleoperation system presented at Ubiquitous Robotics by implementing the z-axis control. In fact, the original system only performed a xy teleoperation allowing users to move the end-effector of the robot to the desired position determined by the index finger keypoint extracted by OpenPose. However, a more interesting application also integrates a precise z-axis control and a trajectory planner, which are the key improvements of this version as seen in Fig. 1.

Fig. 1 - Concept of Hands-Free v2. By analyzing the hand skeleton using OpenPose it is possible to extract the index finger position over time to build a complete trajectory. The points are interpolated by the ad-hoc interpolator and the final trajectory is sent to the ROS node of the Sawyer robot.


Due to the COVID-19 pandemic restrictions, this project has been carried out using ROS Gazebo to reproduce the laboratory set-up already seen for Hands-Free v1. The camera adopted is a consumer-end RGB camera that has been calibrated following the standard procedure described in Hands-Free v1 paper. The robot calibration has been similarly performed by setting up a simulated environment as shown in Fig. 2.

Fig. 2 - Example of the robot calibration procedure carried out into the simulated Gazebo environment.


The trajectory builder is an extension of the capabilities of the old version of the software, hence it only works in 2D considering the user frame (xy plane) and the vertical robot frame (zy plane). The procedure to use this application is the following and may be seen in the video below:

  1. Users place their hand open on the user frame in order to detect the “hand-open” gesture. This allows the system to reset the variables and move the robot to its home pose (Fig. 3)
  2. After the initialization phase, users may move their hand around the user frame performing the “index” gesture (with both thumb and index finger opened). The index finger position is extracted considering keypoint 8. Only positions which differ from the preceding one of at least 5 px are retained. Moreover, each position is extracted as the mean position detected over N consecutive frames. In this case, we set N = 3 (Fig. 4)
  3. The detected trajectory points are filtered and interpolated according to the ad-hoc interpolator developed. The resulting trajectory may be sent to the robot if the “move” gesture is performed (with index and middle fingers opened, see Fig. 5).
Fig. 3 - Example of the initialization phase performed by using the "hand-open" gesture.
Fig. 4 - Example of the definition of a trajectory by performing the "index" gesture. The trajectory points are saved and filtered according to their position with respect to the preceding point.
Fig. 5 - Example of the launch of the interpolated trajectory, performed by using the "move" gesture.


To control the manipulator along the z-axis (which in this set-up corresponds to the robot’s x-axis) three different modalities have been studied and implemented. For now, however, the depth control is still separate from the trajectory planner.


Intuitively, by using this mode the robot may be moved home (h), forward (w), or backward (s) according to the pressed key of the keyboard. The stepsize of its movement is fixed. By pressing ctrl+c or esc it is possible to exit the modality and close the communication with the robot.


In this modality, the core functions of Hands-Free are retained in order to detect the hand gestures. However, in this case, only the “hand-open” gesture and the “move” gesture are detected. By checking the mutual distance between the two fingers of the “move” gesture it is possible to detect if the robot should move forward (small distance up to zero) or backward (higher distance, corresponding to the original “move” gesture with the two fingers quite separate from each other). An example of this modality may be seen in the video below.


The last modality implements depth control by leveraging the Vicara Kai wearable sensor. The sensor should be wear on the hand and, according to the detected orientation of the opened hand, it is possible to determine if the robot should stay still (hand parallel to the ground), move forward (hand tilted down), or backward (hand tilted up).


C. Nuzzi, S. Ghidini, R. Pagani, S. Pasinetti, G. Coffetti and G. Sansoni, “Hands-Free: a robot augmented reality teleoperation system,” 2020 17th International Conference on Ubiquitous Robots (UR), Kyoto, Japan, 2020, pp. 617-624, doi: 10.1109/UR49135.2020.9144841.

Analysis and development of a teleoperation system for cobots based on vision systems and hand-gesture recognition


The idea behind this thesis work is to make a first step towards the development of a vision-based system to teleoperate cobots in real-time using the tip of the user’s hand.

This has been experimentally done by developing a ROS-based program that simultaneously (i) analyzes the hand-gesture performed by the user leveraging OpenPose skeletonization algorithm and (ii) moves the robot accordingly.


After the Kinect v2 sensor has been intrinsecally calibrated to perfectly align the depth data to the RGB images, it is necessary to calibrate the workspace, hence establishing a user frame reference system in order to convert from image pixels to meters and vice-versa.
This has been done by adopting the vision library OpenCV. By using its functions it has been possible to detect automatically the master’s markers and assigning to each of them the corresponding coordinates in the user reference system. Hence, given the couples of points in the user frame and in the camera frame, it has been possible to estimate the calibration matrix M by solving the linear system using the least squares method.
In Fig. 1 the experimental set-up of the camera-user frame portion of the project is presented.
Fig. 1 - Experimental set-up. The horizontal user frame is viewed by a Kinect v2 camera mounted at fixed height.


First test

In this test a rectangular object of shape 88 x 41 x 25 mm has been positioned in correspondence of 8 measure points of the set-up as shown in Fig. 2 by placing its bottom-left corner in the measure point. 

The measured position of the object in each point has been calculated by applying the conversion from pixels to meters developed before. Hence, it has been possible to estimate the positional error as the difference between the measured position and the real position of the bottom-left corner of the object in each pose.
From the results of this analysis it emerged that the average displacement for the two coordinates is equal to 6.0 mm for the x-axis and 5.8 mm for the y-axis. These two errors are probably caused by the prospectic distortions of the lenses and by the scene illumination. In fact, since the calibration performed did not consider the lens distortions, their effect affects the measurements especially around the corners of the image. Moreover, the height of the object casts shadows on the set-up that introduce errors in the corner detection due to the mutual position of the light and the object.


Similarly to the first test, in this one a rectangular object of shape 73 x 48 x 0.5 mm has been positioned in correspondence of the 8 measure points.

In this case the average displacement observed is equal to 6.4 mm for the x-axis and 6.5 mm for the y-axis. This highlighted how the prospectic distortions heavily affect the measurements: in fact, since the height of the object is not enough to cast shadows on the plane, these errors are only due to the lens distortions.

Fig. 2 - Measure points of the calibration master adopted. The reference system is centered around the bottom-left marker (marker 0) of the target.


Since the system adoperates OpenPose to extract in real-time the hand skeleton, it has been necessary to define three hand-gestures to detect according to the position of the keypoints (see Fig. 3, Fig. 4 and Fig. 5).

However, since OpenPose estimates the hand keypoints even if they are not present, it has been necessary to define a filtering procedure to determine the output gesture according to some geometrical references:

  • the thumb must be present to correctly assign the keypoints numbers
  • the distance between the start and end keypoint of the thumb must be between 20 and 50 mm
  • the angle between the thumb and the x-axis must be > 90°
  • the distance between the start and end keypoints of index, middle and ring fingers must be between 20 and 100 mm
  • the distance between the start and end keypoint of the pinky finger must be between 20 and 70 mm
Moreover, the acquisition window has been set to 3 s in order to take into account the time intervals between a change of hand gesture or a movement.
Fig. 3 - Hand-gesture of the first sub-task named "positioning".
Fig. 4 - Hand-gesture of the second sub-task named "picking".
Fig. 5 - Hand-gesture of the third sub-task named "release".


The proposed gestures have been performed by three male students with pale skin in different moments of the day (morning, afternoon, late afternoon). The purpose of this test was to determine if the system was able to robustly detect the gestures also considering the illumination of the scene.

The students moved their hand around the user frame and performed the gesture one at a time. I resulted that, on average, the proposed gestures were recognized 90% of the times.

It is worth noting that gesture “positioning” was defined in such a way to reduce the misclassification of the keypoints that could happen in some cases due to the presence of only one finger (the index). In fact, it has been observed that incrementing the number of fingers clearly visible in the scene also incremented the recognition accuracy of the gesture. This is probably due to the fact that the thumb must be present to avoid misclassifications with the index finger. However, even if the “positioning” gesture adopts three fingers, only the position of the index finger’s tip (keypoint 8) is used to estimate the position to which the user is pointing to.


The complete system is composed by (i) the gesture recognition ROS node that detects the gesture and (ii) a robot node to properly move the Sawyer cobot accordingly. Hence, since the robot workspace is vertical (as shown in Fig. 6) it has been necessary to properly calibrate the vertical workspace with respect to the robot user frame. This has been done using a centering tool to build the calibration matrix (adopting points couples of robot frame coordinates – vertical workspace coordinates).

Fig. 6 - Complete set-up developed, showing the two workspaces (horizontal and vertical).

Project Hands-Free presented at Ubiquitous Robotics 2020

Hands-Free is a ROS-based software to teleoperate a robot with the user hand. The skeleton of the hand is extracted by using OpenPose and the position of the user’s index finger in the user workspace is mapped to the corresponding robot position in the robot workspace.

The project is available on GitHub and the paper has been published in the Ubiquitous Robotics 2020 virtual conference proceedings.

Check out the presentation video below!