Real-Time robot command system based on hand gestures recognition

With the Industry 4.0 paradigm, the industrial world has faced a technological revolution. Manufacturing environments in particular are required to be smart and integrate automatic processes and robots in the production plant. To achieve this smart manufacturing it is necessary to re-think the production process in order to create a true collaboration between human operators and robots. Robotic cells usually have safety cages in order to protect the operators from any harm that a direct contact can produce, thus limiting the interaction between the two. Only collaborative robots can really collaborate in the same workspace as humans without risks, due to their proper design. They pose another problem, though: in order to not harm human safety, they must operate at low velocities and forces, hence their operations are slow and quite comparable to the ones a human operator does. In practice, collaborative robots hardly have a place in a real industrial environment with high production rates.

In this context, this thesis work presents an innovative command system to be used in a collaborative workstation, in order to work alongside robots in a more natural and straightforward way for humans, thus reducing the time to properly command the robot on the fly. Recent techniques of Computer Vision, Image Processing and Deep Learning are used to create the intelligence behind the system, which is in charge of properly recognize the gestures performed by the operator in real-time.

Step 1: Creation of the gesture recognition system

A number of suitable algorithms and models are available in the literature for this purpose. An Object Detector in particular has been chosen for the job, called “Faster Region Proposal Convolutional Neural Network“, or Faster R-CNN, developed in MATLAB.

Object Detectors are especially suited for the task of gesture recognition because they are capable to (i) find the objects in the image and (ii) classify them, thus recognizing which objects they are. Figure 1 shows this concept: the object “number three” is showed in the figure, which the algorithm has to find. 

Fig. 1 - The process undergone by Object Detectors in general. Two networks elaborate the image in different steps: first the region proposals are extracted, which are the positions of object of interest found. Then, the proposals are evaluated by the classification network, which at the end outputs both the position of the object (the bounding box) and the name of the object class.

After a careful selection of gestures, purposely acquired by means of different mobile phones, and a preliminary study to understand if the model was able to differentiate between left and right hand and at the same time between the palm and the back of the hand, the final gestures proposed and their meaning in the control system are showed in Fig. 2.

Fig. 2 - Definitive gesture commands used in the command system.

Step 2: creation of the command system

The proposed command system is structured as in Fig. 3: the images are acquired in real-time by a Kinect v2 camera connected to the master PC and elaborated in MATLAB in order to obtain the gesture commands frame by frame. The commands are then sent to the ROS node in charge of translating the numerical command into an operation for the robot. It is the ROS node, by means of a purposely developed driver for the robot used, that sends the movement positions to the robot controller. Finally, the robot receives the ROS packets of the desired trajectory and executes the movements. Fig. 4 shows how the data are sent to the robot.

Fig. 3 - Overview of the complete system, composed of the acquisition system, the elaboration system and the actuator system.
Fig. 4 - The data are sent to the "PUB_Joint" ROS topic, elaborated by the Robox Driver which uses ROS Industrial and finally sent to the controller to move the robot.

Four modalities have been developed for the interface, by means of a State Machine developed in MATLAB:

  1. Points definition state
  2. Collaborative operation state
  3. Loop operation state
  4. Jog state
Below you can see the initialization of the system, in order to address correctly the light conditions of the working area and the areas where the hands will probably be found, according to barycenter calibration performed by the initialization procedure. 
 
If you are interested in the project, download the presentation by clicking the button below. The thesis document is also available on request.

Related Publications

Nuzzi, C.; Pasinetti, S.; Lancini, M.; Docchio, M.; Sansoni, G. “Deep Learning based Machine Vision: first steps towards a hand gesture recognition set up for Collaborative Robots“, Workshop on Metrology for Industry 4.0 and IoT, pp. 28-33. 2018

Nuzzi, C.; Pasinetti, S.; Lancini, M.; Docchio, M.; Sansoni, G. “Deep learning-based hand gesture recognition for collaborative robots“, IEEE Instrumentation & Measurement Magazine 22 (2), pp. 44-51. 2019