Vis4Mechs at the Smart Vision Forum 2019

Vis4Mechs group has been invited to the first Smart Vision Forum held in Bologna, June 25th 2019. The event was a half-conference half-industrial showcase to bring together vision and automation industries, students of both high schools and Universities and academical researchers.

Simone, Cristina and a small group of students of the 3D Vision course and of other PhD programs, as well as Massimiliano and Luca of the Mechanical and Thermal Measurments group, attended the event.

We also found Giordano and Rossano, two former students of the group who graduated in October 2018!

Student Projects 2018/2019

The projects presented by the students to complete the exam are described here. Every group was composed of two members, randomly selected.

Group 1: Extrinsic calibration of 2 Kinects using both a sphere-based calibration and a skeletonization-based calibration

The group task was to compare the extrinsic calibration results obtained both from a calibration made using a green sphere and a custom algorithm developed by University of Trento, and a calibration obtained by a skeletonization algorithm developed by our group.

Group 2: Intrinsic calibration evaluation

The group task was to empirically determine the best way to perform an intrinsic calibration of Kinect v2 cameras using a chessboard target (i. e. how many acquisitions? Distances? Inclinations?). Then, they calibrated four different Kinect v2 and analyzed the results of the calibration for each camera.

Group 3: Evaluation of a trabecular structure point cloud acquisition (1)

The group task was to compare two different acquisitions of a small trabecular structure 3D printed in Titanium, obtained (i) from the 3D digitizer Vivid-910 and (ii) from the 2D/3D Profile Sensor Wenglor MLWL132.

Group 4: People tracking system evaluation

The group task was to create a simple people tracking algorithm based on the 3D point cloud acquired from a Real Sense D435 camera mounted on the ceiling. The evaluation of the performances focused on how well the developed algorithm was able to track the path of the person compared to the theorethical path.

Group 5: Extrinsic calibration of 3 Kinects using a skeletonization algorithm

The group task was to analyze the result of the point cloud alignment obtained by the extrinsic calibrations performed. The group tested different configurations with 2 and 3 Kinects in different positions, used the skeletonization algorithm to obtained the rototranslation matrixes and finally analyzed the resulting alignments in PolyWorks.

Group 6: Evaluation of a trabecular structure point cloud acquisition (2)

The group task was to compare two different acquisitions of a small trabecular structure 3D printed in Titanium, obtained (i) from the 3D digitizer Vivid-910 and (ii) from the 2D/3D Profile Sensor Wenglor MLWL132. The trabecular structure used is different from the one used by Group 3.

Group 7: Instrumented crutches for gait analysis results evaluation

The group task was to perform some acquisitions of a person walking with a pair of our instrumented crutches in different outdoor set ups (uphill, downhill, planar). The acquisitions have been elaborated by our software to analyze gait phases, and their task was to choose the best set up conditions and filtering options according to the results of the algorithm.

The observatory on food consumption and gastronomic professions: one year of activities

Vis4Mechs staff is participating to the Observatory with the ambitious project of Robo-Chef, an automatic recipe rewriting system based on the acquisition of data related to the preparation of recipes through vision sensors (and not only). The recipes are carried out by the cook and will be processed by a “smart” system that recognizes ingredients and actions. The collaboration with the staff of CAST Alimenti has been fundamental, in particular Dario Mariotti who has warmly supported the project, and Nicola Michieletto, whose experience and skills in the kitchen are of fundamental importance for the technological transfer of the skills the project aims at.

University, companies, and a school to create companies

The C-Lab of the University of Brescia has been baptized by those in charge and now begins to take its first steps. First thing: to let people know what it is, what it will do, with whom and with what means and objectives. This is the Contamination Lab, a structure emanating from the University of Brescia, which puts its first appropriation into it and keeps it under its wing.

Watch the video and the full article here!

Extrinsic Calibration procedure by using human skeletonization

Metrology techniques based on Industrial Vision are increasingly used both in research and industry. These are contactless techniques characterized by the use of optical devices, such as RGB and Depth cameras. In particular, multi-camera systems, i. e. systems composed of several cameras, are used to carry out three-dimensional measurements of various types. In addition to mechanical measurements this type of systems are often used to monitor the movements of human subjects within a given space, as well as for the reconstruction of three-dimensional objects and shapes.

To perform an accurate measurment, multi-camera systems must be carefully calibrated. The calibration process is a fundamental concept in Computer Vision applications that involve image-based measurements. In fact, Vision System calibration is a key factor to obtain reliable performances, as it allows to find the correspondence between the workspace and the points present on the images acquired by the cameras.

A multi-camera system calibration is the process that allows to obtain the geometric parameters related to distortions, positions and orientations of each camera that constitutes the system. It is therefore necessary to identify a mathematical model that takes into account both the internal functioning of the device and the relationship between the camera and the external world.

The calibration process consists of two parts:

  1. The intrinsic calibration, necessary to model the individual devices in terms of focal length, coordinates of the main point and distortion coefficients of the acquired image;
  2. The extrinsic calibration, necessary to determine the position and orientation of each device with respect to an absolute reference system. By means of extrinsic calibration it is possible to obtain the position of the camera reference system with respect to the absolute reference system in terms of rotation and translation.
Traditional calibration methods are based on the use of a calibration target, i. e. a known object, usually a plane object, on which there are calibration points that are unambiguously recognizable and have known coordinates. These calibration targets are automatically recognized by the system in order to calculate the position of the object with respect to the camera and thus obtain the parameters of rotation and translation of the camera with respect to the global reference system (Fig. 1).
Fig. 1 - Examples of traditional calibration masters.

The methods based on the recognition of three-dimensional shapes are based on the geometric coherence of a 3D object positioned in the field of view (FoV) of the various cameras. Each device records only a part of the target object and then, combining each view with the portion actually recorded by the camera, the relative displacement of the sensors with respect to the object is evaluated, thus obtaining the rotation and translation of the acquisition device itself (Fig. 2).

Fig. 2 - Example of a known object used as the calibration master, in this case a green sphere of a known diameter.

However, both traditional calibration methods and those based on the recognition of three-dimensional shapes have the disadvantage of being very complex and computationally expensive. In fact, former methods exploit the use of a generally flat calibration target which limits the positioning of the target itself because, under certain conditions and positions, it is not possible to obtain simultaneous views. The latter methods also require to converge to a solution, hence a good initialization of the parameters of 3D recognition by the operator is needed, resulting in low reliability and limited automation.

Finally, the methods based on the recognition of the human skeleton use as targets the skeleton joints of the human positioned within the FoV of the cameras. Skeleton based methods represent therefore an evolution of the 3D shape matching methods, since the human is considered as the target object (Fig. 3).

In this thesis work, we exploited skeleton based methods to create a new calibration method which is (i) easier to use compared to known methods, since users only need to stand in front of a camera and the system will take care of everything and (ii) faster and computationally inexpensive.

Fig. 3 - Example of a skeleton obtained by connecting the joints acquired from a Kinect camera (orange dots) and from a calibration set up based on multiple cameras (red dots).

STEP 1: Measurment set-up

The proposed set-up is very simple, as it is composed of a pair of Kinect v2 intrinsecally calibrated by using known procedures, a process required to correctly align the color information on top of the depth information acquired by the camera.

The cameras are placed in order to acquire the measurment area in a suitable way. After the calibration images have been taken, these are evaluated by a skeletonization algorithm based on OpenPose, which also takes into account the depth information to correctly place the skeleton in the 3D world [1]. Joints coordinates are therefore extracted for every pair of color-depth images correctly aligned both spatially and temporally.

To obtain accurate joint positions we also perform an optimization procedure afterwards to correctly place the joints on the human figure according to a minimization error procedure written in MATLAB.

Fig. 4 shows the abovementioned steps, while Fig. 5 shows a detail of the skeletonization algorithm used.

 

Fig. 4 - Steps required for the project. First we acquire both RGB and Depth frames from every Kinect, second we perform the skeletonization of the human using the skeletonization algorithm of choice and finally we use it to obtain the extrinsinc calibration matrix for each kinect, in order to reproject them to a known reference system.
Fig. 5 - Skeletonization procedure used by the algorithm.

STEP 2: Validation of the proposed procedure

To validate the system we used a mannequin placed still in 3 different positions (2 m, 3.5 m, 5 m) showing both the front and the back of it to the cameras. In fact, skeletonization procedures are usually more robust when humans are in a frontal position since also the face keypoints are visible. The validation positions are shown in Fig. 6, while the joint positions calculated by the procedure are shown in Fig. 7 and 8.

Fig. 6 - Validation positions of the mannequin. Only a single camera was used in this phase.
Fig. 7 - Joints calculated by the system when the mannequin is positioned in frontal position at (a) 2 m, (b) 3.5 m and (c) 5 m.
Fig. 8 - Joints calculated by the system when the mannequin is positioned in back position at (a) 2 m, (b) 3.5 m and (c) 5 m.

STEP 3: Calibration experiments

We tested the system in 3 configurations shown in Fig. 9: (a) when the two cameras are placed in front of the operator, one next to the other; (b) when a camera is positioned in front of the operator and the other is positioned laterally with an angle between them of 90°; (c) when the two cameras are positioned with an angle of 180° between them (one in front of the other, operator in the middle).

We first obtained the calibration matrix by using our procedure, hence the target used is the human subject placed in the middle of the scene (at position zero). Then, we compared the calibration matrix obtained in this way to the calibration matrix obtained from another algorithm developed by the University of Trento [2]. This algorithm is based on the recognition of a 3D known object, in this case the green sphere shown in Fig. 2.

The calibration obtained from both methods is evaluated using a 3D object, a cylinder of known shape which has been placed in 7 different positions in the scene as shown in Fig. 10. The exact same positions have been used for each configuration.

Fig. 9 - Configurations used for the calibration experiments.
Fig. 10 - Cylinder positions in the scene.

We finally compared the results aligning the point clouds obtained by using both calibration matrixes. These results have been elaborated in PolyWorks for better visualization, and are shown in the presentation below. Feel free to download it to know more about the project and to view the results!

Gait Phases analysis by means of wireless instrumented crutches and Decision Trees

In recent years there has been a significant increase in the development of walking aid systems and in particular of robotic exoskeletons for the lower limbs, used to make up for the loss of walking ability following injuries to the spine (Fig. 1).

However, in the initial training phase the patient is still forced to use the exoskeleton in specialized laboratories for assisted walking, usually equipped with vision systems, accelerometers, force platforms and other measurement systems to monitor the patient training (Fig. 2).

Fig. 1 - Example of a patient wearing the ReWalk exoskeleton. The crutches are needed as aid for the patients, since they tend to tilt forward when using the exoskeleton.

To overcome the limitations given by both the measuring environment and the instruments themselves needed, the University of Brescia joined forces with other research laboratories and created a pair of instrumented wireless crutches able to assess the loads on the upper limbs through a suitable biomechanical model that measures the load exchanged between the soil and the crutch. 

In addition to providing a great help to the physiotherapist for the assessment of the quality of walking and reduce the risk of injury to the upper limbs due to the use of the exoskeleton, the real advantage of the developed device lies in the fact that it is totally wireless, allowing its use in external environments where the user can feel more comfortable. It is in fact known in the literature that the behavior of a subject during a walk varies depending on the environment in which it is performed, both because of the space limitations and of the range of the measurment instrumentation, which limits the trajectory the patient performs in a laboratory compared to the one performed outdoors without strict limitations.

It is also important to relate and evaluate the measured quantities by averaging them on a percentage basis associated with the phases of the step instead of time, in order to compare them with the physiological behavior of the subject, usually related to the phases of the step (swing and stance, Fig. 3).

Fig. 3 - Detail of a gait cycle: the stance phase is when the foot is placed forward, the swing phase is when the foot kicks off the ground moving the body forward.

STEP 1: Instrumented wireless crutches design

A first prototype of instrumented crutches was already been developed to measure the upper limbs forces. In this project, however, we wanted to create a new prototype able to measure and predict the gait phases without the need of an indoord laboratory. To do this, we mounted two Raspberry Pi on the crutches and two PicoFlexx Time of Flight cameras which will see the feet while walking (Fig. 4 and 5).
Fig. 4 - Detail of the proposed instrumented crutches. The two Raspberry Pi send the acquired images to an Ubuntu PC wireless.
Fig. 5 - (1) PicoFlexx ToF cameras, (2) Raspberry Pi boards, (3) powerbanks for the Raspberry Pi boards.

STEP 2: Supervised Machine Learning algorithm for gait phases prediction

We choose a supervised algorithm (Decision Trees) to predict the gait phases according to the depth images of the feet acquired during the walking.
Each image was processed offline to (i) perform a distance filtering, (ii) calculate and fit a ground plane according to the measured one and (iii) calculate the distance between the ground plane and the feet. The latter values are used as features for the algorithm (Fig. 6 and 7).
Fig. 6 - Detail of the supervised algorithm of choice: a set of features are selected and used for the training of the model, which is then used to predict the gait phases.
Fig. 7 - Detail of the processing of the images needed to extract the features used by the model.

STEP 3: Experiments

The proposed system has been tested on 3 healthy subjects. Each subject has been tested on a walk indoor and on a walk outdoor, different for every subject.

For each subject we trained a model specific for the person. This choice was driven by the fact that every person walks in its way, so it is best to comprehend its specific style and use it to monitor its performances while walking.

To learn more about the project and to view the results, feel free to download the presentation below!

Performance Benchmark between an Embedded GPU and a FPGA

Nowadays is impossible to think of a device without considering it “intelligent” to some extent. If ten years ago “intelligent” systems where carefully designed and could only be used in specific cases (i. e. industry or defense or research), today smart sensors which are big as a button are everywhere, from cellphones to refrigerators, from vacuum cleaner to industrial machines.
If you think that’s it, you’re wrong: the future is tiny.

Embedded systems are more and more needed even in industries, because they’re capable to perform complex elaborations on board, sometimes comparable to the ones carried out by standard PCs. Small, portable, flexible and smart: it is not hard to understand why they’re more and more used!

A plethora of embedded systems is available on the market, according to the needs of the client. One important characteristic to check out is the capability of the embedded platform of choice to be “smart”, which nowadays means if it’s able to run a Deep Learning model with the same performances obtained while running it on a PC. The reason is that Deep Learning models require a lot of resources to perform well, and running them on CPUs usually means to lose accuracy and speed compared to running them on GPUs.

To solve this issue, some companies started to produce embedded platforms with GPUs on board. While the architecture of these systems is still different to the architecture of PCs, they are a quite good improvement on the matter. Another type of embedded systems is the FPGA: these platforms lead the market for a while before embedded GPUs became common, are low-level programmed and because of that are usually high performing.

In this thesis work, conducted in collaboration with Tattile, we performed a benchmark between the Nvidia Jetson TX2 embedded platform and the Xilinx FPGA Evaluation Board ZCU104.

STEP 1: Determine the BASELINE model

To perform our benchmark we selected an example model. We kept it simple by choosing the well-known VGG Net model, which we trained on a host machine equipped with a GPU on the standard dataset CIFAR-10. This dataset is composed by 10 classes of different objects (dogs, cats, airplanes, cars…) with standard image dimensions of 32×32 px.

The network was trained in Caffe for 40000 iterations, reaching an average accuracy of 86.1%. Note that the model obtained after this step is represented in floating point 32bit (FP32), which allows a refined and accurate representation of weights and activations.

Fig. 3 - CIFAR-10 dataset examples for each class.

STEP 2: Performances on Nvidia Jetson TX2

This board is equipped with a GPU, thus allowing a representation in floating point. Even if the FP32 representation is supported, we choose to perform a quantization procedure to reduce the representation complexity of the trained model to a FP16 representation. This choice was driven by the fact that the performances obtained with this representation are considered the best ones by the literature.

We used TensorRT, which is natively installed on the board, to perform the quantization procedure. After this process the model obtained an average accuracy of 85.8%.

STEP 3: Performances on Xilinx FPGA Zynq ZCU104

The FPGA do not support floating point representations. The toolbox used to modify the original model and adapt it for the board is a proprietary one, called DNNDK.

Two configurations were tested: a configuration where the original model was quantized from FP32 to INT8, thus losing the floating point and critically reduce the dimension of the network to few MB. The average accuracy obtained in this case is 86.6%, slightly better than the baseline probably because of the big representation gap, which in some cases lead to correct predictions the ones that were borderline between correct and incorrect.

The second configuration applies a pruning process after the quantization procedure, thus deleting useless layers. In this case the average accuracy reached is of 84.2%, as expected after the combination of both processes.

STEP 4: Performance benchmark of the boards

Finally we compared the performances obtained by the two boards when running inference in real-time. The results can be found in the presentation below: if you’re interested, feel free to download it!

The thesis document is also available on request.

Real-Time robot command system based on hand gestures recognition

With the Industry 4.0 paradigm, the industrial world has faced a technological revolution. Manufacturing environments in particular are required to be smart and integrate automatic processes and robots in the production plant. To achieve this smart manufacturing it is necessary to re-think the production process in order to create a true collaboration between human operators and robots. Robotic cells usually have safety cages in order to protect the operators from any harm that a direct contact can produce, thus limiting the interaction between the two. Only collaborative robots can really collaborate in the same workspace as humans without risks, due to their proper design. They pose another problem, though: in order to not harm human safety, they must operate at low velocities and forces, hence their operations are slow and quite comparable to the ones a human operator does. In practice, collaborative robots hardly have a place in a real industrial environment with high production rates.

In this context, this thesis work presents an innovative command system to be used in a collaborative workstation, in order to work alongside robots in a more natural and straightforward way for humans, thus reducing the time to properly command the robot on the fly. Recent techniques of Computer Vision, Image Processing and Deep Learning are used to create the intelligence behind the system, which is in charge of properly recognize the gestures performed by the operator in real-time.

Step 1: Creation of the gesture recognition system

A number of suitable algorithms and models are available in the literature for this purpose. An Object Detector in particular has been chosen for the job, called “Faster Region Proposal Convolutional Neural Network“, or Faster R-CNN, developed in MATLAB.

Object Detectors are especially suited for the task of gesture recognition because they are capable to (i) find the objects in the image and (ii) classify them, thus recognizing which objects they are. Figure 1 shows this concept: the object “number three” is showed in the figure, which the algorithm has to find. 

Fig. 1 - The process undergone by Object Detectors in general. Two networks elaborate the image in different steps: first the region proposals are extracted, which are the positions of object of interest found. Then, the proposals are evaluated by the classification network, which at the end outputs both the position of the object (the bounding box) and the name of the object class.

After a careful selection of gestures, purposely acquired by means of different mobile phones, and a preliminary study to understand if the model was able to differentiate between left and right hand and at the same time between the palm and the back of the hand, the final gestures proposed and their meaning in the control system are showed in Fig. 2.

Fig. 2 - Definitive gesture commands used in the command system.

Step 2: creation of the command system

The proposed command system is structured as in Fig. 3: the images are acquired in real-time by a Kinect v2 camera connected to the master PC and elaborated in MATLAB in order to obtain the gesture commands frame by frame. The commands are then sent to the ROS node in charge of translating the numerical command into an operation for the robot. It is the ROS node, by means of a purposely developed driver for the robot used, that sends the movement positions to the robot controller. Finally, the robot receives the ROS packets of the desired trajectory and executes the movements. Fig. 4 shows how the data are sent to the robot.

Fig. 3 - Overview of the complete system, composed of the acquisition system, the elaboration system and the actuator system.
Fig. 4 - The data are sent to the "PUB_Joint" ROS topic, elaborated by the Robox Driver which uses ROS Industrial and finally sent to the controller to move the robot.

Four modalities have been developed for the interface, by means of a State Machine developed in MATLAB:

  1. Points definition state
  2. Collaborative operation state
  3. Loop operation state
  4. Jog state
Below you can see the initialization of the system, in order to address correctly the light conditions of the working area and the areas where the hands will probably be found, according to barycenter calibration performed by the initialization procedure. 
 
If you are interested in the project, download the presentation by clicking the button below. The thesis document is also available on request.

Related Publications

Nuzzi, C.; Pasinetti, S.; Lancini, M.; Docchio, M.; Sansoni, G. “Deep Learning based Machine Vision: first steps towards a hand gesture recognition set up for Collaborative Robots“, Workshop on Metrology for Industry 4.0 and IoT, pp. 28-33. 2018

Nuzzi, C.; Pasinetti, S.; Lancini, M.; Docchio, M.; Sansoni, G. “Deep learning-based hand gesture recognition for collaborative robots“, IEEE Instrumentation & Measurement Magazine 22 (2), pp. 44-51. 2019