OpenPTrack software metrological evaluation

Smart tracking systems are nowadays a necessity in different fields, especially the industrial one. A very interesting and successful open source software has been developed by the University of Padua, called OpenPTrack. The software, based on ROS (Robotic Operative System), is capable to keep track of humans in the scene, leveraging well known tracking algorithms that use point cloud 3D information, and also objects, leveraging the colour information as well.

Amazed by the capabilites of the software, we decided to study its performances further. This is the aim of this thesis project: to carefully characterize the measurment performances of OpenPTrack both of humans and objects, by using a set of Kinect v2 sensor.

Step 1: Calibration of the sensors

It is of utmost importance to correctly calibrate the sensors when performing a multi-sensor acquisition. 

Two types of calibration are necessary: (i) the intrinsic calibration, to align the colour (or grayscale/IR like in the case of OpenPTrack) information acquired to the depth information (Fig. 1) and (ii) the extrinsic calibration, to align the different views obtained by the different cameras to a common reference system (Fig. 2).

The software provides the suitable tools to perform these steps, and also provides a tool to further refine the extrinsic calibration obtained (Fig. 3). In this case, a human operator has to walk around the scene: its trajectory is then acquired by every sensor and at the end of this registration the procedure aligns the trajectories in a more precise way.

Each of these calibration processes is completely automatic and performed by the software.

Fig. 1 - Examples of intrinsic calibration images. (a) RGB hd image, (b) IR image, (c) syncronized calibration of RGB and IR streams.
Fig. 2 - Scheme of an extrinsic calibration procedure. The second camera K2 must be referred to the first on K1, finally the two must be referred to an absolute reference system called Wd.
Fig. 3 - Examples of the calibration refinement. (a) Trajectories obtained by two Kinect not refined, (b) trajectories correctly aligned after the refinement procedure.

Step 2: Definition of measurment area

Two Kinect v2 were used for the project, mounted on tripods and placed in order to acquire the larger FoV possible (Fig. 4). A total of 31 positions were defined in the area: these are the spots where the targets to be measured have been placed in the two experiments, in order to cover all the FoV available. Note that not every spot lies in a region acquired by both Kinects, and that there are 3 performance regions highlighted in the figure: the overall most performing one (light green) and the single camera most performing ones, where only one camera (the one that is closer) sees the target with a good performance.

Fig. 4 - FoV acquired by the two Kinects. The numbers represent the different acquisition positions (31 in total) where the targets where placed in order to perform a stable acquisition and characterization of the measurment.

Step 3: Evaluation of Human Detection Algorithms

To evaluate the detection algorithms of OpenPTrack it has been used a mannequin placed firmly on the different spots as the measuring target. Its orientation is different for every acquisition (N, S, W, E) in order to better understand if the algorithm is able to correctly detect the barycenter of the mannequin even if it is rotated (Fig. 5).
 

The performances were evaluated using 4 parameters:

  • MOTA (Multiple Object Tracking Accuracy), to measure if the algorithm was able to detect the human in the scene;
  • MOTP (Multiple Object Tracking Precision), to measure the accuracy of the barycenter estimation relative to the human figure;
  • (Ex, Ey, Ez), the mean error between the estimation of the barycenter position and the reference barycenter position known, relative to every spatial dimension (x, y, z);
  • (Sx, Sy, Sz), the errors variability to measure the repetibility of the measurments for every spatial dimension (x, y, z).
Fig. 5 - Different orientations of the mannequin in the same spot.

Step 4: Evaluation of Object Detection algorithms

A different target has been used to evaluate the performances of Object Detection algorithms, shown in Fig. 6: it is a structure on which three spheres have been positioned on top of three rigid arms. The spheres are of different colours (R, G, B), to estimate how much the algorithm is colour-dependant, and different dimensions (200 mm, 160 mm, 100 mm), to estimante how much the algorithm is dimension-dependant. In this case, to estimate the performances of the algorithm the relative position between the spheres has been used as the reference measure (Fig. 7). 
 

The performances were evaluated using the same parameters used for the Human Detection algorithm, but referred to the tracked object instead. 

Fig. 6 - The different targets: in the first three images the spheres are of different colours but same dimension (200 mm, 160 mm and 100 mm), while in the last figure the sphere where all of the same colour (green) but of different dimensions.
Fig. 7 - Example of the reference positions of the spheres used.

If you want to know more about the project and the results obtained, please download the master thesis below.