AI evaluation campaigns during robotics competitions: the METRICS paradigm

Guillaume Avrin, Virginie Barbosa, Agnes Delaborde

The H2020 METRICS project (2020-2023) organizes competitions in four application areas (Healthcare, Infrastructure Inspection and Maintenance, Agri-Food, and Agile Production) relying on both physical testing facilities (field evaluation campaign) and virtual testing facilities (data-based evaluation campaign) to mobilize, in addition to the European robotics community, the artificial intelligence one. This article presents this approach and pave the way for a new robotics and artificial intelligence competition paradigm.

ACRE: Quantitative Benchmarking in Agricultural Robotics

Riccardo Bertoglio, Giulio Fontana, Matteo Matteucci, Davide Facchinetti and Stefano Santoro

The aim of ACRE (Agri-food Competition for Robot Evaluation) is to provide a set of benchmarks for agricultural robots and smart implements. While involving capabilities of general application, ACRE puts a special focus on weeding, identified as one of the tasks where it is easier for robotics to demonstrate its potential. ACRE, as the other three robot competitions that are being organised by European project METRICS, is built on the established idea of benchmarking through competitions. In this paper we present the framework of ACRE and examples of its benchmarks.


Activity Recognition for Ambient Assisted Living with Videos, Inertial Units and Ambient Sensors

Ranieri, C. M., MacLeod, S., Dragone, M., Vargas, P. A., & Romero, R. A. F.

Worldwide demographic projections point to a progressively older population. This fact has fostered research on Ambient Assisted Living, which includes developments on smart homes and social robots. To endow such environments with truly autonomous behaviours, algorithms must extract semantically meaningful information from whichever sensor data is available. Human activity recognition is one of the most active fields of research within this context. Proposed approaches vary according to the input modality and the environments considered.


Automatic Dataset Generation From CAD for Vision-Based Grasping

Saad Ahmad, Kulunu Samarawickrama, Esa Rahtu and Roel Pieters

Published in: 20th International Conference on Advanced Robotics

Recent developments in robotics and deep learning enable the training of models for a wide variety of tasks, from large amounts of collected data. Visual and robotic tasks, such as pose estimation or grasping, are trained from image data (RGB-D) or point clouds that need to be representative for the actual objects, to acquire accurate and robust results. This implies either generalized object models or large datasets that include all object and environment variability, for training. However, data collection is often a bottleneck in the fast development of learning-based models. In fact, data collection might be impossible or even undesirable, as physical objects are unavailable or the physical recording of data is too time-consuming and expensive. For example, when building a data recording setup with cameras and robotic hardware. CAD tools, in combination with robot simulation, offer a solution for the generation of training data that can be easily automated and that can be just as realistic as real world data. In this work, we propose a data generation pipeline that takes as input a CAD model of an object and automatically generates the required training data for object pose estimation and object grasp detection. The object data generated are: RGB and depth image, object binary mask, class label and ground truth pose in camera- and world frame. We demonstrate the dataset generation of several sets of industrial object assemblies and evaluate the trained models on state of the art pose estimation and grasp detection approaches. Code and video are available at:


Domestic service robots are becoming more ubiquitous and can perform various assistive tasks such as fetching items or helping with medicine intake to support humans with impairments of varying severity. However, the development of robots taking care of humans should not only be focused on developing advanced functionalities, but should also be accompanied by the definition of benchmarking protocols enabling the rigorous and reproducible evaluation of robots and their functionalities. Thereby, of particular importance is the assessment of robots’ ability to deal with failures and unexpected events which occur when they interact with humans in real-world scenarios. For example, a person might drop an object during a robot-human hand over due to its weight. However, the systematic investigation of hazardous situations remains challenging as (i) failures are difficult to reproduce; and (ii) possibly impact the health of humans. Therefore, we propose in this paper to employ the concept of scientific robotic competitions as a benchmarking protocol for assessing care robots and to collect datasets of human-robot interactions covering a large variety of failures which are present in real-world domestic environments. We demonstrate the process of defining the benchmarking procedure with the human-to-robot and robot-to-human handover functionalities, and execute a dry-run of the benchmarks while inducing several failure modes such as dropping objects, ignoring the robot, and not releasing objects. A dataset comprising colour and depth images, a wrist force-torque sensor and other internal sensors of the robot was collected during the dry-run. In addition, we discuss the relation between benchmarking protocols and standards that exist or need to be extended with regard to the test procedures required for verifying and validating conformance to standards.


From ERL to RAMI: Expanding Marine Robotics Competitions Through Virtual Events

G. Ferri, F. Ferreira, A. Faggiani, T. Fabbri

To be published soon

On the Design of the Agri-Food Competition for Robot Evaluation (ACRE)

Riccardo Bertoglio, Giulio Fontana, Matteo Matteucci, Davide Facchinetti, Michel Berducat, Daniel Boffety

Published in: 2021 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC)

The Agri-Food Competition for Robot Evaluation (ACRE) is a novel competition for autonomous robots and smart implements. It is focused on agricultural tasks such as removing weeds or mapping/surveying crops down to single-plant resolution. Such abilities are crucial for the transition to so-called “Agriculture 4.0”, i.e., precision agriculture supported by ICT, Artificial Intelligence, and Robotics. ACRE is a benchmarking competition, i.e., the activities that participants are required to execute are structured as performance benchmarks. The benchmarks are grounded on the key scientific concepts of objective evaluation, repeatability, and reproducibility. Transferring such concepts in the agricultural context, where large parts of the test environment are not fully controllable, is one of the challenges tackled by ACRE. The ACRE competition involves both physical Field Campaigns and data-based Cascade Campaigns. In this paper, we present the benchmarks designed for both kinds of Campaigns and report the outcome of the ACRE dry-runs that took place in 2020.


"Validation of methodologies for evaluating stand-alone weeding solutions, within the framework of the Challenge ROSE and METRICS projects"

The ROSE Challenge is the first global robotics and artificial intelligence competition to implement a third-party evaluation of the performance of robotized intra-row weed control in real and reproducible conditions, to ensure a credible and objective assessment of their effectiveness. This paper reports on the design and validation of test facilities for this competition, which presents a particular complexity: the evaluations take place in real conditions on crop plots and target living organisms (crops and weeds). Moreover, the experimental conditions need to be reproductible to allow for comparison of evaluation results and for fair treatment of different participants. The article also discusses the opportunity this challenge offers to define, in a consensual manner, the means and methods for characterizing these intelligent systems. The tools developed in the framework of this challenge establish the necessary references for future research in the field of agricultural robotics: the annotated images will be particularly useful to the community and the evaluation protocol will allow to define harmonized methodologies beyond the ROSE challenge. After presenting the objectives of the challenge, the article will present the methodology and tools developed and used to allow an objective and comparable evaluation of the performances of the systems and solutions developed. Finally, the article will illustrate this potential for harmonization and sharing of references through the European competition ACRE of the European project H2020 METRICS.


Learning-enabled components in robots must be assessed concerning non-functional requirements (NFR) such as reliability, fault tolerance, and adaptability to ease the acceptance of responsible robots into human-centered environments. While many factors impact NFRs, in this paper, we focus on datasets that are used to train learning models that are applied in robots. We describe desirable characteristics for robotics datasets and identify the associated NFRs they affect. The characteristics are described in relation to the variability of the instances in the dataset, out-of-distribution data, the Spatio-temporal embodiment of robots, interaction failures, and lifelong learning. We emphasize the need to include out-of-distribution and failure data in the datasets, both to improve the performance of learning models, and to allow the assessment of robots in unexpected situations. We also stress the importance of continually updating the datasets throughout the lifetime of the robot, and the associated documentation of the datasets for improved transparency and traceability.