Data-efficient machine learning: the ability to learn in complex domains without requiring big data

Recent efforts in machine learning have addressed the problem of learning from massive amounts of data. We now have highly scalable solutions for problems in object detection and recognition, machine translation, text-to-speech, recommender systems, and information retrieval, all of which attain state-of-the-art performance when trained with large amounts of data. But in other domains, the challenge we now face is how to learn efficiently with the same performance in less time and with less data.
illustration

Surrogate modelling or data-efficient machine learning is an optimization method where a predictive model guides the optimization process. Like any other traditional machine learning model, it is trained using training data coming from historical data or experiments or simulations and can be questioned to return a predicted outcome based on a given set of input parameters. But next to this, it can be queried to find the input parameter set that will most likely result in the highest accuracy of the model if data about this parameter set is obtained. This way, it can give the input parameter set that can be used for the next experiment or simulation, creating additional training data, resulting in the highest increase inaccuracy.

These techniques have been the subject of research for many years, especially in the Surrogate Modeling lab (SUMO Lab) of imec/UGent, where the founders of ML2Grow performed their PhD research and ML2Grow was founded. The framework was released by SUMO Lab in 2008 and adopted by thousands of engineers worldwide and used in several ML2Grow projects.

This results in a complete shift in mindset. In a typical workflow, costly experiments and simulations are done on a less-structured approach, and optima are found using traditional grid search algorithms. This is sub-optimal as with complex parameter combinations, a simulator is computationally too expensive to have sufficient exploration.

Multi-objective Bayesian Optimization

This approach using data-efficient Machine Learning results in a simulation speed-up of 100x because this kind of AI model models the parameter space and can suggest ‘intelligently’ the next points to simulate.

This is commercially available as a service offered by several companies, but with limited usability. Most commercially offered services lack support for multiple objectives, multi-fidelity, reweighing objective functions and a kickstart optimization procedure. And of course, for specific cases, one needs lots of specialized knowledge to set up commercial services in these complex use cases, where tailor-made implementations are always at least part of the solution. ML2Grow offers this knowledge to its customers and delivers complete integrated solutions.

Reference implementations

Surrogate Modeling is mainly used in environments where experiments or simulations are expensive. ML2Grow staff used SUMO in their previous career at UGent/imec to speed up parameter space exploration in calculations with finite element methods in material engineering to optimise the Dyson vacuum cleaner. The same method is also intensively used in aircraft wing design and simulation. Also in processes where each experiment has a high cost in terms of money and/or time like in biotech or agriculture where crop growth takes a lot of time, or EMC testing where every experiment destroys a chip, creates SUMO very interesting opportunities which have been explored by ML2Grow staff in previous research work. Some other examples created by ML2Grow staff are given below.

Search flight optimization and hotspot detection

To find a hotspot in a 2D space (e.g. the source of air pollution, radiation, contamination,…) one needs to collect samples. This could traditionally be done by scanning the whole space with the same intensity, introducing unnecessary costs (travel time). SUMO enables possibilities to do optimized line-based sampling incorporating the additional constraints of real-life movements (from e.g. a fixed-wing drone, where curves are penalized) through the sample space. This leads to an optimal path quickly identifying the hotspot or local optimum. (sample source: research by Delanghe et al. – Ghent University)

In these implementations, the model suggests the location of the next sampling point (or the direction the drone should head) to efficiently find the optimum, or in case the source of radiation.

Case study: Oqton

Oqton is a Belgian-American AI-powered platform for 3D printing created by tech entrepreneur Ben Schrauwen. In 2021, the company raised an additional 40 million euro venture capital. It is an end-to-end software platform combining different silo’s from manufacturing to logistics, optimizing and automating both.

For the optimization of 3D printing jobs, ML2Grow provided Oqton with the necessary Bayesian optimization-based software suggesting the most optimal placement of pieces on the additive manufacturing base. The placement of different pieces in a 3D printer has an impact on price as well as quality. It can be optimized in terms of reduction of movements of the 3D printing head (and thus speed-up of the production) as well as the optimal position in order to simultaneously build as much pieces as possible, but also the achieved quality.

As the calculation of the estimated production time and quality takes some time and the huge variety in possible positions, it is impossible to simulate all possible positions. Surrogate Modeling provides the solution in this situation, speeding up decision time by reducing the required amount of simulations by 10-100x.

Case study: Space Bakery

The Space Bakery consortium has the ambition to create the next generation of bread products, high in nutritive value and low in resource demand, that could support such future space missions. The ultimate goal is to make bread on Mars. Within this research project, the Consortium will address concerns relevant in domains such as circular food production, bread products focusing on health and well-being, urban farming, robotic pollination, efficient water and nutrient use for agriculture, etc. and, as such, enable us to gain insights that could ultimately bring benefits to Earth.

Among the seven partners are two ML2grow customers: Urban Crop Solutions (UCS) and Magics. In the project, we developed a framework for optimization of growth conditions, based on several developed sensors and data-efficient machine learning to reduce the number of necessary experiments and time. The machine learning software conducts several experiments.