Data pooling and foundation models as a cost-effective alternative for developing safe and robust AI applications based on high-quality data

Cost pressure and performance are often the decisive factors in the development of autonomous driving functions. This applies especially to the off-highway sector, where the variety of different use-cases and vehicle configurations make the build-up off a large data collection fleet and the required high-quality data annotation time-consuming and cost-intensive. The costs involved can amount to several million euros per machine type and therefore represent a significant barrier to enter the development of autonomous driving AI models for many companies.

To tackle this challenge, driveblocks, a specialist in commercial vehicle autonomy and AI model development, is deeply integrating the b-plus data processing and quality management solution into its core product, the Mapless Autonomy Platform. This partnership will allow customers to build cost-efficiently upon foundation models pre-trained on high quality data and facilitates their entry into AI-based autonomous driving function development.

driveblocks therefore offers its neural networks pre-trained upon a data pool from various industry sectors, sensor types and operating conditions. Users of these pre-trained models can decide to contribute data recorded with their respective setups in exchanges for substantial discounts. The b-plus annotation and data management solutions ensure, that the training and validation data in the data pool is annotated and curated up to highest quality standards and therefore suitable for future production projects. In addition, the partners handle the complete data infrastructure and process flow, ranging from specification creation, data collection, selection and labeling to quality assurance and testing, training and validation of the autonomous driving AI models. As a final step, customer-specific data is used to fine-tune the foundation model pre-trained on the data pool. At all times during the process, it is ensured that data privacy (users only get access to the trained model and not the data pool itself) and GDPR compliance is maintained.

"We are in the process of implementing a data pooling approach that will produce an incredibly robust foundation model. This approach will allow us to develop AI applications for our customers not only faster, but also extremely cost-efficiently," explains Alexander Wischnewski, Managing Director of driveblocks. "With a constantly growing overall pool of data, we can access reliable test data at any time. While we are still in the development phase, current predictions are that customers could save around 90% of cost when building upon the pre-trained foundation models."

"Our contribution is to ensure that ML applications are only trained with the most valuable data. We take great care to ensure that only relevant data is included in the data pool," adds Marius Reuther, Managing Director of b-plus automotive. "We achieve this through our extensive quality assurance processes established in the OEM environment, which guarantee a verifiable data quality of 98%! Even though we are still in the concept phase, the initial results are extremely promising."

The joint work of b-plus and driveblocks is revolutionizing the development of safe, robust and high-performance autonomous driving AI models. It minimizes the required amount of data, effort and costs caused by inadequate data quality, the need to set up in-house databases and infrastructures, and project delays. The first results of this groundbreaking approach will be published during the summer 2025.

Until then, find out more about the underlying solutions b-plus’ CONiX.dpc and driveblock's Mapless Autonomy Platform.