Appen’s Annual State of AI Report Finds 10% Increase in Bottlenecks Related to Sourcing, Cleaning and Labeling Data

In This Article:

Appen AI, Inc.
Appen AI, Inc.

Report Also Finds That Data Accuracy Has Dropped Nearly 9% Since 2021

Bottleneck Trend
Bottleneck Trend

KIRKLAND, Wash., Oct. 22, 2024 (GLOBE NEWSWIRE) -- Appen Limited (ASX: APX), a leading provider of high-quality data for the AI lifecycle, today released the findings of its 2024 State of AI report, which surveyed more than 500 IT decision-makers across a variety of U.S. industries. While growing, the adoption of AI-powered technologies such as machine learning (ML) and generative AI (GenAI) are hindered by a lack of accurate and high-quality data. The report found a 10 percentage point year-over-year increase in bottlenecks related to sourcing, cleaning, and labeling data.

"Enthusiasm around GenAI and other AI-powered tech remains high, as indicated by the rapid growth, but users are quickly finding that the promise of these tools is matched by an equally daunting challenge," said Si Chen, VP of Strategy at Appen. "The success of AI initiatives relies heavily on high-quality data, and this is becoming more difficult as AI use cases increase in complexity and become more specialized. This is reflected by the fact that high-quality annotations, meaning high consistency and accuracy, are the top features companies seek in a data annotation solution. Those building the AI tools and models of tomorrow value strategic data partnerships now more than ever."

The report found that the use of GenAI continues to grow at a healthy pace, with adoption up 17 percentage points in 2024 versus the previous year. At the same time, 86% of respondents retrain or update their ML models at least once every quarter, indicating a critical desire for fresh, relevant and high-quality data as accuracy declines. Meanwhile, data accuracy has decreased by 9 percentage points since 2021, making the quest for high-quality data a major challenge—as models are being iterated more frequently, data remains the most significant challenge, especially where accuracy and availability are concerned.

To compile the 2024 State of AI report, Appen commissioned Harris Poll to conduct an online survey of U.S. information technology decision-makers, including business leaders and managers, data scientists, data engineers and developers from April 18 - May 9, 2024, with respondents working at companies with 100-plus employees. Key findings include:

  • Enterprise deployments and ROI are down – mean percent of AI projects making it to deployment has dropped by 8 percentage points since 2021, while the mean percent of deployed AI projects that have shown meaningful ROI has dropped by 9.4 percentage points.

  • Nearly 90% of respondents rely on external data providers for AI model training and/or annotation, underscoring the essential role external data plays in sustaining AI systems.

  • Despite GenAI’s rapid growth, managing bias and ensuring fairness in GenAI model training remains a key challenge. Custom data collection is the primary method for sourcing GenAI training data.

  • Humans are more vital to AI than most might expect: 80% of respondents highlight the importance of human-in-the-loop ML, validating that human insight is key to refining AI systems.

  • Data diversity is the most valued feature in AI, closely followed by efforts to reduce bias and ensure scalability. 97% of respondents agree these elements are vital for building AI models.

  • Data management has emerged as the leading obstacle in AI projects for technologists, with a reported 10 percentage point increase in bottlenecks related to sourcing, cleaning, and labeling data since 2023. Data accuracy has declined by 9 percentage points, likely driven by the increasing complexity of AI and the corresponding data requirements. The lack of data availability has also risen by 7 percentage points during the same period.