As we wrap up 2022, many teams are reflecting on challenges and accomplishments, and thinking ahead to new projects, improving the way we work, and how to allocate resources in the coming year. The recent Label Studio Community survey reveals trends, challenges and shifting investments for data science teams that will resonate as you step into 2023.
In September, we asked the global Label Studio open source community to tell us about their data labeling operations and how they’ve integrated them into their machine learning workflows. Label Studio is the most popular open source data labeling platform with more than 150,000 users worldwide, 100,000,000+ annotations created and over 11,000 stars on GitHub. Community members from more than 40 countries participated in the survey, and 75% of the survey respondents currently have ML/AI models in production with another 15% planning to have models in production soon.
Key findings from the survey include:
73% of respondents noted their organizations will make a higher level of investment in their ML/AI initiatives in the coming year.
80% of respondents state that accurately labeled data is one of the biggest challenges to getting ML/AI models in production (the top response), while 46% cited lack of data as one of the biggest challenges (the second most popular response).
72% of respondents reported spending 50% or more of their time on data preparation, iteration and management, while more than one-third (34%) of respondents said they spend 75% or more of their time on the data.
While most respondents have the traditional roles of data scientists and data engineers, the responsibility for data labeling is broad, requiring engagement across organizations from interns to executives and business leaders. Notably, 20% reported that a mix of roles held the data prep responsibility, including subject matter experts, who accounted for 5% of responses, and business analysts, who accounted for 3%.