Get the Essential Guide to LLM Fine-Tuning
Contact Sales

Customer Story

How Outreach Achieved Best-in-Class Training Data Quality and ML Performance

In Conversation With

Andrew Herington

Data Annotation Program Manager

Pavel Dmitriev

Vice President of Data Science

Download PDF

25 % Reduction in development time for new labeling tasks
15-20 % Overall increase in the quality of labeled data
6 x Increase in the number of concurrent projects the labeling team can run in a quarter

About Outreach

Sales leaders and their teams often need more insight into their full sales cycle, which is particularly difficult when it's stretched over multiple tools. With Outreach, sales teams can quickly automate funnel building, strategically boost pipeline conversion rates, and use forecasts informed by real activity to predictably close deals. Using machine learning, Outreach uncovers insights that assist the entire team in achieving greater sales results by tying together and analyzing data from every engagement, applying their unique expertise and data around sales execution to make smart recommendations.

Because market trends change often, Outreach is constantly ingesting new data to improve its models and develop new product capabilities. This necessitates a robust, high-quality pipeline of labeled datasets. However, as they scaled their labeling operations, Outreach ran into limitations with their labeling process and tooling. After initiating the search for a new labeling platform, and trying out an open-source version of Label Studio, they landed on Label Studio Enterprise as the ideal solution.

Now, Outreach has reduced development time for labeling tasks by at least 25% while simultaneously achieving a 15%-20% increase in data quality over their previous labeling solution. This has allowed them to increase the throughput of their data labeling operations and embark on new data-hungry projects that take advantage of the latest trends in machine learning technology.

Outreach helps sales teams perform better with unique insights

Outreach is in the business of helping sales teams make smarter decisions that lead to more revenue, and using the analytical and generative properties of machine learning is a critical piece of their program. With proprietary access to millions of pieces of sales-related data, Outreach uses its unique knowledge and experience to train models that then generate unique insights for their customers that they can’t get anywhere else.

According to Pavel Dmitriev, Vice President of Data Science, "The reason why we started doing annotation is that it became evident early on that to make Outreach succeed, we have to understand the communication that's happening between sellers and prospects at a very deep, very granular level, not just at the metadata level." This understanding is what powers many of Outreach’s key features, including its sales engagement and conversation intelligence tools.

However, Outreach was running into difficulties with its data labeling platform. Pavel notes, “The direction of natural language understanding science has in recent years moved into large-scale pre-trained models. These models are already trained on a lot of data and just need to be fine-tuned for a specific domain on a relatively small amount of data, which has to be very high quality. This is a shift from a decade ago when the volume of data was more important than the quality.”

Their existing platform wasn’t well-suited for ensuring that Outreach could produce high-quality training data sets efficiently. The challenges the team faced included:

  • The annotation team had to be overly reliant on data scientists to do the data scrapes and configure the tooling they would need to use.
  • The annotation tool itself was very limited in capabilities and rigid. Often labeling decisions had to be based on what the tool could do rather than what the needs of the data were. In addition, any minor change, such as restarting a single job, had to be handled directly by the data scientist.
  • Due to the data scientists having to be effectively “tool operators” an almost daily communication stream was required between annotation teams and the data scientist as challenges were encountered. This pulled data scientists away from their actual jobs as well as slowed down the efficiency, quality, and productivity of the annotation teams. This was not sustainable for either the data scientists or data annotators nor was it meeting the needs of the business.
  • It was difficult to monitor the quality of the labeled data in real-time, (only after an entire job was done) or on an annotator-by-annotator basis.

So they began looking for alternatives.

The Solution

Label Studio Enters the Chat

Outreach was running into limitations with their old data annotation tool that were preventing them from getting the model performance they needed to achieve their business goals. As they went through their evaluation process with a new set of requirements, they appreciated the ability to try Label Studio’s open source version to get a feel for the product before making a purchase. In the end, they found that Label Studio Enterprise provided the flexibility, insight, and structure they needed to make their annotation program successful.

Easily Customizable UI Unlocks Independence

One of Outreach’s goals in choosing a new labeling platform was to reduce the annotation team’s reliance on the data science team. Label Studio allows Outreach’s data annotation team to be even more responsive and frees up needed data science resources for the important work of doing the actual modeling, instead of developing and building initial data sets. According to Andrew Herington, Outreach’s Data Science Program Manager, “one of the things that impressed us the most about Label Studio was the ability to craft UIs on the fly. Before Label Studio, we had too much dependency on our data scientists. We were wasting their time asking for help just to make simple changes to our labeling tooling. Label Studio allows the annotation team to be more self-serve. Now, we get a request, do the configuration ourselves, and get a data set without needing to involve the data science team. It’s much more efficient.”

One of the things that impressed us the most about Label Studio was the ability to craft UIs on the fly. Before Label Studio, we had too much dependency on our data scientists. We were wasting their time asking for help just to make simple changes to our labeling tooling. Label Studio allows the annotation team to be more self-serve.

Andrew Herington

Data Annotation Program Manager

In fact, Andrew estimates that Label Studio has reduced development time for new tasks by at least 25%.

25 % Reduction in development time for new labeling tasks

Enabling Annotator Training

Outreach’s philosophy is that the most important aspect of getting high-quality labeled data is having properly trained, experienced annotators. Andrew notes that “human annotation as a discipline is difficult and especially NLP annotation.” Outreach places a premium on having a well-trained annotation team. “Training the annotators is what matters. The better they’re trained the higher the quality of the data and the faster we can produce the work.” states Andrew.

They found an excellent fit in Label Studio. Using features like the ability to create ground truth data sets based on annotator agreement matrices, strong feedback communication loops, and the ability to queue and assign labeling tasks to specific annotators and reviewers, Outreach was able to implement a world-class annotator training program that enables them to tackle multiple new projects simultaneously with far less management overhead. “Label Studio is not just an annotation tool, it’s an annotator training tool!”

A Real-Time View Into Metrics

Outreach organizes its annotation team into 3-person teams or “pods” consisting of two junior annotators and a more-experienced annotator who acts as a reviewer.  The two junior annotators annotate the same item and items with 100% agreement usually are passed through to the final data set while the items with disagreements are reviewed by the more experienced annotator. This allows each pod to iterate very quickly through labeling projects, producing high-quality data. This is also a versatile structure that enables the annotation team to tackle several projects at the same time. In cases where the project is bigger than what one pod can handle, Outreach assigns more pods to the project to increase capacity. It also allows certain pods to specialize in certain types of tasks, such as different languages, or particular annotation modalities such as classification or NER.

Label Studio has been useful in giving us real-time metrics to allow us to react quickly to dips in quality or productivity.

Pavel Dmitriev

Vice President of Data Science

However, for this structure to work well, Outreach needed highly granular analytics on annotator performance, agreement, and labeling quality. “One of the big reasons Label Studio is helpful is the people management piece - an easy way to control who does what and how well they do it. For example, how does the quality compare between different pods? Streamlining people management was a key reason we were looking for a new tool” says Pavel.

Andrew adds “Label Studio has been useful in giving us real-time metrics to allow us to react quickly to dips in quality or productivity.” This has resulted in a 15-20% overall increase in the quality of labeled data due to Outreach’s ability to quickly course correct.

15-20 % Overall increase in the quality of labeled data

Conclusion

What’s Next For Outreach?

Outreach continues to increase the scope of their machine learning program as they expand into additional languages and new types of data. Recent innovations in generative AI have introduced a new project for Outreach: training a large language model to help salespeople compose relevant and effective sales emails. “We can generate a lot of replies but how good are they, and how do we even define what ‘good’ means?” says Pavel. “This is the job of the annotation team, which is pretty central in building this feature. Anyone can call an API of ChatGPT, pass it an email prompt, and get something back. But how do we make it more relevant and personalized, and ensure it is consistently high quality?”

With the flexibility and workflows available with Label Studio Enterprise, Outreach is well-positioned to take advantage of emerging machine learning technologies that will allow them to continue to differentiate their offerings with internally-labeled data and build a thriving and successful business.