February 9, 2023

Managing A Data Annotation Team: 5 Key Takeaways From Yext

We recently held a webinar with Dr. Vera Dvorak, Machine Learning Operations Manager at Yext. In her role at Yext, Dr. Dvorak oversees the data annotation teams, ensuring that the data science team has all the labeled data they need to complete their projects. The webinar is full of insight and helpful information (you can check it out here) but we’ve pulled out a few key takeaways for you.

What a Machine Learning Operations Manager Does

“My role is to be a link between the Data Science team on one hand and our annotation teams on the other hand. When it comes to the data science team I really am responsible to respond to all their labeled data needs, like the data that they need for both training and retraining language models. And this can be both for long-term projects…and also short-term projects when you just need a limited set of labeled data to test some idea and then maybe discard it or go on further with it and add more labeled data so that's one part of my work: to respond to all of that to be able to brainstorm and will supply what is needed.

“At the same time, I also need to manage annotators in the sense of making sure that they have enough data to labe, that they understand the guidelines, that they know how to use the labeling too, and also very importantly that their questions are answered very swiftly so they don't have any blockers which would prevent them from labeling correctly, efficiently, and at a steady pace.”

How Yext Uses Label Studio Comments

“Label Studio provides very clear annotator feedback loops which are very transparent. They allow people both to leave their own comments but also to mark things for escalation and for discussion. We use comments on three different levels:

On a personal level - “People can mark things they want to do later or they have some insight they want to attach to it. I myself worked a lot as an annotator and I know sometimes you want to mark things you want to go back to later on after you do some research.”
For discussion - “People mark things for discussion using certain keywords that you can search on using filters. We meet with a team typically once a week (or even more often if there are many things for discussion) and multiple people contribute and find a solution.”
For management - “As a manager I need to have a way to mark things for escalation - things I need to discuss, maybe with a product team or with someone from data science who created the project and knows what they want. So I use this field to mark things that I need to go back to.”

The Benefits of Using Customizable Interfaces

“[I like] the flexibility of the labeling interfaces themselves. You don't need to be a programmer to add labels to or remove labels to play with how things are. I'm very picky about how annotators see things. I want to make things very condensed so even if something is imported and I'm not happy with how it looks, I can play with it. Also, based on the feedback I get from the annotators I can make fonts larger or smaller, for example. This is very easy to do and I really appreciate that.”

Measuring Annotator Performance

“When people start they usually need some time to get used to a task, to maybe even research and go back to the guidelines. But then over time they should improve and then we compare them [with the other annotators] But it doesn't mean that the fastest annotator is always the best if they are not as precise. So you also need to take that into account. I always tell them it needs to be this balance between quantity and quality. You don't want somebody who is super slow and doesn't do much, but you also don't want someone who is super fast and then overlooks things. As you know, ‘garbage in, garbage out.’ I always say it's not lots of labeled data that helps you, it's lots of high quality labeled data that really gets you somewhere.”

Empowering Your Annotation Team

“You always get comments asking for clarification but I think the strategy that I apply could be sort of compared to a funnel. So at the beginning, there are many questions about different things but one by one people start labeling and they mark things they want to discuss that they don't understand - and it could be 50% of your data at the beginning or even more - and you go over them, but then you see it's repetitive. There is a pattern. You narrow that down more and more, and in a good labeling project when the task is clear, you see this funnel effect where it's less and less questions that you have problems with, and in the end there are basically no questions…If that's the evolution you have, I would say that means your labeling is going well.”

This is just a small sample of what’s available in the whole webinar. You can watch the recording here.