Pinterest this morning peeled back the curtains on the AI and machine learning technologies it’s using to combat harmful content on its platform. Leveraging algorithms to automatically detect adult content, hateful activities, medical misinformation, drugs, graphic violence, and more before it’s reported, the company says that policy-violating reports per impression have declined by 52% since fall 2019, when the technologies were first introduced. And reports for self-harm content have decreased by 80% since April 2019.
One of the challenges in building multi-category machine learning models for content safety is the scarcity of labeled data, forcing engineers to use simpler models that can’t be extended to multi-model inputs. Pinterest solves this problem with a system trained on millions of human-reviewed Pins, consisting of both user reports and proactive model-based sampling from its Trust and Safety operations team, which assigns categories and takes action on violating content. The company also employs a Pin model trained using a mathematical, model-friendly representation of Pins based on their keywords and images, aggregated with another model to generate scores that indicate which Pinterest boards might be in violation.
“We’ve made improvements to the information derived by optical character recognition on images and have deployed an online, near-real-time, version of our system. Also new is the scoring of boards and not just Pins,” Vishwakarma Singh, head of Pinterest’s trust and safety machine learning team, told VentureBeat via email. “An impactful multi-category [model] using multi-modal inputs — embeddings and text — for content safety is a valuable insight for decision makers … We use a combination of offline and online models to get both performance and speed, providing a system design that’s a nice learning for others and generally applicable.”
In production, Pinterest employs a family of models to proactively detect policy-violating Pins. When enforcing policies across Pins, the platform groups together Pins with similar images and identifies them by a unique hash called “image-signature.” Models generate scores for each image-signature, and based on these scores, the same content moderation decision is applied to all Pins with the same image-signature.
For example, one of Pinterest’s models identifies Pins that it believes violates the platform’s policy on health misinformation. Trained using labels from Pinterest, the model internally finds keywords or text associated with misinformation and blocks pins with that language while at the same time identifying visual representations associated with medical misinformation. It accounts for factors like image and URL and blocks any images online across Pinterest search, the home feed, and related pins, according to Singh.
Since users usually save thematically related Pins together as a collection on boards around topics like recipes, Pinterest deployed a machine learning model to produce scores for boards and enforce board-level moderation. A Pin model trained using only embeddings — i.e., representations — generates content safety scores for each Pinterest board. An embedding for the boards is constructed by aggregating the embeddings of the most recent Pins saved to them. When fed into the Pin model, these embeddings produce a content safety score for each board, allowing Pinterest to identify policy-violating boards without training a model for boards.
“These technologies, along with an algorithm that rewards positive content, and policy and product updates such as blocking anti-vaccination content, prohibiting culturally insensitive ads, prohibiting political ads, and launching compassionate search for mental wellness, are the foundation for making Pinterest an inspiring place online,” Singh said. “Our work has demonstrated the impact graph convolutional methods can have in a production recommender systems, as well as other graph representation learning problems at large scale, including knowledge graph reasoning and graph clustering.”
Article: Pinterest details the AI that powers its content moderation