Facebook has quietly acquired AI.Reverie, a New York-based startup creating synthetic data to train machine learning models, VentureBeat has learned. In an apparent nod to the HBO show Westworld, where visitors to a theme park encounter hordes of artificially intelligent robots, the purchase was made through a holding company called Dolores Acquisition Sub, Inc., after a character in the show.
A Facebook spokesperson confirmed the acquisition when contacted for comment.
AI.Reverie was launched in 2017 by a founding team that included Daeil Kim, Joey Tran, and Paul Walborsky. Kim was formerly a data scientist at The New York Times, where he spearheaded NYT Español’s audience acquisition strategy by developing AI solutions to optimize the brand’s acquisition funnel. Walborsky, previously president and CEO of tech media brand Gigaom, was SVP at The Times, responsible for a team leading the publisher’s international expansion.
AI.Reverie offered APIs and a platform that procedurally generated fully annotated synthetic videos and images for AI systems. Synthetic data, which is often used in tandem with real-world data to develop and test AI algorithms, has come into vogue as companies embrace digital transformation during the pandemic. In a recent survey of executives, 89% of respondents said synthetic data will be essential to staying competitive. And according to Gartner, by 2030, synthetic data will overshadow real data in AI models.
“Facebook acquiring AI.Reverie is a clear win for the synthetic data industry. We’re noticing more companies, large and small are rapidly adopting synthetic data as a key pillar to a robust AI strategy,” Arjan Wijnveen, CEO of synthetic data startup Cvedia, told VentureBeat via email. “This acquisition also highlights that even a company like Facebook, which is known for its vast data, still has gaps in their ability to collect the data needed to train AI.”
While synthetic data closely mirrors real-world data, mathematically or statistically, the jury is still out on its efficacy. A paper published by researchers at Carnegie Mellon outlines the challenges with simulation that impede real-world development, including reproducibility issues and the so-called “reality gap,” where simulated environments don’t adequately represent reality.
Other research suggests the synthetic data can be as good for training a model compared with data based on actual events or people, however. For example, Nvidia researchers have demonstrated a way to use data created in a virtual environment to train robots to pick up objects like cans of soup, a mustard bottle, and a box of Cheez-Its in the real world.
In one study published by AI.Reverie in 2019, the company claimed that fine-tuning a model trained on synthetic data with just 10% of real-world data achieved performance on par with a model trained entirely with real-world data. “We … empower great minds everywhere to test the value of synthetic data for themselves,” Kim said in a previous statement.
AI.Reverie — which competed with startups like Tonic, Delphix, Mostly AI, Hazy, Gretel.ai, and Cvedia, among others — has a long history of military and defense contracts.
In 2019, the company announced a strategic alliance with Booz Allen Hamilton with the introduction of Modzy at Nvidia’s GTC DC conference. Through Modzy — a platform for managing and deploying AI models — AI.Reverie launched a weapons detection model that ostensibly could spot ammunition, explosives, artillery, firearms, missiles, and blades from “multiple perspectives.”
In 2020, AI.Reverie was awarded a $1.5 million research grant by AFWERX, a tech incubation arm of the U.S. Air Force, to build AI algorithms for the 7th Bomb Wing at Dyess Air Force Base. In a statement, Kim said that AI.Reverie would create synthetic images to train computer vision algorithms for navigation, which would normally require hand-labeled images.
The company further described the first phase of its work in a press release: “The Defense Department looks to AI.Reverie to accelerate reconnaissance to the speed required in a contingency environment. The computer vision models that power intelligence-gathering must be trained on data from classified locations and hard-to-reach places … AI.Reverie’s synthetic data platform … [generates] millions of fully annotated, richly diverse images — quickly and at a low cost. AI.Reverie aims to generate images across the electromagnetic spectrum that will empower soldiers to more accurately identify objects and make life-saving decisions.”
The contract closely followed AI.Reverie’s work with CosmiQ Works to release RarePlanes, a dataset containing over tens of thousands of real and synthetic satellite scenes and annotations of different aircraft types. CosmiQ Works, which focuses on creating AI technologies for geospatial applications, was founded in 2015 within In-Q-Tel, an investment firm that connects tech companies with the U.S. intelligence community.
In 2021, AI.Reverie received a contract for the U.S. Air Force Advanced Battle Management System (ABMS), whose goal is to create a network for the military that would provide the technical infrastructure to connect various platforms and sensors. ABMS also aims to apply AI to data from the network to help parse information and aid in decision making.
“We are honored that the Air Force selected AI.Reverie to support its Advance Battle Management System,” Kim said at the time. “We believe that in partnership with AI.Reverie, the Air Force will have a significant opportunity to improve mission critical vision algorithms that ensure military advantage and keep our troops safe.”
Investing in synthetic data
Prior to the acquisition, AI.Reverie, which had attracted $10 million in funding from Compound, In-Q-Tel, Resolute Ventures, SGInnovate, TechNexus and Triphammer Ventures, claimed to have government agencies and Fortune 500 customers in retail, smart cities, industry, and agriculture, including airport simulation, weapons detection, cashier-less shopping, and delivery bots. But Facebook’s play appears to be for the company’s synthetic data generation technology rather than its customer base.
While Facebook hasn’t revealed in detail how — or whether — it uses synthetic data for computer vision, researchers at the company have leveraged synthetic data to train models like M2M-100, which can translate between 100 languages without English data. Synthetic data could be used to improve the performance of computer vision algorithms on Facebook’s platform that detect hate speech, or to develop intelligent assistants in virtual reality (VR) and augmented reality (AR) environments like the social network’s Horizon Worlds.
As the pandemic accelerates the trend toward stricter data privacy regulation and governance, synthetic data offers Facebook another advantage: compliance. The company has historically trained computer vision algorithms on videos and images from its products (e.g., Instagram) and other sources, but synthetic data technologies like AI.Reverie’s could lesson Facebook’s reliance on real-world user and third-party data.
In 2020, a Lithuanian company called Planner 5D sued Facebook for reportedly stealing thousands of files from Planner 5D’s software, which were made available through a partnership with Princeton to contestants of Facebook’s 2019 Scene Understanding and Modeling challenge for computer vision researchers. Planner 5D claimed that Princeton, Facebook, and Oculus, Facebook’s VR-focused hardware and software division, could have benefited from the training data that was taken from it.
More recently, a federal judge approved a $650 million class action privacy settlement over Facebook’s use of facial recognition tagging. The lawsuit alleged that the company’s Tag Suggestions tool, which scanned faces in photos and offered suggestions about who people might be, stored biometric data without users’ consent, in violation of Illinois law.
Fifty-one percent of consumers surveyed aren’t comfortable sharing their personal information, according to a Privitar survey. And in a Veritas report, 53% of respondents say they would spend more money with trusted organizations, with 22% saying they would spend up to 25% more with a business that takes data protection seriously.