From academia to acquisition – the journey of computer vision startup GrokStyle
- Computer vision has become one of the core tenets of AI
- It’s now used in industry from product lines, RPA, e-commerce and facial recognition
- TechHQ speaks to a computer science professor and the creator of GrokStyle, a computer vision startup acquired by Facebook Marketplace
Today, we are building artificial intelligence systems that can understand the visual components of the world around us. This is computer vision, and while it started off as an MIT summer project more than fifty years ago, it’s now playing a central role in advanced robotics systems and automation technology.
Exploding within the last decade, computer vision has become an incredibly hot area of development and growth among AI researchers, becoming one of the technology’s main tenets alongside machine learning and natural language processing.
Use cases of computer vision are both incredibly exciting and expansive; the technology is being used to check for compromised goods on production lines; it can work in concert with Robotic Process Automation to filter information from large volumes of hand-written text; it serves as the ‘occipital lobe’ of autonomous vehicles; and in its perhaps most topical manifestation, it’s the technology used in advanced facial recognition technologies whose use by law enforcement has brought serious concern to many.
Few individuals are better versed on the subject of computer vision than Kavita Bala, a professor and chair of the Cornell Computer Science department, and co-creator of GrokStyle, a product-recognition system recently acquired by Facebook that can identify attributes across billions of photographs in dozens of categories, including fashion and home decor.
In use by Facebook Marketplace today, the technology packaged in GrokStyle emerged from academic research into fine-grained visual recognition. While computer vision can be trained to recognize that there’s a table, chair and a couch in the room, fine-grained recognition goes further into detail on what is the brand of an object. It can tell users “that’s the Eames chair or the IKEA Mammut table or a particular brand and type of lamp that is visible in the image,” explained Bala.
Bala and her team realized the applications of such advanced image recognition could answer the demand for granular and precise information of objects often seen in retail and online shopping. Ultimately, it could both appeal to the way that consumers shop online today, across various sites and social media, and play a transformative role in the development of e-commerce and marketing – that’s certainly how Facebook saw it.
The Facebook system, now known as GrokNet, builds off the GrokStyle technology and automatically suggests attributes such as colors and materials when sellers upload photos of products for sale, according to the social networking giant. Powering Facebook Marketplace today using a combination of deep learning and a vast database of images, it identifies items by predicting color, style and material attributes, and by matching uploaded photos to clean catalog images.
The result is a system that’s twice as accurate as Facebook’s prior systems at recognizing products, the company said, and improved coverage for its Home and Garden category from 33% to 90%. Trained on a diverse data set, it even works for items that may look different depending on what part of the world you’re in.
“Facebook, as you can imagine, has an immense amount of visual data because people are uploading images, whether it’s Instagram or regular Facebook posts, or Facebook Marketplace […] they wanted technology that would be able to do recognition in the context of this immense quantity of data that they have,” Bala explained.
Prior to GrokStyle’s acquisition by Facebook for Marketplace, Bala had long eyed the potential of computer vision in transforming the shopping experience online.
“There are people who pose questions like, ‘Gee, I see a picture of somebody carrying a handbag, I wonder which handbag it is?’ Or they’re on a home remodelling site, and would like to see what that countertop is,” said Bala; “We realized that there was a real need for a solution to this problem.”
Fine-grained recognition technology can provide the on-demand product information for consumers: “If you had fine-grained recognition on your phone or on your laptop, then you effectively get expert-level knowledge at your fingertips in understanding images.”
Based on their ongoing research into how computer vision’s applications in a retail setting, and the warm reception of their published paper in 2015, Bala’s team started GrokStyle a year later and landed innovative Swedish furniture maker IKEA – a brand known for pushing the envelope with technology and user experience – as one of their earliest partners.
Bala told TechHQ, one of the challenges IKEA shoppers face is difficulty in visualizing how a specific piece of furniture would look in their homes: “As a human being, you could imagine if you walk into a room, you can very rapidly recognize if you’re seeing a table or a chair,” said Bala, “But unless you’re an expert on a furniture expert, you will not be able to recognize the exact brand of furniture or the type of furniture it is, particularly for exotic categories.”
When browsing furniture online, customers also face a challenge in envisioning space and understanding how a piece of furniture might occupy a room in real life.
Recognizing the struggle shoppers face, IKEA launched an augmented reality (AR) app where people can visualize furniture in place, while wrapping in GrokStyle’s image recognition meant the app could recognize items in a room, and suggest complementary items from the furniture retailer’s catalogue.
“Augmented reality is a particularly compelling experience in the context of furniture, shopping, and interior design. And so rolling out with IKEA made a lot of sense for us, and indeed, they felt that we did for them too.”
The research and development, and subsequent sale of GrokStyle demonstrates the appetite in industry for technology that can enhance user experience, and ultimately drive engagement in new or augmented applications – optimizing the buying and selling experience of Facebook Marketplace is just one example of computer vision’s potential, and there’s a long way to go yet.
The successful journey from academic research to acquisition by one of the world’s biggest tech companies represents a major feat in itself, and it’s not all that common: “This is a very interesting journey, actually, academic work when you start off, you solve the problem, but you don’t look at all of the complexity of the mess of the real world,” Bala told us.
“To go from there all the way to a product that actually gets used at the Marketplace scale with the billions of images that are out there is a journey. It requires a lot of innovation, a lot of dealing with and wrangling with things that you don’t quite expect, such as privacy claims on different data or exotic items that you might not solve doing the bulk of an academic paper.
“Going all the way from an academic solution to something that you deploy at scale is a very exciting journey – and yeah, I think that’s pretty rare to see.”
24 March 2023
24 March 2023