My neighbor has the most beautiful garden ever.
Season after season, she grows the most exotic, gorgeous plants that I could never find in any local nursery. Slightly green with envy over her green thumb, I discovered a glimmer of hope.
There are apps that will identify any plant you take a photo of. Problem solved. Now the rest of the neighborhood is getting prettied up as several houses, including mine, have sprouted exotic new blooms easily ordered online.
Take a photo, get an answer. The most basic form of visual search.
Visual search addresses both convenience and curiosity. If we wanted to learn something more about what we’re looking at, we could simply upload a photo instead of trying to come up with words to describe it.
This isn’t new. Google Visual Search was demoed back in 2009. CamFind rolled out its visual search app in 2013, following similar technology that powered Google Glass.
What’s new is that a storm of visual-centric technologies are coming together to point to a future of search that makes the keyword less…key.
Artificial intelligence and machine learning are the critical new components in the visual game. Let’s focus on what this means and how it’s going to impact your marketing game.
How many kinds of reality do we actually need?
The first thing we think about with the future of visual is virtual reality or augmented reality.
There’s also a third one: mixed reality. So what’s the difference between them and how many kinds of reality can we handle?
Virtual reality (VR) is full immersion in another universe – when you have the VR headset on, you cannot see your actual reality. Virtual reality is a closed environment, meaning that you can only experience what’s been programmed into it. Oculus Rift is an example of virtual reality.
Augmented reality (AR) uses your real environment, but enhances it with the addition of a computer-generated element, like sound or graphics. Pokémon Go is a great example of this, where you still see the world around you but the Pokémon-related graphics – as well as sounds – are added to what you see.
Mixed reality (MR) is an offshoot of augmented reality, with the added element of augmented virtuality. Here, it merges your virtual world with your real world and allows you to interact with both through gestures and voice commands. HoloLens from Microsoft (my employer) is an example of mixed reality – this headset can be programmed to layer on and make interactive any kind of environment over your reality.
The difference is a big fat deal – because an open environment, like HoloLens, becomes a fantastic tool for marketers and consumers.
Let me show you what I mean.
Pretty cool, right? Just think of the commercial implications.
Virtual and augmented reality will reshape retail. This is because it solves a problem – for the consumer.
Online shopping has become a driving force, and we already know what its limitations are: not being able to try clothing on, feel the fabric on the couch or get a sense of the heft of a stool. All of these are obstacles to the online shopper.
According to the Harvard Business Review, augmented reality will eliminate pain points that are specific to every kind of retail shopping – not just trying on the right size, but think about envisioning how big a two-man tent actually is. With augmented reality, you can climb inside it!
If you have any doubt that augmented reality is coming, and coming fast, look no further than this recent conquering by Pokémon Go. We couldn’t get enough.
Some projections put investment in AR technology at close to $30 billion by 2020 – that’s in the next three years. HoloLens is already showing early signs for being a game-changer for advertisers.
For example, if I’m shopping for a kitchen stool I could not only look at the website, but I can see what it would look like in my home:
It’s all about being able to get a better feel for how things will look.
Fashion is one industry that has tried to find ways to solve for this and is increasingly embracing augmented reality.
Rebecca Minkoff debuted the use of augmented reality in her New York Fashion Week show this September. Women could use AR app Zeekit – live during the show – to see how the clothes would look on their own body.
Why did they do this? To fix a very real problem in retail.
According to Uri Minkoff, who is a partner in his sister’s clothing company, 20 to 40 percent of purchases in retail get returned – that’s the industry standard.
If a virtual try-on can eliminate the hassle of the wrong fit, the wrong size, the wrong everything, then they will have solved a business problem while also making their customers super happy.
This trend caught on and at London Fashion Week a few weeks later there were a host of other designers following suit.
Let’s get real about reality
Let’s bring our leap into the visual back down to earth just a bit – because very few of us will be augmenting our reality today.
What’s preventing AR and VR from taking over the world just yet is going to be slow market penetration. AR and VR are relatively expensive and require entirely new hardware.
On the other hand, something like voice search – another aspect of multi-sensory search – is becoming widely adopted because it relies on a piece of hardware most of us already carry with us at all times: our mobile phone.
The future of visual intelligence relies on tying it to a platform that is already commonly used.
Imagine this. You’re reading a magazine and you like something a model is wearing.
Your phone is never more than three feet from you, so you pick it up, snap a photo of the dress, and the artificial intelligence (AI) – via your digital personal assistant – uses image search to find out where to buy it, no keywords necessary at all.
Take a look at how it could work:
Talk about a multi-sensory search experience, right?
Voice search and conversation as a platform are combined with image search to transact right within the existing platform of your digital personal assistant – which is already used by 66% of 18- to 26-year-olds and 59% of 27- to 35-year-olds, according to Forrester Research.
As personal digital assistants rise, so will the prevalence of visual intelligence.
Digital personal assistants, with their embedded artificial intelligence, are the key to the future of visual intelligence in everybody’s hands.
What’s already happening with visual intelligence?
One of the most common uses exists right within the Amazon app. Here, the app gives you the option to find a product simply by taking a photo of something or of the bar code:
The app CamFind can identify the content of pictures you’ve taken and offer links to places you could shop for it. Their website touts the fact that users can get “fast, accurate results with no typing necessary.”
For example, I took a photo of my (very dusty) mouse and it not only recognized it, but also gave me links to places I could buy it or learn more about it.
Pinterest already has a handy visual search tool for “visually similar results,” which returns results from other pins that are a mix of commerce and community posts. This is a huge benefit for retailers to take advantage of.
For example, if you were looking for pumpkin soup recipe ideas and came across a kitchen towel you liked within the Pin, you could select the part of the image you wanted to find visually similar results for.
Google’s purchase of Moodstocks is also very interesting to watch. Moodstocks is a startup that has developed machine learning technology to boost image recognition for the cameras on smartphones.
For example, you see something you like. Maybe it’s a pair of shoes a stranger is wearing on the subway, and you take a picture of it. The image recognition software identifies the make and model of the shoe, tells you where you can buy it and how much it costs.
Microsoft has developed an app that describes what it sees in images. It understands thousands of objects as well as the relationship between them. That last bit is key – and is the “AI” part.
Captionbot.ai was created to showcase some of the intelligence capabilities of Microsoft Cognitive Services, such as Computer Vision, Emotion API, and Natural Language. It’s all built on machine learning, which means it will get smarter over time.
You know what else is going to make it smarter over time? It’s integrated into Skype now. This gives it a huge practice field – exactly what all machine learning technology craves.
As I said when we first started, where we are now with something like plant identification is leading us directly to the future with a way of getting your product into the hands of consumers who are dying to buy it.
What should I do?
Let’s make our marketing more visual.
We saw the signs with rich SERP results – we went from text only to images, videos and more. We’re seeing pictures everywhere in a land that used to be limited to plain text.
Images are the most important deciding factor when making a purchase, according to research by Pixel Road Designs. They also found that consumers are 80% more willing to engage with content that includes relevant images. Think about your own purchase behavior – we all do this.
This is also why all the virtual reality shenanigans are going to take root.
Up the visual appeal
Without the keyword, the image is now the star of the show. It’s almost as if the understudy suddenly got thrust into the spotlight. Are they ready? Will they succeed?
To get ready for keywordless searches, start by reviewing the images on your site. The goal here is to ensure they’re fully optimized and still recognizable without the surrounding text.
First and foremost, we want to look at the quality of the image and answer yes to as many of the following questions as possible:
- Does it clearly showcase the product?
- Is it high-resolution?
- Is the lighting natural with no distortive filters applied?
- Is it easily recognizable as being that product?
Next, we want to tell the search engines as much about the image as we can, so they can best understand it. For the same reasons that SEOs can benefit by using Schema mark-up, we want to ensure the images tell as much of a story as they can.
The wonderfully brilliant Ronell Smith touched upon this subject in his recent Moz post, and the Yoast blog offers some in-depth image SEO tips as well. To summarize a few of their key points:
- Make sure file names are descriptive
- Provide all the information: titles, captions, alt attribute, description
- Create an image XML sitemap
- Optimize file size for loading speed
Fairly simple to do, right? This primes us for the next step.
Take action now by taking advantage of existing technology:
On Pinterest, optimize your product images for clean matches from lifestyle photos. You can reverse-engineer searches to your products via the “visually similar results” tool by posting pins of lifestyle shots (always more compelling than a white background product shot) that feature your products, in various relevant categories.
In August, Pinterest added video to its visual search machine learning functionality. This tool is still working out the kinks, but keep your eye on it so you can create relevant content with a commerce view.
For example, a crafting video about jewelry might be tagged with places to buy the tools and materials in it.
Integrate Slyce’s astounding tool, which gives your customer’s camera a “buy” button. Using image recognition technology, the Slyce tool activates visual product recognition.
Does it work? There are certainly several compelling case studies from the likes of Urban Outfitters and Neiman Marcus on their site.
Snap your way to your customer, using Snapchat’s soon-to-come object recognition ad platform. This lets you deliver an ad to a Snapchatter by recognizing objects in the pictures they’ve just taken.
The Verge shared images from the patent Snapchat had applied for, such as:
For example, someone who snaps a pic of a woman in a cocktail dress could get an ad for cocktail dresses. Mind-blowing.
The Blippar app is practically a two-for-one in the world of visual intelligence, offering both AR as well as visual discovery options.
They’ve helped brands pave the way to AR by turning their static content into AR interactive content. A past example is Domino’s Pizza in the UK, which allowed users of the Blippar app to interact with their static posters to take actions such as download deals for their local store.
Now the company has expanded into visual discovery. When a user “Blipps” an item, the app will show a series of interrelated bubbles, each related to the original item. For example, “Blipping” a can of soda could result in information about the manufacturer, latest news, offers, and more.
Empowerment via inclusivity
Just in case you imagine all the developments are here to serve commerce, I wanted to share two examples of how visual intelligence can help with accessibility for the seeing impaired.
From the creators of CamFind, TapTapSee is an app specifically designed for the blind and visually impaired.
It recognizes objects photographed and identifies them out loud for the user. All the user needs to do to take a photo is to double tap on the devices’ screen.
The Seeing AI
Created by a Microsoft engineer, the Seeing AI project combines artificial intelligence and image recognition with a pair of smart glasses to help a visually-impaired person better understand who and what is going on around them.
Take a look at them in action:
While wearing the glasses, the user simply swipes the touch panel on the eyewear to take a photo. The AI will then interpret the scene and describe it back out loud, using natural language.
It can describe what people are doing, how old they are, what emotion they’re expressing, and it can even read out text (such as a restaurant menu or newspaper) to the user.
Innovations like this are what makes search even more inclusive.
Keep Calm and Visualize On
We are visual creatures. We eat first with our eyes, we love with our eyes, we become curious with our eyes.
Cameras as the new search box is brilliant. It removes obstacles to search and helps us get answers in a more intuitive way. Our technology is adapting to us, to our very human drive to see everything.
And that is why the future of search is visual.