While it’s helpful to think of AI in terms of doing things differently within your existing business processes, it’s also true that AI can help you explore and benefit from entirely new approaches and redefined processes. Gaining efficiency is of course a big, obvious win, but today’s new breed of intelligent algorithms enable us to engage customers and prospects in new ways, create entirely new experiences, and even create new business opportunities we’ve never considered before — if you can successfully get your company behind your AI-powered vision of the future.
When we were getting started with AI in the realm of marketing automation, we found it helpful to think in terms of a framework for building an “intelligent machine”. The basic elements of this framework include:
An understanding of your customer journey, experience, and lifecycle.
Creatives or content designed to impact the customer journey at different stages in their lifecycle — the more variety, the better. Be sure to map multiple creatives, offers, etc. to different stages of the lifecycle for optimal results.
Data for understanding the customer journey and breaking behavioral cohorts into targeting segments
API-based access to your preferred marketing channels — Facebook, Instagram, Google, Snapchat, your email service provider, etc. to allow for campaign creation and orchestration
A working feedback loop for reporting and ongoing optimization, likely running through a third-party attribution provider or coming directly from ‘the platforms themselves
An approach to optimization, which can be turned into an algorithm or algorithms used to drive performance based on the feedback your system is getting.
This framework could be universally applied to any company considering building their own AI-powered intelligent machine to fully automate their customer acquisition efforts. It can also be used to help evaluate vendors to ensure they offer a comprehensive approach to autonomous marketing. Now, let’s get a bit smarter about the types of artificial intelligence you can explore and apply to your specific business case or needs.
It also helps to remember that — for the most part — all of these functions are simply core marketing activities that are enabled in today’s digitized world. This is about doing these things better, faster, smarter and sometimes in entirely new ways.
Amazon.com founder and CEO Jeff Bezos often gets asked to predict what the future will be like in ten years. While he’s happy to indulge audiences with his thoughts and readily admits he really can’t predict the future, Bezos often thinks about the things that won’t change in the next ten years, and how that can impact business decisions. That way, “you can work on those things with the confidence to know that all the energy you put into them today is still going to be paying you dividends 10 years from now,” he said.
That’s good news when it comes to applying AI and machine learning to marketing. These business activities, functions and business processes aren’t going away anytime soon, but the days are numbered for the old ways of dealing with this familiar workload. Think about it this way: the work you do today to make your intelligent machine a reality will still be paying off dividends in ten years.
While the study of machine learning and artificial intelligence is quite broad, there are a handful of approaches to developing effective “learning” algorithms and approaches. Consider this a primer to get you started with a good understanding of current approaches and how they apply to your AI-based marketing efforts:
Supervised Learning algorithms use properly labeled data sets to train an algorithm to make predictions. They’re great for classification, or labeling, new data through a mapping function from input variables to discrete output variables. A classic use of supervised learning algorithms we can all relate to is that of classifying emails as “spam” or “not spam” based on input variables leading to a discrete output variable (“spam” or “not spam”). A second major use of supervised learning is for regression analysis, in which the training data is used to map input variables to a continuous output variable — a real number or value, often quantities like amounts or sizes, within certain error boundaries to indicate the accuracy of the prediction.
Unsupervised Learning algorithms take unlabeled sets of data with no known outcomes or results and are useful for discovering the underlying structure of the data. It’s typically used for clustering data within sets, detecting anomalies (like fraudulent transactions), mining for associations (understanding what types of goods are on a typical retail receipt to better merchandise products on retail shelves), or reducing the number of features in a given data set or breaking out a data set into multiple, smaller sets for further analysis.
Semi-Supervised Learning algorithms use a combination of labeled and unlabeled data for training — Generally speaking, they rely on smaller labeled datasets that include outcome information used in conjunction with much larger unlabeled datasets. They’re used when you don’t have enough labeled data to produce an accurate model. With this approach, you can increase the size of your training data by applying this technique.
Reinforcement Learning is one of the newest approaches to machine learning. As the name indicates, a reinforcement algorithm learns by trial and error to achieve its objective. The machine tries out lots of different things and gets rewarded or penalized based on the outcomes of its behaviors and how well they help or hinder it from reaching its objective. Google’s AlphaGo used reinforcement learning algorithms to beat the best players in the world in the complex strategy game, “Go.”
Deep Learning architectures represent another, perhaps even more revolutionary, approach to AI-based application development is a field of data science that has also accelerated in recent years. They’re based on “artificial neural networks” or ANNs and support a variety of learning approaches, including supervised, unsupervised and semi-supervised. “Deep” refers to the number of layers through which data is transformed; in certain models, the number of transformations is fixed, and in others, known as recurrent neural networks, the number of transformations is potentially infinite. These architectures are among the most exciting in data science today and have been applied successfully to drug design, bioinformatics, social network filtering, speech recognition, computer vision, natural language processing, machine translation, audio recognition, material inspection, medical image analysis, and board game programs. In many cases, they’ve produced results comparable to or even superior to human subject-matter experts.
As you can see in Figure 4-1 below, many of the different approaches to machine learning described above can be applied to different applications related to marketing automation. Each application represents an order of magnitude leap from conventional, manual approaches of days gone by thanks to the availability and sheer volume of data that we can now feed into these algorithms.
The “x” factor is deep learning, which cuts across the four major approaches to machine learning. This approach can be incorporated where the benefits of this more computationally intensive method outweigh the more lightweight methods of classical machine learning approaches.
As you can see in our handy chart, supervised learning algorithms can be useful in a marketing context for predicting things like ad prices in an auction, along with associated predicted yield. There are several approaches to these types of supervised learning algorithms and plenty more available online if you want to dive deeper into the math. Here’s a primer on the four major types:
Regression analysis shows the relationship between inputs and outputs in a given system. Linear regression is one of the most common types of regression analysis; in its simplest form, it uses a linear relationship to predict the value of Y for a given value of X using a straight (regression) line. It can allow us to see what factors in our marketing efforts relate to others. Exploring these relationships can help us with testing the actual cause or “causality” of certain outcomes -- like what factors indicate or predict a higher probability of a click or a conversion.
One important note: linear regression requires some careful tuning to get reliable outcomes, so be careful when applying it to your modeling. You’ll need to find a fairly strong correlation between the X and the Y to have confidence going forward with this approach. Peradventure the data fails to form a natural line; there is no need in trying to use a line to fit the data and make predictions.
Logistic regression is used primarily for classification, labeling, or sorting data sets. Although similar to the regular linear regression, logistic regression is aimed at gaining insight into understanding the relationship between the dependent variable with one or more independent variables provided the dependent variable is binary, or “dichotomous”.
Apart from the similarities stated above, logistic regression differs from a linear regression through the non-usage of regular least-squares to plot the line of best fit which is used in predicting the value of the dependent variable based on the information derived from the independent variable. In logistic regression, the value of y is usually set at 0 or 1 rather than being distributed along the line of best fit as seen in linear regression.
Logistic regression is an important market research tool which can be used to predict the response of a customer to a product based on certain factors. For instance, it can be used to predict if a customer will purchase a product if, say, we know their health status.
Please note that logistic regression may require a certain number of participants for each set of dependent variables before the results can be processed. Most people who have used logistic regression feel that interpreting logistic regressions can be time consuming and somewhat confusing. It is therefore highly recommended that you make use of important statistic analysis tools like Intellectus, which allows you to conduct analysis and interprets the results in plain English.
Often regarded as a non-parametric technique, kNN can be described as a simple algorithm that can store all available cases and also predict the numerical target based on a similar measure (e.g., distance). This predictive, non-parametric technique is capable of predicting the future outcome of a test based on the reports on how people might have reacted to the same parameters in the past. For instance, you can easily determine which product a customer is likely to go for by considering what their nearest neighbors bought.
This method is often used by growth teams to determine the most effective customer acquisition strategy and how the customers are likely to respond when approached with a new product by simply considering what their nearest neighbors are likely to purchase. It creates a chance to compare the reaction of the old and new customers under the study.
Most of the growth marketing processes or challenges tend to require making the right predictions of a future state or system, and this is where Support vector machine (SVM) comes into play. It can be described as an algorithm that can come in handy in finding a hyperplane in an N-directional space (where N—is the number of features) and can classify the data points distinctively. SVM is very important in predicting the outcomes of emerging environments like data mining, intelligent software agents, mass-produced models, and automated modeling.
There’s only one major variety of unsupervised learning algorithm that we’re going to address here: k-Means.
Simply put, this type of unsupervised learning algorithm is useful whenever there is a need to divide n observations into k clusters. More often than not, these observations may belong to a cluster with the nearest mean and can serve as a prototype of the cluster leading to the partitioning of the data space into Voronoi cells. To cluster is to divide data or population points into a certain number of groups that can allow such data points in the same group to have similar appearances and also differentiate them from those outside the group. The major goal of using this algorithm is to locate groups in the data with the number of groups represented by the k variables. The algorithm can assign each data point to one of the k groups by capitalizing on the features provided.
K-means algorithm has been successfully used in the following aspects of marketing: classification of documents, optimization of delivery routes for stores, identifying crime localities, customer segmentation, statistical analysis of fantasy leagues, detection of insurance fraud, rideshare data analysis, cyber-proofing criminals, analysis of call records, and automatic clustering of IT alerts.
Just as the name implies, a decision tree can be described as a support tool that uses a tree-like model in reaching decisions and also determining their possible consequences such as outcomes, utility, and cost of resources. It serves as one of the most effective ways of displaying algorithms containing conditional control statements. A decision tree shares the same features with a flow-chart where each internal (non-leaf) node is used to represent a test on a parameter, branches representing the test outcome and leaves (or terminal) nodes serving as a class label. These trees can serve as an integral part of operation research in decision analysis and plays an important role in identifying a strategy that is most likely to reach a goal.
There is no denying the fact that businesses have to deal with lots of data obtained from the market, competition, and customer analysis. Dealing with this data effectively to reach the right conclusion may take longer than expected, thus causing a delay. Since identifying and proffering solutions to business problems can be time-consuming for most executives or managers, the use of decision trees can provide them with a simple abbreviated method of predicted outcomes for each of the split trees. A decision tree is a visual representation of the decision-making process and can be used to simplify problems as different as credit card attrition and currency exchange rates.
Naïve Bayes can be described in a good number of ways depending on the area of use. In machine learning, it can be described as a family of simple probabilistic classifiers which are based on the application of Bayes’ theorem containing naïve (strong) independent assumptions in the features. This is one of the most popular methods of text characterization and has been in existence since the 1960s. Text characterization is the problem of judging documents as either belonging to one category or not, for instance, legitimate or spam, politics or sports, old or new, single or married, etc. Naïve Bayes utilizes word frequencies as features.
In a learning problem. Naïve Bayes classifiers may require a specific number of parameters linear in the number of variables (predictors or features). Naïve Bayes classifiers can be used in classifying customers based on certain parameters such as age, gender, nationality, and occupation, thus giving the company enough time to have detailed information about their consumers.
Using several decision trees to determine the final result and predict possible outcomes can be better than using a single decision tree. A single tree cannot make a forest, and that’s why Random Forest (random decision forest algorithm) is the preferred method for regression, classification and other tasks that need the input of constructing a lot of decision trees during training time and release the class mode or mean prediction of these trees.
Random forest is often used by many companies to make predictions with a process concerning machine learning. It uses multiple decision trees aimed at making a more holistic analysis of a given data set. During analysis, the random forest can build on the decision tree model and make the entire process more complicated. Most experts are of the view that these random forests are representing “stochastic guessing” or “stochastic discrimination.”
Random forest algorithm can be used to test the quality of a customer growth strategy or product quality; for example, it has been used by companies to determine the quality of wine through parameters like alcohol content, Sulphur dioxide level, pH, acidity and sugar content. It can also be utilized in taking the various product properties and variables to indicate the interests of customers and how best to meet their needs.
In most of the cases of use, random forest algorithm takes note of the random subset features for each tree and finds the average when can be utilized as the final result.
Please note: before using the random forest algorithm, it is advisable that you first isolate the predictive data which utilized during production and apply them to the random forest model while using a certain set of training data.
The most important rule to remember is that data is what powers algorithms -- it’s the fuel that fires up the AI machine. So what happens when the data used to train machines is flawed? Many data scientists and others building next generation AI solutions actually spend a good amount of time scrubbing the data, cleaning it up, and put it into a format computers can actually use. If you put garbage in, then you’ll get garbage out.
The best way to get clean data is to setup the right API1 connections from all of your key data sources and pipe them into your AI intelligent machine. At IMVU, we have the following data sources that were connected via API to our AI intelligent machine called Athena Prime:
Appsflyer which is our mobile attribution and marketing analytics platform. This is our source of truth for measuring success in all our mobile user acquisition campaigns from all the different partners like Google, Facebook, Snapchat, Apple Search, Instagram, Liftoff, InMobi and many more. Appsflyer, like most other attribution solutions in the space, is integrated with all the major mobile ad networks and partners so it’s seamless to track all these campaigns in one place. We also pass back all our key CRM downstream event data like new payers, revenue and engagement. This enables Appsflyer to pass all this valuable data back to all our advertising partners to enable them to build look-a-like user segments for our systems to target on those different partner networks.
Leanplum is our marketing automation and CRM platform. We pass our cross-platform data into Leanplum from our backend data warehouse to help us create customized on-boarding, retargeting and re-engagement campaigns to help us better engage, retain and monetize our users. Our goal is to ensure all new users follow similar user journeys to replicate our best Lifetime value customers by influencing them to take the same user actions and behaviors that lead to our best customers.
All the paid user acquisition channels: Facebook, Google, Snapchat, Apple Search, Liftoff, InMobi, and many more are all connected to pass their data back to enable us to control the key optimization levers like budgets, bids and goals.
Creative assets are developed by the marketing team in-house at IMVU, based on an ongoing analysis of what’s working and what’s creating the most value within any given marketing segment. Creatives are added to each marketing channel for automated distribution.
These elements are all feeders into Athena Prime; we can turn over customer segments for targeting, channels for promotion, and creative elements to Athena to drive our desired marketing outcomes with maximum efficiency.
Athena Prime starts with our base business needs for any given campaign, such as marketing objective, creatives to be used within a campaign, and any additional constraints (budget, campaign dates, etc.). These parameters are fed into Athena using the user interface or through programmatic API calls, providing the instructions or marching orders for the system.
As a Software-As-A Service or SaaS platform, Athena Prime abstracts key components of digital media campaigns:
Business Needs (campaign configuration) from Audience Selection, Message Placement
Performance Optimization (i.e., campaign orchestration, contained in the large light blue box below)
Reporting on Business Outcomes (meeting business goals, performance insights on aggregate audiences included in a campaign, and any insights into how content or ads performed).
In a generic framework sense, you can view an intelligent marketing machine below in figure 4-2, with the classes of machine learning algorithms highlighted in red:
In terms of Audience Selection, Athena Prime uses natural language processing, neural processing and deep learning models to analyze ad copy and landing pages to extract interest-based targeting parameters to the extent they are made available by participating channels to improve ad relevance and performance. This feature, dubbed “Athena Sense”, does not allow for targeting on an individual level, but instead adds a contextual or interest-based component to campaign targeting without any human intervention. Athena Sense may add hundreds of additional interest-based targeting parameters to any given targeting set — something that would require an inordinate amount of work for employees or agencies to into that level of targeting detail.
These interest-based targeting parameters are provided by platform/channel partners via programmatic APIs. These interest-based, behavioural or demographic targeting parameters are identical to those available directly within the corresponding platform’s user interface.
For example, Athena Sense may identify that a particular ad in a campaign makes reference to “Game of Thrones”. Here’s how this targeting parameter is made available in the Facebook user interface:
Instead of an agency or team member working through the UI of five or more marketing channels, Athena enhances targeting intelligently and automatically. This saves time, improves performance, and generates higher return on our marketing spend.
It’s also interesting to note that features like Athena Sense -- which extracts meaning and context from creatives and landing pages -- add valuable targeting insights that are used by the marketing platform AI on the other side of the handshake. These clues are so valuable that platforms like Facebook and Google rewards advertisers that add detailed targeting enhancements to their campaigns, further boosting performance and ROI.
First party or customer relationship management (CRM) data is that which is collected by the Advertiser through its own data-gathering systems; in our case, we used Leanplum to organize all this data into targeting segments. These audiences are uploaded periodically to various marketing platforms/channels, where they are then made available to systems like Athena Prime for inclusion or exclusion in campaign targeting, but only in aggregate and in ways that prevent marketing automation providers (like Nectar9) to “reverse engineer” audiences to reveal any personally identifiable information (PII).
Custom Audiences can be created on various platforms based on a variety of methods. They are typically generated as “lookalike audiences” based on First Party or CRM data uploaded to platforms (although they can also include demographic, behavioral or interest-based parameters as well). Custom Audiences are created directly on platform/channel providers through their user interfaces or in some cases via API calls. Regardless, Custom Audiences are aggregated in such a way to prevent marketing automation providers (like Nectar9) to “reverse engineer” audiences to reveal any personally identifiable information (PII).
This functionality includes features that take ad creatives or “content” and configure them for use on various channels, and as such, don’t take into account audience targeting on any granular level beyond placement availability or relevance.
The final component of Athena Prime is an optimization engine that makes campaign adjustments based on performance. Athena Prime observes results being reported programmatically across channels to shift budgets to ensure maximum efficiency. This may involve shifting budgets away from certain audience segments or increasing budgets against others. It may also shift budget between different channels based on current performance against goals. But in no case does this optimization engine deal with performance on an individual user basis.
IMVU analyzed user journeys with our data team to explore what organic behaviors resulted in the most valuable users or purchasers. One key insight became very clear: if we can get someone to make an in-app purchase within the first seven days, this is a significant indicator of higher lifetime value. In addition, users interacting with different features and exhibiting certain behaviors within IMVU proved to be good indicators of what leads to a purchase. Our goal was to find ways to increase customer lifetime value (CLV, which we will address further in Chapter 6) and create incremental lift over organic purchases.
To put the insights from this study to work, we segmented our customers into three primary groupings:
People who installed the app, but didn’t register
People who were on a “First Seven Days” journey
Taking the insights from our user journey study, the marketing team then created a host of creatives for each segment grouped by where they appeared to be in their user journey. Here are the “winners” for us at each stage in the sequence in figure 4-3 below:
We leveraged an intelligent AI autonomous marketing platform from Nectar9 called Athena Prime to orchestrate and automate the delivery of sequenced ads on multiple channels in a synchronized way to get optimal results.
Typically, executing this type of sophisticated campaign with a complex array of audiences, channels, creatives and dynamic sequencing using manual processes is challenging to say the least. But artificial intelligence is making it possible to identify the right sequencing for different cohorts of people at different stages of their lifecycle.
The application of AI has allowed us to run full lifecycle user acquisition and revenue generating campaigns benefiting from thousands of experiments across these cohorts. The reward of taking this approach has been an incredible 3.5X improvement in the new CAC and ROI.
The massive scale with which we can experiment, learn and optimize messaging throughout the user journey simply isn’t possible (or worth the time and effort) without an autonomous artificial intelligence marketing engine. We can test, learn and iterate at a much faster pace to quickly identify what works and what doesn’t across creatives, audiences, messaging and more. It allows us to better target people with the right ads and messages based on where they are in the IMVU lifecycle, encouraging them to take actions that naturally lead to higher lifetime value.
Specifically, all of this orchestration and automated learning drove 46% lift compared to the control group when driving in-app purchases.
Let’s review the business processes at play and how the application of AI drives meaningful optimizations and outcomes through large scale experimentation.
Starting with our overall strategy, we set our objectives (desired outcomes), creatives, and any other campaign constraints. We then get segmentation data from our data warehouse and CRM sources, along with custom audiences we’ve developed over time. AI automates blending segmentation models with cross-channel message placement which automatically explores, observes and optimizes for the right business outcomes. From there, we seek further potential audience or creative insights, update our approach and the cycle goes on.
What’s happening behind the scenes as artificial intelligence orchestrates cross channel experimentation? You can think of it like split tests of different variables across multiple digital channels -- but on steroids. You’re intelligently running rapidfire content and audience experimentation and learning in a way that uncovers new opportunities to present the right content to the right people at the right time, and taking action in the instant.
What did we take away from all this? Besides dramatically improved performance and efficiency, we gained insights into the best performing creatives and segments.
For our lapsed purchasers segment -- defined as anyone who made an In-App Purchase in the last 180 days but not in the last 30 days -- we learned that there were two kinds of content that worked best:
Highlighting content from IMVU’s most influential creators, essentially creating a showcase of products from the best of IMVU’s creators.
Weekly contests, in which users can participate to win free credits as you see in figure 4-5 below, proved very popular and did a great job of attracting lapsed users back both to participate and purchase again.
We also learned that our Day 1 Users were motivated by a simple message: reminding them that they can redeem free credits to get started, as you can see in figure 4-6 below. This engaged them in the app and encouraged them into the flow of becoming a high Lifetime Value customer.
Scaling growth doesn’t come easy. Let this be your roadmap to maximize your customer lifetime value (CLV) by always running sequential tests for different cohorts at different stages throughout the entire user journey. To turbocharge your performance, consider working with, or building, an intelligent AI machine to help you automate the key levers like blending segmentation models with cross channel creative placements, achieving data-driven results far beyond manual capabilities.
Now that we’ve got the basics of AI as they’re applied to marketing, and an understanding of customer lifecycle marketing, we can explore options around building or buying a solution to help you turbocharge your startup’s growth.
1 An application program interface (API) is a set of routines, protocols, and tools for building software applications. Basically, an API specifies how software components should interact. Additionally, APIs are used when programming graphical user interface (GUI) components. A good API makes it easier to develop a program by providing all the building blocks. A programmer then puts the blocks together.