It seems like you can’t look at a press release these without finding a mention of a company’s latest “AI driven” or “powered by Machine Learning” solution. The marketing buzz is so strong around these near omnipresent terms, that’s it’s easy to become numb to them. But as we’ve established at Gestalt IT, words mean things (sometimes). So we decided to ask an expert to help sort through the cruft.
I had the privilege to talk to Dr. Rachel Traylor on the subject. During Storage Field Day earlier this month, she raised a number of interesting questions around what companies consider to be AI and ML. She also recently started a dialogue on The Math Citadel around a similarly widely use phrase, predictive analytics.
Rich Stroffolino: We’ve heard many enterprise companies boast about including Machine Learning and AI in their products. Are these the same things? If not, how are they different? (I know this sounds like a 2nd grade essay prompt)
Dr. Rachel Traylor: Machine Learning and AI are used interchangeably in the marketing sphere. AI is just a buzzword meant conjure images from our favorite Hollywood sci-fi movies. Allowing the image of a friendly NS4 (I, Robot) or J.A.R.V.I.S. (MCU/Iron-Man) to enter in a customer’s head allows a company to pass off basic rule-based decision making as better tech than it really is. It generates a more glitzy image for people to get excited about. Machine Learning generally refers to (at least these days) some sort of feedback loop like a neural network in order to attempt to “learn” something about the data being fed into the system. That’s generally classed into supervised and unsupervised learning.
Supervised learning has labels, like what we did in school. The teacher knows the answer, and gives feedback when a student attempts a problem (say, identifying the horses in a picture of animals), the teacher can take the student’s answers and mark them right or wrong. The student then takes the feedback from the teacher and uses it the next time he is given a different set of pictures and asked to identify horses. In theory, the student eventually figures out a set of characteristics that allows him to identify a horse in any context by sight.
Unsupervised learning is like releasing a small child into the world to see what he picks up. In this type of learning, we don’t know what’s right or wrong in advance. In this case, the teacher (us) doesn’t know what a horse looks like in general. Self-organizing maps are one type of unsupervised learning.
Unsupervised learning can be good for finding potential patterns in large amounts of data. It can also be good to identify potential features for model selection. In general, I don’t like the use of unsupervised learning beyond an exploratory or qualitative study. Supervised learning examples include classification problems (given a set of characteristics, is the target over or under 18 years old?), decision trees, clustering algorithms, and linear regression. These are the most common types of things I have employed when I’ve been a statistician, and are used for developing models of behavior or for predictive purposes.
RS: Does “having an algorithm” mean that a company is using ML?
DRT: Absolutely not. An algorithm is just a procedure. Take the Division Algorithm—one of the most famous in mathematics, used in Euclid’s Elements to compute the greatest common divisor of two numbers.
The Division Algorithm states that for two integers M and N, there are unique integers q and r such that M = N*Q + R. (Q is the quotient, and R is the remainder). Repeated applications of this allow us to divide two numbers (with remainder).
Let’s do an example and divide 101 by 3:
101 = 3*(33) + 2
Here Q = 33, and R = 2. We can use this repeatedly to find the greatest common divisor of two numbers.
No one reasonable would call the Division Algorithm “Machine Learning”. It just a procedure. Mathematics has many examples, including Gram-Schmidt (a way to transform a set of vectors into an orthogonal set called a basis), long division (to use an elementary example), or even generating random numbers. There’s no learning involved here, and the perversion of the word “algorithm” into a marketing buzzword does concern me. When I use the word “algorithm”, I mean it exactly as it is defined. I’ll quote Wolfram’s definition: An algorithm is a specific set of instructions for carrying out a procedure or solving a problem, usually with the requirement that the procedure terminate at some point. Specific algorithms sometimes also go by the name method, procedure, or technique.
RS: What are some of the warning signs that claims of ML and AI are marketing bluster?
DRT: The most obvious one is a refusal to discuss details. When pressed on the details of their “ML/AI/predictive analytics”, the claimant should be able to give even vague details. For example, if the claim is that their storage system detects anomalies, the claimant should first be able to define an anomaly (trick question…you can’t. See my article here ), then state how they classify it. Do they smooth a time series using exponential smoothing, then take any point outside 3 standard deviations and call it an anomaly? Do they use windowing? These are all questions that can be answered at a high level without “giving away” any “secret sauce” or proprietary information. Hiding behind a “secret sauce” curtain is a good way to spot bluster.
Another (sillier) one is that they talk about it in their slide decks using the same stock images of in the black and blue color scheme with a brain, some “graphs” and translucent binary bits. You know which ones I’m talking about. To me, this is a way to see that you’re not really doing anything under the hood. Why not show results, or a demo, or even a dashboard that gives some set of summary statistics? Anything that shows your actual product.
Talk to me about your actual implementations, your actual data, your actual model performance. A counter-argument to this is “customers won’t understand” or some variant of this. I disagree. Customers are smarter than enterprises give them credit for. They may not know all the statistical terminology, but they can tell when they’re being fooled. People who have really built something they are confident of enjoy showing it off and openly welcome questions. Customers will notice that, even if they’re not as well-versed in the specifics of what’s going on under the hood. They’re glad someone was at least willing to open the hood and let them see the engine.
RS: Is ML synonymous with automation?
DRT: No. I’ll make a quick example. Let’s suppose I want to put a system in place that will proactively re-route traffic. Maybe it’s a network that re-routes traffic to another server when it can tell the current server is getting close to overloaded. What do we mean by “close to overloaded”? We do some kind of study and devise a threshold. If the queue length gets above some specific number, we re-route all incoming traffic somewhere else until the queue length dies down. Once this rule is put in place, the traffic routing is automated. The decision rule (threshold) is static. There’s no “algorithm”, no machine learning, yet the traffic is routed automatically according to that rule.
We could layer some kind of neural network on top of this automation process if we wanted that threshold to be able to change without an entire new study, or without human intervention. But ultimately automation encompasses far more than just machine learning. Machine learning can be used to draw a static conclusion, or to can be used as part of an automatic feedback loop that performs an action. Automation implies an action being performed without human intervention. Machine Learning is just a tool.
RS: What are some legitimate uses of ML and AI in the enterprise you’ve seen?
DRT: Self-organizing maps are neat. I’ve seen that used to discover patterns of behavior in storage capacity and storage purchasing. Classification algorithms such as decision trees can be used to predict target attributes. The output of a decision tree is a very nice, easily understood diagram that maps nicely to our own decision making processes. Other clustering algorithms can help you see groups of particular types of customers. Perhaps a clustering algorithm can help you see qualitatively that your large finance customers tend to make purchases at the end of a fiscal year, whereas your customers in the agricultural space tend to make purchases at the beginning.
RS: Any concluding remarks?
DRT: Machine Learning tools are valuable in many contexts. Decision making doesn’t always require a fully quantitative and exhaustive study, particularly in business settings. Having a qualitative understanding and revealing patterns or clusters can be valuable information, and many good decisions are made qualitatively. Predictive analytics can also provide valuable insight into planning, but one should be careful to note that these predictions have a variance, or that the model may not be accurate. Among the legitimate uses of machine learning comes a need to recognize that underneath the hood, way deep in the engine, there are actually very strict mathematical assumptions in place. Sometimes these assumptions are not met by the data, or render the intended use of that particular machine learning tool inappropriate.
Sometimes the assumptions are so strict that the machine learning tool doesn’t mirror reality (for instance, assuming the dataset is mutually independent, which is rarely true). In these cases, building applications on top of these tools and forcing data through them (or transforming them to “fit” the assumptions) yields fundamentally flawed applications and unstable algorithms. We’ve seen plenty of “AI gone wrong” instances of these. Some are funny, and some are sinister. The answer here isn’t more complexity and data transformations, and it’s certainly not some variant of “shove more data through.” The answer is to return to those unmet assumptions and generalize the mathematics formally to encompass in a provable way the more realistic behavior of your phenomena.
We realized Newtonian physics couldn’t explain quantum behavior accurately, so we went back to the drawing board and mathematically created quantum physics. You don’t try to force Newtonian methods on quantum problems. Why would you force inappropriate machine learning or statistical methods on data it doesn’t apply to? Go back to the drawing board, examine the assumptions, and be willing to do the hard work of creating something new. It really is worth the investment.
You can check out more of Dr. Traylor’s work at The Math Citadel.