Application Aware Auto-Scaling for Kubernetes

It is known to everybody in the cloud community that effective resource allocation is a powerful cost optimizer, and yet few have been able to tap into it. Cost remains a major pain point in cloud, and for a majority of the organizations on Kubernetes, it is an ongoing struggle to align their cloud resources tightly to their usage. Manual app optimization seemed a possibility for a hot second, but it requires nuanced understanding of the architectural needs of an application that is above the paygrade of operators. The only safe option left is to overprovision, and overprovision they do. Companies sustain impossible capital outflows every year on account of wasted capacity winding up paying way more than what they got use for. This begs to question -is there really a way to break out of this cycle? Can organizations tune their applications to attain optimal resource utilization on Kubernetes?

StormForge has answers worth paying attention to. At KubeCon 2022 in Detroit, Michigan, we talked to Yasmin Rajabi, VP of Product Management at StormForge about the inherent problems of Kubernetes’ default auto-scaling and learned how StormForge’s ML-based bi-directional auto-scaling helps right-size applications and optimize cloud cost.

Kubernetes Has a Problem

When it comes to pod auto-scaling, out of the box, Kubernetes offers the options of horizontal and vertical scaling, but there’s a caveat. One must not use both HPA and VPA scaling because they essentially cancel each other out. One attempts to reduce utilization while the other works to maximize it to reduce waste, and that creates a thrashing action. And when users try to manually tune the settings, they find themselves looking at a vast amount of data from which tracing usage patterns and coming out with the right settings is not humanly possible.

Risk Vs Reward

A thing to keep in mind when taking a corrective action with something like resource allocation is that there are risks associated with the gains. Reckless cost cutting can lead to very adverse outcomes. Even when there’re potential savings on the horizon, one must understand that certain tradeoffs have to be made.

If an organization decides to spend less on compute resources, it can come at the expense of degraded performance. For those reasons, clipping can be a risky business and it demands an abundance of caution, proactive risk assessment and predictive utilization analytics.

StormForge’s Two-Way Approach

StormForge tackles the beast two ways. First is in pre-production stage where it lets users preemptively load test applications and work out the optimal configuration for the deployment in prod. With a configuration that has the minimum tradeoffs and the best cost-to-performance numbers, users can tune up the resource efficiency of the applications.

Second is at production. Here, the StormForge platform scans telemetry data like usage and request limits with the embedded ML algorithm to analyze headroom and possibility of clipping. The recommendations engine rolls out the best set of recommendations based on the risk profile and potential for savings. Rajabi says, “Additionally, we set the target utilization of the HPA so that when you are autoscaling horizontally, you’re doing it in the most resource-efficient manner.”

Utilization Peaks and Troughs

Stephen Foskett brings up the topic of variations of utilization based on business seasons. The reason predicting utilization is precarious is because the patterns of usage aren’t always even. The practice of spinning up and spinning down during certain hours of the day does not fly when the instances max out in peak periods of business like holidays and sale events. If you’re not resourced right in those times, you can wind up with serious performance problems and errors.

Rajabi informs that the StormForge ML algorithm’s strength is in looking at data and recognizing patterns beyond the usual that are imperceivable by humans. As the platform evolves, StormForge plans to expand its scope of data collection and with new generations of algorithm, it will make more precise predictions, even for unplanned spikes.

Trying Recommendations One at a Time

As good as ML-based auto-scaling sounds, there’s still some hesitation around it. Enterprises are not overly eager to rely on a software or service that may or may not mess up their implementation.

Rajabi says that that’s why StormForge offers users the option to test the recommendations for as long as they want before putting them on auto-pilot. Users can choose to deploy the StormForge recommendations as often as every hour or set them to weekly. Rajabi informed that it has customers that deploy their recommendations hourly so that they can be as specific as possible to the traffic patterns.

StormForge works with cloud monitoring solutions like Datadog and Prometheus and procures all its data from them instead of using a separate software. StormForge also integrates with CI/CD systems and automate that deployment.

Real Numbers from Real World Applications

In terms of real numbers, Rajabi reports that for some customers, StormForge has seen savings as high as 50% which goes to show the startling amount of capacity wastage. Out of the box, the StormForge platform comes with a dashboard that visualizes metrics of utilization. Users get itemized information on excess capacity, predictions and recommendation and potential for savings based on their preferences.

Rajabi emphasizes that StormForge does not swamp users with just data and no actions. Oftentimes, when analytics are presented before operators, the question becomes how can a problem of overallocation be resolved in a way that does not bring down the entire system. StormForge’s recommendations are curated to reduce risk, first and foremost, and show the potential for resizing at minimum sacrifice.

You can download the StormForge platform from StormForge’s website or get a trial pack free of cost. Datadog users can purchase them at the Datadog marketplace.

Watch the full interview above or check out StormForge’s presentations at the recent Cloud Field Day event for a technical demo of the StormForge platform. Come back for more interesting interviews from KubeCon 2022 arriving next week.

Guest

Yasmin Rajabi, VP of Product Management, StormForge

Moderator

Stephen Foskett

Connect with Yasmin Rajabi on her LinkedIn Page.

Twitter

@SFoskett

Application Aware Auto-Scaling for Kubernetes

Kubernetes Has a Problem

Risk Vs Reward

StormForge’s Two-Way Approach

Utilization Peaks and Troughs

Trying Recommendations One at a Time

Real Numbers from Real World Applications

Guest

Moderator

LinkedIn

Twitter

Related Posts

Is Kubernetes A Good Fit For Edge Computing?

A Secure Service Mesh for Hybrid Multi-Cloud

Product Refreshes and New Announcements from Portworx

Storage from a DevOps Point of View with Dell Technologies

About the author

Sulagna Saha

Leave a Comment X

Kubernetes Has a Problem

Risk Vs Reward

StormForge’s Two-Way Approach

Utilization Peaks and Troughs

Trying Recommendations One at a Time

Real Numbers from Real World Applications

Guest

Moderator

LinkedIn

Twitter

Related Posts

Is Kubernetes A Good Fit For Edge Computing?

A Secure Service Mesh for Hybrid Multi-Cloud

Product Refreshes and New Announcements from Portworx

Storage from a DevOps Point of View with Dell Technologies

About the author

Sulagna Saha

Leave a Comment X

There are Too Many Clouds - The Tech Field Day Podcast