Machine learning is a hot topic across the technology spectrum today. From self-driving cars, to catching nefarious content in the fight against terrorism, to apps that automatically retouch photos before you even take them, it is popping up just about everywhere. Each innovation is creating a new wave of business opportunity while simplifying and automating tasks that are generally beyond the reach of how much data we human beings can process at once, or even in a lifetime.
While machine learning might seem a newly emerging trend – which it most certainly is – it is also a breakthrough that has been a long time coming. Back in 1959, computer science and gaming pioneer Arthur Samuel defined machine learning as giving “computers the ability to learn without being explicitly programmed.”
With so much to gain from computers helping us with front-end processing in apps and services, it’s no surprise that machine learning is rapidly moving to the backroom of data centers. With cyberattacks on the rise, researchers are examining how machine learning could improve data center security. Machine learning controls inspired by Internet of Things (IoT) connectivity are already helping to manage power and cooling efficiency to help data centers become much more energy-conscious. Any gains in efficiency are most welcome, as NRDC estimatesthat by 2020, the electricity bill for keeping data centers running in the United States alone is expected to reach $13 billion.
Machine smarts beyond the switch
But there is far more efficiency ahead beyond our much-needed digital guard dogs and smart switches. Today, despite the rapid innovation on flash and other non-volatile memory (NVMe) technologies, entire storage systems are used inefficiently. This is because there has not been a way to know what data is hot and needs high performance, or what data is inactive and can move to a less expensive storage tier, such as an offsite cloud. Since the dawn of computing, storage has been blind to applications, and applications blind to the storage systems serving their data. As a result, to ensure service-level agreements, IT will over-provision an application’s storage needs, resulting in cold files sitting on expensive storage or capacity that is allocated but unused. The good news is that the insight needed to deliver machine-learning automation is already sitting latent in the data itself.
Metadata – the data about your data – can be used to determine when a file was last accessed, who opened it, what changed, and many more attributes that help reveal the current business value of a given dataset. Metadata engine software can virtualize data, and examine how the data aligns with the storage resources available in your ecosystem. With that intelligence, it can then move data at any granularity (LUNs, volumes, directories, sub-directories or a single file), to the right tier to align with IT’s objectives. The longer it operates, the more the metadata engine software can collect and analyze patterns and begin to make recommendations on how to optimize resources to maximize both performance for the data that needs it, and savings for the data that doesn’t. Importantly, you should still be able to take manual control and define data that needs performance, since it’s likely that the CEO will want even her 18-month-old emails accessible in seconds.
Alternatively, IT can keep managing data manually in each of the storage silos in its datacenter, but as enterprises begin to adopt more efficient alternatives, sticking with the status quo may turn out to be a much greater business risk than being first to automate data management across on-premises IT infrastructure and the cloud.
Machine learning is about understanding what to do better in the future based on what was experienced in the past. Perhaps this is why so many of us react with a little fear when we hear about growth in machine learning. Machines have far less emotional investment in the past than we do, but luckily, with a little human wisdom, we can put their abilities to work where our businesses need it most – in managing our data.