At first glance, IBM’s surprise announcement that it plans to buy Red Hat for $34 billion looks like a victory for open source, which has been a huge driver of innovation in the IT market for decades. However, recent moves by prominent commercial open source companies to restrict licenses shows that open source’s future could be cloudier than many realize.
IBM‘s blockbuster announcement on Sunday that it will buy Red Hat in what will be the third-largest technology acquisition in history was cast as a conquest for open source software.
“Open source is the default choice for modern IT solutions,” said Red Hat CEO and President Jim Whitehurst. “Joining forces with IBM will provide us with a greater level of scale, resources and capabilities to accelerate the impact of open source as the basis for digital transformation and bring Red Hat to an even wider audience.”
The acquisition gives Big Blue more ammo to fight the cloud wars, according Ginni Rometty, who is CEO, president, and chairman of IBM. “The acquisition of Red Hat is a game-changer. It changes everything about the cloud market,” Rometty said. “IBM will become the world’s number one hybrid cloud providerâ¦”
The deal highlights the importance of open source in the tech sector, according to Jim Zemlin, executive director of The Linux Foundation. “IBM has been a major open source advocate for over two decades: promoting and defending Linux and open source at critical points in its history,” he says.
It’s hard to overstate the impact that open source has had on the IT industry over the past two decades. Since Linux emerged as a legitimate enterprise option for server operating systems near the turn of the millennia, momentum behind the open source development model has been practically unstoppable as it changed the course of IT history.
- Linux today is the dominant server operating system when it comes to servers, accounting for about the 96% of the world’s websites by one count;
- Linux accounts for 100% of the supercomputers in the TOP500 list as of November 2017;
- Linux accounts for 92% of Amazon EC2 instances by one measure, and even Microsoft admitted recently that Linux instances now exceeded Windows operating systems on the Azure cloud.
Microsoft even joined the Open Invention Network (OIN) recently and offered its entire patent portfolio to OIN members, which is quite the change of course for a company whose CEO called open source a “cancer” on innovation in 2001.
Open Sourcing Big Data
The trajectory of open source projects has been even more impressive in the big data ecosystem. A handful of influential Apache Software Foundation projects like Hadoop, Spark, Cassandra, Kafka, and Flink have all pushed the envelope in distributed computing in some way. (The fact that nearly all of these frameworks assume a Linux operating system at the bottom of the stack also shows how the open source community builds off previous successes.)
But open source projects sponsored by ASF are just the start of open source. There are influential non-Apache projects in the big data world, like TensorFlow, Presto, and Kubernetes, that use the Apache 2 license, while other projects, such as Scikit-learn, Numpy, and Pandas use a BSD license. MySQL, KNIME, and others use GPL, Citus uses the AGPL, while Postgres uses the PostgreSQL license.
Regardless of the specific licenses used, open source software in general has been a boon to big data. According to Brian Gracely, the director of product strategy at Red Hat, there has been a shift in the location where innovation occurs: from proprietary products into the open source realm.
“Our perspective isâ¦that most of the innovation we’re seeing happening nowadays is happening in the open source community, it’s happening in open source,” Gracely told Datanami earlier this month. “We continue to see more and more VC money flow into companies around open source. We’re seeing more of the large companies contributing to it. We’re seeing Red Hat obviously, but we’re also seeing Microsoft, Google, [and other] larger companies contributing to that space.”
While companies may equate “open source” with “innovation,” they’re also looking for some help in pulling open source projects together, including the complex work of stitching multiple frameworks together to work as a cohesive whole. That’s partly what’s powering the commercial open source business model, where companies buy subscriptions to enterprise versions of products from a Red Hat, a Cloudera, or an Elastic.
“That’s what we’re all trying to do: Make it as simple as possible, to take all the friction away,” Gracely said. “I have an idea, I have a business problem, let me just start working on it. The technology will be there somewhere. It will be delivered by your IT group locally or it will be delivered out of a public cloud service in one way, shape, or form.”
Cloud to the Rescue?
There’s no doubt that many have struggled to get open source frameworks like Hadoop, Spark, and Hive to function smoothly and deliver the big data goods as advertised. Many companies have spent years and millions of dollars to build applications atop open source Hadoop distributions that may have dozens of open source sub-components working underneath. Some have come out of this with production clusters, while many others are still in development mode.
In fact, the difficulty in getting value out all this open source innovation has driven many companies to the cloud, where Amazon, Google, and Microsoft offered hosted versions of many of the same open source products that are pre-configured and pre-integrated. The hosted data warehousing company Snowflake, which recently brought in $450 million in venture funding, is enjoying a bit of success at the moment at the expense of failed Hadoop implementations.
The obvious momentum of cloud providers was also evident in the recent news that Cloudera and Hortonworks are joining forces to create a single company with about $720 million in annual revenue and 2,500 customers. The companies realized that, instead of fighting each other for share of the market for on-premise Hadoop distributions, their biggest competitor is the public cloud.
The new Cloudera is hatching a plan to counteract the momentum of AWS, Google Cloud, and Microsoft Azure clouds by building a hybrid platform that runs atop Kubernetes containerization infrastructure. That capability to span from on-premise to the cloud â which the public cloud providers cannot do â appears to be one of the main drivers of IBM’s interest in Red Hat, too.
Freeloading In the Cloud
There’s no doubt that open source is responsible for a ton of innovation in the technology field. It’s also unleashed market forces that are still playing out today. Some of those forces are having positive impacts, but there are also some unexpected consequences that are starting to emerge.
In August, Redis Labs moved its Redis Modules from an AGPL license to a new license that combines Apache v2.0 with Commons Clause, which restricts the sale of covered software. The net impact is that Redis Modules, including RediSearch, Redis Graph, ReJSON, ReBloom and Redis-ML are no longer open source software, even while the core Redis database continues with a BSD license.
In an August 22 blog post, Redis Labs cofounder and CTO Yiftach Shoolman put the blame square on the cloud.
“Cloud providers have been taking advantage the open source community for years by selling (for hundreds of millions of dollars) cloud services based on open source code they didnât develop (e.g. Docker, Spark, Hadoop, Redis, Elasticsearch and others),” Shoolman writes. “This discourages the community from investing in developing open source code, because any potential benefit goes to cloud providers rather than the code developer or their sponsor.”
Under the new license, Redis Labs customers can still build and sell products atop the modules â they just can’t sell the unmodified modules. “We believe this licensing supports the open and free use of modules, while still maintaining our rights over commercializing our assets,” Shoolman writes.
NoSQL database MongoDB is making a similar change to its open source license for its community edition, for similar reasons. On October 16, the company announced that it would use a new Server Side Public License (SSPL) for MongoDB Community Server. The new license would retain all of the same freedoms that the open source community had with MongoDB under the AGPL license, but require “that any organization attempting to exploit MongoDB as a service must open source the software that it uses to offer such service.”
“The market is increasingly consuming software as a service, creating an incredible opportunity to foster a new wave of great open source server-side software,” MongoDB CTO and co-founder Eliot Horowitz stated in a press release. “Unfortunately, once an open source project becomes interesting, it is too easy for cloud vendors who have not developed the software to capture all of the value while contributing little back to the community.”
The change in sentiment around open source caught the eye of David Flower, the CEO of VoltDB, which develops an in-memory distributed relational database available as open source with an AGPL license.
“A few years back, true open source data platforms were extremely popular, as they enabled enterprises to cost-effectively get a handle on ‘big data,’” Flower tells Datanami via email. “However, today’s digital transformation strategies have brought that trend to a halt, as platforms such as Redis and MongoDB are reversing open source strategies and charging unhappy customers to keep pace with the scale of data growth.”
It’s doubtful that the big data community will go back to developing proprietary software at this point. However, it’s clear that existing commercial open source strategies are not panning out as purveyors expected, which will likely lead to more restrictive licenses in the future. With the cloud poised to continue gobbling up more workloads, organizations may need to pay more attention to licenses behind the technology that’s unlocking all the data innovation.