Americas

  • United States

Hadoop fails to live up to the promise and the hype

Opinion
Mar 28, 20173 mins
Big DataOpen Source

The cost and complexity of Hadoop keep people from achieving anything, according to big data experts

Hadoop, the open source big data framework first developed at Yahoo for analyzing large data sets, is a total failure that costs too much and is too much of a headache to implement, say people in the field. 

In a lengthy and in-depth piece on Datanami, big data experts describe Hadoop as too primitive for any kind of complex processing work or interactive, user-facing applications. At best, it’s a batch process job, which is how Hadoop started out. It doesn’t seem to have grown beyond it. 

“I can’t find a happy Hadoop customer. It’s sort of as simple as that,” Bob Muglia, CEO of Snowflake Computing, told Datanami. Snowflake develops and runs a cloud-based relational data warehouse product. “It’s very clear to me, technologically, that it’s not the technology base the world will be built on going forward.” 

Hadoop isn’t going away overnight

Hadoop is pretty widely used, so it won’t go away overnight, but it won’t see any new deployments, either. Muglia says new advances, such as use on Amazon S3 for storage and Apache Spark for real-time in-memory processing, will relegate Hadoop to niche and legacy statuses going forward. 

“The number of customers who have actually successfully tamed Hadoop is probably less than 20, and it might be less than 10,” Muglia told Datanami. “That’s just nuts given how long that product, that technology has been in the market and how much general industry energy has gone into it.” 

The article points out that Facebook was one of the bigger Hadoop deployments, but the company no longer uses it because “there’s a bunch of things that people have been trying to do with it for a long time that it’s just not well suited for,” according to an ex-Facebook staffer. 

Muglia says the problem is the community around Hadoop is small and didn’t really make much of an effort to grow the product. Alternatives have emerged, not the least of which is the public cloud. We’ve even seen the rise of big data as a service

So, the Hadoop team has to make a decision. Do they do the heavy lifting required to bring the software up to speed and make it easier to use and more widely usable, which were the two big knocks in the Datanami piece, or do they just maintain what they have and leave the work to newer technologies?