And you thought having enough storage was going to be the main problem?
It turns out that having enough words to describe the massive amount of stuff being stored is actually a big problem too.
I like hellabytes and whateverbytes — but I also like dogbytes, babybytes, mosquitobytes, and soundbytes.
Yeah, yeah, I’ll stick to the day job, I know. But do you think Tosh could get some legs out of my material?
Emphasis in red added by me.
Brian Wood, VP Marketing
Extreme ‘big’ data: after zettabytes and yottabytes comes…
While we’ve spent a good bit of time–you and I–previously discussing that size doesn’t matter in using tools labeled as big data, there actually is such a thing as “big” data in this world.
Really big, enormous data does exist and it’s growing even bigger as we speak. It’s so big that it measures beyond zettabytes and yottabytes and we don’t even have a word for it yet. But we need to name it soon because this size of data is in the future for more companies than most may realize now. Welcome to extreme big data and here’s what that means.
“The challenge is only partly one of coming to agreement on the right words to describe what lies beyond a yottabyte, which is a septillion bytes,” writes John Foley, director of strategic communications at Oracle, in his Forbes post.
“Oracle big data strategist Paul Sonderegger says the ‘whateverbyte problem‘ is a symptom of a larger, even more important business issue. ‘Not only do we lack a name for that volume of data, we don’t know how to talk about its consequences,’ Sonderegger says.”
A yottabyte is 1,000,000,000,000,000,000,000,000 bytes. It’s hard to imagine something bigger. But the inability to imagine won’t prevent it from occurring in an increasing number of datacenters in the not-so-distant future.
“Few database managers or storage architects are thinking beyond that [yottabyte] because their IT environments aren’t that enormous yet, but the time for such planning is coming faster than most of us realize,” writes Foley.
Indeed it is. But it’s not only the business implications that must be planned for now, but the societal and individual privacy implications as well. Unfortunately, no one is quite sure how to do that yet.
Extreme Big Data: Beyond Zettabytes And Yottabytes
If we’re going to talk about big data—and it’s one of the most important discussions in business today—we better all speak the same language. In the hierarchy of big data, there are petabytes, exabytes, zettabytes, and yottabytes. After that, things get murky.
The challenge is only partly one of coming to agreement on the right words to describe what lies beyond a yottabyte, which is a septillion bytes. Oracle big data strategist Paul Sonderegger says the “whateverbyte problem” is a symptom of a larger, even more important business issue.
“Not only do we lack a name for that volume of data, we don’t know how to talk about its consequences,” Sonderegger says.
What happens, he asks, when large investment banks have perfect information about small markets? Who will be on the other side of the trade? And what happens to consumer protections when banks decide to turn purchase histories into a product for retailers, insurance companies, and other banks to buy?
The time has come for the technology industry to sharpen its language around data sets that are thousands or millions of yottabytes in size. The sooner we do this, the better equipped we will be to answer the kinds of tough questions posed by Sonderegger, which are surely coming.
A yottabyte is a mind-boggling 1,000,000,000,000,000,000,000,000 bytes. Few database managers or storage architects are thinking beyond that because their IT environments aren’t that enormous yet, but the time for such planning is coming faster than most of us realize. See my earlier column, “As Big Data Explodes, Are You Ready For Yottabyes?”
It’s time to start thinking about what comes next . In August, big data expert Andrew McAfee, principal research scientist at Massachusetts Institute of Technology’s Center for Digital Business, warned that the industry is moving so fast from petabytes to exabytes to zettabytes that “we’re literally about to run out of metrics for this stuff.”
McAfee was speaking in New York at Oracle’s Big Data At Work event. “We don’t have many prefixes left,” he said, in reference to exa, zetta, and yotta. For a recap of that event and McAfee’s presentation, see “Big Data At Work: Decline Of The HiPPO.”
He’s right. The tech industry has been circling around the terms brontobyte (a thousand yottabytes) and geopbyte (a thousand brontobytes) as the next levels in the big data hierarchy, but those are de facto terms. Merriam-Webster.com has no listing for either of those words.
And neither brontobyte nor geopbyte have their own entries in Wikipedia, which otherwise provides extensive background on metric, binary, and other unit prefixes defined by the International System of Units. That brings us to yet another nuance in this whole discussion. While a megabyte is precisely 1,048,576 bytes, it’s also, as a matter of convenience, defined as one million bytes.
Wikipedia has a whole section on unofficial prefixes, including hellabyte—as in a hell of a lot of bytes—as another way of saying a thousand yottabytes. There was actually a movement to legitimize hellabyte, but that fizzled out. One slap happy Wikipedia contributor referred to brontobyte as the mark left by a hungry brontosaurus.
Such punch lines are a distraction from the very important conversations that must take place in businesses, government agencies, universities, and research organizations about the rapid accumulation of data and new data types being generated from many more sources. On a more serious note, other terms that have been proposed include ninabytes (a thousand yottabytes), followed by tenabytes.
As you can see, the lingua franca of big data gets complicated beyond yotta, and even more so when people spell the same word differently. Tech blogger Sharon Fisher has observed that geobyte, gegobyte, and geopbyte are variants of the same thing. “Geo-, gego-, or geop-? It kind of doesn’t matter,” Fisher wrote, “because it’s all unofficial anyway, but somebody might want to figure it out at some point.”
Exactly! Big data strategy discussions will be more difficult if we’re not using the same terms. Brontobytes, hellabytes, or ninabytes? Geopbytes, gegobytes, or tenabytes? It’s data storage alphabet soup.
The tech industry has been mulling the question of life beyond yottabytes for a few years now, but the market is catching up to our ruminations as petabytes enter the mainstream and as we see more examples of exabyte and zettabyte computing environments.
The argument could be made that we’re simply not ready for all of this yet—that yottabytes are the outer edge of today’s most data-intensive applications and projects and that it’s merely theoretical to contemplate data stores that are a thousand and a million times bigger.
Yet, if we’ve learned anything about data science, it’s that the growth curve points forever upward. And high-performance computing and very large databases have always been about what’s next as much as they are about what’s already here. So let’s find the right words to have an intelligent conversation about where we’re headed because the theoretical will be here before we know it.