Dark Data: Time to Take Action Now

dark_data-security,2-4-374620-13Over the next two years, nearly 60% of professionals are making analytic strategies and tools top priorities according to Forrester Research’s “The State of Digital Customer Experience.” And that’s not surprising considering organizations across all industries now want to make sure they are extracting maximum value from their data. But in order to get the maximum benefit from analytics, organizations must finally address their “dark data”. So what exactly is dark data? At the AIIM Roadshow in London, 451Group research director Alan Pelz-Sharpe defined dark data “as large quantities of information that an organization accumulates over years but that ends up being redundant or unusable, largely because there is too much, it is being produced too quickly, and it lacks structure.” Gartner adds in its definition, “Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value.”

When I first heard the term “dark data” being used to describe the amorphous mass of continually growing digital content it immediately brought to mind “dark fiber” from my days in the telco industry. Dark fiber is seen as a valuable asset to the telcos. It’s simply dormant strands of fiber just waiting to be lit up to provide a customer data/voice bandwidth (mostly data today), i.e. revenue. Dark data should be seen in similar light; there’s hidden value amongst the digital debris. It simply requires evaluating it, identifying the set of data elements that still has value to the business, protecting those and disposing of the rest, which has been estimated as being up to 70% or more of what’s currently being stored. Continue reading

Information Tai Chi

taichi1One of the objectives of tai chi as a martial art for self defense is to gently absorb the force of an aggressor and guide the energy away or turn it against him.  It could also be stated as using that energy to benefit the tai chi master.  The massive number of constant data streams our senses are attacked with every day (individually and organizationally), which I described in a previous post (Driven to Distraction: Insatiable Information Input), is demonstrating the need for a similar skill.  It is the ability to effortlessly absorb the information, determine what should be guided away and what has value (energy) and can be used for our benefit.

The place where yin and yang are balanced is called tai chi by the Chinese and in the realm of information and data, balance comes from understanding what is of value to us and that which is superfluous.  This is one key capability to achieving success with Big Data analytics; discovering the information gems amongst the noise and leveraging it to benefit the organization.  An effective information governance/management program is  an important pillar for the data-driven organization.  It enables the organization to analyze, identify and protect valuable information assets, as well as the ability to dispose of much of what’s no longer needed. While you may find this too much of an over-simplification, that is not at all the intent.

Tai chi is an art form involving the whole body through movements that exercise the muscles, joints and bones, which improves the functioning of the organs.  To master information tai chi requires the support of the whole organization (starting at the executive suite) along with cross-functional cooperation to implement a solution (movements) that improves decision making and operational performance.  Tai chi is also a life long endeavor (or practice), that even those achieving rank of master say they’ve not mastered yet.  It is important for the organization to understand that becoming an information tai chi master (or improving information governance) is a core strategic initiative requiring a long-term commitment.

There is a significant drive by organizations today to become more data-driven or information-centric in their approach to customers, competitors and operationally to improve their opportunity for success.  Emerging from this endeavor is the recognition that a new, executive level role is needed to create a holistic understanding of an organization’s data assets and guide their use to maximize its value (see here & here).  Enter the Chief Data Officer, or Information Tai Chi Master.

Driven to Distraction: Insatiable Information Input

Perhaps the FIFO (first-in-first-out) method applies to one’s ability to retain information, or lose it with the constant bombardment of ever more coming in all the time. Maybe the sitcom “Married With Children” was prescient in the episode where Al fills Kelly’s head with sports facts and old trivial information that was in there is jettisoned, or first out.

In all seriousness though, there was a recent report on one of the cable news networks about the increase in ADHD diagnosed in children over the past ten years.  It referred to an article in the New York Times indicating that a CDC study showed the rate of kids between the ages of 4 and 17 being diagnosed with ADHD has jumped 41% over the past ten years.  This started me wondering what the impact of the always-on, always-connected, content-everywhere lifestyle might be as a contributing factor.  There is a near-infinite number of information streams flowing continuously 24 hours a day. Continue reading

Waves Required for Surfing the Web

In an earlier post I made the statement that the Web is just one massive content storage repository and how this aligns with the concept of the cloud.  The intent here is to expand on that concept further with respect to the file system.  The dominant paradigm over the past 30 plus years for organizing, structuring and managing computer generated information artifacts stored digitally as files.  Content or unstructured data in the form of blogs, tweets, social networking and pinteresting things has provided users new and more dynamic methods of storing, managing and sharing stuff.  Surfing the Web is the new common paradigm and you can’t surf without waves (or WAVS).

The file system is the most familiar interface to storage for end users, applications and developers.  The introduction of NAS for presenting a network accessible file system for shared storage extended the capabilities of file system connectivity.  However, we’re seeing these technologies reaching their breaking point as the number of files and amount of capacity are exceeding the expectations of the original architecture. Continue reading

Example Web Map

Cloud continues to occupy a significant amount of mind share these days as it promises to simplify storage in the enterprise, improve the cost model and streamline content access and distribution.  The Web has done the same for applications, business transactions, etc. so why not storage.  One could posit that the World Wide Web is simply a massive storage repository providing ubiquitous access to content over the Internet using HTTP.  The Web browser is the primary user interface, presenting rich, dynamic HTML pages containing hyperlinks to other pages and various content types, e.g. text, documents, images, video, etc. Object storage is the optimal infrastructure for browser access to content clouds.  Continue reading

