Dark data: shedding light on a growing problem for businesses
Mika Javanainen – Senior Director, Product Management at M-Files
While somewhat ominous, the term effectively describes an important class of information that exists within nearly every company. “Dark data” refers to all of the fragments and files that have been forgotten or lost within an organisation’s digital repositories.
What is dark data?
Many define dark data as information assets that are created and used only once, such as a .zip file that is received, stored, unzipped and then typically ignored forever. And even content that is actively used for a period of time can turn into dark data when organisational and project priority changes. Active information that becomes inactive is typically left where it was and is easily forgotten. To make matters worse, employees often recreate data when they can’t quickly find their copy. Duplication and recreation multiply the incremental volumes of any data that subsequently goes dark.
Dark data – A fast-growing burden on the business
Massive volumes of neglected, unused information can grow just as fast as relevant, mission-critical data.
Expenses for eDiscovery and forensic data analysis can similarly be significantly increased by dark data. Any organisation that anticipates or has to plan for possible litigations will be hampered by huge amounts of dark data.
Dark data also impedes analytical efforts that involve culling data repositories. Results can be skewed by essentially bogus or out-of-date information, and at a minimum, time will be wasted filtering out dark data.
Rewiring information with metadata
Before dark data can see the light, it has to be identified. After that, enterprise content management (ECM) solutions that leverage metadata can inject a layer of intelligence capable of eradicating the dark data.
By attaching attributes or tags to content assets, a metadata-driven ECM solution can instantly identify the information assets that are related and/or relevant to other unstructured content assets as well as structured data objects. For example, metadata attached to a sales proposal (an unstructured content asset) can be tagged with a metadata attribute for “Customer A.” That proposal then becomes visible from within the CRM system, and can be linked to the CRM account for Customer A (a structured data object). In this way, metadata shines the light on previously dark data. All of the information assets related to Customer A can be displayed to decision makers in context with other related information.
Learning to see in the dark
With the ability to see and harness dark data, companies can make better use of all of their data. At a time when the volume of information continues to skyrocket, it makes sense to pay attention to dark data. Since a portion of dark data can still provide value, there are positive incentives for making it broadly visible.