Big data has become one of the more valuable assets held by enterprises, and virtually every large organization is making investments in big data initiatives.
That’s not an overstatement. A 2021 survey by NewVantage Partners found that 99% of senior C-level executives at Fortune 1000 companies said they’re pursuing a big data program. Perhaps even more significant, 96% reported that their companies have had success with their big data and artificial intelligence programs, 92% said the pace of their investments in these areas is accelerating and 81% voiced optimism about the future of big data and AI in their organizations.
Big data collection entails structured, semi-structured and unstructured data generated by people and computers. Big data’s value doesn’t lie in its quantity, but rather in its role in making decisions, generating insights and supporting automation — all critical to business success in the 21st century.
“Companies need to invest in what the data can do for their business,” said Christophe Antoine, vice president of global solutions engineering at data integration platform provider Talend. But organizations that want to reap the benefits of big data must first effectively collect it — not so easy a feat given the volume, variety and velocity of data today.
Data collection is far from new, of course, since information gathering has been an ingrained practice for millennia. Moreover, researchers for centuries have been confounded in their attempts to manage and analyze overwhelming amounts of data.
Today the volume, variety and velocity of data are so much greater that it warrants the title big data. The world now generates an estimated 2.5 quintillion bytes of data every day, according to general consensus statistics. This data comes in the following three forms:
In big data collection, the range of a company’s sources generating data need to be identified. Common sources include the following:
No enterprise can collect and use all the data being created. So, business leaders need to build a big data collection program that identifies the data they need for their existing and future business use cases. Some experts believe enterprises should collect as much data as they can acquire to pilot innovative use cases, while others advise organizations to be more selective to avoid running up costs, complexity and compliance issues without getting any business value in return.
Identifying useful data sources is just the start of the big data collection process. From there, an organization must build a pipeline that moves data from generation to enterprise locations where the data will be stored for organizational use. Most commonly, this data ingestion process involves three overarching steps — extract, transform and load (ETL):
Data management teams face additional considerations and requirements at each of these steps, such as how to ensure the data they’ve identified for use is reliable and how to prepare it for use.
“Data determines the uses you can have, and desired applications determine the data you will need,” said David Belanger, senior research fellow at the Stevens Institute of Technology School of Business and retired chief scientist at AT&T Labs. “Once you know the sources, there are a number of questions to be answered: Where can I get the data I need? Is the source reliable? What are its properties, for example, velocity, stream, transaction, purchased? What is its quality? Is it internally or externally sourced? etc.”
Not surprisingly, many businesses struggle with these questions. “There are all kinds of challenges — technical challenges, organizational and sometimes compliance challenges,” said Max Martynov, CTO at digital transformation service provider Grid Dynamics. These challenges can include the following:
Such challenges within the data collection process mirror the challenges that executives cite as barriers to developing their big data initiatives overall. The NewVantage study, for example, found that 92% of respondents identified culture — people, business processes, change management — as the biggest challenge to becoming a data-driven organization, while just 8% identified technology limitations as the leading barrier.
Experts advise business leaders to develop a strong data governance programto help address those challenges, particularly security- and privacy-related challenges. “You don’t want to hurt access, but you do need to put the right governance in place to protect your data,” Talend’s Antoine noted.
A good governance program should establish the processes needed to dictate how the data is collected, stored and used and ensure that the organization does the following:
Such steps help secure and protect data to ensure regulatory compliance. Moreover, experts said these measures help the business to trust its data — an important part of becoming a data-driven organization.
To build a successful, secure process for big data collection, experts offered the following best practices:
Taken directly from: https://searchdatamanagement.techtarget.com/feature/Big-data-collection-processes-challenges-and-best-practices?utm_campaign=20210903_ERU+Transmission+for+09%2F03%2F2021+%28UserUniverse%3A+328901%29&utm_medium=EM&utm_source=ERU&src=8192149&asrc=EM_ERU_178630315&utm_content=eru-rd2-rcpG
When looking to optimize your BI environment, Analytics Turbo is the expert in Microsoft Power BI. Improve your data management systems, with exclusive tools, processes and services from Analytics Turbo. Book a free consultation. https://analyticsturbo.com/