In many of the analytic projects I have been involved in, whether big or small, providing guidance to those using the data increases the adoption and long-term value of the solution. The following three-part guide and resource list cover the minimum topics you should consider as people in your organization face these issues.
There is an immediate requirement for Data Governance Initiatives to determine how to secure data usage, manage activity, gain visibility and control of one of your most important assets.
Whether on-premises or in the cloud, more employees are using and sharing an ever-increasing amount of data throughout your organization. The key to this process is that while you need governance, you cannot slow down your organization’s ability to innovate and create impactful solutions.
One of the keys to success is to have a set of Data Governance Principles so everyone understands how to safely use the data. This will help your team not to reinvent the wheel every time they start a project.
Part 1: Why do we Need Data Governance?
In the simplest terms, a data governance process is about managing data as that strategic asset rather than just collecting. It involves ensuring that there are controls in place around data, content, structure, use, and safety. The biggest example of such data is your using and tracking any personally identifying information.
As modern business data usage evolves, it embraces advanced analytics, artificial intelligence, and machine learning, which drives the amount, velocity, and variety of data in play. With all that data comes a wealth of new possibilities and a new set of challenges. Our main outcome and this is important here, is to optimize the management and governance of this ever-greater amount of data.
Data Breach Statistics: How big of a problem?
Hundreds of millions of people are at risk of identity theft or other harm due to recent large-scale data breaches of public and private entities. An important factor turned out to be that some of these companies had poor hygiene on information they were keeping. The need for a set of Data Governance policies, procedures, and technologies could not be more important.
Jennifer Kurtz in the National Institute of Standards and Technology | NIST article, 20 Cybersecurity Statistics Manufacturers Can’t Ignore | NIST (Feb 2020) had a couple of important data breach statistics that drive the danger home.
- An estimated 74% of companies have more than 1,000 stale sensitive files. (Varonis)
- An estimated 41% of companies have more than 1,000 sensitive files including credit card numbers and health records left unprotected. (Varonis)
- An estimated 21% of all files are not protected in any way. (Varonis)
The Danger of Dark Data?
In some cases, your organization may not even be aware of some data being collected. Wikipedia defines dark data as data acquired but is never really used to drive insights or used in decision-making. The real issue is that Dark Data flies under the radar in most organizations. This can cause issues later on, not only in the cost of storage and processing but provides an unneeded risk.
Need: Ability to discover and catalog data in your various environments.
Penalties if you do not get Data Governance correct – GDPR
The most dangerous aspect of all of your data being shared around your organization is that you are only an Excel data pull away from a data breach. Especially with regulations such as GDPR, where making mistakes can be costly not only in reputation but also financially.
Not only a danger to the data but also your business and reputation. An IT Information Week survey reviewing cybercrime showed that 10% of breached small businesses shut down in 2019. (National Cybersecurity Alliance)
A Data Governance program helps minimize this danger by putting controls in place to help manage your data estate without blocking data solutions that will help you gain a competitive advantage in today’s business.
Data Governance Case Study
I have two of my back-in-the-day examples from my consulting life with clients that help illustrate what you are up against and help to define the scope of the problem.
Danger of what Data Might be Surfaced
I was helping a client enable search tools over SAN storage. There were rules that we used, as consultants, that helped make sure a client knew what they were getting into. With so much data, just throwing it open to search, you don’t notice what you might bring up to the surface. Security by obscurity never really works.
We had a couple of gotya searches that we would take management aside and review, such as; executive salary, layoffs, popular movies, images, and content from various file types. I always had something that shocked them.
Do I have control of how my Customer’s Data is used?
When getting into a more recent example on the data side, clients can be surprised as to how many copies of their customer information they have, how out of date it could be, and the shock of what personal customer data is shared around the company. The number of databases with this data would almost always is a surprise.
Lesson Learned: Without a plan, you invite issues. This was usually the best way to start the governance discussion.
What is a data governance strategy?
A data governance strategy is an integrated approach to managing confidential business information that involves applying policies and procedures to your organization’s various data activities. It’s based on the belief that a company’s data should be treated as a critical asset and used to help improve business operations. Data governance is not so much about restricting access to information but making sure that your organization has the right policies and procedures in place to protect data that may contain sensitive or confidential information.
Part 2: Creating a Data Governance Framework
The important point to know about starting any data governance project is that you should not start from scratch. Just like any project, there are resources you can leverage to get you started. This section provides various subject areas you can look at when creating your own framework.
The following are some key points I like to keep in mind for most of my projects.
- “Don’t try and boil the ocean” is important here. The easiest thing to do is start with something small and grow into it. You have to have little victories in a project to build momentum.
- You need executive buy-in as this project will cross many departments and functions. You will find that you need executive level support to help push things along.
- You will not be successful unless you know who has ownership and can be held accountable for their portion of the data estate. You need to define the organizational roles and responsibilities of the various team members you need.
- Balance the focus on the process and the tools. Having the best governance plan will not succeed if your users cannot find the information or find the process or tools too onerous to use.
- As you move through the project don’t forget to always think of the end goals on why you are doing this such as;
- To improve the data quality
- To improve data management
- To making finding and using data easier
- To improve data security and compliance
- (Add in any of your goals to the list)
How can a General-Purpose Data Governance Framework Help?
If you are just starting out, the best place to start is with a general-purpose framework, but also keeping the keys to success in mind from above.
A general-purpose data governance framework is a set of policies and processes that can be applied to most organizations. It may not be tailored to your organization’s individual needs, but it may be easier to start with and implement because it doesn’t require any significant changes to your systems or infrastructure.
Some companies use a combination of these frameworks alongside the more individualized approaches. However, because it is nearly impossible for a framework to address every need, you may still need a flexible approach to achieve the full benefits of data governance in your situation.
The Data Governance Institute(DGI) (Linked-In) is an organization that provides vendor-neutral data governance and guidance. They have published the resource Data Governance Framework & Components which provides a good overview and includes the following two whitepapers:
- In Depth: The DGI Data Governance Framework Download
- How to Use The DGI Data Governance Framework to Configure Your Program
What is a Data Governance Framework?
A data governance framework is a great place to start when designing a set of policies and processes. This will provide guidance and be used to improve data quality, security, privacy, and compliance. Companies with poor data quality or security can suffer from many problems, including low customer trust in their company to handle personal information securely.
Data governance frameworks can be complex, but it’s worth the investment because they will strengthen your company’s security for data collection, storage, and usage from both internal and external threats. This does not mean that you cannot start small and build into your plan.
Your plan should take the following items into consideration, which are covered in the next few sections.
Set Up a Data Governance Center of Excellence (COE)
In many of the analytic projects I have been involved in, whether big or small, you have to guide those using the data solution you create.
A center of excellence centralizes resources, guidance, and up-to-date assistance for those using data in your organization. The main outcomes from this are; to maintain consistency of delivering high-quality data solutions and to make sure time is not wasted reinventing the wheel with each project.
Resource: Establish a Center of Excellence – Power BI – This link provides more detail on a Center of Excellence example from the Power BI side of the business.
The following 7 items should be key tenants for your Center of Excellence.
1. Provide Data Governance Principles
This documents your organization’s overall approach to data; how you collect, what you should collect, how you store it, how long you keep it, and who should have access to it. Having these principles front and center also serve as a reminder to the teams.
2. All Actions and Data Must be Auditable
You need to be able to report and monitor your progress by tracking various metrics. Not only the current status of your data but being able to track it over time. For example, you need to be able to audit based on items in your governance program. Access to usage reports on various data sources, seeing various fields, and sharing reports.
3. People must have Accountability
Someone must own and be responsible for the data. You have to make sure that for all reports or visuals the user must know who to contact for questions or report any issues.
In order for data consumers to have faith and trust the data, they need to know who the Subject Matter Experts are and those available to respond to questions. There has to be a culture of responsibility as once users lose faith in the validity of the data, it can be over.
4. Data Formula and Calculations
I have been involved in projects where each department had a different way of calculating certain metrics. This can lead to everyone saying their number is correct or coming up with their own calculations.
For example, Margin% is one of those calculations that seems simple, Divide Gross Profit by Revenue. BUT, what do you include to calculate Goss Profit? Some were using Operating Profit, some have Net Profit.
Each calculation could be correct for how that department looks at their results, but on reports what does a 30% Gross Profit mean? Having a clearinghouse of data formula and calculations allows different groups to see what is behind a number and what calculations should be on a specific report.
5. Data Architectures
A data architecture combines the data flow models, security methods, and various integration patterns that have been tried and tested. Product evaluations and decisions require an extended process and many different departments. How a data solution is architected including approved products that can be used should come from and reference from the COE.
6. Data Security & Privacy Policies
Not only do you need to protect corporate data but also data collected from your customers. You need to define who should have access to what data. Having clear security principles need to be front and center.
This is very important to your customer success. With security breaches, if people and organizations do not trust how you handle their data, you lose them as customers.
7. Governance Review Before Development is Productionalized
Ensure that every data project has a data governance document review and sign-off before production. The development team needs to have reviewed during the planning phase, but most importantly needs to present their solution for review before release to production. So much easier to do this before issues turn up in production.
What is a Data Inventory?
Taking an inventory of your data can be a complicated process, but it gives you the knowledge you need to build a strong foundation for your company’s future growth. Your inventory can also help you identify areas where new policies or procedures need to be implemented to protect sensitive information or improve the quality of your data. In addition, a well-designed data inventory can also serve as an excellent reference tool for training purposes.
A data inventory or data catalog of your organization’s data would first and foremost include information on where it’s stored and what’s contained within it. It will also provide you with information on how your business uses that data and who has access to it.
Things to keep in mind:
- Who is the Data Owner
- Who is the Subject Matter Expert (SME)
- Documentation of formula and calculations
What is Data Classification?
Data classification involves sorting information into categories according to the types of details it contains. This helps you to identify the data types that are most important and need to be protected while identifying information that can be shared more freely. A simple way to classify information is to assign each piece a unique label based on its value or function. However, you could use many other classification methods as well, such as by using different colors, dimensions, and even numbers to help identify different pieces of information.
What is Data Discovery?
Data discovery is a process of searching for and extracting information. Data discovery tools can help you find specific pieces of information by searching for specific keywords, tags, or unique characteristics. Data discovery can be used to find large amounts of data to help you identify trends or patterns. Knowing how to use these tools effectively can help you build an actionable view of your company’s data and result in increased employee productivity and cost savings when looking for solutions that will solve your business problems.
What is Data Mapping & Lineage?
Data mapping shows the relationships between different types of data and the dependencies between them. It looks at the various sources your data comes from and how that information is used. Mapping takes into account the movement and matching of fields from one database to another.
Data mapping can help you identify any common or repeating patterns, which are often a sign that your business should track that type of data more closely or implement new rules about how it’s accessed and stored. It can also show which systems in your company use certain types of data more than others. This can also help you to build a more thorough understanding of your organization’s needs and capabilities.
Data Lineage includes the process of understanding and showing the full context of your data. Think of it as the visualization of the workflow path and transformations your data goes through. Mapping the fields from one source to another, showing the transformation that occurs as the data moves between parts of your information architecture.
The following screen capture is an example from Azure Purview, Microsoft’s data governance application currently in preview. Through scanning data sources in your organization, the sources can be mapped together to form a lineage. This is a high-level image but the application allows the objects to be drilled into for more detail.
Part 3: Adding in Data Quality
A Data Quality Framework is a set of policies and procedures used to ensure that your company’s data is accurate, complete, and current. It can be fairly complex but should provide you with the tools you need to help meet your team’s needs and reduce the risk of losing significant quantities of important information.
A standard data quality framework is one in which standards or guidelines have been established for different types of data. These standards typically cover how the information should be stored, the format it should be presented in, and the type of analysis that should be performed on it before it’s made available to users.
The following sections provide an overview of the different areas of data quality.
Metadata Management is part of the data quality.
Metadata is really the information about your data contained in your data estate. This can be descriptors, administrative, reference, or other information you want to provide to people using your data. The main benefit is that it helps you maintain order and can provide solution developers the approved and validated data for their projects.
A great example is being able to have various reference data available for data consumers to build from. Rather than having each project create a store listing or even a data table, a central location that each project can use will cut down the development time and interoperability of various solutions.
What is a real-time data quality?
A real-time data quality framework is one in which standards or guidelines for different types of data are determined and followed throughout the day. Different systems could operate on their own schedules, depending on the kind of information they’re monitoring. A real-time framework can be implemented so that your people and systems can keep up with current information trends, making sure that you know about any new issues or changes before they become significant.
Whether on-premises or in the cloud, more employees are using and sharing an ever-increasing amount of data throughout your organization. Data Governance provides guidance on how to use, maintain and secure your data. With the increase in data breaches, it is more important than ever to get a handle on and secure your data while not putting roadblocks on your analysts who need this data to make decisions and grow your business.