The purpose of this post is to provide a high-level plan for implementing DataOps suite.
Evaluate Project and System Requirements
- Understand project testing requirements for Data-in-Motion (data comparison,Data quality) and Data-at-Rest (data quality)
- Understand the current testing process
- Prepare a list of Data Sources for testing: Relational, Flat Files, Big Data, Cloud, NoSQL, etc.
- Understand network and system access requirements for the above Data Sources
- Estimate data volumes for testing
- Estimate the number of test cases and type of test cases for the initial project delivery.
- Estimate number of users - QA, Development
- Determine DevOps strategy for CI/CD of ETL and test cases
- Evaluate Application Lifecycle Management and reporting requirements for test results
Evaluate Security Requirements
- Understand the team and project structure
- Categorize teams/projects into different groups for separation of tests in dataOps
- Identify the dataOps engine (Standalone/EMR/Databricks) based on the project structure and availability of resources.
- Evaluate team access requirements for creating Tests and administering dataOps
- Understand data security requirements between teams
- Understand dataOps deployment requirements for different environments/timezones/regions
Installation & Setup
- Estimate dataOps Server and DataOps Engine hardware sizing based on Data Volumes and number of users/tests
- Estimate hardware for dataOps Servers based on the number of environments/regions
- Procure and set up hardware for dataOps Server and engine in a network location closer to data sources
- Configure network access for server and engine machines to Data Sources (open ports)
- Verify access to Data Sources from the server and engine machines
- Install dataOps Server and Repository Database (PostgreSQL or Oracle)
- Install engine (Standalone/EMR/Databricks) based on the user requirement and configure engine in dataOps.
- Create Data Source connections and test them
- Setup dataOps CLI tool for CI/CD integration
- Setup a backup/recovery process for dataOps Repository
- Configure SMTP settings for Email Notification
Provision & Train Users
- Setup Containers based on the team/project requirements
- Setup Data Source connections as per the team/project requirements
- Provision users to the appropriate dataOps containers
- Train users on dataOps data flow/data Quality/TDM: Videos, Use Cases, Instructor-Led
Plan, Create and Execute Tests
- Understand different types of tests in dataOps and come up with a plan for creating tests
- Create data flows(Test cases) based on use cases, user stories as part of your Project Plan
- Use parameters for reducing the changes to test cases
- Group Tests into the pipeline for executing them together
- Setup notifications for pipelines
- Schedule pipeline execution
- Automate data flow and pipeline runs using dataOps CLI tool/Rest API as part of your CI/CD process