The purpose of this post is to provide a high-level plan for implementing DataOps suite.

        

   

Evaluate Project and System Requirements

  •     Understand project testing requirements for Data-in-Motion (data comparison,Data quality) and Data-at-Rest (data quality)
  •     Understand the current testing process
  •     Prepare a list of Data Sources for testing: Relational, Flat Files, Big Data, Cloud, NoSQL, etc.
  •     Understand network and system access requirements for the above Data Sources 
  •     Estimate data volumes for testing
  •     Estimate the number of test cases and type of test cases for the initial project delivery.
  •     Estimate number of users - QA, Development
  •     Determine DevOps strategy for CI/CD of ETL and test cases
  •     Evaluate Application Lifecycle Management and reporting requirements for test results

Evaluate Security Requirements

  •     Understand the team and project structure
  •     Categorize teams/projects into different groups for separation of tests in dataOps 
  •     Identify the dataOps engine (Standalone/EMR/Databricks) based on the project structure and availability of resources. 
  •     Evaluate team access requirements for creating Tests and administering dataOps 
  •     Understand data security requirements between teams
  •     Understand dataOps deployment requirements for different environments/timezones/regions

Installation & Setup

  •     Estimate dataOps Server and DataOps Engine hardware sizing based on Data Volumes and number of users/tests
  •     Estimate hardware for dataOps Servers based on the number of environments/regions
  •     Procure and set up hardware for dataOps Server and engine in a network location closer to data sources
  •     Configure network access for server and engine machines to Data Sources (open ports)
  •     Verify access to Data Sources from the server and engine machines 
  •     Install dataOps Server and Repository Database (PostgreSQL or Oracle)    
  •     Install engine (Standalone/EMR/Databricks) based on the user requirement and configure engine in dataOps.    
  •     Create Data Source connections and test them
  •     Setup dataOps CLI tool for CI/CD integration
  •     Setup a backup/recovery process for dataOps Repository
  •     Configure SMTP settings for Email Notification

Provision & Train Users

  •     Setup Containers based on the team/project requirements
  •     Setup Data Source connections as per the team/project requirements
  •     Provision users to the appropriate dataOps containers    
  •     Train users on dataOps data flow/data Quality/TDM: Videos, Use Cases, Instructor-Led

Plan, Create and Execute Tests

  •     Understand different types of tests in dataOps and come up with a plan for creating tests
  •     Create data flows(Test cases) based on use cases, user stories as part of your Project Plan
  •     Use parameters for reducing the changes to test cases
  •     Group Tests into the pipeline for executing them together
  •     Setup notifications for pipelines
  •     Schedule pipeline execution
  •     Automate data flow and pipeline runs using dataOps CLI tool/Rest API as part of your CI/CD process