File Type/Data Compare | DataOpsEngine | Time |
AWS S3 CSV Dataset Size: 1Billion records (rows 1B and 23 columns) with 1628 partitions Size: 200GB | AWS EMR Cluster: 11 nodes Type: r6g.4Xlarge CPU: 16 cores RAM: 122 GB EBS:1TB | 4min 15 secs |
AWS S3 Parquet Dataset Size: 1Billion records (rows 1B and 23 columns) with 624 partitions Size: 78GB
| AWS EMR Cluster: 11 nodes Type: r6g.4Xlarge CPU: 16 cores RAM: 122 GB EBS:1TB | 1min 4secs |
Data compare between 1Billion records (rows 1B and 23 columns)CSV and 1Billion records (rows 1B and 23 columns) Parquet | AWS EMR Cluster: 11 nodes Type: r6g.4Xlarge CPU: 16 cores RAM: 122 GB EBS:1TB | 51min 35 secs Note: DataOps suite application is horizontally scalable and performance can be further improved with compute-optimized and memory-optimized node types. |