File Type/Data Compare

DataOpsEngine

Time

AWS S3 CSV

Dataset Size: 1Billion records (rows 1B and 23 columns) with 1628 partitions

Size: 200GB

AWS EMR

Cluster:  11 nodes

Type: r6g.4Xlarge

CPU: 16 cores

RAM:  122 GB EBS:1TB

4min 15 secs

AWS S3 Parquet

Dataset Size: 1Billion records (rows 1B and 23 columns) with 624 partitions

Size: 78GB

 

AWS EMR

Cluster:  11 nodes

Type: r6g.4Xlarge

CPU: 16 cores

RAM:  122 GB EBS:1TB

1min 4secs

Data compare between 1Billion records (rows 1B and 23 columns)CSV and 1Billion records (rows 1B and 23 columns) Parquet

AWS EMR

Cluster:  11 nodes

Type: r6g.4Xlarge

CPU: 16 cores

RAM:  122 GB EBS:1TB

51min 35 secs

Note: DataOps suite application is horizontally scalable and performance can be further improved with compute-optimized and memory-optimized node types.