Dataflow

DataOps Performance Stats
File Type/Data Compare DataOpsEngine Time AWS S3 CSV Dataset Size: 1Billion records (rows 1B and 23 columns) with 1628 partitions Size: 200GB ...
Tue, 7 Sep, 2021 at 3:02 AM
DataOpsEMR Pricing
Service Monthly(all prices Configuration summary EMR Master(Software) 15.33$ Number of master EMR nodes (1), EC2 instance (r6g.xlarge), Uti...
Thu, 2 Sep, 2021 at 8:25 AM
Code to read csv file, skipping top and bottom rows with pipe delimiter. And replacing special characters with underscore.
import pandas as pd import databricks.koalas as ks df = pd.read_csv('/$[ReconcileDate]/Files/Sample.csv', engine='python', sep='|'...
Mon, 20 Dec, 2021 at 12:23 AM
Code to skip last record from the dataset.
df = spark.sql(f"select * from DependentDataset") ds = df.toPandas().iloc[:-1] spark.createDataFrame(ds.astype(str)).createOrReplaceTempView(...
Wed, 15 Dec, 2021 at 5:06 AM
Read xls which has html data.
import pandas as pd df = pd.read_html('/$[ReconcileDate]/Detail_$[ReconcileDate].xls', header=0) dff = pd.concat(df) dff.columns = dff.columns.st...
Wed, 15 Dec, 2021 at 5:09 AM
Export datacompare datasets to Excel
import pandas as pd sheet_names = ["DuplicatesInBase", "DuplicatesInRun", "OnlyInBase", "OnlyInRun", "Differen...
Wed, 15 Dec, 2021 at 5:14 AM
Read csv with Koalas
import pandas as pd import databricks.koalas as ks df = pd.read_csv('/sample.csv', engine='python', sep='|', skiprows=[0], skipfoo...
Wed, 15 Dec, 2021 at 5:17 AM
Code to read csv data with columns and skip top,bottom rows
import pandas as pd df = pd.read_csv('/sample.txt', header=None, skiprows=[0],skipfooter=1, delimiter = '|', names=["COL1","...
Wed, 15 Dec, 2021 at 5:30 AM
Other similar codes
import pandas as pd df = pd.read_csv('/BMO_REPORTS_REGRESSION/V16_Upgrade/UAT/$[ReconcileDate]/base2/Calypso_EOD_MPE_Deal_Level_North_America_ALL$[Reco...
Wed, 15 Dec, 2021 at 7:01 AM
While reading the excel file if we get Error like Tried to read data but the maximum length for this record type is 100,000,000
Tried to read data but the maximum length for this record type is 100,000,000. If the file is not corrupt and not large, please open an issue on bugfile to ...
Wed, 17 Dec, 2025 at 5:12 AM