Dataflow :

Dataflow

File Type/Data Compare DataOpsEngine Time AWS S3 CSV Dataset Size: 1Billion records (rows 1B and 23 columns) with 1628 partitions Size: 200GB ...

Tue, 7 Sep, 2021 at 3:02 AM

DataOpsEMR Pricing

Service Monthly(all prices Configuration summary EMR Master(Software) 15.33$ Number of master EMR nodes (1), EC2 instance (r6g.xlarge), Uti...

Thu, 2 Sep, 2021 at 8:25 AM

Code to read csv file, skipping top and bottom rows with pipe delimiter. And replacing special characters with underscore.

import pandas as pd import databricks.koalas as ks df = pd.read_csv('/$[ReconcileDate]/Files/Sample.csv', engine='python', sep='|'...

Mon, 20 Dec, 2021 at 12:23 AM

Code to skip last record from the dataset.

df = spark.sql(f"select * from DependentDataset") ds = df.toPandas().iloc[:-1] spark.createDataFrame(ds.astype(str)).createOrReplaceTempView(...

Wed, 15 Dec, 2021 at 5:06 AM

Read xls which has html data.

import pandas as pd df = pd.read_html('/$[ReconcileDate]/Detail_$[ReconcileDate].xls', header=0) dff = pd.concat(df) dff.columns = dff.columns.st...

Wed, 15 Dec, 2021 at 5:09 AM

Export datacompare datasets to Excel

import pandas as pd sheet_names = ["DuplicatesInBase", "DuplicatesInRun", "OnlyInBase", "OnlyInRun", "Differen...

Wed, 15 Dec, 2021 at 5:14 AM

Read csv with Koalas

import pandas as pd import databricks.koalas as ks df = pd.read_csv('/sample.csv', engine='python', sep='|', skiprows=[0], skipfoo...

Wed, 15 Dec, 2021 at 5:17 AM

Code to read csv data with columns and skip top,bottom rows

import pandas as pd df = pd.read_csv('/sample.txt', header=None, skiprows=[0],skipfooter=1, delimiter = '|', names=["COL1","...

Wed, 15 Dec, 2021 at 5:30 AM

Other similar codes

import pandas as pd df = pd.read_csv('/BMO_REPORTS_REGRESSION/V16_Upgrade/UAT/$[ReconcileDate]/base2/Calypso_EOD_MPE_Deal_Level_North_America_ALL$[Reco...

Wed, 15 Dec, 2021 at 7:01 AM

How can we help you today?

Dataflow