#1258 : Pipeline related info in postgresSQL :

Query By Customer:

Which tables store the information we see in JSON when pipeline is executed and results are retrieved using job id.

Which tables stores data flow ids and its status for the job id in the pipeline i.e. which table has these fields as in the results heighted below in JSON,

Solution:

Table "job": There will be a new record on each run, with a job id for the pipeline that also has run status in this table.

Table "task_execution": There will be a new record for every task within the job id and the task information will be in the serialaized_execution column as JSON.

example of task information:

{"id": "2708c5a1a6b0448190700b4f4022bb7f", "dfId": "7492f69e-de0c-4fef-9e74-bbf4aef3c4c7", "name": "Dataflow_1783", "type": "runDataFlow", "jobId": "38c82230390c4385b381a8b1bfd3c920", "label": "Data", "output": "{¥"status¥":¥"Completed¥",¥"message¥":[{¥"name¥":¥"startComponent¥",¥"status¥":¥"Completed¥",¥"message¥":¥"Component executed successfully¥"},{¥"name¥":¥"JDBC 1¥",¥"status¥":¥"Completed¥",¥"message¥":¥"JDBC 1 executed successfully and processed 100 rows¥"}],¥"runId¥":24543,¥"tag¥":null}", "status": "COMPLETED", "userId": 3, "endTime": "2023-04-10T14:05:19.623+0530", "parentId": "d5f06a9a4fb1437dbb510fd1ffd2a8b1", "priority": 0, "progress": 100, "startTime": "2023-04-10T14:03:15.975+0530", "createTime": "2023-04-10T14:03:15.895+0530", "livyServer": 122, "parameters": "{¥"limit_rows¥":¥"limit 1000¥"}", "executionTime": 123647, "stopOnTaskFailure": "N","stopDependentTasks": "N"}*

by default, will have one start task for each job_id, which we can identify with the parent_id column as null.

If we want to skip that start component the query will be like

"select * from task_execution where job_Id='38c82230390c4385b381a8b1bfd3c920' and parent_id is not null"

How can we help you today?

#1258 : Pipeline related info in postgresSQL Print

How can we help you today?

#1258 : Pipeline related info in postgresSQL Print

Related Articles