Register Login

Datastage Interview Questions and Answers

Updated Feb 19, 2019

What is DataStage?

DataStage is a famous ETL tool developed by IBM used to transform, load and extract information. It helps companies in the field of business intelligence. It is used to transfer data from a source to another destination.

What is DataStage Architecture?

The DataStage architecture is divided into two parts, client and server.

Client components

  • DataStage Administrator – It sets the environment variable. It is also responsible for developing and deleting data projects.
  • DataStage Designer – It is responsible for job designing.
  • DataStage Director – It is responsible for executing jobs, organizing them and evaluating them.
  • DataStage Manager – It is responsible for importing exporting the data projects.

Server components

  • DataStage Server – It is responsible for executing the tasks related to the server.
  • DataStage Package Installer – It is responsible for installing DataStage jobs.
  • Project – It is the data project that has all the important information.

What are DataStage operators?

The DataStage operators are as follows:

  • String operators
  • Assignment operators
  • Logical operators
  • Arithmetic operators
  • If operator
  • Pattern matching operators

Datastage-Interview-Questions

How to remove duplicates in DataStage?

The Remote Duplicate Stage is used to remove duplicates in DataStage.

What is version control in DataStage?

Version control in DataStage is a tool that is present along with DataStage 7.5. This is used to keep a backup of previous versions of DataStage. Version control is used to monitor the changes made to a particular job. It comes as a separate installable version along with DataStage 7.5.

How to delete dataset in DataStage?

In the DataStage Manager, from the DataSet management in Tools, select the dataset that needs to be removed.

How to generate surrogate key in DataStage?

Surrogate keys can be generated using the Transformer Generator stage. In this stage:

  • Click on stage properties and click on the variable define option.
  • After that modify the datatype and initial values.
  • Hit OK.
  • In the output link, we can create the surrogate keys using the stage variable we had created earlier.

What are DataStage operators?

The DataStage operators are as follows:

  • String operators
  • Assignment operators
  • Logical operators
  • Arithmetic operators
  • If the operator
  • Pattern matching operators

What is hashed file in DataStage?

The primary task of a DataStage hash file is that of a lookup table. The most common and widely used hash file is the dynamic hash file. Inside a file where space is allocated beforehand, these files distribute the information throughout the file.

A key value is present for every row that determines its position inside the hash file. A hashing algorithm is used to determine how the rows will be structured. Hash files can be loaded with data that is pulled from a remote system, which can be done locally.

What is the aggregator stage in DataStage?

The Aggregator stage in DataStage is where the processing of rows is performed. It distributes the rows into different groups from the input links. For each group, the aggregate stage calculates the total value. These totals for each group are the throughput for that stage.

What is a quality stage in DataStage?

The Quality Stage is a part of the IBM Information Server, which is basically a client/server software used for data cleansing and improving the quality of data. It provides the developers the facility of a development environment for making data cleaning tasks.

It supports data analysis that can be used to improve business intelligence. Gathering valuable insights from the data help in developing marketing strategies.

How to zip a file in DataStage?

In DataStage, the files can be zipped in the Compress stage. In this stage, the UNIX gzip or the compress functionality is utilized to make a zip file. After the data set is zipped, the data gets converted into binary data from the records.

How to check DataStage server is running?

The serverStatus command in the DataStage Application Server will show the current status of the server, whether it is running or not.

What is data cleansing in DataStage?

Data cleansing in DataStage is the process of cleaning and maintaining the data to improve the quality of data. When the data is accurate and lets the user to easily draw insights from it, for analysis, its quality is maintained.

The data must not have duplicate records. The data cleaning tasks are performed in the Quality Stage.

How to remove empty tags in XML in DataStage?

To handle empty and null elements, DataStage offers two options in the Transformation tab:

  • Replace NULLs with empty values – This removes all null values and replaces them with empty values.
  • Replace empty values with NULLs – This replaces all empty values with NULLS.

If both options are selected, only the empty XML elements will be considered Null.

What is join in DataStage?

Join is the process of joining two tables in DataStage. The Join stage is a processing stage where two or more data sets that are provided as input. This stage is used for joining large data sets, combining different tables with the same keys and for doing outer joins.

What is audit table in DataStage?

The audit data table in DataStage for executing the jobs consist of the jobs that are executing at present.  The different details of the executing or completed jobs like status, rows processed, the last date of execution can be obtained. The Audit Report has all the necessary information.

What is DataStage ETL tool?

DataStage ETL is one of the most popular extractions, transfer and data loading tools developed by IBM. It helps companies in the field of business intelligence. It is used to transfer data from a source to another destination.

It has a simple interface for accessing and integrating files from anywhere. It supports both Linux and Windows platforms and supports parallel processing. It has 3 security levels, private, collaborative and shared.

Difference between an Operational DataStage and a Data Warehouse.

The difference between an Operational DataStage and a Data Warehouse are provided below:

Data Warehouse

Operational Database

IT provides a combined set of data having different views.

It a database that undergoes alterations every day like updating, adding, deleting etc. of data.

OLAP (Online Analytical Processing) is used to handle the different data operations by the senior officials of an organization.

OLTP (Online Transaction Processing) is used to perform the daily transactions by the junior level professionals of an organization.

It consists of historical information.

It has data that is being processed at present.

The view of the data here is multidimensional.

It provides a relational view of the data being processed.

This is based on Constellation schema, Snowflake schema, and Star Schema.

The Entity-Relationship model is followed here.

How to call Routine in DataStage?

The routines can be called in DataStage by the following steps:

  • Right click on the mapping fields, in the transformation stage.
  • Click on dsRoutines option and provide the business logic.
  • Select either before or after subroutines.


×