Sample Questions and Answers
What is the best practice for handling sensitive data in Azure Data Factory?
A) Use Azure Key Vault to store credentials and access secrets dynamically
B) Hardcode passwords in pipeline parameters
C) Use plain text in pipeline code
D) Store credentials in datasets
Answer: A) Use Azure Key Vault to store credentials and access secrets dynamically
Explanation:
Using Key Vault improves security by avoiding exposure of sensitive info.
Which type of Azure Data Factory pipeline activity executes a stored procedure?
A) Stored Procedure Activity
B) Lookup Activity
C) Copy Activity
D) Web Activity
Answer: A) Stored Procedure Activity
Explanation:
This activity runs stored procedures in relational databases.
What is “PolyBase” primarily used for in Azure Synapse?
A) Querying external data directly using T-SQL
B) Performing machine learning
C) Building dashboards
D) Data ingestion automation
Answer: A) Querying external data directly using T-SQL
Explanation:
PolyBase integrates external data with Synapse SQL pools.
Which storage type does Azure Synapse Analytics’ Serverless SQL Pool query?
A) Azure Data Lake Storage and Blob Storage
B) Azure SQL Database only
C) Azure Cosmos DB only
D) Azure Table Storage
Answer: A) Azure Data Lake Storage and Blob Storage
Explanation:
Serverless SQL Pools query data directly from files in storage accounts.
What is a key benefit of Azure Data Factory’s Mapping Data Flows?
A) No infrastructure management needed for ETL
B) Only batch data processing supported
C) Requires manual Spark cluster management
D) Limited support for transformations
Answer: A) No infrastructure management needed for ETL
Explanation:
Mapping Data Flows run on managed Spark clusters abstracted from users.
How do you define a parameter in Azure Data Factory?
A) As a variable that can accept dynamic values in pipelines or datasets
B) As a fixed constant inside activities
C) As an output of copy activity only
D) As a data source type
Answer: A) As a variable that can accept dynamic values in pipelines or datasets
Explanation:
Parameters make pipelines reusable and dynamic.
Which Azure Synapse component is optimized for time-series and log analytics data?
A) Azure Data Explorer Pool
B) Dedicated SQL Pool
C) Serverless SQL Pool
D) Spark Pool
Answer: A) Azure Data Explorer Pool
Explanation:
Data Explorer specializes in telemetry and log data.
What is the recommended method to manage schema drift in Mapping Data Flows?
A) Use schema drift settings and wildcards in source transformation
B) Avoid schema changes
C) Manually edit JSON definitions each time
D) Stop pipeline execution when schema changes
Answer: A) Use schema drift settings and wildcards in source transformation
Explanation:
ADF supports schema drift to handle evolving data structures.
What is an advantage of using Azure Data Factory Self-hosted Integration Runtime?
A) Secure data movement between on-premises and cloud sources
B) Only works with Azure data stores
C) Cannot be scaled out
D) Requires no setup
Answer: A) Secure data movement between on-premises and cloud sources
Explanation:
Self-hosted IR allows connecting to private networks safely.
Which Azure Synapse Analytics pool is serverless and charges based on usage?
A) Serverless SQL Pool
B) Dedicated SQL Pool
C) Spark Pool
D) Data Explorer Pool
Answer: A) Serverless SQL Pool
Explanation:
Serverless pools provide pay-per-query model without cluster provisioning.
What type of data flow activity is used for data transformations in Azure Data Factory?
A) Mapping Data Flow
B) Lookup Activity
C) Web Activity
D) Copy Activity
Answer: A) Mapping Data Flow
Explanation:
Mapping Data Flow supports visually designed ETL transformations.
How do you pass parameters from a pipeline to a Mapping Data Flow?
A) Define parameters in the data flow and pass them during pipeline execution
B) Use only fixed values in data flows
C) Parameters cannot be passed to data flows
D) Pass parameters via Azure Functions
Answer: A) Define parameters in the data flow and pass them during pipeline execution
Explanation:
This enables dynamic, reusable data flows.
What is the primary use case for Azure Data Factory’s Lookup activity?
A) Retrieve a single row or small dataset for decision making in pipelines
B) Move large datasets between storage accounts
C) Trigger pipelines on schedule
D) Execute stored procedures in SQL databases
Answer: A) Retrieve a single row or small dataset for decision making in pipelines
Explanation:
Lookup activity fetches data for conditional logic or parameterization.
Which Azure Data Factory feature enables you to retry failed pipeline activities automatically?
A) Retry policy configured on activities
B) Using Logic Apps
C) Creating duplicate pipelines
D) Event Grid triggers
Answer: A) Retry policy configured on activities
Explanation:
Retry policies specify number of retry attempts and intervals.
In Azure Synapse Analytics, what does the term “Dedicated SQL Pool” refer to?
A) A provisioned data warehouse with fixed compute resources
B) A serverless on-demand query engine
C) A Spark cluster for big data workloads
D) A managed NoSQL database
Answer: A) A provisioned data warehouse with fixed compute resources
Explanation:
Dedicated SQL Pools allocate fixed DWUs for predictable performance.
Which Azure Data Factory component connects to data stores and computes to perform data movement?
A) Integration Runtime (IR)
B) Linked Service
C) Dataset
D) Pipeline
Answer: A) Integration Runtime (IR)
Explanation:
IR provides the compute environment for data movement and transformation.
What is the main benefit of using serverless SQL pools in Azure Synapse?
A) Pay only for query execution without managing infrastructure
B) Provides dedicated resources for high-performance workloads
C) Requires cluster provisioning
D) Only supports structured data
Answer: A) Pay only for query execution without managing infrastructure
Explanation:
Serverless pools enable querying data in storage without cluster setup.
When would you choose PolyBase over Bulk Insert in Azure Synapse?
A) When querying external data sources without moving data
B) When loading small datasets only
C) When performing complex joins inside Spark
D) When exporting data to external storage
Answer: A) When querying external data sources without moving data
Explanation:
PolyBase lets you query external files as if they were tables.
What file format is optimal for big data analytic workloads on Azure Synapse?
A) Parquet
B) CSV
C) TXT
D) JSON
Answer: A) Parquet
Explanation:
Parquet is columnar, compressed, and efficient for analytic queries.
What is the role of Linked Services in Azure Data Factory?
A) Define connection information to external data sources
B) Store data schema metadata
C) Orchestrate pipeline execution
D) Transform data during ETL
Answer: A) Define connection information to external data sources
Explanation:
Linked Services specify connection strings and credentials.
How can you optimize Azure Synapse Dedicated SQL Pool for large table scans?
A) Use partitioning and distribution strategies on tables
B) Avoid indexing
C) Use serverless SQL pools only
D) Store data in JSON format
Answer: A) Use partitioning and distribution strategies on tables
Explanation:
Proper partitioning improves query performance and parallelism.
Which Azure service is best suited for ingesting real-time streaming data?
A) Azure Stream Analytics
B) Azure Data Factory
C) Azure Blob Storage
D) Azure SQL Database
Answer: A) Azure Stream Analytics
Explanation:
Stream Analytics processes and analyzes streaming data in real time.
Which Azure Synapse component is ideal for exploratory data analysis using notebooks?
A) Spark Pools
B) Dedicated SQL Pools
C) Serverless SQL Pools
D) Data Explorer Pools
Answer: A) Spark Pools
Explanation:
Spark pools support notebooks and data science workloads.
What feature in Azure Data Factory supports incremental data loading?
A) Watermark columns and parameters
B) Copy Activity only
C) Lookup Activity only
D) Linked Services
Answer: A) Watermark columns and parameters
Explanation:
Watermarks track the last processed record to load only new data.
How does Azure Data Factory handle sensitive information such as passwords?
A) Integrates with Azure Key Vault for secure secret management
B) Stores them in plain text within pipelines
C) Requires hardcoding in JSON definitions
D) Does not support secure storage
Answer: A) Integrates with Azure Key Vault for secure secret management
Explanation:
Key Vault keeps credentials secure and separate from code.
What is the main purpose of the Azure Data Factory Trigger?
A) Automate pipeline execution based on schedule or events
B) Transform data inside pipelines
C) Define data schema
D) Monitor pipeline runs
Answer: A) Automate pipeline execution based on schedule or events
Explanation:
Triggers start pipelines automatically when conditions are met.
Which Azure Synapse Analytics feature supports querying semi-structured JSON data?
A) OPENROWSET with Serverless SQL Pool
B) Dedicated SQL Pool only
C) Spark Pools only
D) PolyBase
Answer: A) OPENROWSET with Serverless SQL Pool
Explanation:
Serverless SQL can parse JSON files stored in data lakes.
What is the function of a Dataset in Azure Data Factory?
A) Represents data structures within linked services for use in activities
B) Executes code
C) Manages secrets
D) Defines triggers
Answer: A) Represents data structures within linked services for use in activities
Explanation:
Datasets define the shape and location of data.
Which Azure Data Factory activity would you use to run custom code or REST API calls?
A) Web Activity
B) Copy Activity
C) Lookup Activity
D) Execute Pipeline Activity
Answer: A) Web Activity
Explanation:
Web Activity lets you invoke REST endpoints.
How can you improve performance of large data copies in Azure Data Factory?
A) Enable parallel copy with multiple threads
B) Copy data row by row
C) Disable compression
D) Use a single pipeline activity only
Answer: A) Enable parallel copy with multiple threads
Explanation:
Parallelism speeds up bulk data movement.
What is a tumbling window trigger in Azure Data Factory?
A) A trigger that runs pipelines at fixed time intervals with retry and backfill support
B) A trigger that runs only once
C) A manual trigger
D) An event-based trigger
Answer: A) A trigger that runs pipelines at fixed time intervals with retry and backfill support
Explanation:
Tumbling window triggers are used for periodic, windowed execution.
Which Azure Synapse feature enables you to use SQL for big data stored in data lakes?
A) Serverless SQL Pools
B) Dedicated SQL Pools
C) Spark Pools
D) Data Explorer Pools
Answer: A) Serverless SQL Pools
Explanation:
Serverless SQL pools query files in data lakes directly with T-SQL.
How does Azure Synapse ensure security at the data level?
A) Through role-based access control (RBAC) and data masking
B) By encrypting only the storage account
C) Using open access to all users
D) No built-in security features
Answer: A) Through role-based access control (RBAC) and data masking
Explanation:
RBAC limits user actions; data masking hides sensitive info.
What is a key advantage of using Azure Data Factory’s pipeline parameters?
A) Reusability and dynamic pipeline behavior
B) Fixed hardcoded values only
C) Only usable in datasets
D) Not supported in ADF
Answer: A) Reusability and dynamic pipeline behavior
Explanation:
Parameters enable passing different values for flexible pipelines.
What is the main difference between Azure Blob Storage and Azure Data Lake Storage Gen2?
A) ADLS Gen2 adds hierarchical namespace and is optimized for big data analytics
B) Blob Storage supports hierarchical namespace
C) ADLS Gen2 does not support analytics workloads
D) Blob Storage is only for unstructured data
Answer: A) ADLS Gen2 adds hierarchical namespace and is optimized for big data analytics
Explanation:
Hierarchical namespace enables directory structure and faster analytics.
How do you monitor pipeline performance and troubleshoot failures in Azure Data Factory?
A) Using the Monitor tab and activity run logs
B) Using Azure DevOps only
C) By exporting logs manually every time
D) No monitoring is available
Answer: A) Using the Monitor tab and activity run logs
Explanation:
The Monitor tab provides detailed execution history and error info.
What does “sharding” mean in Azure Synapse Analytics?
A) Distributing data across multiple nodes for parallel processing
B) Compressing data files
C) Encrypting data at rest
D) Archiving old data
Answer: A) Distributing data across multiple nodes for parallel processing
Explanation:
Sharding improves scalability and query speed by splitting data.
Reviews
There are no reviews yet.