Sample Questions and Answers
In reinforcement learning, what does the “policy” represent?
A mapping from states to actions
B. The reward function
C. The environment model
D. The discount factor
Answer: A. A mapping from states to actions
Explanation: The policy guides the agent’s actions based on current state.
What is the role of the “discount factor” (γ) in reinforcement learning?
It balances the importance of immediate versus future rewards
B. It controls exploration rate
C. It normalizes rewards
D. It sets learning rate
Answer: A. It balances the importance of immediate versus future rewards
Explanation: A lower γ focuses more on immediate rewards; higher γ values consider future rewards more.
Which activation function is often preferred in hidden layers of deep neural networks?
ReLU (Rectified Linear Unit)
B. Sigmoid
C. Tanh
D. Linear
Answer: A. ReLU (Rectified Linear Unit)
Explanation: ReLU reduces vanishing gradient problems and is computationally efficient.
What is “dropout” used for in training neural networks?
Preventing overfitting by randomly deactivating neurons during training
B. Speeding up convergence
C. Initializing weights
D. Normalizing input data
Answer: A. Preventing overfitting by randomly deactivating neurons during training
Explanation: Dropout reduces co-adaptation of neurons.
What is the primary purpose of “early stopping”?
To stop training when validation loss stops improving to prevent overfitting
B. To stop training after fixed epochs
C. To speed up training by skipping some batches
D. To reduce dataset size
Answer: A. To stop training when validation loss stops improving to prevent overfitting
Explanation: Early stopping halts training before the model overfits.
What is the “softmax” function commonly used for?
Converting logits to a probability distribution over classes
B. Normalizing inputs
C. Regularizing weights
D. Data augmentation
Answer: A. Converting logits to a probability distribution over classes
Explanation: Softmax outputs class probabilities in classification tasks.
What does “transfer learning” typically involve?
Using a pretrained model on a related task and adapting it to a new task
B. Training a model from scratch on a large dataset
C. Increasing dataset size artificially
D. Reducing model size
Answer: A. Using a pretrained model on a related task and adapting it to a new task
Explanation: Transfer learning leverages learned features to improve efficiency.
What is the primary challenge addressed by “unsupervised learning”?
Finding patterns in data without labeled outputs
B. Predicting outcomes from labeled data
C. Reinforcement learning with reward signals
D. Reducing dataset size
Answer: A. Finding patterns in data without labeled outputs
Explanation: Unsupervised learning discovers structure without labels.
What is the difference between “precision” and “recall”?
Precision measures accuracy of positive predictions; recall measures coverage of actual positives
B. Precision measures recall rate; recall measures prediction accuracy
C. Both measure the same
D. Precision is for regression; recall is for classification
Answer: A. Precision measures accuracy of positive predictions; recall measures coverage of actual positives
Explanation: Precision = TP / (TP + FP); Recall = TP / (TP + FN).
Which technique helps reduce overfitting by penalizing large weights?
Regularization
B. Dropout
C. Data augmentation
D. Batch normalization
Answer: A. Regularization
Explanation: Regularization methods like L1 and L2 add penalties to large weights.
What does “k-fold cross-validation” do?
Splits data into k subsets and iteratively trains and validates on different splits
B. Splits data into training and test once
C. Randomly shuffles data before training
D. Reduces dataset size
Answer: A. Splits data into k subsets and iteratively trains and validates on different splits
Explanation: Cross-validation estimates model generalization better.
What is the purpose of the “embedding layer” in an NLP model?
To convert discrete tokens into dense vector representations
B. To normalize input data
C. To generate output predictions
D. To reduce vocabulary size
Answer: A. To convert discrete tokens into dense vector representations
Explanation: Embeddings capture semantic meaning of words.
Which of the following is a disadvantage of decision trees?
Prone to overfitting if not properly pruned
B. Difficult to interpret
C. Not applicable to classification problems
D. Require large amounts of data
Answer: A. Prone to overfitting if not properly pruned
Explanation: Decision trees can fit noise if allowed to grow deep.
What is “ensemble learning”?
Combining predictions from multiple models to improve overall performance
B. Training one large model
C. Reducing dataset size
D. Data normalization
Answer: A. Combining predictions from multiple models to improve overall performance
Explanation: Examples include bagging and boosting.
What is the main advantage of using convolutional layers in image tasks?
They capture spatial hierarchies and patterns efficiently
B. They reduce dataset size
C. They are faster than fully connected layers for all tasks
D. They do not require training
Answer: A. They capture spatial hierarchies and patterns efficiently
Explanation: Convolutions exploit local connectivity.
What is “exploding gradients”?
When gradients become too large, causing unstable updates during training
B. When gradients vanish and become too small
C. When model size grows exponentially
D. When dataset size increases exponentially
Answer: A. When gradients become too large, causing unstable updates during training
Explanation: Exploding gradients can be controlled with clipping.
What is “dimensionality reduction”?
Reducing the number of features while preserving essential information
B. Increasing dataset size
C. Normalizing features
D. Increasing number of parameters
Answer: A. Reducing the number of features while preserving essential information
Explanation: Techniques include PCA and t-SNE.
Which of the following is an unsupervised learning algorithm?
K-means clustering
B. Support vector machine
C. Linear regression
D. Decision tree
Answer: A. K-means clustering
Explanation: K-means groups data without labels.
What is the “exploration-exploitation” tradeoff in reinforcement learning?
Balancing trying new actions versus using known rewarding actions
B. Choosing between supervised and unsupervised learning
C. Selecting between model architectures
D. Adjusting learning rates
Answer: A. Balancing trying new actions versus using known rewarding actions
Explanation: Effective RL agents manage this tradeoff to maximize rewards.
What does “precision-recall curve” help visualize?
The tradeoff between precision and recall for different thresholds
B. Loss vs. epochs
C. Accuracy over time
D. Confusion matrix
Answer: A. The tradeoff between precision and recall for different thresholds
Explanation: Useful for imbalanced classification tasks.
What is the role of the “embedding dimension” in NLP?
It determines the size of the vector used to represent each token
B. It sets the maximum sentence length
C. It controls vocabulary size
D. It normalizes token frequencies
Answer: A. It determines the size of the vector used to represent each token
Explanation: Larger dimensions can capture more information but risk overfitting.
What is the typical output of a binary classification model?
A probability score or class label (0 or 1)
B. Continuous values from -1 to 1
C. Multiple class labels
D. Clusters of data points
Answer: A. A probability score or class label (0 or 1)
Explanation: Binary classifiers output likelihood of positive class.
Which loss function is commonly used for multi-class classification?
Categorical cross-entropy
B. Mean squared error
C. Hinge loss
D. Mean absolute error
Answer: A. Categorical cross-entropy
Explanation: It compares predicted and true class probability distributions.
What is “batch size” tradeoff?
Larger batches give more stable gradients but require more memory; smaller batches provide noisier updates but faster training
B. Larger batches always improve accuracy
C. Smaller batches reduce overfitting always
D. Batch size has no effect on training
Answer: A. Larger batches give more stable gradients but require more memory; smaller batches provide noisier updates but faster training
Explanation: Selecting batch size depends on hardware and optimization goals.
What is the purpose of “weight decay”?
Regularizing by adding a penalty proportional to weight magnitude to loss
B. Reducing learning rate
C. Increasing batch size
D. Initializing weights
Answer: A. Regularizing by adding a penalty proportional to weight magnitude to loss
Explanation: Helps prevent overfitting by penalizing large weights.
What does “data drift” mean in deployed machine learning models?
Change in data distribution over time, possibly degrading model performance
B. Increasing dataset size
C. Adding new features
D. Retraining model
Answer: A. Change in data distribution over time, possibly degrading model performance
Explanation: Models must be monitored and updated as data evolves.
What is a “confusion matrix”?
A table showing counts of true positives, false positives, true negatives, and false negatives
B. A method for dimensionality reduction
C. A type of loss function
D. A regularization technique
Answer: A. A table showing counts of true positives, false positives, true negatives, and false negatives
Explanation: Useful for evaluating classification models.
What does the “softmax” layer do in a neural network?
Converts raw scores into probabilities summing to one
B. Computes the loss function
C. Normalizes inputs
D. Regularizes outputs
Answer: A. Converts raw scores into probabilities summing to one
Explanation: Used for multi-class classification.
What is “hyperparameter tuning”?
Selecting optimal model parameters like learning rate, batch size, etc. before training
B. Adjusting weights during backpropagation
C. Normalizing data
D. Data augmentation
Answer: A. Selecting optimal model parameters like learning rate, batch size, etc. before training
Explanation: Hyperparameters control training behavior and model capacity.
Reviews
There are no reviews yet.