Interview Question And Answer - Machine Learning

Interview Question And Answer Content - Machine Learning

Beginner - Level

Answer:-Machine Learning is a branch of AI that enables systems to learn and make predictions from data without being explicitly programmed. It uses algorithms to find patterns and make decisions.

Answer:-

  • Supervised Learning (e.g., Linear Regression, Decision Trees)
  • Unsupervised Learning (e.g., Clustering, PCA)
  • Reinforcement Learning (e.g., Q-learning, Deep Q Networks)

Answer:- 

  • Supervised Learning: Uses labeled data for training. Example: Spam Detection.
  • Unsupervised Learning: Uses unlabeled data to find patterns. Example: Customer Segmentation.

Answer:-

  • Overfitting: Model learns noise instead of the actual pattern (high variance, low bias).
  • Underfitting:Model is too simple to capture the underlying trend (high bias, low variance).

Answer:-

  • Use more training data
  • Reduce model complexity
  • Use Regularization (L1, L2)
  • Use Cross-validation

Answer:- Cross-validation is a technique to evaluate models by training on different subsets of data and testing on the remaining set.

Answer:-

  • Bias: Error due to simplistic model assumptions.
  • Variance: Error due to model sensitivity to small changes in data.
  • Tradeoff: Aim for a balance to minimize total error.

Answer:-

  • Regression: Predicts continuous values (e.g., house price).
  • Classification: Predicts categorical values (e.g., spam or not spam).

Answer:-

  • Filter Methods (Correlation, Chi-square)
  • Wrapper Methods (Recursive Feature Elimination)
  • Embedded Methods (Lasso Regression)

Answer:- PCA is a dimensionality reduction technique that transforms features into fewer uncorrelated variables (principal components).

Answer:- A table used to evaluate classification models, containing True Positives, False Positives, True Negatives, and False Negatives.

Answer:

  • Precision = TP / (TP + FP)
  • Recall = TP / (TP + FN)
  • F1-score = 2 × (Precision × Recall) / (Precision + Recall)

Answer:- The Receiver Operating Characteristic (ROC) curve plots True Positive Rate (TPR) vs. False Positive Rate (FPR) for different thresholds.

Answer:- An activation function introduces non-linearity into the model. Examples:

  • Sigmoid
  • ReLU
  • Tanh

Answer:- Gradient Descent is an optimization algorithm that minimizes the cost function by iteratively updating model parameters.

Answer:- Parameters set before training that affect model performance (e.g., learning rate, number of layers in a neural network).

Answer:- A technique to prevent overfitting by adding a penalty to the loss function (e.g., L1, L2 regularization).

Answer:-

  • L1 (Lasso): Shrinks coefficients to zero (feature selection).
  • L2 (Ridge): Shrinks coefficients but doesn’t make them zero.

Answer:- KNN is a non-parametric classification algorithm that assigns labels based on the majority vote of k-nearest data points.

Answer:- A tree-like model where decisions are made based on feature conditions (e.g., Gini Index, Information Gain).

Answer:- An ensemble method combining multiple Decision Trees for better accuracy and robustness.

Answer:- A classification algorithm that finds the optimal hyperplane separating data points.

Answer:- A probabilistic classifier based on Bayes’ Theorem, assuming feature independence.

Answer:- As dimensionality increases, data becomes sparse, making it harder for models to find patterns.

Answer:- Unsupervised technique to group similar data points (e.g., K-Means, Hierarchical Clustering).

Answer:-  A clustering algorithm that partitions data into k clusters by minimizing intra-cluster variance.

Answer:- A density-based clustering algorithm that groups points based on density connectivity.

Answer:-  A learning approach where an agent interacts with the environment and learns via rewards and punishments.

Answer:- A technique where a pre-trained model is adapted to a new but related task.

Answer:- Predicting future values based on past time-dependent data (e.g., ARIMA, LSTM).

Answer:- import pandas as pd

df=pd.read_csv(“data.csv”)

df.head()

Answer:- from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

df_scaled = scaler.fit_transform(df)

Answer:- from sklearn.linear_model import LinearRegression

model = LinearRegression()

model.fit(X_train, y_train)

Answer:- from sklearn.linear_model import LogisticRegression

model = LogisticRegression()

model.fit(X_train, y_train)

Answer:- from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier()

model.fit(X_train, y_train)

Answer:-

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=100, random_state=42)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

Answer:- 

from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier(n_neighbors=5)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

Answer:-

from sklearn.svm import SVC

model = SVC(kernel=’linear’)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

Answer:-

from sklearn.decomposition import PCA

pca = PCA(n_components=2)

X_pca = pca.fit_transform(X)

Answer:-

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=3, random_state=42)

clusters = kmeans.fit_predict(X)

Answer:-

from sklearn.naive_bayes import GaussianNB

model = GaussianNB()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

Answer:-

from sklearn.feature_selection import SelectKBest, chi2

X_new = SelectKBest(score_func=chi2, k=5).fit_transform(X, y)

Answer:-

from sklearn.impute import SimpleImputer

imputer = SimpleImputer(strategy=’mean’)

X_imputed = imputer.fit_transform(X)

Answer:-

from sklearn.model_selection import StratifiedKFold, cross_val_score

cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

scores = cross_val_score(model, X, y, cv=cv)

Answer:-

from sklearn.model_selection import GridSearchCV

param_grid = {‘n_neighbors’: [3, 5, 7]}

grid_search = GridSearchCV(KNeighborsClassifier(), param_grid, cv=5)

grid_search.fit(X_train, y_train)

Answer:-

from sklearn.model_selection import RandomizedSearchCV

param_dist = {‘n_estimators’: [50, 100, 200], ‘max_depth’: [5, 10, 20]}

random_search = RandomizedSearchCV(RandomForestClassifier(), param_dist, cv=5, n_iter=5)

random_search.fit(X_train, y_train)

Answer:-

from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

Answer:-

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

model = Sequential([

    Dense(32, activation=’relu’, input_shape=(X_train.shape[1],)),

    Dense(16, activation=’relu’),

    Dense(1, activation=’sigmoid’)

])

model.compile(optimizer=’adam’, loss=’binary_crossentropy’, metrics=[‘accuracy’])

model.fit(X_train, y_train, epochs=10, batch_size=32)

Answer:-

import joblib

# Save the model

joblib.dump(model, “model.pkl”)

# Load the model

loaded_model = joblib.load(“model.pkl”)

Answer:-

from sklearn.pipeline import Pipeline

from sklearn.preprocessing import StandardScaler

pipeline = Pipeline([

    (‘scaler’, StandardScaler()),

    (‘classifier’, LogisticRegression())

])

pipeline.fit(X_train, y_train)

Intermediate - Level

Answer:-

  • L1 Regularization (Lasso): Adds absolute values of weights to the loss function (λΣ|w|). It performs feature selection by shrinking some coefficients to zero.
  • L2 Regularization (Ridge): Adds squared values of weights to the loss function (λΣw²). It reduces model complexity but does not eliminate features.

Answer:-

  • High Bias (Underfitting): The model is too simple, leading to poor performance on both training and test data.
  • High Variance (Overfitting): The model is too complex, fitting noise in the training data but failing on test data.
  • The best model balances bias and variance to achieve optimal generalization.

Answer:-

  • As the number of features increases, the data becomes sparse, making distance-based algorithms ineffective.
  • Solutions:
    • Dimensionality Reduction (PCA, t-SNE, Autoencoders)
    • Feature Selection (Lasso, Mutual Information)
    • Regularization to prevent overfitting

Answer:-

  • Precision = TP / (TP + FP) → Focuses on minimizing false positives.
  • Recall = TP / (TP + FN) → Focuses on minimizing false negatives.
  • F1-Score = Harmonic mean of precision and recall.
  • When to prioritize?
    • High Precision: Fraud detection, spam filtering (False positives are costly).
    • High Recall: Medical diagnosis, cancer detection (False negatives are costly).

Answer:-

  • Bagging (Bootstrap Aggregating)
    • Multiple models trained on different bootstrapped samples.
    • Reduces variance (e.g., Random Forest).
  • Boosting
    • Models are trained sequentially, correcting errors of the previous model.
    • Reduces bias and variance (e.g., AdaBoost, Gradient Boosting).

Answer:- A Confusion Matrix is a table that summarizes classification performance.

Actual \ Predicted

Positive (P)

Negative (N)

Positive (P)

True Positive (TP)

False Negative (FN)

Negative (N)

False Positive (FP)

True Negative (TN)

  • Accuracy = (TP + TN) / (TP + TN + FP + FN)
  • Precision = TP / (TP + FP)
  • Recall = TP / (TP + FN)

Answer:-Activation functions introduce non-linearity, enabling neural networks to learn complex patterns.

  • ReLU (Rectified Linear Unit): Popular in deep learning, avoids vanishing gradients.
  • Sigmoid: Used in binary classification, but suffers from vanishing gradient.
  • Tanh: Centered around zero but still suffers from vanishing gradient.
  • Softmax: Used in multi-class classification.

Answer:-

  • Batch Gradient Descent (BGD): Computes gradients using the entire dataset (slower but more stable).
  • Stochastic Gradient Descent (SGD): Updates parameters for each training example (faster but noisy).
  • Mini-Batch Gradient Descent: Uses small random batches, balancing speed and stability.

Answer:-

  • Too high: The model may diverge and fail to converge.
  • Too low: The model takes too long to converge.
  • Adaptive Learning Rate Methods (e.g., Adam, RMSProp) adjust the learning rate dynamically.

Answer:- Cross-validation is used to evaluate model performance by splitting data into multiple training and test sets.

  • K-Fold Cross-Validation: Splits data into K folds and trains K models.
  • Stratified K-Fold: Ensures class distribution is preserved in each fold (useful for imbalanced datasets).
  • Leave-One-Out Cross-Validation (LOOCV): Uses only one sample for testing at a time (computationally expensive).

Answer:-

  • ROC Curve plots True Positive Rate (TPR) vs. False Positive Rate (FPR).
  • AUC-ROC (Area Under the Curve) measures the classifier’s ability to distinguish between classes.
  • Higher AUC-ROC values indicate better performance.

Answer:-

  • Resampling Techniques:
    • Oversampling (SMOTE, ADASYN)
    • Undersampling (Random, Tomek Links)
  • Algorithmic Approaches:
    • Change class weights (e.g., class_weight=’balanced’ in Scikit-Learn).
    • Use ensemble methods like Boosting.

Answer:- Feature engineering involves creating new meaningful features from raw data. Examples:

  • Scaling: Standardization, Min-Max Scaling.
  • Encoding: One-Hot Encoding, Label Encoding.
  • Feature Extraction: TF-IDF for text data, PCA for dimensionality reduction.
  • Feature Transformation: Log transformation, Polynomial Features.

Answer:- ✅ Easy to interpret and visualize.
✅ Handles both numerical and categorical data.
✅ No need for feature scaling.
✅ Can handle missing values.
❌ Prone to overfitting (solved using pruning or ensemble methods).

Answer:-

  • Gini Impurity: Measures the probability of incorrect classification. It is computationally faster. Gini=1−∑pi2Gini = 1 – \sum p_i^2Gini=1−∑pi2​
  • Entropy: Measures the randomness in the data. It is more computationally expensive. Entropy=−∑pilog⁡2piEntropy = – \sum p_i \log_2 p_iEntropy=−∑pi​log2​pi​
  • When to use? Gini is preferred for speed; Entropy is preferred when information gain is crucial.

Answer:-

  • Parametric Models: Assume a fixed number of parameters (e.g., Logistic Regression, Linear Regression). They are computationally efficient but may not capture complex patterns.
  • Non-Parametric Models: Do not assume a fixed number of parameters (e.g., k-NN, Decision Trees, Random Forest). They can model complex data but require more data to generalize well.

Answer:- A cost function measures the difference between predicted and actual values. The model optimizes this function to minimize errors. Examples:

  • MSE (Mean Squared Error) → Regression
  • Log Loss (Cross-Entropy Loss) → Classification

Answer:-

  • Gradient Boosting: Sequentially builds trees, reducing errors of previous models.
  • XGBoost: An optimized version of Gradient Boosting that is faster and uses regularization (L1 & L2).

Answer:- Outliers are data points that significantly differ from others.

  • Detection Methods:
    • Z-score (>3 or <-3)
    • IQR (Interquartile Range)
    • Box Plot, Isolation Forest
  • Handling Methods:
    • Remove extreme outliers if they are errors.
    • Transform data (log transformation).
    • Use robust models (Tree-based models).

Answer:

Reinforcement Learning (RL) is a trial-and-error learning process where an agent interacts with the environment to maximize cumulative rewards.

  • Example: Self-driving cars use RL to learn optimal driving policies.

Answer:- A Markov Chain is a stochastic process where the next state depends only on the current state, not past states. It is used in RL and Hidden Markov Models (HMM).

Answer:- PCA reduces dimensionality by transforming features into principal components that capture the most variance.

  • Steps:
    1. Compute covariance matrix.
    2. Calculate eigenvectors & eigenvalues.
    3. Select top k principal components.

Project data onto these components.

Answer:-Overfitting occurs when a model learns noise instead of patterns, leading to poor generalization.

  • Prevention Techniques:
    • Regularization (L1/L2)
    • Cross-validation
    • More data
    • Dropout (for neural networks)

Pruning (for decision trees)

Answer:- Kernels transform data into a higher-dimensional space to make it linearly separable.

  • Linear Kernel → For linearly separable data
  • Polynomial Kernel → For non-linear data
  • RBF Kernel (Gaussian) → For complex decision boundaries

Answer:-

  • Bagging: Trains multiple weak models independently and averages their predictions (e.g., Random Forest).
  • Stacking: Trains multiple models and combines their predictions using a meta-model.

Answer:-

  • Sigmoid → Outputs values between 0 and 1, used for binary classification.
  • Softmax → Outputs probabilities that sum to 1, used for multi-class classification.

Answer:-Autoencoders are neural networks used for unsupervised learning to encode and reconstruct data. They are useful for anomaly detection and dimensionality reduction.

Answer:- LSTMs are specialized RNNs designed to handle long-term dependencies by using memory cells and gates (input, forget, output gates).

Answer:- 

  • Word2Vec: Learns word embeddings based on context (continuous representation).
  • TF-IDF: Measures term importance in a document (sparse representation).

Answer:-Hyperparameter tuning finds the best parameters (e.g., learning rate, tree depth) for optimal model performance.

  • Methods: Grid Search, Random Search, Bayesian Optimization

 

Answer: – A/B testing compares two versions of a model (A & B) to determine which performs better using statistical significance.

Answer:-

  • K-Means: Requires specifying k clusters; sensitive to outliers.
  • DBSCAN: Detects clusters based on density; does not require specifying k.

Answer:- VIF measures multicollinearity between features.

  • VIF > 5 → High correlation, should be removed.

Answer:- A perceptron is a simple neural network unit used for binary classification with a weighted sum and activation function.

Answer:- Model drift occurs when model performance degrades over time due to changing data patterns.

Answer:- Occurs when new users/items lack interaction history, making recommendations difficult.

Answer:-

  • Batch Learning: The model is trained in large chunks.
  • Online Learning: Updates in real time as new data arrives.

Answer:- An ensemble combines multiple models to improve accuracy (e.g., Random Forest, Gradient Boosting).

Answer:- Hinge loss is used in SVMs to penalize misclassified points that violate the margin.

Answer:- Uses self-attention mechanisms to capture long-range dependencies (used in NLP models like BERT).

Answer:- A neural network that learns similarity between pairs of inputs, used in facial recognition.

Answer:- Focuses on important parts of the input sequence, improving translation and NLP models.

Answer:- A model that learns the probability distribution of data to generate new samples (e.g., GANs, VAEs).

Answer:- A deep learning model designed to process graph-structured data.

Answer:-

  • CTR (Click-Through Rate): Measures ad effectiveness.
  • LTR (Learning to Rank): Optimizes search rankings.

Answer:- An algorithm used by Google to rank web pages based on link popularity.

Answer: – A mix of labeled and unlabeled data for training models.

Answer:- A probabilistic neural network used for feature learning and recommendation.

Answer:- A neural network that captures spatial relationships between features

Answer:- A technique where a model learns how to learn, used in few-shot learning.

Advance - Level

Answer:-

  • Supervised Learning: Uses labeled data (e.g., Classification, Regression).
  • Unsupervised Learning: Uses unlabeled data (e.g., Clustering, PCA).
  • Semi-Supervised Learning: A mix of labeled and unlabeled data to improve learning.

Answer:-

  • Bias: Error from incorrect assumptions, leading to underfitting.
  • Variance: Sensitivity to small fluctuations in data, causing overfitting.
  • Tradeoff: A balance must be maintained between bias and variance for optimal model performance.

Answer:- High-dimensional data can degrade model performance by increasing sparsity.
Solutions:

  • Dimensionality Reduction (PCA, LDA, t-SNE)
  • Feature Selection (Lasso Regression, Mutual Information)

Answer:-

  • Ridge Regression (L2 Regularization): Shrinks coefficients but never sets them to zero.
  • Lasso Regression (L1 Regularization): Can set coefficients to zero, performing feature selection.

Answer:- Elastic Net combines L1 (Lasso) and L2 (Ridge) regularization, handling multicollinearity better than Lasso alone.

Answer:- Uses the sigmoid function to map inputs to probabilities for binary classification.

Answer:- A flexible extension of linear regression where response variables follow different distributions (e.g., Poisson, Binomial).

Answer:- AIC measures the goodness of fit of a model while penalizing complexity to prevent overfitting. Lower AIC is better.

Answer:-

  • MLE: Estimates parameters that maximize the likelihood function.
  • MAP: Incorporates prior probabilities into estimation, improving results in small datasets.

Answer:- The kernel trick maps data into a higher-dimensional space where it becomes linearly separable.

Answer:- A combination of MSE (Mean Squared Error) and MAE (Mean Absolute Error), useful for handling outliers.

Answer:- SVR finds a hyperplane with a margin of tolerance (epsilon) around actual values and ignores errors within this margin.

Answer:- A model that identifies cause-and-effect relationships rather than just correlations.

Answer:- Feature scaling ensures that features have a similar range, improving performance for algorithms like SVM, k-NN, and Gradient Descent.

Answer:-

  • Standardization: Converts data to zero mean and unit variance ((X – mean) / std).
  • Normalization: Scales data between 0 and 1 ((X – min) / (max – min)).

Answer:- A distance metric that accounts for correlations between features, useful in anomaly detection.

Answer:- A matrix representing pairwise distances or similarities between data points in clustering algorithms.

Answer: – A clustering algorithm that builds a tree-like structure (dendrogram) to form clusters.

Answer:-  DBSCAN (Density-Based Spatial Clustering) groups points based on density. It classifies outliers as noise points.

 

Answer:- A metric that measures how well each point fits within its cluster, ranging from -1 to 1.

Answer:- HMM is a probabilistic model where states are hidden, and transitions follow probabilities (used in NLP).

Answer:- A technique using random sampling to approximate solutions for probabilistic problems.

Answer:- A metric that detects multicollinearity in regression.

Answer:- Measures how a small change in training data affects model predictions.

Answer:- A performance metric that measures the tradeoff between true positives and false positives.

Answer:- A variant of SVM used for anomaly detection by identifying deviations from normal patterns.

Answer:- A Bayesian non-parametric method for clustering.

Answer:-  A tradeoff between exploring new options and exploiting known good ones to maximize rewards.

Answer:- A probability distribution commonly used in Bayesian inference.

Answer:- Uses Bayes’ Theorem to classify new instances based on prior probabilities.

Answer:- Balances training error and model complexity to improve generalization.

Answer:- A measure of distance between probability distributions.

Answer:- SGD updates weights incrementally, reducing computational cost and improving convergence speed.

Answer:- The CLT states that the distribution of sample means approaches normality as sample size increases.

Answer:- A technique in Naïve Bayes to prevent zero probabilities by adding small values to all counts.

Answer:- A hidden variable (also called a latent variable) is a variable that is not directly observed but influences observed variables in a dataset. Hidden variables are used in probabilistic models like Hidden Markov Models (HMM) and Gaussian Mixture Models (GMM).

Example: In a medical dataset, “health status” may be a hidden variable that affects observed symptoms but is not directly recorded.

Answer:- The Pareto Principle (80/20 rule) states that 80% of the outcomes result from 20% of the causes. In ML, it applies in various ways:

  • Feature Selection: 20% of features often contribute to 80% of model accuracy.
  • Data Cleaning: 80% of errors are often caused by 20% of the data.
  • Optimization: Focusing on the most important 20% of hyperparameters can yield 80% of model improvements.

Answer:-  A Gaussian Mixture Model (GMM) is a probabilistic clustering algorithm that models data as a combination of multiple Gaussian distributions.

Steps:

  1. Assume that data points belong to different Gaussian distributions.
  2. Use the Expectation-Maximization (EM) algorithm to estimate the parameters (mean, variance, and mixing coefficients).
  3. Assign probabilities to each data point belonging to a cluster instead of hard clustering.

Use Case: GMM is useful when clusters have overlapping boundaries (unlike K-Means which assumes spherical clusters).

Answer:-

  • Parametric Models:
    • Assume a fixed number of parameters (e.g., Linear Regression, Logistic Regression).
    • Computationally efficient and interpretable.
    • Work well when the data distribution follows assumptions.
  • Non-Parametric Models:
    • Do not assume a fixed number of parameters (e.g., k-NN, Decision Trees, Random Forest).
    • Can adapt to data complexity but require more data to avoid overfitting.
    • More flexible but computationally expensive.

 

Answer:- The Theil Index is a statistical measure of inequality used in economics and machine learning to evaluate distribution fairness.

  • A Theil Index of 0 means perfect equality (all predictions or values are equal).
  • A higher Theil Index indicates greater inequality or imbalance in the predictions.

Use Case: Used in model fairness assessment, income inequality studies, and resource allocation problems.

Answer:-  The Jensen-Shannon Divergence (JSD) is a measure of similarity between two probability distributions. It is a symmetric and smoothed version of Kullback-Leibler (KL) divergence.

Formula:

JSD(P∣∣Q)=12KL(P∣∣M)+12KL(Q∣∣M)JSD(P || Q) = \frac{1}{2} KL(P || M) + \frac{1}{2} KL(Q || M)JSD(P∣∣Q)=21​KL(P∣∣M)+21​KL(Q∣∣M)

where M = (P + Q) / 2.

Use Case: Used in NLP (word embedding comparison), generative models, and clustering validation.

Answer:- A stationary process is a time series whose statistical properties (mean, variance, autocorrelation) remain constant over time.

Types:

  1. Strictly Stationary: The joint probability distribution is time-invariant.
  2. Weakly Stationary: Only mean and variance remain constant over time.

Use Case: In ML, stationary processes are important in time series forecasting (ARIMA models require stationarity).

Answer:- Wasserstein Distance (Earth Mover’s Distance – EMD) measures the minimum cost to transform one probability distribution into another.

Use Case:

  • Used in Optimal Transport problems.
  • Helps compare distributions in Generative Models (WGAN – Wasserstein GANs).

Answer:- The Expectation-Maximization (EM) algorithm is an iterative method for estimating parameters in models with latent variables.

Steps:

  1. Expectation Step (E-Step): Estimate missing/latent variables using current parameter estimates.
  2. Maximization Step (M-Step): Update model parameters to maximize likelihood.
  3. Repeat until convergence.

Use Case: Used in GMM, HMM, topic modeling (LDA).

Answer:-

MRR is a metric for evaluating ranking models. It measures how soon the first relevant result appears in a ranked list.

Formula: MRR=1N∑i=1N1rankiMRR = \frac{1}{N} \sum_{i=1}^{N} \frac{1}{rank_i}MRR=N1​i=1∑N​ranki​1​

where rank_i is the position of the first relevant item for the i-th query.

Use Case: Used in search engines, recommendation systems, and question-answering models.

Answer:- F-Measure (F1-Score) balances Precision and Recall using their harmonic mean.

F1=2×Precision×RecallPrecision+RecallF1 = 2 \times \frac{Precision \times Recall}{Precision + Recall}F1=2×Precision+RecallPrecision×Recall​

Why is it used?

  • Helps when class distribution is imbalanced.
  • Useful when both False Positives and False Negatives are costly.

Answer:-

  • Markov Model: The system’s state is directly observable.
  • Hidden Markov Model (HMM): The true state is hidden, and we only observe emitted signals.

Example:

  • Markov Model: Weather transition (Sunny → Rainy).
  • HMM: Speech recognition (we hear sounds but don’t directly observe phonemes).

Answer:- A Markov Blanket for a node in a Bayesian Network consists of its parents, children, and children’s other parents. It defines all the variables needed to predict the node while ignoring the rest.

Use Case: Feature selection in Bayesian Networks.

Answer:- Node.js applications can be deployed using cloud services like AWS, Google Cloud, or platforms like Heroku, using Docker containers, or on traditional VPS using a reverse proxy (e.g., Nginx).

Answer:- The Shapley Value is used in Explainable AI (XAI) to fairly distribute credit among features in a prediction.

Use Case:

  • Used in SHAP (SHapley Additive exPlanations) to interpret feature importance.
  • Helps understand model decisions in fields like healthcare and finance.

Placed Students//Partnership

Placed Students

For Frequent Course Updates and Information, Join our Telegram Group
Join 100% Placement Guranteed
 Programs

For Webinar Videos and Demo Session, Join our Youtube Channel