Answer:-Machine Learning is a branch of AI that enables systems to learn and make predictions from data without being explicitly programmed. It uses algorithms to find patterns and make decisions.
Answer:-
Answer:-
Answer:-
Answer:-
Answer:- Cross-validation is a technique to evaluate models by training on different subsets of data and testing on the remaining set.
Answer:-
Answer:-
Answer:-
Answer:- PCA is a dimensionality reduction technique that transforms features into fewer uncorrelated variables (principal components).
Answer:- A table used to evaluate classification models, containing True Positives, False Positives, True Negatives, and False Negatives.
Answer:
Answer:- The Receiver Operating Characteristic (ROC) curve plots True Positive Rate (TPR) vs. False Positive Rate (FPR) for different thresholds.
Answer:- An activation function introduces non-linearity into the model. Examples:
Answer:- Gradient Descent is an optimization algorithm that minimizes the cost function by iteratively updating model parameters.
Answer:- Parameters set before training that affect model performance (e.g., learning rate, number of layers in a neural network).
Answer:- A technique to prevent overfitting by adding a penalty to the loss function (e.g., L1, L2 regularization).
Answer:-
Answer:- KNN is a non-parametric classification algorithm that assigns labels based on the majority vote of k-nearest data points.
Answer:- A tree-like model where decisions are made based on feature conditions (e.g., Gini Index, Information Gain).
Answer:- An ensemble method combining multiple Decision Trees for better accuracy and robustness.
Answer:- A classification algorithm that finds the optimal hyperplane separating data points.
Answer:- A probabilistic classifier based on Bayes’ Theorem, assuming feature independence.
Answer:- As dimensionality increases, data becomes sparse, making it harder for models to find patterns.
Answer:- Unsupervised technique to group similar data points (e.g., K-Means, Hierarchical Clustering).
Answer:- A clustering algorithm that partitions data into k clusters by minimizing intra-cluster variance.
Answer:- A density-based clustering algorithm that groups points based on density connectivity.
Answer:- A learning approach where an agent interacts with the environment and learns via rewards and punishments.
Answer:- A technique where a pre-trained model is adapted to a new but related task.
Answer:- Predicting future values based on past time-dependent data (e.g., ARIMA, LSTM).
Answer:- import pandas as pd
df=pd.read_csv(“data.csv”)
df.head()
Answer:- from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df_scaled = scaler.fit_transform(df)
Answer:- from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
Answer:- from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)
Answer:- from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
Answer:-
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Answer:-
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Answer:-
from sklearn.svm import SVC
model = SVC(kernel=’linear’)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Answer:-
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
Answer:-
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3, random_state=42)
clusters = kmeans.fit_predict(X)
Answer:-
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Answer:-
from sklearn.feature_selection import SelectKBest, chi2
X_new = SelectKBest(score_func=chi2, k=5).fit_transform(X, y)
Answer:-
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy=’mean’)
X_imputed = imputer.fit_transform(X)
Answer:-
from sklearn.model_selection import StratifiedKFold, cross_val_score
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(model, X, y, cv=cv)
Answer:-
from sklearn.model_selection import GridSearchCV
param_grid = {‘n_neighbors’: [3, 5, 7]}
grid_search = GridSearchCV(KNeighborsClassifier(), param_grid, cv=5)
grid_search.fit(X_train, y_train)
Answer:-
from sklearn.model_selection import RandomizedSearchCV
param_dist = {‘n_estimators’: [50, 100, 200], ‘max_depth’: [5, 10, 20]}
random_search = RandomizedSearchCV(RandomForestClassifier(), param_dist, cv=5, n_iter=5)
random_search.fit(X_train, y_train)
Answer:-
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))
Answer:-
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
model = Sequential([
Dense(32, activation=’relu’, input_shape=(X_train.shape[1],)),
Dense(16, activation=’relu’),
Dense(1, activation=’sigmoid’)
])
model.compile(optimizer=’adam’, loss=’binary_crossentropy’, metrics=[‘accuracy’])
model.fit(X_train, y_train, epochs=10, batch_size=32)
Answer:-
import joblib
# Save the model
joblib.dump(model, “model.pkl”)
# Load the model
loaded_model = joblib.load(“model.pkl”)
Answer:-
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
pipeline = Pipeline([
(‘scaler’, StandardScaler()),
(‘classifier’, LogisticRegression())
])
pipeline.fit(X_train, y_train)
Answer:-
Answer:-
Answer:-
Answer:-
Answer:-
Answer:- A Confusion Matrix is a table that summarizes classification performance.
Actual \ Predicted | Positive (P) | Negative (N) |
Positive (P) | True Positive (TP) | False Negative (FN) |
Negative (N) | False Positive (FP) | True Negative (TN) |
Answer:-Activation functions introduce non-linearity, enabling neural networks to learn complex patterns.
Answer:-
Answer:-
Answer:- Cross-validation is used to evaluate model performance by splitting data into multiple training and test sets.
Answer:-
Answer:-
Answer:- Feature engineering involves creating new meaningful features from raw data. Examples:
Answer:- ✅ Easy to interpret and visualize.
✅ Handles both numerical and categorical data.
✅ No need for feature scaling.
✅ Can handle missing values.
❌ Prone to overfitting (solved using pruning or ensemble methods).
Answer:-
Answer:-
Answer:- A cost function measures the difference between predicted and actual values. The model optimizes this function to minimize errors. Examples:
Answer:-
Answer:- Outliers are data points that significantly differ from others.
Answer:
Reinforcement Learning (RL) is a trial-and-error learning process where an agent interacts with the environment to maximize cumulative rewards.
Answer:- A Markov Chain is a stochastic process where the next state depends only on the current state, not past states. It is used in RL and Hidden Markov Models (HMM).
Answer:- PCA reduces dimensionality by transforming features into principal components that capture the most variance.
Project data onto these components.
Answer:-Overfitting occurs when a model learns noise instead of patterns, leading to poor generalization.
Pruning (for decision trees)
Answer:- Kernels transform data into a higher-dimensional space to make it linearly separable.
Answer:-
Answer:-
Answer:-Autoencoders are neural networks used for unsupervised learning to encode and reconstruct data. They are useful for anomaly detection and dimensionality reduction.
Answer:- LSTMs are specialized RNNs designed to handle long-term dependencies by using memory cells and gates (input, forget, output gates).
Answer:-
Answer:-Hyperparameter tuning finds the best parameters (e.g., learning rate, tree depth) for optimal model performance.
Answer: – A/B testing compares two versions of a model (A & B) to determine which performs better using statistical significance.
Answer:-
Answer:- VIF measures multicollinearity between features.
Answer:- A perceptron is a simple neural network unit used for binary classification with a weighted sum and activation function.
Answer:- Model drift occurs when model performance degrades over time due to changing data patterns.
Answer:- Occurs when new users/items lack interaction history, making recommendations difficult.
Answer:-
Answer:- An ensemble combines multiple models to improve accuracy (e.g., Random Forest, Gradient Boosting).
Answer:- Hinge loss is used in SVMs to penalize misclassified points that violate the margin.
Answer:- Uses self-attention mechanisms to capture long-range dependencies (used in NLP models like BERT).
Answer:- A neural network that learns similarity between pairs of inputs, used in facial recognition.
Answer:- Focuses on important parts of the input sequence, improving translation and NLP models.
Answer:- A model that learns the probability distribution of data to generate new samples (e.g., GANs, VAEs).
Answer:- A deep learning model designed to process graph-structured data.
Answer:-
Answer:- An algorithm used by Google to rank web pages based on link popularity.
Answer: – A mix of labeled and unlabeled data for training models.
Answer:- A probabilistic neural network used for feature learning and recommendation.
Answer:- A neural network that captures spatial relationships between features
Answer:- A technique where a model learns how to learn, used in few-shot learning.
Answer:-
Answer:-
Answer:- High-dimensional data can degrade model performance by increasing sparsity.
Solutions:
Answer:-
Answer:- Elastic Net combines L1 (Lasso) and L2 (Ridge) regularization, handling multicollinearity better than Lasso alone.
Answer:- Uses the sigmoid function to map inputs to probabilities for binary classification.
Answer:- A flexible extension of linear regression where response variables follow different distributions (e.g., Poisson, Binomial).
Answer:- AIC measures the goodness of fit of a model while penalizing complexity to prevent overfitting. Lower AIC is better.
Answer:-
Answer:- The kernel trick maps data into a higher-dimensional space where it becomes linearly separable.
Answer:- A combination of MSE (Mean Squared Error) and MAE (Mean Absolute Error), useful for handling outliers.
Answer:- SVR finds a hyperplane with a margin of tolerance (epsilon) around actual values and ignores errors within this margin.
Answer:- A model that identifies cause-and-effect relationships rather than just correlations.
Answer:- Feature scaling ensures that features have a similar range, improving performance for algorithms like SVM, k-NN, and Gradient Descent.
Answer:-
Answer:- A distance metric that accounts for correlations between features, useful in anomaly detection.
Answer:- A matrix representing pairwise distances or similarities between data points in clustering algorithms.
Answer: – A clustering algorithm that builds a tree-like structure (dendrogram) to form clusters.
Answer:- DBSCAN (Density-Based Spatial Clustering) groups points based on density. It classifies outliers as noise points.
Answer:- A metric that measures how well each point fits within its cluster, ranging from -1 to 1.
Answer:- HMM is a probabilistic model where states are hidden, and transitions follow probabilities (used in NLP).
Answer:- A technique using random sampling to approximate solutions for probabilistic problems.
Answer:- A metric that detects multicollinearity in regression.
Answer:- Measures how a small change in training data affects model predictions.
Answer:- A performance metric that measures the tradeoff between true positives and false positives.
Answer:- A variant of SVM used for anomaly detection by identifying deviations from normal patterns.
Answer:- A Bayesian non-parametric method for clustering.
Answer:- A tradeoff between exploring new options and exploiting known good ones to maximize rewards.
Answer:- A probability distribution commonly used in Bayesian inference.
Answer:- Uses Bayes’ Theorem to classify new instances based on prior probabilities.
Answer:- Balances training error and model complexity to improve generalization.
Answer:- A measure of distance between probability distributions.
Answer:- SGD updates weights incrementally, reducing computational cost and improving convergence speed.
Answer:- The CLT states that the distribution of sample means approaches normality as sample size increases.
Answer:- A technique in Naïve Bayes to prevent zero probabilities by adding small values to all counts.
Answer:- A hidden variable (also called a latent variable) is a variable that is not directly observed but influences observed variables in a dataset. Hidden variables are used in probabilistic models like Hidden Markov Models (HMM) and Gaussian Mixture Models (GMM).
Example: In a medical dataset, “health status” may be a hidden variable that affects observed symptoms but is not directly recorded.
Answer:- The Pareto Principle (80/20 rule) states that 80% of the outcomes result from 20% of the causes. In ML, it applies in various ways:
Answer:- A Gaussian Mixture Model (GMM) is a probabilistic clustering algorithm that models data as a combination of multiple Gaussian distributions.
Steps:
Use Case: GMM is useful when clusters have overlapping boundaries (unlike K-Means which assumes spherical clusters).
Answer:-
Answer:- The Theil Index is a statistical measure of inequality used in economics and machine learning to evaluate distribution fairness.
Use Case: Used in model fairness assessment, income inequality studies, and resource allocation problems.
Answer:- The Jensen-Shannon Divergence (JSD) is a measure of similarity between two probability distributions. It is a symmetric and smoothed version of Kullback-Leibler (KL) divergence.
Formula:
JSD(P∣∣Q)=12KL(P∣∣M)+12KL(Q∣∣M)JSD(P || Q) = \frac{1}{2} KL(P || M) + \frac{1}{2} KL(Q || M)JSD(P∣∣Q)=21KL(P∣∣M)+21KL(Q∣∣M)
where M = (P + Q) / 2.
Use Case: Used in NLP (word embedding comparison), generative models, and clustering validation.
Answer:- A stationary process is a time series whose statistical properties (mean, variance, autocorrelation) remain constant over time.
Types:
Use Case: In ML, stationary processes are important in time series forecasting (ARIMA models require stationarity).
Answer:- Wasserstein Distance (Earth Mover’s Distance – EMD) measures the minimum cost to transform one probability distribution into another.
Use Case:
Answer:- The Expectation-Maximization (EM) algorithm is an iterative method for estimating parameters in models with latent variables.
Steps:
Use Case: Used in GMM, HMM, topic modeling (LDA).
Answer:-
MRR is a metric for evaluating ranking models. It measures how soon the first relevant result appears in a ranked list.
Formula: MRR=1N∑i=1N1rankiMRR = \frac{1}{N} \sum_{i=1}^{N} \frac{1}{rank_i}MRR=N1i=1∑Nranki1
where rank_i is the position of the first relevant item for the i-th query.
Use Case: Used in search engines, recommendation systems, and question-answering models.
Answer:- F-Measure (F1-Score) balances Precision and Recall using their harmonic mean.
F1=2×Precision×RecallPrecision+RecallF1 = 2 \times \frac{Precision \times Recall}{Precision + Recall}F1=2×Precision+RecallPrecision×Recall
Why is it used?
Answer:-
Example:
Answer:- A Markov Blanket for a node in a Bayesian Network consists of its parents, children, and children’s other parents. It defines all the variables needed to predict the node while ignoring the rest.
Use Case: Feature selection in Bayesian Networks.
Answer:- Node.js applications can be deployed using cloud services like AWS, Google Cloud, or platforms like Heroku, using Docker containers, or on traditional VPS using a reverse proxy (e.g., Nginx).
Answer:- The Shapley Value is used in Explainable AI (XAI) to fairly distribute credit among features in a prediction.
Use Case:
For Webinar Videos and Demo Session, Join our Youtube Channel
WhatsApp us