Data Engineering Specialization With Cloud

(Batches Start from 8th, 19th, 29th January 2024)

About The Program

With the belief to build a healthy ecosystem as per the Industry Standards REGex Software brings a Training/Internship Program on “BigData & AWS Cloud”. We organize Summer Training/Internship Program for improving the knowledge and skills of the Students/Professionals, so that they can become expert in the field of BigData and get their Dream Job in Software Development Field in Big MNCs.

REGex Software Services’s Data Engineering Specialization With Cloud is a valuable resource for both beginners and experts. This specialization program will introduce you to the domain of Data Engineering and Cloud include Hadoop, Map-Reduce, HIVE, Apache Spark, Kafka Streaming ,SQL,  Amazon EMR and Connecting, Managing, Deploying and Updating Cloud much more starting from Basics to Advance. If you want to become Data Engineer / Business Analyst, REGex introduce this program for you.  

Weekly Duration

20 Hours Per week

Location

Physical (Jaipur)
or 
Online (Google Meet)

Duration

6 Months

Participants

25 – 30 per Batch

What you will Learn

Python

Duration: 20 Hours

SQL

Duration: 60 Hours

Hadoop

Duration: 20 Hours

Hive

Duration: 20 Hours

Spark

Duration: 40 Hours

Apache Kafka

Duration: 20 Hours

nosql

No SQL

Duration: 20 Hours

Amazon Emr

Duration: 10 Hours

AWS

Duration: 10 Hours

What you will Learn

  • Linux basics
  • Big Data Analytics & Hadoop
  • HDFS [ Hadoop Distributed File System ]
  • Map-Reduce [ Data Processing ]
  • HIVE
  • Apache Spark on Azure DataBricks
  • NoSQL DataBase
  • Data visualization
  • SQL
  • Power Query & Editor
  • Dashboard & Graph
  • Amazon EMR
  • Learn how to use these tools in the field of Data Analytics
  • AWS Foundations and Services
  • AWS Security & Costs
  • AWS Cloud Services Overview
  • Compute Services Design, Implementation & Management
  • Identity and Access Management (IAM)
  • Auto Scaling Solutions
  • Virtual Network Services – DNS
  • AWS Application Deployment
  • AWS Database Design & Deployment
  • Additional AWS Services.

Study Material

  • E-Notes.
  • Poll Test & Assignments .
  • Over 300+ hours of Live Video Lectures available on demand.
  • Accessing lecture videos and notes.
  • 24*7 Mentorship Support
  • Engaging in real-time project assignments

Output

  • Able to think out of the box
  • Expertise in different Big Data Tools like HDFS, Hive, Apache Spark, Amazon EMR
  • Work on multiple projects and Opportunity to get Internship in REGex or in any other Other company through us.
  • Understand creating Data Insights by connecting data sets, transform & clean the data into data models and then create chars/graphs to provide visuals of the data
  • Become Data Engineer after completion of this program
  • Able to get Package of  upto 30 LPA

Why Choose Us

Live Sessions

Live Sessions by Expertise Trainers and Access of Recorded Session is also available.

Live Projects
Get a chance to work on Industry Oriented Projects to implement your learning.
24*7 Support
24*7 Mentorship Support available for all Students to clear all of your doubts.
Opportunities
REGex provides Internship / Job opportunities to the best Students in different Companies.

Placed Students//Partnership

What People Tell About Us

Placed Students

Course Content

  • Basics of Python
  • OOPs Concepts
  • File & Exception Handling
  • Working with Pandas, Numpy & Matplotlib ■ Working with Missing Data ■ Data Grouping ■ Data Subsetting ■ Merging & Joining Data Frames
  • Importing Libraries & Datasets
● Introduction to LINUX Operating System and Basic LINUX commands ● Operating System ● Basic LINUX Commands Linux File System
  • LINUX File System
  • File Types
  • File Permissions
  • File Related Commands
  • Filters
  • Simple Filters
  • Advanced Filters
Vi Editor
  • Vi Editor
  • Input Mode Commands
  • Vi Editor – Save & Quit
  • Cursor Movement Commands
Shell Programming
  • Shell Variables
  • Environmental Variables
  • Shell script Commands
  • Arithmetic Operations
  • Command Substitution
  • Command Line Arguments
  •  Business Intelligence
  •  Need for Business Intelligence
  • Terms used in BI
  • Components of BI
General concept of Data Warehouse
  • Data Warehouse
  • History of Data Warehousing
  • Need for Data Warehouse
  • Data Warehouse Architecture
  • Data Mining Works with DWH
  • Features of Data warehouse
  • Data Mart
  • Application Areas
Dimensional modeling
  • Dimension modeling
  • Fact and Dimension tables
  • Database schema
  • Schema Design for Modeling
  • Star, SnowFlake
  • Fact Constellation schema
  • Use of Data mining
  • Data mining and Business Intelligence
  • Types of data used in Data mining
  • Data mining applications
  • Data mining Products

● What’s Big Data?
● Big Data: 3V’s
● Explosion of Data
● What’s driving Big Data
● Applications for Big Data Analytics
● Big Data Use Cases
● Benefits of Big Data

    • Functional Dependency
    • Closure of Attributes
    • Types of Keys: PrimaryKey CandidateKey & Super Key in DBMS
    • Normalization
    • Indexing
    • Transaction and Concurrency Control
    • Transaction in DBMS
    • ACID Propertise in DBMS
    • Joins in DBMS
    • Create & Alter Table
    • Constraints in SQL
    • Sql Queries & Sub Queries
    • SQL Stored Procedure
    • View, Cursor & Trigger in SQL
    • Common Table Expession
    • Replace Null and Coalesce Function
    • Running Total In SQL
 

Big Data Tools

● History of Hadoop
● Distributed File System
● What is Hadoop
● Characteristics of Hadoop
● RDBMS Vs Hadoop
● Hadoop Generations
● Components of Hadoop
● HDFS Blocks and Replication
● How Files Are Stored
● HDFS Commands
● Hadoop Daemons

● Difference between Hadoop 1.0 and 2.0
● New Components in Hadoop 2.x
● YARN/MRv2
● Configuration Files in Hadoop 2.x
● Major Hadoop Distributors/Vendors
● Cluster Management & Monitoring
● Hadoop Downloads

● What is distributed computing
● Introduction to Map Reduce
● Map Reduce components
● How MapReduce works
● Word Count execution
● Suitable & unsuitable use cases for MapReduce

● Architecture
● Basic Syntax
● Import data from a table in a relational database into HDFS
● import the results of a query from a relational database into HDFS
● Import a table from a relational database into a new or existing Hive table
● Insert or update data from HDFS into a table in a relational database

● Define a Hive-managed table
● Define a Hive external table
● Define a partitioned Hive table
● Define a bucketed Hive table
● Define a Hive table from a select query
● Define a Hive table that uses the ORCFile format
● Create a new ORCFile table from the data in an existing non-ORCFile Hive table
● Specify the delimiter of a Hive table
● Load data into a Hive table from a local directory
● Load data into a Hive table from an HDFS directory
● Load data into a Hive table as the result of a query
● Load a compressed data file into a Hive table
● Update a row in a Hive table
● Delete a row from a Hive table
● Insert a new row into a Hive table
● Join two Hive tables
● Use a subquery within a Hive query

● What is Spark?
● History of Spark
● Spark Architecture
● Spark Shell

● RDD Basics
● Creating RDDs in Spark
● RDD Operations
● Passing Functions to Spark
● Transformations and Actions in Spark
● Spark RDD Persistence

● Pair RDDs
● Transformations on Pair RDDs
● Actions Available on Pair RDDs
● Data Partitioning (Advanced)
● Loading and Saving the Data

● Accumulators
● Broadcast Variables
● Piping to External Programs
● Numeric RDD Operations
● Spark Runtime Architecture
● Deploying Applications

  •  Spark SQL Overview
  • Spark SQL Architecture

Data Frame

  • What are dataframe
  • Manipulating Dataframes
  • Reading new data from different file format
  • Group By & Aggregations functions
  •  What is Spark streaming?

  • Spark Streaming example

  • Understand the fundamentals of Kafka.
  • Understand the distributed nature of Kafka and its scalability.
  • Understand how data is organized into topics and partitions.
  • Install and set up Kafka on your local machine or a cluster.
  • Learn how to create topics, produce messages, and consume messages using
    Kafka APIs.
  • Overview of Amazon EMR and its features.
  • Setting up and configuring Amazon EMR clusters.
  • Running big data processing jobs on EMR.
  • Integrating Amazon EMR with other AWS services.
  • Monitoring and optimizing EMR clusters.
  • Security considerations for EMR.

● Introduction of HBase
● Comparison with traditional database
● HBase Data Model (Logical and Physical models)
● Hbase Architecture
● Regions and Region Servers
● Partitions
● Compaction (Major and Minor)
● Shell Commands
● HBase using APIs

NO SQL

  • Introduction to NoSQL databases and their characteristics.
  • Types of NoSQL databases: Document-oriented, Key-Value, Column-Family, Graph.
  • Use cases for NoSQL databases.
  • MongoDB: A popular document-oriented NoSQL database.
  • Redis: A widely used key-value NoSQL database.
  • Cassandra: A column-family NoSQL database.
  • Understand the fundamentals of ETL (Extract, Transform, Load) processes.
  • Learn how to install and configure Taland.
  • Explore Taland’s interface and understand its key components.
  • Practice using Taland to extract data from different sources, perform
    transformations, and load it into target systems.

● Pre-requisites
● Introduction
● Architecture

● Installation and Configuration
● Repository
● Projects
● Metadata Connection
● Context Parameters
● Jobs / Joblets
● Components
● Important components
● Aggregation & working with Input & output data

● Pseudo Live Project (PLP) program is primarily to handhold participants who are fresh into the technology. In PLP, more importance given to “Process Adherence”
● The following SDLC activities are carried out during PLP
o Requirement Analysis
o Design ( High Level Design and Low Level Design)
o Design of UTP(Unit Test Plan) with test cases
o Coding
o Code Review
o Testing
o Deployment
o Configuration Management
o Final Presentation

AWS Cloud

1. Design Resilient Architectures

  • AWS global infrastructure (for example, Availability Zones, AWS Regions, Amazon Route 53)
  • AWS managed services with appropriate use cases (for example, Amazon Comprehend, Amazon Polly)
  • Basic networking concepts (for example, route tables)
  • Disaster recovery (DR) strategies (for example, backup and restore, pilot light, warm standby, active-active failover, recovery point objective [RPO], recovery time objective [RTO])
  • Distributed design patterns
  • Failover strategies
  • Immutable infrastructure
  • Load balancing concepts (for example, Application Load Balancer)
  • Proxy concepts (for example, Amazon RDS Proxy)
  • Service quotas and throttling (for example, how to configure the service quotas for a workload in a standby environment)
  • Storage options and characteristics (for example, durability, replication)
  • Workload visibility (for example, AWS X-Ray) Skills in:
  • Determining automation strategies to ensure infrastructure integrity
  • Determining the AWS services required to provide a highly available and/or fault-tolerant architecture across AWS Regions or Availability Zones
  • Identifying metrics based on business requirements to deliver a highly available solution
  • Implementing designs to mitigate single points of failure
  • Implementing strategies to ensure the durability and availability of data (for example, backups)
  • Selecting an appropriate DR strategy to meet business requirements
  • Using AWS services that improve the reliability of legacy applications and applications not built for the cloud (for example, when application changes are not possible)
  • Using purpose-built AWS services for workloads

2. Design High-Performing Architectures

  • Hybrid storage solutions to meet business requirements
  • Storage services with appropriate use cases (for example, Amazon S3, Amazon Elastic File System [Amazon EFS], Amazon Elastic Block Store [Amazon EBS])
  • Storage types with associated characteristics (for example, object, file, block) Skills in:
  • Determining storage services and configurations that meet performance demands
  • Determining storage services that can scale to accommodate future needs
  • AWS compute services with appropriate use cases (for example, AWS Batch, Amazon EMR, Fargate)
  • Distributed computing concepts supported by AWS global infrastructure and edge services
  • Queuing and messaging concepts (for example, publish/subscribe)
  • Scalability capabilities with appropriate use cases (for example, Amazon EC2 Auto Scaling, AWS Auto Scaling)
  • Serverless technologies and patterns (for example, Lambda, Fargate)
  • The orchestration of containers (for example, Amazon ECS, Amazon EKS) Skills in:
  • Decoupling workloads so that components can scale independently
  • Identifying metrics and conditions to perform scaling actions
  • Selecting the appropriate compute options and features (for example, EC2 instance types) to meet business requirements
  • Selecting the appropriate resource type and size (for example, the amount of Lambda memory) to meet business requirements
  • AWS global infrastructure (for example, Availability Zones, AWS Regions)
  • Caching strategies and services (for example, Amazon ElastiCache)
  • Data access patterns (for example, read-intensive compared with write-intensive)
  • Database capacity planning (for example, capacity units, instance types, Provisioned IOPS)
  • Database connections and proxies
  • Database engines with appropriate use cases (for example, heterogeneous migrations, homogeneous migrations)
  • Database replication (for example, read replicas)
  • Database types and services (for example, serverless, relational compared with non-relational, in-memory)

3. Design Cost-Optimized Architectures

  • Access options (for example, an S3 bucket with Requester Pays object storage)
  • AWS cost management service features (for example, cost allocation tags, multi-account billing)
  • AWS cost management tools with appropriate use cases (for example, AWS Cost Explorer, AWS Budgets, AWS Cost and Usage Report)
  • AWS storage services with appropriate use cases (for example, Amazon FSx, Amazon EFS, Amazon S3, Amazon EBS)
  • Backup strategies
  • Block storage options (for example, hard disk drive [HDD] volume types, solid state drive [SSD] volume types)
  • Data lifecycles
  • Hybrid storage options (for example, DataSync, Transfer Family, Storage Gateway)
  • Storage access patterns • Storage tiering (for example, cold tiering for object storage)
  • Storage types with associated characteristics (for example, object, file, block)
  • AWS cost management service features (for example, cost allocation tags, multi-account billing)
  • AWS cost management tools with appropriate use cases (for example, Cost Explorer, AWS Budgets, AWS Cost and Usage Report)
  • AWS global infrastructure (for example, Availability Zones, AWS Regions)
  • AWS purchasing options (for example, Spot Instances, Reserved Instances, Savings Plans)
  • Distributed compute strategies (for example, edge processing)
  • Hybrid compute options (for example, AWS Outposts, AWS Snowball Edge)
  • Instance types, families, and sizes (for example, memory optimized, compute optimized, virtualization)
  • Optimization of compute utilization (for example, containers, serverless computing, microservices)
  • Scaling strategies (for example, auto scaling, hibernation)
  • AWS cost management service features (for example, cost allocation tags, multi-account billing)
  • AWS cost management tools with appropriate use cases (for example, Cost Explorer, AWS Budgets, AWS Cost and Usage Report)
  • Caching strategies
  • Data retention policies
  • Database capacity planning (for example, capacity units)
  • Database connections and proxies
  • Database engines with appropriate use cases (for example, heterogeneous migrations, homogeneous migrations)
  • Database replication (for example, read replicas)
  • Database types and services (for example, relational compared with non-relational, Aurora, DynamoDB)
  • AWS cost management service features (for example, cost allocation tags, multi-account billing)
  • AWS cost management tools with appropriate use cases (for example, Cost Explorer, AWS Budgets, AWS Cost and Usage Report)
  • Load balancing concepts (for example, Application Load Balancer)
  • NAT gateways (for example, NAT instance costs compared with NAT gateway costs)
  • Network connectivity (for example, private lines, dedicated lines, VPNs)
  • Network routing, topology, and peering (for example, AWS Transit Gateway, VPC peering)
  • Network services with appropriate use cases (for example, DNS)

 Note: Content may Subject to Change by REGex as per Requirement

Extra Sessions

Additinal Session on GIT, Linux, Docker, AWS Basics, Jenkins and many more for all students.

Fee Structure

Indian Fee

Price: ₹59,999/- (Flat 75% off) => ₹25,000/-  
(Limited Period Special Offer)

International Fee

Price: $1200 (Flat 75% off) => $500
(Limited Period Special Offer)

Fee Can be Paid as No Cost EMI @2500/Month

Cashback Policy

  • You will get your Unique Referral Code after successful paid registration.
  • You will get ₹2000 Cashback directly in your account for each paid registration from your Unique Referral Code on monthly basis(After Closing Registrations of this program) .
  • For Example:- If we received 10 paid registration from your Unique Referral Code then you will receive ₹2000*10 = ₹20,000 on monthly basis.

Related Programs

Data Engineering Specialization with Cloud
AWS Cloud Specialization with DevOps
ML & DL Specialization with Python D-jango

Enroll Now

(Batches Start from 8th, 19th & 29th January 2024)

*It will help us to reach more
*Extra off is applicable on 1 time payment only. Seats can be filled or Price can be increased at any time. Refund policy is not available*