About The Program:
With the belief to build a healthy ecosystem as per the Industry Standards REGex Software brings a Winter Training/Internship Program on “BigData”. We organize Winter Training/Internship Program for improving the knowledge and skills of the Students/Professionals, so that they can become expert in the field of BigData and get their Dream Job in Software Development Field in Big MNCs.
REGex Software Services’s BigData course is a valuable resource for beginners and experts. This course will introduce you to Hadoop, HDFS, HIVE, Apache Spark Amazon EMR etc. from Basics to Advance. If you are preparing for a coding interview, REGex introduce this course for you.
24*7 Mentorship Support
Live Sessions by Expertise Trainers and Access of Recorded Session is also available
Get a chance to work on Industry Oriented Projects to implement your learning
24*7 Mentorship Support available for all Students to clear all of your doubts
REGex provides Internship / Job opportunities to the best Students in different Companies.
S. No. | Topic |
---|---|
1 | Introduction to Python |
Introduction to Python Programming
| |
● Why do we need Python? ● Program structure in Python | |
Execution steps
| |
● Interactive Shell ● Executable or script files ● User Interface or IDE | |
Data Types and Operations | |
● Numbers, Strings, List, Tuple, Dictionary ● Other Core Types | |
Statements and Syntax in Python | |
● Assignments, Expressions and prints ● If tests and Syntax Rules ● While and For Loops ● Iterations and Comprehensions | |
Functions in Python | |
● Function definition and call ● Function Scope, Arguments ● Function Objects ● Anonymous Functions | |
Modules and Packages-Basic | |
● Module Creations and Usage ● Package Creation and Importing | |
Classes in Python | |
● Classes and instances ● Classes method calls | |
File Operations | |
● Opening a file ● Using Files ● Other File tools | |
Libraries | |
● Importing a library ● Math, Numpy | |
2 | LINUX |
Introduction to LINUX Operating System and Basic LINUX commands | |
● Introduction to LINUX Operating System and Basic LINUX commands ● Operating System ● Basic LINUX Commands | |
LINUX File System | |
● LINUX File System ● File Types ● File Permissions ● File Related Commands ● Filters o Simple Filters o Advanced Filters | |
Vi Editor | |
● Vi Editor ● Input Mode Commands ● Vi Editor – Save & Quit ● Cursor Movement Commands | |
Shell Programming | |
● Shell Variables ● Environmental Variables ● Shell script Commands ● Arithmetic Operations ● Command Substitution ● Command Line Arguments | |
3 | Data Warehouse & Modeling Concepts |
Business Intelligence | |
● Business Intelligence ● Need for Business Intelligence ● Terms used in BI ● Components of BI | |
General concept of Data Warehouse | |
● Data Warehouse ● History of Data Warehousing ● Need for Data Warehouse ● Data Warehouse Architecture ● Data Mining Works with DWH ● Features of Data warehouse ● Data Mart ● Application Areas | |
Dimensional modeling | |
● Dimension modeling ● Fact and Dimension tables ● Database schema ● Schema Design for Modeling ● Star, SnowFlake ● Fact Constellation schema ● Use of Data mining ● Data mining and Business Intelligence ● Types of data used in Data mining ● Data mining applications ● Data mining products | |
4 | Big Data Platform |
Big Data Overview | |
● What’s Big Data? ● Big Data: 3V’s ● Explosion of Data ● What’s driving Big Data ● Applications for Big Data Analytics ● Big Data Use Cases ● Benefits of Big Data | |
Hadoop(HDFS) | |
● History of Hadoop ● Distributed File System ● What is Hadoop ● Characteristics of Hadoop ● RDBMS Vs Hadoop ● Hadoop Generations ● Components of Hadoop ● HDFS Blocks and Replication ● How Files Are Stored ● HDFS Commands ● Hadoop Daemons | |
Hadoop 2.0 & YARN | |
● Difference between Hadoop 1.0 and 2.0 ● New Components in Hadoop 2.x ● YARN/MRv2 ● Configuration Files in Hadoop 2.x ● Major Hadoop Distributors/Vendors ● Cluster Management & Monitoring ● Hadoop Downloads |
S. No. | Topic |
---|---|
Map Reduce | |
● What is distributed computing ● Introduction to Map Reduce ● Map Reduce components ● How MapReduce works ● Word Count execution ● Suitable & unsuitable use cases for MapReduce | |
Sqoop | |
● Architecture ● Basic Syntax ● Import data from a table in a relational database into HDFS ● import the results of a query from a relational database into HDFS ● Import a table from a relational database into a new or existing Hive table ● Insert or update data from HDFS into a table in a relational database | |
Hive Programming | |
● Define a Hive-managed table ● Define a Hive external table ● Define a partitioned Hive table ● Define a bucketed Hive table ● Define a Hive table from a select query ● Define a Hive table that uses the ORCFile format ● Create a new ORCFile table from the data in an existing non-ORCFile Hive table ● Specify the delimiter of a Hive table ● Load data into a Hive table from a local directory ● Load data into a Hive table from an HDFS directory ● Load data into a Hive table as the result of a query ● Load a compressed data file into a Hive table ● Update a row in a Hive table ● Delete a row from a Hive table ● Insert a new row into a Hive table ● Join two Hive tables ● Use a subquery within a Hive query | |
Scala
| |
● An overview of functional programming ● Why Scala? ● REPL ● Working with functions ● objects and inheritance ● Working with lists and collections ● Abstract classes | |
5 | Spark in Memory |
SPARK Basics | |
● What is Spark? ● History of Spark ● Spark Architecture ● Spark Shell | |
Working with RDDs in Spark | |
● RDD Basics ● Creating RDDs in Spark ● RDD Operations ● Passing Functions to Spark ● Transformations and Actions in Spark ● Spark RDD Persistence | |
Working with Key/Value Pairs | |
● Pair RDDs ● Transformations on Pair RDDs ● Actions Available on Pair RDDs ● Data Partitioning (Advanced) ● Loading and Saving the Data | |
Spark Advanced | |
● Accumulators ● Broadcast Variables ● Piping to External Programs ● Numeric RDD Operations ● Spark Runtime Architecture ● Deploying Applications | |
SPARK with SQL | |
● Spark SQL Overview ● Spark SQL Architecture | |
DataFrame : | |
● What are dataframe ● Manipulating Dataframes ● Reading new data from different file format ● Group By & Aggregations functions | |
Spark streaming | |
● What is Spark streaming? ● Spark Streaming example | |
6 | No SQL Databases |
Introduction to HBASE | |
● Introduction of HBase ● Comparison with traditional database ● HBase Data Model (Logical and Physical models) ● Hbase Architecture ● Regions and Region Servers ● Partitions ● Compaction (Major and Minor) ● Shell Commands ● HBase using APIs | |
7 | Talend |
Talend Basics | |
● Pre-requisites ● Introduction ● Architecture | |
Talend Data Integration | |
● Installation and Configuration ● Repository ● Projects ● Metadata Connection ● Context Parameters ● Jobs / Joblets ● Components ● Important components ● Aggregation & working with Input & output data | |
8 | Pseudo Live Project (PLP) |
● Pseudo Live Project (PLP) program is primarily to handhold participants who are fresh
into the technology. In PLP, more importance given to “Process Adherence” ● The following SDLC activities are carried out during PLP o Requirement Analysis o Design ( High Level Design and Low Level Design) o Design of UTP(Unit Test Plan) with test cases o Coding o Code Review o Testing o Deployment o Configuration Management o Final Presentation |
WhatsApp us