Big Data Lab
Objectives:
In today’s world where data is the new oil, big data can hypothetically be assumed as a system that acquires crude oil and makes fuel out of it. Big Data simply refers to a large amount of data which is of structured, semi-structured or unstructured nature. The data pool is so voluminous that it becomes difficult for an organization to manage and process it using traditional databases and software techniques. Therefore, big data not only implies the enormous amount of available data but it also refers to the entire process of gathering, storing, and analysing that data. Today’s business enterprises are data-driven and without data no enterprise can have a competitive advantage. Today, Big Data is so rampant that one has to look which are the companies that are not deploying Big Data.
The Big Data Lab was setup with the purpose to educate students in all aspects of large and distributed information systems and prepare them for highly skilled jobs in emerging and fast growing IT industries such as cloud computing, health care informatics, finance, data integration, and data analytics.
|
|||||||
Setting up a Big Data Lab involves a combination of hardware, software, and networking components. Here's a breakdown of the hardware Lab: |
|||||||
Processor Name: Intel core i7 9700 (3 GHz), B660, 12 cores per processor with model name PLEXTEK DESKTOP MQ-765 and 20 in number |
|||||||
RAM: DDR4 ,16GB,RAM Expandability up to( using spare DIMM Slots) 64GB Total HDD Capacity : 1000GB Total SSD Capacity:256GB RAM Speed:2666 MHz |
|||||||
Hard Disk: 1TB 7200RPM Graphics Card: 2gb NVIDIA®GeForce GT 710,Integrated Intel HD Access: Optical Mouse, Keyboard, 21.5"LED Backlit Monitor with monitor resolution 1920x1080 |
|||||||
Here's a breakdown of the software present in the Lab: |
|||||||
Hadoop Distribution: |
|||||||
Cloudera: Cloudera provides distributions of various open-source big data technologies, including Apache Hadoop, Apache Spark, Apache Hive, Apache HBase, Apache Kafka, and others. These open-source components are freely available and can be used without any licensing fees. |
|||||||
Apache Hadoop: Apache Hadoop is free and open-source software distributed under the Apache License 2.0. |
|||||||
Hadoop Ecosystem Tools: Various tools that complement Hadoop are installed, such as:
|
Practical Project Experiments
1 |
Building chatbots. |
5 |
Classifying breast cancer. |
9 |
Exploratory data analysis. |
2 |
Credit card fraud detection. |
6 |
Driver drowsiness detection. |
10 |
Gender detection and age detection. |
3 |
Fake news detection. |
7 |
Recommender systems. |
11 |
Recognizing speech emotion. |
4 |
Forest fire prediction. |
8 |
Sentiment analysis. |
12 |
Customer segmentation |
Coordinator:
Dr Sharmistha Bhattacharjee
Scientist-D
sharmisthab[at]nielit[dot]gov[dot]in