Introduction Hadoop is a open source software use for distribute computing. It is reliable, scalable and perfectly use for big data purpose. Hadoop cluster commonly consist of NameNode, secondary NameNode, resource manager and DataNode. NameNode itself store block metadata on the file call fsimage. Secondary NameNode is a NameNode helper. It log changes to fsimage (checkpoint) but do not store the actual fsimage file. Secondary NameNode update frequently and update NameNode fsimage by combining update logs with fsimage to achieve most recent fsimage.
Background if you have number of files need version control, you may heard about what is Git able to provide. Git is a mature version control system widely use by many peoples around the world. Storing repository online can be achieve using GitHub. There are free version for public repositories and pay version for private repositories. Other option is using Amazon AWS with local git. Amazon simple storage service (S3) is a online storage service that provide storage for your data and can be access from anywhere as long you have internet access.
Cron job is a system daemon employed to undertake chosen jobs run in the background. Scheduled cron job is a very helpful tool for system administrator to carry out number of job automatically. Before setting cron job, you need to install cron. Install Cron on Debian linux Login to linux machine run debian. Execute following command to install cron. sudo apt-get update sudo apt-get install cron After cron is successfully installed, You should able to setup task.