apache hadoop on mac osx yosemite

install hadoop

        $ brew install hadoop
        ==> Downloading http://www.apache.org/dyn/closer.cgi?path=hadoop/common/hadoop-2.6.0/h
        ==> Best Mirror http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.6.0/hadoop-2.
        ######################################################################## 100.0%
        ==> Caveats
        In Hadoop's config file:
          /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/hadoop-env.sh,
          /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/mapred-env.sh and
          /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/yarn-env.sh
        $JAVA_HOME has been set to be the output of:
          /usr/libexec/java_home
        ==> Summary
        🍺  /usr/local/Cellar/hadoop/2.6.0: 6140 files, 307M, built in 8.9 minutes

hadoop will be installed in the directory /usr/local/Cellar/hadoop

configuring hadoop

create a soft-link check my post command ln difference between soft link and hard link
```
     $ cd /usr/local
     $ ln -s Cellar/hadoop/2.6.0 hadoop
```

edit hadoop-env.sh

     $ cd hadoop/libexec/etc/hadoop/
     $ pico hadoop-env.sh
     # export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
     export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="

edit core-site.xml

     $ pico core-site.xml

     <?xml version="1.0" encoding="UTF-8"?>
     <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

     <configuration>
         <property>
             <name>hadoop.tmp.dir</name>
             <value>/usr/local/hadoop/hdfs/tmp</value>
             <description>A base for other temporary directories.</description>
         </property>
         <property>
             <name>fs.default.name</name>
             <value>hdfs://localhost:9000</value>
         </property>
     </configuration>

edit mapred-site.xml

     $ pico mapred-site.xml

     <?xml version="1.0"?>
     <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
     <configuration>
         <property>
             <name>mapred.job.tracker</name>
             <value>localhost:9010</value>
         </property>
     </configuration>

edit hdfs-site.xml

     $ pico hdfs-site.xml

     <?xml version="1.0" encoding="UTF-8"?>
     <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
     <configuration>
         <property>
             <name>mapred.job.tracker</name>
             <value>localhost:9010</value>
         </property>
     </configuration>

create alias

create hstart and hstop

 $ cd
 $ pico .bash_profile

 alias hstart="/usr/local/hadoop/sbin/start-dfs.sh;/usr/local/hadoop/sbin/start-yarn.sh"
 alias hstop="/usr/local/hadoop/sbin/stop-yarn.sh;/usr/local/hadoop/sbin/stop-dfs.sh"

and execute
```
 $ source ~/.bash_profile
```

before we can run hadoop we first need to format the hdfs
```
     $ hdfs namenode -format
```

ssh localhost

check ssh keys
- nothing needs to be done here if you have already generated ssh keys
- to verify just check for the existance of ~/.ssh/id_rsa and the ~/.ssh/id_rsa.pub files
- if not the keys can be generated using
```
  $ ssh-keygen -t rsa -P ""
```

enable remote login

check remote login

  # system preferences -> sharing -> remote login
  $ sudo systemsetup -setremotelogin on
  Password:
  setremotelogin: remote login is already On.
  $ sudo systemsetup -setremotelogin off
  Do you really want to turn remote login off? If you do, you will lose this connection and can only turn it back on locally at the server (yes/no)? yes
  $ sudo systemsetup -setremotelogin on

authorize ssh keys

  $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

try to login

  $ ssh localhost
  > last login: Fri Mar ...
  $ exit

running hadoop

now we can run hadoop just by typing
```
     $ hstart
```
and stopping using
```
     $ hstop
```

download examples

examples
- hadoop examples 1.2.1 (old)
- hadoop examples 2.6.0 (current)

test them out using

     $ hadoop jar </path/to/hadoop-examples file> pi 10 100

good to know

hadoop web interface

errors

faild to start namenode

     $ hdfs namenode

and the problem is…
```
  $ hadoop namenode -format
```

no such file or directory

     $ hstart
     $ hdfs dfs -ls /

then we need to create the default directory structure hadoop expects

  $ whoami
  > spaceship
  $ hdfs dfs -mkdir -p /user/spaceship
  > ...
  $ hdfs dfs -ls
  > ...
  $ hdfs dfs -put book.txt
  > ...
  $ hdfs dfs -ls
  > ...
  > found 1 items