$ yum -y install openssh
$ su hadoop
$ passwd hadoop
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/home/ hadoop/.ssh/id_dsa):
Enter passphrase (empty for no passphrase): > ~/.ssh/authorized_keys
$ chmod 600 .ssh/authorized_keys
$ ssh hadoop@localhost
找一個讓Hadoop安身立命的好地方,這邊我是放在opt底下,下載後解壓縮
$ cd /opt
$ wget 'http://apache.ntu.edu.tw/hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz'
$ tar xzvf hadoop-0.20.2.tar.gz
$ mv hadoop-0.20.2 hadoop
設置hadoop-env.sh在文件最末端加上
$ vim /opt/hadoop/conf/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.x.x-xx
測試一下hadoop可否執行
$ /opt/hadoop/bin/hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
...以下省略
測試 Local (Standalone) Mode,此模式下每個 Hadoop daemon 執行在一個分離的 Java 程序中。
$ mkdir /opt/hadoop/input
$ cp /opt/hadoop/conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
10/02/22 10:47:42 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName
以下省略
$ cat /opt/hadoop/output/*
1 dfsadmin
更改config檔
$ vim /opt/hadoop/conf/core-site.xml
<configuration><property><name>fs.default.name</name><value>hdfs://localhost:9000</value></property></configuration>
$ vim /opt/hadoop/conf/hdfs-site.xml
<configuration><property><name>dfs.replication</name><value>1</value></property></configuration>
$ vim /opt/hadoop/conf/mapred-site.xml
<configuration><property><name>mapred.job.tracker</name><value>localhost:9001</value></property></configuration>
$ /opt/hadoop/bin/start-all.sh
namenode running as process 17390. Stop it first.
localhost: datanode running as process 17514. Stop it first.
...以下省略
瀏覽管理介面並開始測試
NameNode:http://localhost:50070/
JobTracker:http://localhost:50030/
複製檔案到分散式檔案系統
$ /opt/hadoop/bin/hadoop fs -put conf input
執行範例jar檔測試是否正常
$ /opt/hadoop/bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
10/02/22 11:10:07 INFO mapred.FileInputFormat: Total input paths to process : 13
10/02/22 11:10:07 INFO mapred.JobClient: Running job: job_201002221103_0001
...以下省略
從分散式檔案系統拷貝檔案到本機檔案系統檢驗
$ /opt/hadoop/bin/hadoop fs -get output output
$ cat output/*
cat: output/output: Is a directory
1 dfsadmin